The Digital Cat - compilershttps://www.thedigitalcatonline.com/2020-08-09T18:00:00+01:00Adventures of a curious cat in the land of programmingA game of tokens: write an interpreter in Python with TDD - Part 52020-08-09T18:00:00+01:002020-08-09T18:00:00+01:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-08-09:/blog/2020/08/09/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-5/<h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2>
<p>This is part 5 of <a href="https://www.thedigitalcatonline.com/blog/2017/05/09/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-1/">A game of tokens</a>, a series of posts where I build an interpreter in Python following a pure TDD methodology and engaging you in a sort of a game: I give you the tests and you have to write the code that passes them …</p><h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2>
<p>This is part 5 of <a href="https://www.thedigitalcatonline.com/blog/2017/05/09/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-1/">A game of tokens</a>, a series of posts where I build an interpreter in Python following a pure TDD methodology and engaging you in a sort of a game: I give you the tests and you have to write the code that passes them. After part 4 I had a long hiatus because I focused on other projects, but now I resurrected this series and I'm moving on.</p>
<p>First of all I reviewed the first 4 posts, merging the posts that contained the solutions. While this is definitely better for me, I think it might be better for the reader as well, this way it should be easier to follow along. Remember however that you learn if you do, not if you read!</p>
<p>Secondly, I was wondering in which direction to go, and I decided to shamelessly follow the steps of Ruslan Spivak, who first inspired this set of posts and who set off to build an Pascal interpreter; you can find the impressive series of posts Ruslan wrote on <a href="https://ruslanspivak.com">his website</a>. Thank you Ruslan for the great posts!</p>
<p>So, let's go Pascal!</p>
<h2 id="tools-update">Tools update<a class="headerlink" href="#tools-update" title="Permanent link">¶</a></h2>
<p>I introduced black into my development toolset, so I used it to reformat the code</p>
<div class="highlight"><pre><span></span><code>black<span class="w"> </span>smallcalc/*.py<span class="w"> </span>tests/*.py
</code></pre></div>
<p>And added a configuration file <code>.flake8</code> for Flake8 to avoid the two tools to clash</p>
<div class="highlight"><pre><span></span><code><span class="k">[flake8]</span>
<span class="c1"># Recommend matching the black line length (default 88),</span>
<span class="c1"># rather than using the flake8 default of 79:</span>
<span class="na">max-line-length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">100</span>
<span class="na">ignore</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">E231 E741</span>
</code></pre></div>
<h2 id="level-17-reserved-keywords-and-new-assignment">Level 17 - Reserved keywords and new assignment<a class="headerlink" href="#level-17-reserved-keywords-and-new-assignment" title="Permanent link">¶</a></h2>
<p>Since Pascal has reserved keywords, I need tokens that have the keyword itself as value (something similar to Erlang's atoms). For this reason I changed <code>test_empty_token_has_length_zero</code> into</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_empty_token_has_the_length_of_the_type_itself</span><span class="p">():</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="s2">"sometype"</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="o">==</span> <span class="nb">len</span><span class="p">(</span><span class="s2">"sometype"</span><span class="p">)</span>
<span class="k">assert</span> <span class="nb">bool</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
</code></pre></div>
<p>and modified the code in the class <code>Token</code> to pass it</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="fm">__len__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="k">else</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="p">)</span>
</code></pre></div>
<p>The keywords I will introduce in this post are <code>BEGIN</code> and <code>END</code>, so I need a test that shows they are supported</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_begin_and_end</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN END"</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">END</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">),</span>
<span class="p">]</span>
</code></pre></div>
<p>The block <code>BEGIN ... END</code> is a generic compound block in Pascal (more on this later), and a Pascal program is made of that plus a final dot. Since the dot is already used for floats I need a test that shows it is correctly lexed.</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_final_dot</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN END."</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">END</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">DOT</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">),</span>
<span class="p">]</span>
</code></pre></div>
<p>Last, Pascal assignments are sligthly different from what we already implemented, as they use the symbol <code>:=</code> instead of just <code>=</code>. We face a choice here, as we have to decide where to put the logic of our programming language: shall the lexer identify <code>:</code> and <code>=</code> separately, and let the parser deal with the two tokens in sequence, or shall we make the lexer emit an <code>ASSIGNMENT</code> token directly? I went for the first one, so that the lexer can be kept simple (no lookahead in it), but you are obviously free to try something different. For me the test that checks the assignment is</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_assignment_and_semicolon</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"a := 5;"</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">NAME</span><span class="p">,</span> <span class="s2">"a"</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">":"</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">"="</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s2">"5"</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">),</span>
<span class="p">]</span>
</code></pre></div>
<p>You may have noticed I also decided to check for the semicolon in this test. Even here, we might discuss if it's meaningful to test two different things together, and generally speaking I'm in favour of a high granularity in tests, which however means that I try to avoid testing <em>unrelated</em> and <em>complicated</em> features together. In Pascal, the semicolon is used to separate statements, so it is likely be found at the end of something like an assignment. For this reason, and considering that it's a small feature, I put it in a context inside this test, and will extract it if more complex requirements arise in the future.</p>
<p>The parser has to be changed to support the new assignment, and to do that we first need to change the tests. The symbol <code>=</code> has to be replaced with <code>:=</code> in the following tests: <code>test_parse_assignment</code>, <code>test_parse_assignment_with_expression</code>, <code>test_parse_assignment_expression_with_variables</code>, and <code>test_parse_line_supports_assigment</code>.</p>
<h3 id="solution">Solution<a class="headerlink" href="#solution" title="Permanent link">¶</a></h3>
<p>Supporting reserved keywords is just a matter of defining specific token types for them</p>
<div class="highlight"><pre><span></span><code><span class="n">BEGIN</span> <span class="o">=</span> <span class="s2">"BEGIN"</span>
<span class="n">DOT</span> <span class="o">=</span> <span class="s2">"DOT"</span>
<span class="n">RESERVED_KEYWORDS</span> <span class="o">=</span> <span class="p">[</span><span class="n">BEGIN</span><span class="p">,</span> <span class="n">END</span><span class="p">]</span>
</code></pre></div>
<p>and changing the method <code>_process_name</code> in order to detect them</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">_process_name</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">regexp</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s2">"[a-zA-Z_]+"</span><span class="p">)</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">regexp</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">tail</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">match</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">token_string</span> <span class="o">=</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">()</span>
<span class="k">if</span> <span class="n">token_string</span> <span class="ow">in</span> <span class="n">RESERVED_KEYWORDS</span><span class="p">:</span>
<span class="n">tok</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">token_string</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">tok</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">NAME</span><span class="p">,</span> <span class="n">token_string</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_set_current_token_and_skip</span><span class="p">(</span><span class="n">tok</span><span class="p">)</span>
</code></pre></div>
<p>I decided to put the logic in this method because after all reserved keywords are exactly names with a specific meaning. I might have created a dedicated method <code>_process_keyword</code> but it would basically have been a copy of <code>_process_name</code> so this solution makes sense to me.</p>
<p>To support the final dot I added a token for it</p>
<div class="highlight"><pre><span></span><code><span class="n">DOT</span> <span class="o">=</span> <span class="s2">"DOT"</span>
</code></pre></div>
<p>and a processing method</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_process_dot</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">regexp</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s2">"\.$"</span><span class="p">)</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">regexp</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">tail</span><span class="p">)</span>
<span class="k">if</span> <span class="n">match</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_set_current_token_and_skip</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">DOT</span><span class="p">))</span>
</code></pre></div>
<p>which is then introduced with a high priority in <code>get_token</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">get_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">eof</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_eof</span><span class="p">()</span>
<span class="k">if</span> <span class="n">eof</span><span class="p">:</span>
<span class="k">return</span> <span class="n">eof</span>
<span class="n">eol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_eol</span><span class="p">()</span>
<span class="k">if</span> <span class="n">eol</span><span class="p">:</span>
<span class="k">return</span> <span class="n">eol</span>
<span class="n">dot</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_dot</span><span class="p">()</span>
<span class="k">if</span> <span class="n">dot</span><span class="p">:</span>
<span class="k">return</span> <span class="n">dot</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_process_whitespace</span><span class="p">()</span>
<span class="n">name</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_name</span><span class="p">()</span>
<span class="k">if</span> <span class="n">name</span><span class="p">:</span>
<span class="k">return</span> <span class="n">name</span>
<span class="n">number</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_number</span><span class="p">()</span>
<span class="k">if</span> <span class="n">number</span><span class="p">:</span>
<span class="k">return</span> <span class="n">number</span>
<span class="n">literal</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_literal</span><span class="p">()</span>
<span class="k">if</span> <span class="n">literal</span><span class="p">:</span>
<span class="k">return</span> <span class="n">literal</span>
</code></pre></div>
<p>To pass the parser tests I just need to change the implementation of <code>parse_assignment</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">parse_assignment</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">variable</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_variable</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">":"</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">"="</span><span class="p">))</span>
<span class="n">value</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
</code></pre></div>
<h2 id="level-18-statements-and-compound-statements">Level 18 - Statements and compound statements<a class="headerlink" href="#level-18-statements-and-compound-statements" title="Permanent link">¶</a></h2>
<p>In Pascal a compound statement is a list of statements enclosed between <code>BEGIN</code> and <code>END</code>, so the final grammar we want to have in this post is</p>
<div class="highlight"><pre><span></span><code>compound_statement : BEGIN statement_list END
statement_list : statement | statement SEMI statement_list
statement : compound_statement | assignment_statement | empty
assignment_statement : variable ASSIGN expr
</code></pre></div>
<p>As you can see this is a recursive definition, as the <code>statement_list</code> contains one or more <code>statement</code>, and each of them can be a <code>compound_statement</code>. The following is indeed a valid Pascal program</p>
<div class="highlight"><pre><span></span><code><span class="k">BEGIN</span>
<span class="w"> </span><span class="k">BEGIN</span>
<span class="w"> </span><span class="k">BEGIN</span>
<span class="w"> </span><span class="nb">writeln</span><span class="p">(</span><span class="err">"</span><span class="n">Valid</span><span class="err">!"</span><span class="p">)</span>
<span class="w"> </span><span class="k">END</span>
<span class="w"> </span><span class="k">END</span>
<span class="k">END</span><span class="o">.</span>
</code></pre></div>
<p>Recursive algorithms are not simple, and it takes some time to tackle them properly. Let's try to implement one small feature at a time. The first test is that <code>parse_statement</code> should be able to parse assignments</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_statement_assignment</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"x := 5"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">}</span>
</code></pre></div>
<p>In future, statements will be more than just assignments, so this test is the first of many others that we will eventually have for <code>parse_statement</code>. The second test we need is that a compound statement can contain an empty list of statements.</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_empty_compound_statement</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN END"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_compound_statement</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span> <span class="s2">"statements"</span><span class="p">:</span> <span class="p">[]}</span>
</code></pre></div>
<p>After this is done, I want to test that the compound statement can contains one single statement</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_compound_statement_one_statement</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN x:= 5 END"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_compound_statement</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="p">}</span>
</code></pre></div>
<p>and multiple statements separated by semicolon</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_compound_statement_multiple_statements</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN x:= 5; y:=6; z:=7 END"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_compound_statement</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"y"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">6</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"z"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">7</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">],</span>
<span class="p">}</span>
</code></pre></div>
<h3 id="solution_1">Solution<a class="headerlink" href="#solution_1" title="Permanent link">¶</a></h3>
<p>To pass the first test it is sufficient to add a method <code>parse_statement</code> that calls <code>parse_assignment</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_assignment</span><span class="p">()</span>
</code></pre></div>
<p>The second test requires a bit more code. I need to define a method <code>parse_compound_statement</code> and this has to return a specific new type of node. A compound statement is s list of statements that have to be executed in order, so it's time to define a class <code>CompoundStatementNode</code></p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">CompoundStatementNode</span><span class="p">(</span><span class="n">Node</span><span class="p">):</span>
<span class="n">node_type</span> <span class="o">=</span> <span class="s2">"compound_statement"</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">statements</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">statements</span> <span class="o">=</span> <span class="n">statements</span> <span class="k">if</span> <span class="n">statements</span> <span class="k">else</span> <span class="p">[]</span>
<span class="k">def</span> <span class="nf">asdict</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">node_type</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span><span class="n">statement</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="k">for</span> <span class="n">statement</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">statements</span><span class="p">],</span>
<span class="p">}</span>
</code></pre></div>
<p>and at this point <code>parse_compound_statement</code> is trivial, at least for now</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_compound_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">END</span><span class="p">))</span>
<span class="k">return</span> <span class="n">CompoundStatementNode</span><span class="p">()</span>
</code></pre></div>
<p>With the third test we have to add the processing of a single statement. As this is optional, it's a good use case for our lexer as a context manager</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_compound_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">))</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">END</span><span class="p">))</span>
<span class="k">return</span> <span class="n">CompoundStatementNode</span><span class="p">(</span><span class="n">nodes</span><span class="p">)</span>
</code></pre></div>
<p>And finally, for the fourth test, I have to process optional further statements separated by semicolons. For this, I make use of the method <code>peek_token</code> to look ahead and see if there is another statement to process</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_compound_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">))</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span> <span class="o">==</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">))</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">END</span><span class="p">))</span>
<span class="k">return</span> <span class="n">CompoundStatementNode</span><span class="p">(</span><span class="n">nodes</span><span class="p">)</span>
</code></pre></div>
<h2 id="level-19-recursive-compound-statements">Level 19 - Recursive compound statements<a class="headerlink" href="#level-19-recursive-compound-statements" title="Permanent link">¶</a></h2>
<p>To verify that compound statements are actually recursive, we can add this test</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_compound_statement_multiple_statements_with_compund_statement</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN x:= 5; BEGIN y := 6 END ; z:=7 END"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_compound_statement</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"y"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">6</span><span class="p">},</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"z"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">7</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">],</span>
<span class="p">}</span>
</code></pre></div>
<p>where the second statement is a compound statement itself. After this is done we can test the visitor (<code>tests/test_calc_visitor.py</code>) and see if we can process single statements</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_compound_statement_one_statement</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">None</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">isvariable</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">5</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"integer"</span>
</code></pre></div>
<p>Multiple statements</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_compound_statement_multiple_statements</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"y"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">6</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"z"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">7</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">],</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">None</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">isvariable</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">5</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"integer"</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">isvariable</span><span class="p">(</span><span class="s2">"y"</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="s2">"y"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">6</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="s2">"y"</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"integer"</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">isvariable</span><span class="p">(</span><span class="s2">"z"</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="s2">"z"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">7</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="s2">"z"</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"integer"</span>
</code></pre></div>
<p>and recursive compound statements</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_compound_statement_multiple_statements_with_compund_statement</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"y"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">6</span><span class="p">},</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"z"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">7</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">],</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">None</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">isvariable</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">5</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"integer"</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">isvariable</span><span class="p">(</span><span class="s2">"y"</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="s2">"y"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">6</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="s2">"y"</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"integer"</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">isvariable</span><span class="p">(</span><span class="s2">"z"</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="s2">"z"</span><span class="p">)</span> <span class="o">==</span> <span class="mi">7</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="s2">"z"</span><span class="p">)</span> <span class="o">==</span> <span class="s2">"integer"</span>
</code></pre></div>
<h3 id="solution_2">Solution<a class="headerlink" href="#solution_2" title="Permanent link">¶</a></h3>
<p>Before I added the first test I quickly refactored the code to follow the grammar a bit more closely, introducing <code>parse_statement_list</code> and calling it from <code>parse_compound_statement</code>. This is just a matter of isolating the part of the code that deals with the list of statements in its own method</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_statement_list</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span> <span class="o">==</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">))</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="k">return</span> <span class="n">nodes</span>
<span class="k">def</span> <span class="nf">parse_compound_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">))</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement_list</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">END</span><span class="p">))</span>
<span class="k">return</span> <span class="n">CompoundStatementNode</span><span class="p">(</span><span class="n">nodes</span><span class="p">)</span>
</code></pre></div>
<p>after this I introduce the new test, and to pass it I need to change <code>parse_statement</code> so that it parses either an assignment or a compound statement</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_assignment</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_compound_statement</span><span class="p">()</span>
</code></pre></div>
<p>Before I move to the visitor, I want to discuss a choice that I have here. The current version of the method <code>parse_statement_list</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_statement_list</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="k">while</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span> <span class="o">==</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">))</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="k">return</span> <span class="n">nodes</span>
</code></pre></div>
<p>might be easily written in a recursive way, to better match the grammar, becoming</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_statement_list</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">statement_node</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_statement</span><span class="p">()</span>
<span class="k">if</span> <span class="n">statement_node</span><span class="p">:</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">statement_node</span><span class="p">)</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s2">";"</span><span class="p">))</span>
<span class="n">nodes</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">parse_statement_list</span><span class="p">())</span>
<span class="k">return</span> <span class="n">nodes</span>
</code></pre></div>
<p>As you can see if you replace the code all the test pass, so the solution is technically correct. While recursive algorithms are elegant and compact, however, in this case I will stick to the first version. Using a recursive approach introduces a limit to the number of calls, and while in this little project we won't probably have this issue, I think it is worth mentioning it. Both solutions are correct, though, so feel free to choose the recursive path if you happen to like it more.</p>
<p>The tests for the visitor can be passed with a minimal change, as the visitor itself just needs to be aware of <code>compound_statement</code> nodes and to know how to process them. So, I added a new condition to the method <code>visit</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s2">"type"</span><span class="p">]</span> <span class="o">==</span> <span class="s2">"compound_statement"</span><span class="p">:</span>
<span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="k">for</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">node</span><span class="p">[</span><span class="s2">"statements"</span><span class="p">]]</span>
</code></pre></div>
<p>which passes all the three new tests added for the visitor.</p>
<h2 id="level-20-pascal-programs-and-case-insensitive-names">Level 20 - Pascal programs and case insensitive names<a class="headerlink" href="#level-20-pascal-programs-and-case-insensitive-names" title="Permanent link">¶</a></h2>
<p>A Pascal program ends with a dot, so we should introduce a new endpoint <code>parse_program</code> and test that it works. The first test verifies that we can parse an empty program</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_empty_program</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN END."</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_program</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span> <span class="s2">"statements"</span><span class="p">:</span> <span class="p">[]}</span>
</code></pre></div>
<p>and the second tests that the final dot can't be missing</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">pytest</span>
<span class="kn">from</span> <span class="nn">smallcalc.calc_lexer</span> <span class="kn">import</span> <span class="n">TokenError</span>
<span class="k">def</span> <span class="nf">test_parse_program_requires_the_final_dot</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN END"</span><span class="p">)</span>
<span class="k">with</span> <span class="n">pytest</span><span class="o">.</span><span class="n">raises</span><span class="p">(</span><span class="n">TokenError</span><span class="p">):</span>
<span class="n">p</span><span class="o">.</span><span class="n">parse_program</span><span class="p">()</span>
</code></pre></div>
<p>Notice that I imported <code>pytest</code> and the <code>TokenError</code> exception to build a negative test (i.e. to test something that fails). The last test verifies a non-empty program can be parsed</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_program_with_nested_statements</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"BEGIN x:= 5; BEGIN y := 6 END ; z:=7 END."</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_program</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">5</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"compound_statement"</span><span class="p">,</span>
<span class="s2">"statements"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"y"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">6</span><span class="p">},</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"z"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span> <span class="s2">"value"</span><span class="p">:</span> <span class="mi">7</span><span class="p">},</span>
<span class="p">},</span>
<span class="p">],</span>
<span class="p">}</span>
</code></pre></div>
<p>When all these tests pass we are almost done for this post, and we just need to make the parser treat names in a case insensitive way. In Pascal, both variables and keywords are case-insensitive, so <code>BEGIN</code> and <code>begin</code> are the same keyword (or <code>BeGiN</code>, though I think this might be a misinterpretation of the concept of "snake case" =) ), and the same is valid for variables: you can define <code>MYVAR</code> and use <code>myvar</code>.</p>
<p>To test this behaviour I changed the test <code>test_get_tokens_understands_uppercase_letters</code> into <code>test_get_tokens_is_case_insensitive</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_is_case_insensitive</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"SomeVar"</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">NAME</span><span class="p">,</span> <span class="s2">"somevar"</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">),</span>
<span class="p">]</span>
</code></pre></div>
<p>and added the test for the two keywords we defined so far</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_begin_and_end_case_insensitive</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"begin end"</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">BEGIN</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">END</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">),</span>
<span class="p">]</span>
</code></pre></div>
<h3 id="solution_3">Solution<a class="headerlink" href="#solution_3" title="Permanent link">¶</a></h3>
<p>To parse a program we need to introduce the aptly named endpoint <code>parse_program</code>, which just parses a compound statement (the program) and the final dot.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_program</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">compound_statement</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_compound_statement</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">DOT</span><span class="p">))</span>
<span class="k">return</span> <span class="n">compound_statement</span>
</code></pre></div>
<p>As for the case insensitive names, it's just a matter of changing the method <code>_process_name</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_process_name</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">regexp</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="sa">r</span><span class="s2">"[a-zA-Z_]+"</span><span class="p">)</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">regexp</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">tail</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">match</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">token_string</span> <span class="o">=</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">()</span>
<span class="k">if</span> <span class="n">token_string</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span> <span class="ow">in</span> <span class="n">RESERVED_KEYWORDS</span><span class="p">:</span>
<span class="n">tok</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">token_string</span><span class="o">.</span><span class="n">upper</span><span class="p">())</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">tok</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">NAME</span><span class="p">,</span> <span class="n">token_string</span><span class="o">.</span><span class="n">lower</span><span class="p">())</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_set_current_token_and_skip</span><span class="p">(</span><span class="n">tok</span><span class="p">)</span>
</code></pre></div>
<p>Note that I decided to keep internally keywords with uppercase names and variables with lowercase ones. This is really just a matter of personal taste at this point of the project (and probably will always be), so feel free to follow the structure you like the most.</p>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>That was something! I was honestly impressed by how easily I could introduce changes in the language and add new feature, a testimony that the TDD methodology is a really powerful tool to have in your belt. Thanks again to <a href="https://ruslanspivak.com/pages/about/">Ruslan Spivak</a> for his work and his inspiring posts!</p>
<p>The code I developed in this post is available on the GitHub repository tagged with <code>part5</code> (<a href="https://github.com/lgiordani/smallcalc/tree/part5">link</a>).</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>A game of tokens: solution - Part 42018-06-02T13:30:00+00:002018-06-02T13:30:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2018-06-02:/blog/2018/06/02/a-game-of-tokens-solution-part-4/<p>This post originally contained my solution to the challenge posted <a href="https://www.thedigitalcatonline.com/blog/2018/06/02/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-4/">here</a>. I moved those solutions inside the post itself, under the "Solution" subsections.</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>A game of tokens: write an interpreter in Python with TDD - Part 42018-06-02T13:00:00+00:002020-08-05T11:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2018-06-02:/blog/2018/06/02/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-4/<h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2>
<p>In the first three parts of this series of posts we developed together a calculator using a pure TDD methodology. In the <a href="https://www.thedigitalcatonline.com/blog/2017/10/31/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-3/">previous post</a> we added support for variables.</p>
<p>In this new post we will first add the exponentiation operation. The operator will be challenging because it has a high priority, so we will need to spice up the peek functions to look at multiple tokens.</p>
<p>Then I will show you how I performed a refactoring of the code introducing a new version of the lexer that greatly simplifies the code of the parser.</p>
<h2 id="level-15-exponentiation">Level 15 - Exponentiation<a class="headerlink" href="#level-15-exponentiation" title="Permanent link">¶</a></h2>
<p><em>That is power.</em> - Conan the Barbarian (1982)</p>
<p>The exponentiation operation is simple, and Python uses the double star operator to represent it</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="mi">2</span><span class="o">**</span><span class="mi">3</span>
<span class="mi">8</span>
</code></pre></div>
<p>The main problem that we will face implementing it is the priority of such an operation. Traditionally, this operator has precedence on the basic arithmetic operations (sum, difference, multiplication, division). So if I write</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="mi">1</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">**</span> <span class="mi">3</span>
<span class="mi">9</span>
</code></pre></div>
<p>Python correctly computes <code>1 + (2 ** 3)</code> and not <code>(1 + 2) ** 3</code>. As we did with multiplication and division, then, we will need to create a specific step to parse this operation.</p>
<p>In small calc, the exponentiation will be associated to the symbol <code>^</code>, so <code>2^3</code> will mean 2 to the power of 3 (<code>2**3</code> in Python).</p>
<h3 id="lexer">Lexer<a class="headerlink" href="#lexer" title="Permanent link">¶</a></h3>
<p>The lexer has a simple task, that of recognising the symbol <code>'^'</code> as a <code>LITERAL</code> token. The test goes into <code>tests/test_calc_lexer.py</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_exponentiation</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'2 ^ 3'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'2'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'^'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'3'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">]</span>
</code></pre></div>
<p>Does your code already pass the test? If yes, why?</p>
<h3 id="parser">Parser<a class="headerlink" href="#parser" title="Permanent link">¶</a></h3>
<p>It's time to test the proper parsing of the exponentiation operation. Add this test to <code>tests/test_calc_parser.py</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_exponentiation</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"2 ^ 3"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'exponentiation'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'^'</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>As you can see this test checks directly the method <code>parse_exponentiation</code>, so you just need to properly implement that, at this stage.</p>
<p>To allow the use of the exponentiation operator <code>'^'</code> in the calculator, however, we have to integrate it with the rest of the parse functions, so we will add three tests to the same file. The first one tests that the natural priority of the exponentiation operator is higher than the multiplication</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_exponentiation_with_other_operators</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"3 * 2 ^ 3"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_term</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'*'</span>
<span class="p">},</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'exponentiation'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'^'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The second one checks that the parentheses still change the priority</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_exponentiation_with_parenthesis</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"(3 + 2) ^ 3"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_term</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'exponentiation'</span><span class="p">,</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'^'</span>
<span class="p">},</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">},</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>And the third one checks that unary operators still have a higher priority than the exponentiation operator</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_exponentiation_with_negative_base</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"-2 ^ 2"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'exponentiation'</span><span class="p">,</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'^'</span>
<span class="p">},</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'unary'</span><span class="p">,</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'-'</span>
<span class="p">},</span>
<span class="s1">'content'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>My advice is to add the first test and to try and pass that one changing the code. If your change doesn't touch too much of the existing parse methods, chances are that the following two tests will pass as well.</p>
<h3 id="visitor">Visitor<a class="headerlink" href="#visitor" title="Permanent link">¶</a></h3>
<p>Last, we need to properly expose the exponentiation operation in the CLI, which means to change the Visitor in order to support nodes of type <code>'exponentiation</code>. The test that we need to add to <code>tests/test_calc_visitor.py</code> is</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_exponentiation</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'exponentiation'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'^'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="o">==</span> <span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="s1">'integer'</span><span class="p">)</span>
</code></pre></div>
<p>And the change to the <code>CalcVisitor</code> class should be very easy as we need to simply process a new type of node.</p>
<hr>
<h3 id="solution">Solution<a class="headerlink" href="#solution" title="Permanent link">¶</a></h3>
<p>The lexer can process the exponentiation operator <code>^</code> out of the box as a <code>LITERAL</code> token, so no changes to the code are needed.</p>
<p>The test <code>test_parse_exponentiation</code> can be passed adding a <code>PowerNode</code> class.</p>
<p>Note: After I wrote and committed the solution I realised that the class called <code>PowerNode</code> should have been called <code>ExponentiationNode</code>, the former being a leftover of a previous incorrect nomenclature. I will eventually fix it in one of the refactoring steps, trying to convert a mistake into a good example of TDD.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">PowerNode</span><span class="p">(</span><span class="n">BinaryNode</span><span class="p">):</span>
<span class="n">node_type</span> <span class="o">=</span> <span class="s1">'exponentiation'</span>
</code></pre></div>
<p>and a method <code>parse_exponentiation</code> to the parser</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_exponentiation</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_factor</span><span class="p">()</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span> <span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="o">==</span> <span class="s1">'^'</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_symbol</span><span class="p">()</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="k">return</span> <span class="n">PowerNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="k">return</span> <span class="n">left</span>
</code></pre></div>
<p>This allows the parser to explicitly parse the exponentiation operation, but when the operation is mixed with others the parser doesn't know how to deal with it, as <code>parse_exponentiation</code> is not called by any other method.</p>
<p>To pass the <code>test_parse_exponentiation_with_other_operators</code> test we need to change the method <code>parse_term</code> and call <code>parse_exponentiation</code> instead of <code>parse_factor</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">while</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span>\
<span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'*'</span><span class="p">,</span> <span class="s1">'/'</span><span class="p">]:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_symbol</span><span class="p">()</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
</code></pre></div>
<p>the full code of the <code>CalcParser</code> class is now</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">CalcParser</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">_parse_symbol</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">LiteralNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">parse_integer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">IntegerNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_parse_variable</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">VariableNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">parse_factor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span> <span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'-'</span><span class="p">,</span> <span class="s1">'+'</span><span class="p">]:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_symbol</span><span class="p">()</span>
<span class="n">factor</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_factor</span><span class="p">()</span>
<span class="k">return</span> <span class="n">UnaryNode</span><span class="p">(</span><span class="n">operator</span><span class="p">,</span> <span class="n">factor</span><span class="p">)</span>
<span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span> <span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="o">==</span> <span class="s1">'('</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard_type</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">)</span>
<span class="n">expression</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard_type</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">)</span>
<span class="k">return</span> <span class="n">expression</span>
<span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">NAME</span><span class="p">:</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">VariableNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">parse_exponentiation</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_factor</span><span class="p">()</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span> <span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="o">==</span> <span class="s1">'^'</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_symbol</span><span class="p">()</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="k">return</span> <span class="n">PowerNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="k">return</span> <span class="n">left</span>
<span class="k">def</span> <span class="nf">parse_term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">while</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span>\
<span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'*'</span><span class="p">,</span> <span class="s1">'/'</span><span class="p">]:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_symbol</span><span class="p">()</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">left</span>
<span class="k">def</span> <span class="nf">parse_expression</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_term</span><span class="p">()</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">while</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span>\
<span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'+'</span><span class="p">,</span> <span class="s1">'-'</span><span class="p">]:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_symbol</span><span class="p">()</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_term</span><span class="p">()</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">left</span>
<span class="k">def</span> <span class="nf">parse_assignment</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">variable</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_variable</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'='</span><span class="p">))</span>
<span class="n">value</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="k">return</span> <span class="n">AssignmentNode</span><span class="p">(</span><span class="n">variable</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">parse_line</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">stash</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_assignment</span><span class="p">()</span>
<span class="k">except</span> <span class="n">clex</span><span class="o">.</span><span class="n">TokenError</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
</code></pre></div>
<p>The given test <code>test_visitor_exponentiation</code> requires the <code>CalcVisitor</code> to parse nodes of type <code>exponentiation</code>. The code required to do this is</p>
<div class="highlight"><pre><span></span><code> <span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'exponentiation'</span><span class="p">:</span>
<span class="n">lvalue</span><span class="p">,</span> <span class="n">ltype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'left'</span><span class="p">])</span>
<span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'right'</span><span class="p">])</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">**</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">ltype</span>
</code></pre></div>
<p>as a final case for the <code>CalcVisitor</code> class. The full code of the class is is now</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">CalcVisitor</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">variables</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">def</span> <span class="nf">isvariable</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="k">return</span> <span class="n">name</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">variables</span>
<span class="k">def</span> <span class="nf">valueof</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">variables</span><span class="p">[</span><span class="n">name</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">typeof</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">variables</span><span class="p">[</span><span class="n">name</span><span class="p">][</span><span class="s1">'type'</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'integer'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">],</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'variable'</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]),</span> <span class="bp">self</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">])</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'unary'</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'operator'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'content'</span><span class="p">])</span>
<span class="k">if</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'-'</span><span class="p">:</span>
<span class="k">return</span> <span class="o">-</span> <span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span>
<span class="k">return</span> <span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'binary'</span><span class="p">:</span>
<span class="n">lvalue</span><span class="p">,</span> <span class="n">ltype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'left'</span><span class="p">])</span>
<span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'right'</span><span class="p">])</span>
<span class="n">operator</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'operator'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'+'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">+</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'-'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">-</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'*'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">*</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'/'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">//</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'assignment'</span><span class="p">:</span>
<span class="n">right_value</span><span class="p">,</span> <span class="n">right_type</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">variables</span><span class="p">[</span><span class="n">node</span><span class="p">[</span><span class="s1">'variable'</span><span class="p">]]</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">right_value</span><span class="p">,</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="n">right_type</span>
<span class="p">}</span>
<span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span>
</code></pre></div>
<hr>
<h2 id="intermezzo-refactoring-with-tests">Intermezzo - Refactoring with tests<a class="headerlink" href="#intermezzo-refactoring-with-tests" title="Permanent link">¶</a></h2>
<p><em>See? You just had to see it in context.</em> - Scrooged (1988)</p>
<p>So our little project is growing, and the TDD methodology we are following gives us plenty of confidence in what we did. There as for sure bugs we are not aware of, but we are sure that the cases that we tested are correctly handled by our code.</p>
<p>As happens in many projects at a certain point it's time for refactoring. We implemented solutions to the problems that we found along the way, but are we sure we avoided duplicating code, that we chose the best solution for some algorithms, or more simply that the names we chose for the variables are clear?</p>
<p>Refactoring means basically to change the internal structure of something without changing its external behaviour, and tests are a priceless help in this phase. The tests we wrote are there to ensure that the previous behaviour does not change. Or, if it changes, that we are perfectly aware of it.</p>
<p>In this section, thus, I want to guide you through a refactoring guided by tests. If you are following this series and writing your own code this section will not add anything to the project, but I recommend that you read it anyway, as it shows why tests are so important in a software project.</p>
<p>If you want to follow the refactoring on the repository you can create a branch on the tag <code>context-manager-refactoring</code> and work there. From that commit I implemented the steps you will find in the next sections.</p>
<h3 id="context-managers">Context managers<a class="headerlink" href="#context-managers" title="Permanent link">¶</a></h3>
<p>The main issue the current code has is that the lexer cannot automatically recover a past status, that is we cannot easily try to parse something and, when we discover that the initial guess is wrong, go back in time and start over.</p>
<p>Let's consider the method <code>parse_line</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_line</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">stash</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_assignment</span><span class="p">()</span>
<span class="k">except</span> <span class="n">clex</span><span class="o">.</span><span class="n">TokenError</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
</code></pre></div>
<p>Since a line can contain either an assignment or an expression the first thing this function does is to save the lexer status with <code>stash</code> and try to parse an assignment. If the code is not an assignment somewhere is the code the <code>TokenError</code> exception is raised, and <code>parse_line</code> restores the previous status of the lexer and tries to parse an expression.</p>
<p>The same thing happens in other methods like <code>parse_term</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">while</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span>\
<span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'*'</span><span class="p">,</span> <span class="s1">'/'</span><span class="p">]:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_symbol</span><span class="p">()</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">left</span>
</code></pre></div>
<p>where the use of <code>lexer.peek_token</code> and the <code>while</code> loop show that the lexer class requires too much control from its user.</p>
<p>Back to <code>parse_line</code>, it's clear that the code works, but it is not immediately easy to understand what the function does and when the old status is recovered. I would really prefer something like</p>
<div class="highlight"><pre><span></span><code><span class="c1"># PSEUDOCODE</span>
<span class="k">def</span> <span class="nf">parse_line</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">ATTEMPT</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_assignment</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
</code></pre></div>
<p>where I used a pseudo-keyword <code>ATTEMPT:</code> to signal that somehow the lexer status is automatically stored at the beginning and retrieved at the end of it.</p>
<p>There's a very powerful concept in Python that allows us to write code like this, and it is called <em>context manager</em>. I won't go into the theory and syntax of context managers here, please refer to the Python documentation or your favourite course/book/website to discover how context managers work.</p>
<p>If I can add context manager features to the lexer the code of <code>parse_line</code> might become</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_line</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_assignment</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
</code></pre></div>
<h3 id="lexer_1">Lexer<a class="headerlink" href="#lexer_1" title="Permanent link">¶</a></h3>
<p>The first move is to transform the lexer into a context manager that does nothing. The test in <code>tests/test_calc_lexer.py</code> is</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_lexer_as_context_manager</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'abcd'</span><span class="p">)</span>
<span class="k">with</span> <span class="n">l</span><span class="p">:</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span> <span class="o">==</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">NAME</span><span class="p">,</span> <span class="s1">'abcd'</span><span class="p">)</span>
</code></pre></div>
<p>When this works we have to be sure that the lexer does not restore the previous state outside the <code>with</code> statement if the code inside the statement ended without errors. The new test is</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_lexer_as_context_manager_does_not_restore_the_status_if_no_error</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3 * 5'</span><span class="p">)</span>
<span class="k">with</span> <span class="n">l</span><span class="p">:</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span> <span class="o">==</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span> <span class="o">==</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'*'</span><span class="p">)</span>
</code></pre></div>
<p>Conversely, we need to be sure that the status is restored when the code inside the <code>with</code> statement fails, which is the whole point of the context manager. This is tested by</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_lexer_as_context_manager_restores_the_status_if_token_error</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3 * 5'</span><span class="p">)</span>
<span class="k">with</span> <span class="n">l</span><span class="p">:</span>
<span class="n">l</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">raise</span> <span class="n">clex</span><span class="o">.</span><span class="n">TokenError</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span> <span class="o">==</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
</code></pre></div>
<p>When these three tests pass, we have a fully working context manager lexer, that reacts to <code>TokenError</code> exceptions going back to the previously stored status.</p>
<h3 id="parser_1">Parser<a class="headerlink" href="#parser_1" title="Permanent link">¶</a></h3>
<p>If the context manager lexer works as intended we should be able to replace the code of the parser without changing any test. The new code for <code>parse_line</code> is the one I showed before</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_line</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_assignment</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
</code></pre></div>
<p>and it works flawlessly.</p>
<p>The context manager part of the lexer, however, works if the code inside the <code>with</code> statement raises a <code>TokenError</code> exception when it fails. That exception is a signal to the context manager that the parsing path is not leading anywhere and it shall go back to the previous state.</p>
<h4 id="manage-literals">Manage literals<a class="headerlink" href="#manage-literals" title="Permanent link">¶</a></h4>
<p>The method <code>_parse_symbol</code> is often used after some checks like</p>
<div class="highlight"><pre><span></span><code> <span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span> <span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'-'</span><span class="p">,</span> <span class="s1">'+'</span><span class="p">]:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_symbol</span><span class="p">()</span>
</code></pre></div>
<p>I would prefer to include the checks in the method itself, so that it might be included in a <code>with</code> statement. First of all the method can be renamed to <code>_parse_literal</code>, and being an internal method I don't expect any test to fail</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_parse_literal</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">LiteralNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
</code></pre></div>
<p>The method should also raise a <code>TokenError</code> when the token is not a <code>LITERAL</code>, and when the values are not the expected ones</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_parse_literal</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">values</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">if</span> <span class="n">t</span><span class="o">.</span><span class="n">type</span> <span class="o">!=</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">clex</span><span class="o">.</span><span class="n">TokenError</span>
<span class="k">if</span> <span class="n">values</span> <span class="ow">and</span> <span class="n">t</span><span class="o">.</span><span class="n">value</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">values</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">clex</span><span class="o">.</span><span class="n">TokenError</span>
<span class="k">return</span> <span class="n">LiteralNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
</code></pre></div>
<p>Note that using the default value for the <code>values</code> parameter I didn't change the current behaviour. The whole battery of tests still passes without errors.</p>
<h4 id="parsing-factors">Parsing factors<a class="headerlink" href="#parsing-factors" title="Permanent link">¶</a></h4>
<p>The next method that we can start changing is <code>parse_factor</code>. The first pattern this function tries to parse is an unary node</p>
<div class="highlight"><pre><span></span><code> <span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span> <span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'-'</span><span class="p">,</span> <span class="s1">'+'</span><span class="p">]:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_literal</span><span class="p">()</span>
<span class="n">factor</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_factor</span><span class="p">()</span>
<span class="k">return</span> <span class="n">UnaryNode</span><span class="p">(</span><span class="n">operator</span><span class="p">,</span> <span class="n">factor</span><span class="p">)</span>
</code></pre></div>
<p>which may be converted to</p>
<div class="highlight"><pre><span></span><code> <span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_literal</span><span class="p">([</span><span class="s1">'+'</span><span class="p">,</span> <span class="s1">'-'</span><span class="p">])</span>
<span class="n">content</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_factor</span><span class="p">()</span>
<span class="k">return</span> <span class="n">UnaryNode</span><span class="p">(</span><span class="n">operator</span><span class="p">,</span> <span class="n">content</span><span class="p">)</span>
</code></pre></div>
<p>while still passing the whole test suite.</p>
<p>The second pattern are expressions surrounded by parentheses</p>
<div class="highlight"><pre><span></span><code> <span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span> <span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="o">==</span> <span class="s1">'('</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard_type</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">)</span>
<span class="n">expression</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard_type</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">)</span>
<span class="k">return</span> <span class="n">expression</span>
</code></pre></div>
<p>and this is easily converted to the new syntax as well</p>
<div class="highlight"><pre><span></span><code> <span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_parse_literal</span><span class="p">([</span><span class="s1">'('</span><span class="p">])</span>
<span class="n">expression</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_parse_literal</span><span class="p">([</span><span class="s1">')'</span><span class="p">])</span>
<span class="k">return</span> <span class="n">expression</span>
</code></pre></div>
<h4 id="parsing-exponentiation-operations">Parsing exponentiation operations<a class="headerlink" href="#parsing-exponentiation-operations" title="Permanent link">¶</a></h4>
<p>To change the method <code>parse_exponentiation</code> we need first to make the <code>_parse_variable</code> return a <code>TokenError</code> in case of wrong token</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_parse_variable</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">if</span> <span class="n">t</span><span class="o">.</span><span class="n">type</span> <span class="o">!=</span> <span class="n">clex</span><span class="o">.</span><span class="n">NAME</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">clex</span><span class="o">.</span><span class="n">TokenError</span>
<span class="k">return</span> <span class="n">VariableNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
</code></pre></div>
<p>then we can change the method we are interested in</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_exponentiation</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_factor</span><span class="p">()</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_literal</span><span class="p">([</span><span class="s1">'^'</span><span class="p">])</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="k">return</span> <span class="n">PowerNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="k">return</span> <span class="n">left</span>
</code></pre></div>
<p>Doing this last substitution I also notice that there is some duplicated code in <code>parse_factor</code>, and I replace it with a call to <code>_parse_variable</code>. The test suite keeps passing, giving me the certainty that what I did does not change the behaviour of the code (at least the behaviour that is covered by tests).</p>
<h4 id="parsing-terms">Parsing terms<a class="headerlink" href="#parsing-terms" title="Permanent link">¶</a></h4>
<p>Now, the method <code>parse_term</code> will be problematic. To implement this function I used a <code>while</code> loop that keeps using the method <code>parse_exponentiation</code> until the separation token is a <code>LITERAL</code> with value <code>*</code> or <code>/</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">while</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span>\
<span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'*'</span><span class="p">,</span> <span class="s1">'/'</span><span class="p">]:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_literal</span><span class="p">()</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">left</span>
</code></pre></div>
<p>This is not a pure recursive call, then, and replacing the code with the context manager lexer would result in errors, because the context manage doesn't loop. The same situation is replicated in <code>parse_expression</code>.</p>
<p>This is another typical situation that we face when refactoring code. We realise that the required change is made of multiple steps and that multiple tests will fail until all the steps are completed.</p>
<p>There is no single solution to this problem, but TDD gives you some hints to deal with it. The most important "rule" that I follow when I work in a TDD environment is that there should be maximum one failing test at a time. And when a code change makes multiple tests fail there is a simple way to reach this situation: comment out tests.</p>
<p>Yes, you should temporarily get rid of tests, so you can concentrate in writing code that passes the subset of active tests. Then you will add one test at a time, fixing the code or the tests according to your needs. When you refactor it might be necessary to change the tests as well, as sometimes we test part of the code that are not exactly an external boundary.</p>
<p>I can now change the code of the <code>parse_term</code> function introducing the context manager</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_literal</span><span class="p">([</span><span class="s1">'*'</span><span class="p">,</span> <span class="s1">'/'</span><span class="p">])</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="k">return</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="k">return</span> <span class="n">left</span>
</code></pre></div>
<p>and the test suite runs with one single failing test, <code>test_parse_term_with_multiple_operations</code>. I have now to work on it and try to understand why the test fails.</p>
<p>I decided to go for a pure recursive approach (no more <code>while</code> loops), which is what standard language parsers do. After working on it the new version of <code>parse_term</code> is</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_factor</span><span class="p">()</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_literal</span><span class="p">([</span><span class="s1">'*'</span><span class="p">,</span> <span class="s1">'/'</span><span class="p">])</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_term</span><span class="p">()</span>
<span class="k">return</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="k">return</span> <span class="n">left</span>
</code></pre></div>
<p>And this change makes 3 tests fail. The same <code>test_parse_term_with_multiple_operations</code> that was failing before, plus <code>test_parse_exponentiation_with_other_operators</code>, and <code>test_parse_exponentiation_with_parenthesis</code>. The last two actually test the method <code>parse_exponentiation</code>, which uses <code>parse_term</code>. This means that I can temporarily comment them and concentrate on the first one.</p>
<p>What I discover is that changing the code to use the recursive approach changes the output of the functions. The previous output of <code>parse_term</code> applied to <code>2 * 3 / 4</code> was</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'*'</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'/'</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>that is, the multiplication was stored first. Moving to a recursive approach makes the <code>parse_term</code> function return this</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'/'</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'*'</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>I think it is pretty clear that this structure, once visited, will return the same value as the previous one, as multiplication and division can be swapped. We will have to pay attention to this swap when operators with different priority are involved, like sum and multiplication, but for this test we can agree the result is no different.</p>
<p>This means that we may change the test and make it pass. Let me stress it once more: we have to understand why the test doesn't pass, and once we understood the reason, and decided it is acceptable, we can change the test.</p>
<p>Tests are not immutable, they are mere safeguards that raise alarms when you change the behaviour. It's up to you to deal with the alarm and to decide what to do.</p>
<p>Once the test has been modified and the test suite passes, it's time to uncomment the first of the two remaining tests, <code>test_parse_exponentiation_with_other_operators</code>. This test uses <code>parse_term</code> to parse a string that contains an exponentiation, but the new code of the method <code>parse_term</code> doesn't call the <code>parse_exponentiation</code> function. So the test fails.</p>
<h4 id="parsing-exponentiation">Parsing exponentiation<a class="headerlink" href="#parsing-exponentiation" title="Permanent link">¶</a></h4>
<p>That tests tries to parse a string that contains a multiplication and an exponentiation, so the method that we should use to process it is <code>parse_term</code>. The current version of <code>parse_term</code>, however, doesn't consider exponentiation, so the new code is</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_exponentiation</span><span class="p">()</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_literal</span><span class="p">([</span><span class="s1">'*'</span><span class="p">,</span> <span class="s1">'/'</span><span class="p">])</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_term</span><span class="p">()</span>
<span class="k">return</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="k">return</span> <span class="n">left</span>
</code></pre></div>
<p>which makes all the active tests pass.</p>
<p>There is still one commented test, <code>test_parse_exponentiation_with_parenthesis</code>, that now passes with the new code.</p>
<h4 id="parsing-expressions">Parsing expressions<a class="headerlink" href="#parsing-expressions" title="Permanent link">¶</a></h4>
<p>The new version of <code>parse_expression</code> has the same issue we found with <code>parse_term</code>, that is the recursive approach changes the output.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_expression</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_term</span><span class="p">()</span>
<span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_literal</span><span class="p">([</span><span class="s1">'+'</span><span class="p">,</span> <span class="s1">'-'</span><span class="p">])</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="k">return</span> <span class="n">left</span>
</code></pre></div>
<p>As before, we have to decide if the change is acceptable or if it is a real error. As happened for <code>parse_term</code> the test can be safely changes to match the new code output.</p>
<h2 id="level-16-float-numbers">Level 16 - Float numbers<a class="headerlink" href="#level-16-float-numbers" title="Permanent link">¶</a></h2>
<p>So far, our calculator can handle only integer values, so it's time to add support for float numbers. This change shouldn't be complex: floating point numbers are easy to parse as they are basically two integer numbers separated by a dot.</p>
<h3 id="lexer_2">Lexer<a class="headerlink" href="#lexer_2" title="Permanent link">¶</a></h3>
<p>To test if the lexer understands floating point numbers it's enough to add this to <code>tests/test_calc_lexer.py</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_floats</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3.6'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">FLOAT</span><span class="p">,</span> <span class="s1">'3.6'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">]</span>
</code></pre></div>
<h3 id="parser_2">Parser<a class="headerlink" href="#parser_2" title="Permanent link">¶</a></h3>
<p>To support float numbers it's enough to add that feature the method that we use to parse integers. We can rename <code>parse_integer</code> to <code>parse_number</code>, fix the test <code>test_parse_integer</code>, and add <code>test_parse_float</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_integer</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"5"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_number</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">5</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">test_parse_float</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"5.8"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_number</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'float'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mf">5.8</span>
<span class="p">}</span>
</code></pre></div>
<h3 id="visitor_1">Visitor<a class="headerlink" href="#visitor_1" title="Permanent link">¶</a></h3>
<p>The same thing that we did for the parser is valid for the visitor. We just need to copy the test for the integer numbers and adapt it</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_float</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'float'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mf">12.345</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="o">==</span> <span class="p">(</span><span class="mf">12.345</span><span class="p">,</span> <span class="s1">'float'</span><span class="p">)</span>
</code></pre></div>
<p>This however leaves a bug in the visitor. The Test-Driven Development methodology can help you writing and changing your code, but cannot completely avoid bugs in the code. Actually, if you don't test cases, TDD can't do anything to find and remove bugs.</p>
<p>The bug I noticed after a while is that the visitor doesn't correctly manage an operation between integers and floats, returning an integer result. For example, if you sum <code>4</code> with <code>5.1</code> you should get <code>9.1</code> with type <code>float</code>. To test this behaviour we can add this code</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_expression_sum_with_float</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'float'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mf">5.1</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="o">==</span> <span class="p">(</span><span class="mf">9.1</span><span class="p">,</span> <span class="s1">'float'</span><span class="p">)</span>
</code></pre></div>
<hr>
<h3 id="solution_1">Solution<a class="headerlink" href="#solution_1" title="Permanent link">¶</a></h3>
<p>The first thing the lexer need is a label to identify <code>FLOAT</code> tokens</p>
<div class="highlight"><pre><span></span><code><span class="n">FLOAT</span> <span class="o">=</span> <span class="s1">'FLOAT'</span>
</code></pre></div>
<p>then the method <code>_process_integer</code> cna be extended to process float numbers as well. To do this the method is renamed to <code>_process_number</code>, the regular expression is modified, and the <code>token_type</code> is managed according to the presence of the dot.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_process_number</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">regexp</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s1">'[\d\.]+'</span><span class="p">)</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">regexp</span><span class="o">.</span><span class="n">match</span><span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">tail</span>
<span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">match</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">token_string</span> <span class="o">=</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">()</span>
<span class="n">token_type</span> <span class="o">=</span> <span class="n">FLOAT</span> <span class="k">if</span> <span class="s1">'.'</span> <span class="ow">in</span> <span class="n">token_string</span> <span class="k">else</span> <span class="n">INTEGER</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_set_current_token_and_skip</span><span class="p">(</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">token_type</span><span class="p">,</span> <span class="n">token_string</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p>Remember that the <code>get_token</code> function has to be modified to use the new name of the method. The new code is</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">get_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">eof</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_eof</span><span class="p">()</span>
<span class="k">if</span> <span class="n">eof</span><span class="p">:</span>
<span class="k">return</span> <span class="n">eof</span>
<span class="n">eol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_eol</span><span class="p">()</span>
<span class="k">if</span> <span class="n">eol</span><span class="p">:</span>
<span class="k">return</span> <span class="n">eol</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_process_whitespace</span><span class="p">()</span>
<span class="n">name</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_name</span><span class="p">()</span>
<span class="k">if</span> <span class="n">name</span><span class="p">:</span>
<span class="k">return</span> <span class="n">name</span>
<span class="n">integer</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_number</span><span class="p">()</span>
<span class="k">if</span> <span class="n">integer</span><span class="p">:</span>
<span class="k">return</span> <span class="n">integer</span>
<span class="n">literal</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_literal</span><span class="p">()</span>
<span class="k">if</span> <span class="n">literal</span><span class="p">:</span>
<span class="k">return</span> <span class="n">literal</span>
</code></pre></div>
<p>First we need to add a new type of node</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">FloatNode</span><span class="p">(</span><span class="n">ValueNode</span><span class="p">):</span>
<span class="n">node_type</span> <span class="o">=</span> <span class="s1">'float'</span>
</code></pre></div>
<p>The new version of <code>parse_integer</code>, renamed <code>parse_number</code>, shall deal with both cases but also raise the <code>TokenError</code> exception if the parsing fails</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_number</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">if</span> <span class="n">t</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">:</span>
<span class="k">return</span> <span class="n">IntegerNode</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">))</span>
<span class="k">elif</span> <span class="n">t</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">FLOAT</span><span class="p">:</span>
<span class="k">return</span> <span class="n">FloatNode</span><span class="p">(</span><span class="nb">float</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">))</span>
<span class="k">raise</span> <span class="n">clex</span><span class="o">.</span><span class="n">TokenError</span>
</code></pre></div>
<p>The change to support <code>float</code> nodes is trivial, we just need to include it alongside with the <code>integer</code> case</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'integer'</span><span class="p">,</span> <span class="s1">'float'</span><span class="p">]:</span>
<span class="k">return</span> <span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">],</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span>
</code></pre></div>
<p>To fix the missing type promotion when dealing with integers and floats it's enough to add </p>
<div class="highlight"><pre><span></span><code> <span class="k">if</span> <span class="n">ltype</span> <span class="o">==</span> <span class="s1">'float'</span><span class="p">:</span>
<span class="n">rtype</span> <span class="o">=</span> <span class="n">ltype</span>
</code></pre></div>
<p>just before evaluating the operator in the binary nodes. The full code of the method <code>visit</code> is then</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'integer'</span><span class="p">,</span> <span class="s1">'float'</span><span class="p">]:</span>
<span class="k">return</span> <span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">],</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'variable'</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]),</span> <span class="bp">self</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">])</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'unary'</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'operator'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'content'</span><span class="p">])</span>
<span class="k">if</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'-'</span><span class="p">:</span>
<span class="k">return</span> <span class="o">-</span> <span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span>
<span class="k">return</span> <span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'binary'</span><span class="p">:</span>
<span class="n">lvalue</span><span class="p">,</span> <span class="n">ltype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'left'</span><span class="p">])</span>
<span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'right'</span><span class="p">])</span>
<span class="n">operator</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'operator'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">ltype</span> <span class="o">==</span> <span class="s1">'float'</span><span class="p">:</span>
<span class="n">rtype</span> <span class="o">=</span> <span class="n">ltype</span>
<span class="k">if</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'+'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">+</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'-'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">-</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'*'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">*</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'/'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">//</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'assignment'</span><span class="p">:</span>
<span class="n">right_value</span><span class="p">,</span> <span class="n">right_type</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">variables</span><span class="p">[</span><span class="n">node</span><span class="p">[</span><span class="s1">'variable'</span><span class="p">]]</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">right_value</span><span class="p">,</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="n">right_type</span>
<span class="p">}</span>
<span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'exponentiation'</span><span class="p">:</span>
<span class="n">lvalue</span><span class="p">,</span> <span class="n">ltype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'left'</span><span class="p">])</span>
<span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'right'</span><span class="p">])</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">**</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">ltype</span>
</code></pre></div>
<hr>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>This post showed you how powerful the TDD methodology is when it comes to refactoring, or in general when the code has to be changed. Remember that tests can be changed if there are good reasons, and that the main point is to understand what's happening in your code and in the cases that you already tested.</p>
<p>The code I developed in this post is available on the GitHub repository tagged with <code>part4</code> (<a href="https://github.com/lgiordani/smallcalc/tree/part4">link</a>).</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>A game of tokens: solution - Part 32017-10-31T12:00:00+00:002017-10-31T12:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2017-10-31:/blog/2017/10/31/a-game-of-tokens-solution-part-3/<p>This post originally contained my solution to the challenge posted <a href="https://www.thedigitalcatonline.com/blog/2017/10/31/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-3/">here</a>. I moved those solutions inside the post itself, under the "Solution" subsections.</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>A game of tokens: write an interpreter in Python with TDD - Part 32017-10-31T11:00:00+00:002020-08-05T11:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2017-10-31:/blog/2017/10/31/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-3/<h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2>
<p>This is the third instalment of a series of posts on how to write an interpreter in Python. In the <a href="https://www.thedigitalcatonline.com/blog/2017/05/09/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-1/">first part</a> we developed together a small command line calculator that could sum and subtract numbers, while in the <a href="https://www.thedigitalcatonline.com/blog/2017/10/01/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-2/">second part</a> we went further adding multiplication, division and unary plus and minus.</p>
<p>In this third part we wil start adding variables to our calculator, moving towards a proper programming language.</p>
<h2 id="mezzanine-refactoring">Mezzanine - Refactoring<a class="headerlink" href="#mezzanine-refactoring" title="Permanent link">¶</a></h2>
<p>Often, after wroting some code, you realise that some of the original choices you did are not perfect, especially when it comes to variable and function names. Furthermore, you can realise that some of your functions are too long, and you may consider splitting them in mutiple functions to make the code easier to understand and to use.</p>
<p>It is time, then, to reconsider the code of smallcalc and see if we can improve it. Luckily, having all our tests in place, we may refactor it, that is we can change the code with a high degree of confidence, as the tests check that the behaviour of the whole system doesn't change.</p>
<p>The first change is the naming of the method <code>parse_addsymbol</code>, which now can be more aptly named <code>_parse_symbol</code>. As Martin Fowler says in his book "Refactoring" (a recommended reading): "<em>Life being what it is, you won't get your names right the first time. [...] Remember your code is for a human first and a computer second. Humans need good names.</em>" The name of the method will be prefixed with an underscore because this method is used only internally, and shouldn't be used by third parties.</p>
<p>The proper way to change the name of a method involves calling the new method from the old one, but in this case we may safely rely on tests to tell us what needs to be fixed (this is because our codebase is small). We may thus open the file <code>smallcalc/calc_parser.py</code> and change the name to <code>_parse_symbol</code>. At this point, running the test suite, you should have 11 failures. You can fix them with a text replace action of your editor of choice, but I recommend you to make the tests fail before replacing the text. The 3 replacements are in the methods <code>parse_factor</code>, <code>parse_term</code>, and <code>parse_expression</code>.</p>
<p>I then wanted to add two methods, <code>discard</code> and <code>discard_type</code>, to the lexer, to better control what gets discarded. At the moment the code is using <code>self.lexer.get_token</code> which doesn't allow to explicitly check what we are dropping. These are the tests that I added to <code>tests/test_calc_lexer.py</code></p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">pytest</span>
<span class="k">def</span> <span class="nf">test_discard_tokens</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3 + 5'</span><span class="p">)</span>
<span class="n">l</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'3'</span><span class="p">))</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span> <span class="o">==</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'+'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_discard_checks_equality</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3 + 5'</span><span class="p">)</span>
<span class="k">with</span> <span class="n">pytest</span><span class="o">.</span><span class="n">raises</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">TokenError</span><span class="p">):</span>
<span class="n">l</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'5'</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">test_discard_tokens_by_type</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3 + 5'</span><span class="p">)</span>
<span class="n">l</span><span class="o">.</span><span class="n">discard_type</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span> <span class="o">==</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'+'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_discard_type_checks_equality</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3 + 5'</span><span class="p">)</span>
<span class="k">with</span> <span class="n">pytest</span><span class="o">.</span><span class="n">raises</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">TokenError</span><span class="p">):</span>
<span class="n">l</span><span class="o">.</span><span class="n">discard_type</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">)</span>
</code></pre></div>
<p>As you can see the idea is for both methods to require a parameter, either the token or the type. The code that passes these tests is made by a custom exception in <code>smallcalc/calc_lexer.py</code></p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">TokenError</span><span class="p">(</span><span class="ne">ValueError</span><span class="p">):</span>
<span class="w"> </span><span class="sd">""" The expected token cannot be found """</span>
</code></pre></div>
<p>and, in the <code>CalcLexer</code> class in the same file</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">discard</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span> <span class="o">!=</span> <span class="n">token</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">TokenError</span><span class="p">(</span>
<span class="s1">'Expected token </span><span class="si">{}</span><span class="s1">, found </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="n">token</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_current_token</span>
<span class="p">))</span>
<span class="k">def</span> <span class="nf">discard_type</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">_type</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">if</span> <span class="n">t</span><span class="o">.</span><span class="n">type</span> <span class="o">!=</span> <span class="n">_type</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">TokenError</span><span class="p">(</span>
<span class="s1">'Expected token of type </span><span class="si">{}</span><span class="s1">, found </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="n">_type</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_current_token</span><span class="o">.</span><span class="n">type</span>
<span class="p">))</span>
</code></pre></div>
<p>As I am satisfied with the code that I have now, I will move on to add new features.</p>
<h2 id="level-13-variables">Level 13 - Variables<a class="headerlink" href="#level-13-variables" title="Permanent link">¶</a></h2>
<p><em>I have been assigned by my strength and cunning.</em> - Up (2009)</p>
<p>Variables are labels assigned to values, so what we need to add is a way for the user to make this assignment and then to use variables intead of actual values. The simplest syntax, used by many languages is <code>name = value</code> and we will stick to this. Usually languages allow only a subset of symbols in the name of a variable so we will learn how to use lower- and uppercase names that may also contain an underscore.</p>
<h3 id="lexer">Lexer<a class="headerlink" href="#lexer" title="Permanent link">¶</a></h3>
<p>We want the lexer to recognise a new token called <code>NAME</code>, so the test we have to add to <code>tests/test_calc_lexer.py</code> is</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_letters</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'somevar'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">NAME</span><span class="p">,</span> <span class="s1">'somevar'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">]</span>
</code></pre></div>
<p>This test checks only the support for lowercase letters. Since we want to support also uppercase letters and underscores we need another pair of test</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_uppercase_letters</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'SomeVar'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">NAME</span><span class="p">,</span> <span class="s1">'SomeVar'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">]</span>
<span class="k">def</span> <span class="nf">test_get_tokens_understands_names_with_underscores</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'some_var'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">NAME</span><span class="p">,</span> <span class="s1">'some_var'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">]</span>
</code></pre></div>
<p>If you wonder why I created two tests instead of just one with both uppercase letters and underscores, the reason is that I generally prefer to have tests that focus on one specific feature. This obviously depends on the level of granularity that you want, and in this case we are discussing very simple features, so I would not argue if I saw both tested at the same time.</p>
<h3 id="parser">Parser<a class="headerlink" href="#parser" title="Permanent link">¶</a></h3>
<p>To support variables in expressions we need to change the behaviour of <code>parse_factor</code>, which is the method where we parse the building blocks like integers of unary operators. The test you need to add to <code>tests/test_calc_parser.py</code> is</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_factor_variable</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"somevar"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_factor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'variable'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'somevar'</span>
<span class="p">}</span>
</code></pre></div>
<p>After this we want to provide support for variable assignments. Working on the parser we need only to output the correct node so the test is pretty straightforward</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_assignment</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"x = 5"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_assignment</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'assignment'</span><span class="p">,</span>
<span class="s1">'variable'</span><span class="p">:</span> <span class="s1">'x'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">5</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>This test tries to assign the value <code>5</code> to the variable <code>x</code>, but in general we want to support assignment with expressions, so we should test this behaviour as well, including the presence of variables</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_assignment_with_expression</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"x = 4 * (3 + 5)"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_assignment</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'assignment'</span><span class="p">,</span>
<span class="s1">'variable'</span><span class="p">:</span> <span class="s1">'x'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'*'</span>
<span class="p">},</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">},</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">5</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">test_parse_assignment_expression_with_variables</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"x = y + 4"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_assignment</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"assignment"</span><span class="p">,</span>
<span class="s2">"variable"</span><span class="p">:</span> <span class="s2">"x"</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">},</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'variable'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'y'</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<h3 id="visitor">Visitor<a class="headerlink" href="#visitor" title="Permanent link">¶</a></h3>
<p>It is now time to implement the code that actually stores and retrieves variables, which is what happens in the visitor when an <code>assignment</code> or a <code>variable</code> node are processed. For the moment we do not have specific requirements for variables and we can treat the storage space as a big global dictionary.</p>
<p>The test we want to pass specifies the initial API of the storage space when we assign a value to a variable</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_assignment</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'assignment'</span><span class="p">,</span>
<span class="s1">'variable'</span><span class="p">:</span> <span class="s1">'x'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">5</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="o">==</span> <span class="p">(</span><span class="kc">None</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">isvariable</span><span class="p">(</span><span class="s1">'x'</span><span class="p">)</span> <span class="ow">is</span> <span class="kc">True</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="s1">'x'</span><span class="p">)</span> <span class="o">==</span> <span class="mi">5</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="s1">'x'</span><span class="p">)</span> <span class="o">==</span> <span class="s1">'integer'</span>
</code></pre></div>
<p>We want the visitor to provide three new methods, <code>isvariable</code>, <code>valueof</code>, and <code>typeof</code>, that allow us to interact with the variables we defined.</p>
<p>The last change that the visitor requires is some code that allows it to read the value of variables to be able to use them when computing the result of an expression. The test is then</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_variable</span><span class="p">():</span>
<span class="n">assignment_ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'assignment'</span><span class="p">,</span>
<span class="s1">'variable'</span><span class="p">:</span> <span class="s1">'x'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">123</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">read_ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'variable'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'x'</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">assignment_ast</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">read_ast</span><span class="p">)</span> <span class="o">==</span> <span class="p">(</span><span class="mi">123</span><span class="p">,</span> <span class="s1">'integer'</span><span class="p">)</span>
</code></pre></div>
<p>where two different ASTs have been created. The first one assigns a value to the variable, the second one reads it and returns its value. Note that the visitor returns both value and type of the variable, which seems reasonable to implement later checks of equality or other operations on variables.</p>
<hr>
<h3 id="solution">Solution<a class="headerlink" href="#solution" title="Permanent link">¶</a></h3>
<p>To pass the <code>test_get_tokens_understands_letters</code> test I added a method <code>_process_name</code> to the <code>CalcLexer</code> class</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_process_name</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">regexp</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s1">'[a-z]+'</span><span class="p">)</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">regexp</span><span class="o">.</span><span class="n">match</span><span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">tail</span>
<span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">match</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">token_string</span> <span class="o">=</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_set_current_token_and_skip</span><span class="p">(</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">NAME</span><span class="p">,</span> <span class="n">token_string</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p>and then added it to the method <code>get_token</code>. The new version of the latter is then</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">get_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">eof</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_eof</span><span class="p">()</span>
<span class="k">if</span> <span class="n">eof</span><span class="p">:</span>
<span class="k">return</span> <span class="n">eof</span>
<span class="n">eol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_eol</span><span class="p">()</span>
<span class="k">if</span> <span class="n">eol</span><span class="p">:</span>
<span class="k">return</span> <span class="n">eol</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_process_whitespace</span><span class="p">()</span>
<span class="n">name</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_name</span><span class="p">()</span>
<span class="k">if</span> <span class="n">name</span><span class="p">:</span>
<span class="k">return</span> <span class="n">name</span>
<span class="n">integer</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_integer</span><span class="p">()</span>
<span class="k">if</span> <span class="n">integer</span><span class="p">:</span>
<span class="k">return</span> <span class="n">integer</span>
<span class="n">literal</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_literal</span><span class="p">()</span>
<span class="k">if</span> <span class="n">literal</span><span class="p">:</span>
<span class="k">return</span> <span class="n">literal</span>
</code></pre></div>
<p>At this point to pass the remaining tests <code>test_get_tokens_understands_uppercase_letters</code> and <code>test_get_tokens_understands_names_with_underscores</code> it is sufficient to change the regular expression we use in <code>_process_name</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_process_name</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">regexp</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s1">'[a-zA-Z_]+'</span><span class="p">)</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">regexp</span><span class="o">.</span><span class="n">match</span><span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">tail</span>
<span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">match</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">token_string</span> <span class="o">=</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_set_current_token_and_skip</span><span class="p">(</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">NAME</span><span class="p">,</span> <span class="n">token_string</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p>The required change to <code>parse_factor</code> is simple, but since we will be returning a new type of node we have to define it</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">VariableNode</span><span class="p">(</span><span class="n">ValueNode</span><span class="p">):</span>
<span class="n">node_type</span> <span class="o">=</span> <span class="s1">'variable'</span>
</code></pre></div>
<p>We can then add the required <code>if</code> statement in <code>parse_factor</code>, which becomes</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_factor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span> <span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'-'</span><span class="p">,</span> <span class="s1">'+'</span><span class="p">]:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_symbol</span><span class="p">()</span>
<span class="n">factor</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_factor</span><span class="p">()</span>
<span class="k">return</span> <span class="n">UnaryNode</span><span class="p">(</span><span class="n">operator</span><span class="p">,</span> <span class="n">factor</span><span class="p">)</span>
<span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span> <span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="o">==</span> <span class="s1">'('</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard_type</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">)</span>
<span class="n">expression</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard_type</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">)</span>
<span class="k">return</span> <span class="n">expression</span>
<span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">NAME</span><span class="p">:</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">VariableNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
</code></pre></div>
<p>The second test that we have to pass checks if the parser can understand variable assignments. First of all we need to define <code>AssignmentNode</code> which is the node we will return to the visitor.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">AssignmentNode</span><span class="p">(</span><span class="n">Node</span><span class="p">):</span>
<span class="n">node_type</span> <span class="o">=</span> <span class="s1">'assignment'</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">variable</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">variable</span> <span class="o">=</span> <span class="n">variable</span>
<span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span>
<span class="k">def</span> <span class="nf">asdict</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">node_type</span><span class="p">,</span>
<span class="s1">'variable'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">variable</span><span class="o">.</span><span class="n">value</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="o">.</span><span class="n">asdict</span><span class="p">(),</span>
<span class="p">}</span>
</code></pre></div>
<p>At this point we need a method to parse a variable in <code>CalcParser</code>. This method is very similar to <code>_parse_symbol</code> and <code>parse_integer</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_parse_variable</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">VariableNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
</code></pre></div>
<p>Since the test is running <code>parse_assignment</code> we just need to add that method. We want the assignment to have a variable as its left member and an expression as its right member</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_assignment</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">variable</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_variable</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard_type</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">)</span>
<span class="n">value</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="k">return</span> <span class="n">AssignmentNode</span><span class="p">(</span><span class="n">variable</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
</code></pre></div>
<p>This code makes both the <code>test_parse_assignment</code> and the <code>test_parse_assignment_with_expression</code> tests pass.</p>
<p>As discussed in the introductory text before the test code the variable storage space can be a simple dictionary. The key will be the name of the variable, and the content will be another dictionary with <code>value</code> and <code>type</code>. This is sufficient for the moment and should be also extensible when future requirements will arise.</p>
<p>The <code>CalcVisitor</code> class can be then changed to get the new methods, and a <code>__init__</code> that initializes the dictionary. I also added the relevant <code>if</code> statement to the method <code>visit</code> of the same class. The new <code>CalcVisitor</code> is then</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">CalcVisitor</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">variables</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">def</span> <span class="nf">isvariable</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="k">return</span> <span class="n">name</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">variables</span>
<span class="k">def</span> <span class="nf">valueof</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">variables</span><span class="p">[</span><span class="n">name</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">typeof</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">variables</span><span class="p">[</span><span class="n">name</span><span class="p">][</span><span class="s1">'type'</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'integer'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">],</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'unary'</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'operator'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'content'</span><span class="p">])</span>
<span class="k">if</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'-'</span><span class="p">:</span>
<span class="k">return</span> <span class="o">-</span> <span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span>
<span class="k">return</span> <span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'binary'</span><span class="p">:</span>
<span class="n">lvalue</span><span class="p">,</span> <span class="n">ltype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'left'</span><span class="p">])</span>
<span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'right'</span><span class="p">])</span>
<span class="n">operator</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'operator'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'+'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">+</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'-'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">-</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'*'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">*</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'/'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">//</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'assignment'</span><span class="p">:</span>
<span class="n">right_value</span><span class="p">,</span> <span class="n">right_type</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">variables</span><span class="p">[</span><span class="n">node</span><span class="p">[</span><span class="s1">'variable'</span><span class="p">]]</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">right_value</span><span class="p">,</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="n">right_type</span>
<span class="p">}</span>
<span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span>
</code></pre></div>
<p>To pass the second test we need only to change the method <code>visit</code> adding an <code>if</code> statement for the <code>variable</code> nodes. The new version of the method is</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'integer'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">],</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'variable'</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">valueof</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]),</span> <span class="bp">self</span><span class="o">.</span><span class="n">typeof</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">])</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'unary'</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'operator'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'content'</span><span class="p">])</span>
<span class="k">if</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'-'</span><span class="p">:</span>
<span class="k">return</span> <span class="o">-</span> <span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span>
<span class="k">return</span> <span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'binary'</span><span class="p">:</span>
<span class="n">lvalue</span><span class="p">,</span> <span class="n">ltype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'left'</span><span class="p">])</span>
<span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'right'</span><span class="p">])</span>
<span class="n">operator</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'operator'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'+'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">+</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'-'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">-</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'*'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">*</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'/'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">//</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'assignment'</span><span class="p">:</span>
<span class="n">right_value</span><span class="p">,</span> <span class="n">right_type</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">variables</span><span class="p">[</span><span class="n">node</span><span class="p">[</span><span class="s1">'variable'</span><span class="p">]]</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">right_value</span><span class="p">,</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="n">right_type</span>
<span class="p">}</span>
<span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="kc">None</span>
</code></pre></div>
<hr>
<h2 id="level-14-parsing-expressions-and-assignments">Level 14 - Parsing expressions and assignments<a class="headerlink" href="#level-14-parsing-expressions-and-assignments" title="Permanent link">¶</a></h2>
<p><em>Speak words we can all understand!</em> - The Lord of the Rings: The Fellowship of the Ring (2001)</p>
<p>We are missing a final step. The CLI uses <code>parse_expression</code> as its default entry point, which means that it doesn't understand variable assignments for the time being. We need then to introduce a new entry point <code>parse_line</code> that we will use to process general language statements. The test for this goes in <code>tests/test_calc_parser.py</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_line_supports_expression</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"2 * x + 4"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_line</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'variable'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'x'</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'*'</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>and checks that <code>parse_line</code> can parse expressions (which can be solved just wrapping <code>parse_expression</code> with it). The second test checks that <code>parse_line</code> can parse variable assignments and goes in the same file</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_line_supports_assigment</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"x = 5"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_line</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'assignment'</span><span class="p">,</span>
<span class="s1">'variable'</span><span class="p">:</span> <span class="s1">'x'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">5</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>At this point we can change the entry point in the CLI, using <code>parse_line</code> instead of <code>parse_expression</code>. The new CLI is then</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">smallcalc</span> <span class="kn">import</span> <span class="n">calc_parser</span> <span class="k">as</span> <span class="n">cpar</span>
<span class="kn">from</span> <span class="nn">smallcalc</span> <span class="kn">import</span> <span class="n">calc_visitor</span> <span class="k">as</span> <span class="n">cvis</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">text</span> <span class="o">=</span> <span class="nb">input</span><span class="p">(</span><span class="s1">'smallcalc :> '</span><span class="p">)</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_line</span><span class="p">()</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">())</span>
<span class="nb">print</span><span class="p">(</span><span class="n">res</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">EOFError</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Bye!"</span><span class="p">)</span>
<span class="k">break</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">text</span><span class="p">:</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</code></pre></div>
<p>Try to fire the CLI and enjoy a calculator with variables! Everything works, but you now know it is not magic, but the outcome of a good amount of code. And you wrote it, so you may proudly say that you created a simple but working programming language.</p>
<hr>
<h3 id="solution_1">Solution<a class="headerlink" href="#solution_1" title="Permanent link">¶</a></h3>
<p>To pass the first test, as suggested, I added the method <code>parse_line</code> as a wrapper around <code>parse_expression</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_line</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
</code></pre></div>
<p>The second test requires some changes to <code>parse_line</code>. As I do not know if the next token is an expression or an assignment I decided to stash the status and try one of the two. In case of error I just pop the state and try with the second option</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_line</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">stash</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_assignment</span><span class="p">()</span>
<span class="k">except</span> <span class="n">clex</span><span class="o">.</span><span class="n">TokenError</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
</code></pre></div>
<p>At the same time <code>parse_assignment</code> has to be changed. The current code parses a variable and then discards a literal, which is too generic, as an expression like <code>x * 2</code> will not raise an error. The new code for that method is then</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_assignment</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">variable</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_parse_variable</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">discard</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'='</span><span class="p">))</span>
<span class="n">value</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="k">return</span> <span class="n">AssignmentNode</span><span class="p">(</span><span class="n">variable</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
</code></pre></div>
<p>where I explicitly discard a literal <code>=</code> sign.</p>
<hr>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>Managing variables may look like a very easy task, but as soon as we will start implementing functions and local scopes we will have to move to something richer than a simple global dictionary. Memory management is another big topic that I didn't touch here, perhaps in the future I might discuss garbage collections and related problems.</p>
<p>The code I developed in this post is available on the GitHub repository tagged with <code>part3</code> (<a href="https://github.com/lgiordani/smallcalc/tree/part3">link</a>).</p>
<p>In the next issue I will face with you the task of adding the power operator, support for floating point numbers, and a big refactoring with context managers that will greatly simplify the code.</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>A game of tokens: solution - Part 22017-10-17T13:00:00+01:002017-10-17T13:00:00+01:00Leonardo Giordanitag:www.thedigitalcatonline.com,2017-10-17:/blog/2017/10/17/a-game-of-tokens-solution-part-2/<p>This post originally contained my solution to the challenge posted <a href="https://www.thedigitalcatonline.com/blog/2017/10/01/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-2/">here</a>. I moved those solutions inside the post itself, under the "Solution" subsections.</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>A game of tokens: write an interpreter in Python with TDD - Part 22017-10-01T15:00:00+01:002020-08-05T11:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2017-10-01:/blog/2017/10/01/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-2/<h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2>
<p>Welcome to the second part of the series of posts about writing an interpreter with Python and TDD. In the <a href="https://www.thedigitalcatonline.com/blog/2017/05/09/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-1/">first post</a> we developed together a simple calculator that can handle integers, addition and subtraction. In this instalment I'll give you new tests that will guide you through the implementation of multiplication, division, parentheses, and unary operators. I will obviously reference the structure I used in my solution, but you mileage may vary, so feel free to ignore the comments or the suggested solutions in case your code is different.</p>
<h2 id="level-8-multiplication-and-division">Level 8 - Multiplication and division<a class="headerlink" href="#level-8-multiplication-and-division" title="Permanent link">¶</a></h2>
<p><em>"They're coming outta the walls. They're coming outta the goddamn walls."</em> - Aliens (1986)</p>
<p>As you remember from the previous post our interpreter is made of three different components, the lexer, the parser, and the visitor. So, to implement the missing basic operations, multiplication and division, we need to start with the lexer and ensure that it understands the traditional symbols <code>*</code> and <code>/</code></p>
<h3 id="lexer">Lexer<a class="headerlink" href="#lexer" title="Permanent link">¶</a></h3>
<p>Put the following tests in the <code>tests/test_calc_lexer.py</code> file</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_multiplication</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3 * 5'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'3'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'*'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'5'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">]</span>
<span class="k">def</span> <span class="nf">test_get_tokens_understands_division</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3 / 5'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'3'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'/'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'5'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">]</span>
</code></pre></div>
<p>Do the tests fail? Why? Please remember that when tests pass without requiring any code change you have to ask yourself "Why do they pass?", and be sure that you understood the answer before going further. Otherwise you might be adding tests that are wrong, or tests for things that have already been tested, and in either case you should act on them.</p>
<h3 id="parser">Parser<a class="headerlink" href="#parser" title="Permanent link">¶</a></h3>
<p>Now that the lexer understands the symbols we can start considering the parser. The parser has to output a sensible structure that represents the new operations, which is not different from what it outputs for the sum and the difference. Add the following tests to <code>tests/test_calc_parser.py</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_term</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"2 * 3"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_term</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'*'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">test_parse_term_with_multiple_operations</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"2 * 3 / 4"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_term</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'*'</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'/'</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>This time you should have some failures, so go and edit the <code>CalcParser</code> class in order to pass the tests. As the two new binary operations are at this level the same as sum and difference you <em>could</em> change the method <code>parse_expression</code> (try it!). This will however make things harder later when we will prioritise operations (multiplications have to be performed before sums), so my advice is to introduce a method <code>parse_term</code> in the parser, which is the method used in the tests.</p>
<h3 id="visitor">Visitor<a class="headerlink" href="#visitor" title="Permanent link">¶</a></h3>
<p>Now it's the visitor's turn, where the syntax tree gets analysed and actually executed. Add the following tests to the <code>tests/test_calc_visitor.py</code> file and then make them pass changing the <code>CalcVisitor</code> class accordingly.</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_term_multiplication</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">5</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'*'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="o">==</span> <span class="p">(</span><span class="mi">20</span><span class="p">,</span> <span class="s1">'integer'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_visitor_term_division</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">11</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'/'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="o">==</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'integer'</span><span class="p">)</span>
</code></pre></div>
<hr>
<h3 id="solution">Solution<a class="headerlink" href="#solution" title="Permanent link">¶</a></h3>
<p>The tests we added for the lexer already pass. This is not surprising, as the lexer is designed to return everything it doesn't know as a <code>LITERAL</code> (<code>smallcalc/calc_lexer.py:119</code>). As we already instructed the lexer to skip spaces the new operators are happily digested. As I discussed in the previous post, I decided for this project not to assign operators a specific token, so from this point of view our lexer is pretty open and could already understand instructions like <code>3 $ 5</code> or <code>7 : 9</code>, even though they do not have any meaning in our new language (yet, maybe).</p>
<p>The parser is not so merciful, and the two new tests do not pass. We are explicitly calling a method <code>parse_term</code> that is not defined, so a success would have been very worrying. In these two tests <code>parse_term</code> is called explicitly and there is no relationship with the other methods named <code>parse_*</code>, so we can implement it as a stand-alone processing.</p>
<p>We know that a <code>term</code> is an operation between two integers, so we can follow what we did with <code>parse_expression</code>. The first thing we do is to parse the first integer, then we peek the next token and we decide what to do. If the token is a <code>LITERAL</code> we suppose it is the operation symbol, otherwise we probably hit the end of the file and we will just return the previously read integer. The second element may be a simple integer or another multiplication or division, so we recursively call <code>parse_term</code> and return a <code>BinaryNode</code> with the result.</p>
<p>[Note: I noticed that the <code>parse_addsymbol</code> could be now named <code>parse_literal</code> but this wasn't done when I prepared the source code. Regardless of the name, however, what this method does is to just pack a literal in a <code>LiteralNode</code> and return it.]</p>
<p>The whole parser is now the following</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">CalcParser</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">parse_addsymbol</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">LiteralNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">parse_integer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">IntegerNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">parse_term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">while</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_addsymbol</span><span class="p">()</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">left</span>
<span class="k">def</span> <span class="nf">parse_expression</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">while</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_addsymbol</span><span class="p">()</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">left</span>
</code></pre></div>
<p>The visitor was instructed only to deal with sums and subtractions, and it treats everything is not the former as the latter. This is why the new tests give as results <code>1</code> and <code>7</code>. We just need to extend the <code>if</code> statement to include the new operations</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">CalcVisitor</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'integer'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">],</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'binary'</span><span class="p">:</span>
<span class="n">lvalue</span><span class="p">,</span> <span class="n">ltype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'left'</span><span class="p">])</span>
<span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'right'</span><span class="p">])</span>
<span class="n">operator</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'operator'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'+'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">+</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'-'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">-</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'*'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">*</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'/'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">//</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
</code></pre></div>
<hr>
<p>Now we have a pretty simple but fully working calculator! Enjoy the <code>cli.py</code>, as YOU did it this time! I remember I was pretty excited the first time I run a command line calculator done by me. But hold tight, because you are going to learn and implement much more!</p>
<h2 id="level-9-mixing-operators">Level 9 - Mixing operators<a class="headerlink" href="#level-9-mixing-operators" title="Permanent link">¶</a></h2>
<p><em>"Don't cross the streams."</em> - Ghostbusters (1984)</p>
<p>Ok, it's time to do some serious math. What happens if you mix sums and multiplications? Let's try it and see how our interpreter reacts. We already know that the lexer happily digests all the four symbols so we can head straight to the parser and add the following test to <code>tests/test_calc_parser.py</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_expression_with_term</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"2 + 3 * 4"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'*'</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>Chances are that this will fail miserably. Probably you have to rework a bit <code>parse_expression</code> as it is ignoring the new entry, <code>parse_term</code>. Please note that <code>2 * 3 + 4</code> must give <code>10</code> according to the standard math rules, and not <code>14</code>. This happens because multiplication is performed before sum, and the order depends uniquely on the structure created by the parser, and not by the visitor (which is at this point a pretty dumb component).</p>
<p>Once the parser outputs the correct structure the visitor shouldn't have issues, as it is already behaving in a recursive way. If you want to check feel free to add relevant tests, however.</p>
<hr>
<h3 id="solution_1">Solution<a class="headerlink" href="#solution_1" title="Permanent link">¶</a></h3>
<p>Ouch! It looks like putting multiplications and sums in the same line is not really working. As you may recall we didn't link <code>parse_term</code> with the other methods, and we use a generic function to treat literals. This works in principle, but doesn't consider operator precedence.</p>
<p>When we try to evaluate <code>2 + 3 * 4</code> the output of the parser is</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"binary"</span><span class="p">,</span>
<span class="s2">"left"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"binary"</span><span class="p">,</span>
<span class="s2">"left"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s2">"right"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s2">"operator"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"literal"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="s2">"+"</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="s2">"right"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"integer"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="s2">"operator"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"literal"</span><span class="p">,</span>
<span class="s2">"value"</span><span class="p">:</span> <span class="s2">"*"</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>As you can clearly see the parser recognised the multiplication operator, but then returns a nested sum (the oputput of a recursive call of <code>parse_term</code>). This gives the sum a greater precedence that that of the sum, which is against the mathematical rules we want to follow here. <code>2 + 3 * 4</code> shall be considered <code>2 + (3 * 4)</code> and not <code>(2 + 3) * 4</code>.</p>
<p>To fix this we have to rework <code>parse_term</code>. First of all it shall accept only the <code>*</code> and <code>/</code> operators, then it shall return the left part if it finds a different literal. Even <code>parse_expression</code> shall change a bit: the first thing to do is to call <code>parse_term</code> instead of <code>parse_integer</code> and then to return the left part.</p>
<p>The new code is then</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">while</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span>\
<span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'*'</span><span class="p">,</span> <span class="s1">'/'</span><span class="p">]:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_addsymbol</span><span class="p">()</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">left</span>
<span class="k">def</span> <span class="nf">parse_expression</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_term</span><span class="p">()</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">while</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_addsymbol</span><span class="p">()</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_term</span><span class="p">()</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">left</span>
</code></pre></div>
<p>Let's see what happens parsing <code>2 * 3 + 4</code>. The test calls <code>parse_expression</code> which tries immediately to run <code>parse_term</code>. The latter recognises <code>2</code> and <code>*</code>, so it calls itself recursively just before the <code>3</code> and returns the binary node. This means that the multiplication is the first operation we return, the one with higher precedence. The recursive call recognises <code>3</code> but then doesn't know what to do with <code>+</code> as we specifically consider only <code>*</code> and <code>/</code>, so it just returns the integer value. Back to <code>parse_expression</code>, then the variable <code>left</code> will contain the binary node that represents <code>2 * 3</code>. The function will then finish adding the binary node for the sum.</p>
<p>Take your time to understand the mechanism, perhaps trying with different operations like <code>2 + 4 * 6 - 8</code>, which should return <code>18</code>.</p>
<hr>
<h2 id="level-10-parentheses">Level 10 - Parentheses<a class="headerlink" href="#level-10-parentheses" title="Permanent link">¶</a></h2>
<p><em>"When nine hundred years old you reach, look as good you will not."</em> - Return of the Jedi (1983)</p>
<p><a href="https://en.wikipedia.org/wiki/Bracket#Parentheses">Parentheses</a>, are curved brackets used in mathematics to change the order of operations. As this part is pretty important I will spend some time on it, because the order of operations will be of concerns also when it comes to language operators, and not only when dealing with mathematical operations. As explained in the previous section almost everything at this point happens in the parser, as the resulting structure that we will give to the visitor is the one that rules the precedence of operations.</p>
<p>Let's start to check that the lexer understands the parentheses symbols <code>(</code> and <code>)</code>.</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_parentheses</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3 * ( 5 + 7 )'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'3'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'*'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'('</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'5'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'+'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'7'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">')'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">]</span>
</code></pre></div>
<p>As our lexer is pretty open-minded it shouldn't raise any objections and happily pass the test (why?). </p>
<p>As always, instead, its neighbour the parser is not that forgiving, and I bet it will make a fuss. Let's try and feed it with some simple expression with parentheses</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_expression_with_parentheses</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"(2 + 3)"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>To make this test pass my suggestion is to introduce a method <code>parse_factor</code>, where the term <em>factor</em> encompasses both integers and the expressions between parentheses. In the latter case, obviously, you will need to call <code>parse_expression</code>, which somehow breaks the hierarchical structure of methods in the parser.</p>
<hr>
<h3 id="solution_2">Solution<a class="headerlink" href="#solution_2" title="Permanent link">¶</a></h3>
<p>Let's have some Lisp time here and introduce parentheses. As happened for the new mathematical operators, parentheses are already accepted by the lexer as simple literals, so the first test passes without any change in the code. The parser complains, however, as it always expects an integer (<code>smallcalc/calc_parser.py:76</code>).</p>
<p>As I suggested, my idea is to introduce a method that parses a so-called <em>factor</em>, which can either be an integer of an expression between parentheses.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">CalcParser</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">parse_addsymbol</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">LiteralNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">parse_integer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">IntegerNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">parse_factor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span> <span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="o">==</span> <span class="s1">'('</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="n">expression</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">expression</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
</code></pre></div>
<p>The method <code>parse_term</code> now has to call <code>parse_factor</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_term</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_factor</span><span class="p">()</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">while</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span>\
<span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'*'</span><span class="p">,</span> <span class="s1">'/'</span><span class="p">]:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_addsymbol</span><span class="p">()</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">left</span>
</code></pre></div>
<p>And last we need to slightly change <code>parse_expression</code> introducing a check on the literal token value. This happens because I decided to identify everything with a literal, so the method has to rule out every literal it is not interested to manage. If you introduce specific tokens for operations, parentheses, etc., this change is not required (but you won't use <code>clex.LITERAL</code> at that point).</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_expression</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_term</span><span class="p">()</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">while</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span>\
<span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'+'</span><span class="p">,</span> <span class="s1">'-'</span><span class="p">]:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_addsymbol</span><span class="p">()</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_term</span><span class="p">()</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">left</span>
</code></pre></div>
<hr>
<h2 id="level-11-priorities">Level 11 - Priorities<a class="headerlink" href="#level-11-priorities" title="Permanent link">¶</a></h2>
<p><em>"You got issues, Quill."</em> - Guardians of the Galaxy (2014)</p>
<p>As parentheses have been introduced to change the default priority rules between operators we need to be sure that this happens. We can test it easily with this code</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_parentheses_change_priority</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"(2 + 3) * 4"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'*'</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>Now when your parser passes this test you have a full-fledged calculator that supports parentheses. Make sure to test the new features in the CLI. Does multiple parentheses work? Why?</p>
<hr>
<h3 id="solution_3">Solution<a class="headerlink" href="#solution_3" title="Permanent link">¶</a></h3>
<p>Another feature that comes for free with the previous changes, as the first thing that <code>parse_expression</code> does is to run <code>parse_term</code>, and the first thing the latter does is to run <code>parse_factor</code>, which in turn manages expressions between parentheses. If the expression is enclosed between parentheses the method <code>parse_factor</code> doesn't call <code>parse_expression</code> and just returns the integer.</p>
<hr>
<h2 id="level-12-unary-operators">Level 12 - Unary operators<a class="headerlink" href="#level-12-unary-operators" title="Permanent link">¶</a></h2>
<p><em>"There can be only one!"</em> - Highlander (1986)</p>
<p>Now it's time to introduce unary operators, which are very important in programming languages. Just think at <code>not x</code> and you will immediately understand why you need them. Unary operators do not fit in the current structure of our interpreter as the parser is always expecting either an integer or an open parenthesis as the first token.</p>
<p>Let's first write a test for the most simple unary operator, which is a minus (as in <code>-2</code>). Remember that we are testing the parser here, as the lexer is already able to parse the minus sign.</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_factor_supports_unary_operator</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"-5"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_factor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'unary'</span><span class="p">,</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'-'</span>
<span class="p">},</span>
<span class="s1">'content'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">5</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>When your parser passes this test we have to make sure that the unary minus can be applied also to expressions between parentheses</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_factor_supports_negative_expressions</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"-(2 + 3)"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_factor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'unary'</span><span class="p">,</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'-'</span>
<span class="p">},</span>
<span class="s1">'content'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>Once the parser is able to pass these two tests we are confident that the unary minus can be used in front of all the basic elements of our expressions. At this point it is time to execute the unary expressions produced by the parsing layer, so include this test for the visitor</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_unary_minus</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'unary'</span><span class="p">,</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'-'</span>
<span class="p">},</span>
<span class="s1">'content'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="o">==</span> <span class="p">(</span><span class="o">-</span><span class="mi">5</span><span class="p">,</span> <span class="s1">'integer'</span><span class="p">)</span>
</code></pre></div>
<p>Change the visitor to pass this test and you can go straight to the CLI and start using negative numbers or negative expressions. Can you execute something like <code>--2</code> (minus minus 2)? What is the result? Why?</p>
<p>Now let's go back to the parser and ensure that the unary plus can be used as well. This is the test</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_factor_supports_unary_plus</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"+(2 + 3)"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_factor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'unary'</span><span class="p">,</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">},</span>
<span class="s1">'content'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>and the code should be trivial, as you already manage the unary minus. The relative test for the visitor is</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_unary_plus</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'unary'</span><span class="p">,</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">},</span>
<span class="s1">'content'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="o">==</span> <span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="s1">'integer'</span><span class="p">)</span>
</code></pre></div>
<p>Once your code passes all the tests head to the CLI and try to run something like <code>-+--++-3</code>. Does it work?</p>
<hr>
<h3 id="solution_4">Solution<a class="headerlink" href="#solution_4" title="Permanent link">¶</a></h3>
<p>The minus unary operator uses a literal that we already manage in the lexer, so there is nothing to do there. The first test I gave you checks if the parser can process a factor in the form <code>-5</code>.</p>
<p>The current implementation of <code>parse_factor</code> processes either an expression enclosed between parentheses or an integer, and actually the test doesn't pass, complaining against the minus sign not being a valid integer with base 10. The solution is pretty straightforward, as it is enough to add another <code>if</code> that manages the minus sign. When we encounter such a sign, however, we have to return a different type of node, as the test states, so we also have to introduce the relative class.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">UnaryNode</span><span class="p">(</span><span class="n">Node</span><span class="p">):</span>
<span class="n">node_type</span> <span class="o">=</span> <span class="s1">'unary'</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">content</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">operator</span> <span class="o">=</span> <span class="n">operator</span>
<span class="bp">self</span><span class="o">.</span><span class="n">content</span> <span class="o">=</span> <span class="n">content</span>
<span class="k">def</span> <span class="nf">asdict</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">node_type</span><span class="p">,</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">operator</span><span class="o">.</span><span class="n">asdict</span><span class="p">(),</span>
<span class="s1">'content'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">content</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">result</span>
<span class="k">class</span> <span class="nc">CalcParser</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">parse_addsymbol</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">LiteralNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">parse_integer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">IntegerNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">parse_factor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span> <span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="o">==</span> <span class="s1">'-'</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_addsymbol</span><span class="p">()</span>
<span class="n">factor</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_factor</span><span class="p">()</span>
<span class="k">return</span> <span class="n">UnaryNode</span><span class="p">(</span><span class="n">operator</span><span class="p">,</span> <span class="n">factor</span><span class="p">)</span>
<span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span> <span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="o">==</span> <span class="s1">'('</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="n">expression</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">expression</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
</code></pre></div>
<p>The second test passes automatically because <code>parse_factor</code> intercepts the <code>-</code> literal before the <code>(</code> one.</p>
<p>The visitor has to be updated with the new type of <code>unary</code> node. The new visitor is then</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">CalcVisitor</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'integer'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">],</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'unary'</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'operator'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'content'</span><span class="p">])</span>
<span class="k">if</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'-'</span><span class="p">:</span>
<span class="k">return</span> <span class="o">-</span> <span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'binary'</span><span class="p">:</span>
<span class="n">lvalue</span><span class="p">,</span> <span class="n">ltype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'left'</span><span class="p">])</span>
<span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'right'</span><span class="p">])</span>
<span class="n">operator</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'operator'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'+'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">+</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'-'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">-</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'*'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">*</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">elif</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'/'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">//</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
</code></pre></div>
<p>Now the unary plus is easy to sort out, as we just need to take it into account in <code>parse_factor</code> along with the unary minus.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_factor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span> <span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'-'</span><span class="p">,</span> <span class="s1">'+'</span><span class="p">]:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_addsymbol</span><span class="p">()</span>
<span class="n">factor</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_factor</span><span class="p">()</span>
<span class="k">return</span> <span class="n">UnaryNode</span><span class="p">(</span><span class="n">operator</span><span class="p">,</span> <span class="n">factor</span><span class="p">)</span>
<span class="k">if</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span> <span class="ow">and</span> <span class="n">next_token</span><span class="o">.</span><span class="n">value</span> <span class="o">==</span> <span class="s1">'('</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="n">expression</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">expression</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
</code></pre></div>
<p>And the visitor is missing a single return after the <code>if</code> statement that deals with the unary minus.</p>
<div class="highlight"><pre><span></span><code> <span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'unary'</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'operator'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'content'</span><span class="p">])</span>
<span class="k">if</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'-'</span><span class="p">:</span>
<span class="k">return</span> <span class="o">-</span> <span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span>
<span class="k">return</span> <span class="n">cvalue</span><span class="p">,</span> <span class="n">ctype</span>
</code></pre></div>
<hr>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>That's all for this post. If you feel brave or do not like to wait for the next post go and try adding new operators! Next time I will cover variables, assignments and postfix-operators like the power operation (<code>2^3</code>).</p>
<p>The code I developed in this post is available on the GitHub repository tagged with <code>part2</code> (<a href="https://github.com/lgiordani/smallcalc/tree/part2">link</a>).</p>
<h2 id="updates">Updates<a class="headerlink" href="#updates" title="Permanent link">¶</a></h2>
<p>2017-12-24: <code>test_parse_term_with_multiple_operations</code> has been changed after Victor Uriarte spotted an error in the tree construction. See the updates section of the first post in the series for a full explanation of the issue.</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>A game of tokens: solution - Part 12017-07-12T10:00:00+01:002017-07-12T10:00:00+01:00Leonardo Giordanitag:www.thedigitalcatonline.com,2017-07-12:/blog/2017/07/12/a-game-of-tokens-solution-part-1/<p>This post originally contained my solution to the challenge posted <a href="https://www.thedigitalcatonline.com/blog/2017/05/09/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-1/">here</a>. I moved those solutions inside the post itself, under the "Solution" subsections.</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>A game of tokens: write an interpreter in Python with TDD - Part 12017-05-09T23:00:00+01:002020-08-05T11:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2017-05-09:/blog/2017/05/09/a-game-of-tokens-write-an-interpreter-in-python-with-tdd-part-1/<p>How to write a programming language in Python, a TDD game</p><h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2>
<p>Writing an interpreter or a compiler is usually considered one of the greatest goals that a programmer can achieve, and with good reason. I do not believe the importance of going through this experience is primarily due to its difficulty, though. After all, writing an efficient compiler is difficult, but the same is true for a good web framework, or a feature-rich editor.</p>
<p>Being able to write an interpreter is a significant skill mainly because of its recursive (or self-referring) nature. Think about it: you use a language to write a new language. And this new language, if it becomes sufficiently rich, can eventually be used to create its own compiler.</p>
<p><strong>A language can be used to write the program that executes that same language.</strong></p>
<p>Didn't this last sentence fire you with enthusiasm? It makes me eager to start!</p>
<p>Compilers have been the subject of academic research since the 50s, with the works of <a href="https://en.wikipedia.org/wiki/Grace_Hopper">Hopper</a> and <a href="https://en.wikipedia.org/wiki/Alick_Glennie">Glennie</a>, so trying to provide an overview in a few lines is basically impossible. I highly recommend you to check the online resources listed at the bottom of the post if you are seriously interested in the matter.</p>
<p>In this series of posts I want to try an experiment. I want to guide you through the creation of a simple interpreter in Python using a pure TDD (Test-Driven Development) approach. The posts will be structured like a game, where every level is represented by a new test that I will add to the suite. If you are not confident with TDD, you will find more on it in the specific section.</p>
<p>Following this series you will learn about Python, compilers, interpreters, parsers, lexers, test-driven development, refactoring, coverage, regular expressions, classes, context managers. Wow, that's a lot!</p>
<p>Are you ready to start?</p>
<h2 id="on-the-tdd-game">On the TDD game<a class="headerlink" href="#on-the-tdd-game" title="Permanent link">¶</a></h2>
<p>This series of posts will introduce you to TDD with a sort of game. I'll give you the test, and you are supposed to write something that passes that test, finishing the level. <strong>Update</strong>: I decided to move solutions into the same post where the challenge is given, you will find them in specific sections named "Solution" after each level.</p>
<p>My best advice for the TDD game is: remember that the easiest solution for a test that requires the output <code>A</code> is to write a function that returns exactly <code>A</code>.</p>
<p><strong>Beautiful is better than ugly, but ugly and tested is better than beautiful and untested.</strong></p>
<h2 id="about-the-language">About the language<a class="headerlink" href="#about-the-language" title="Permanent link">¶</a></h2>
<p>At the time of writing the language we are going to implement is a simple calculator with support for <strong>integer</strong> and <strong>floats</strong>, <strong>binary operators</strong> (addition, subtraction, multiplication, division, and power), <strong>unary operators</strong> (negation), <strong>nested expressions</strong> (parentheses) and <strong>variables</strong>.</p>
<p>The name <strong>smallcalc</strong> is a homage to one of the most innovative and influential languages ever conceived: <a href="https://en.wikipedia.org/wiki/Smalltalk">Smalltalk</a>.</p>
<p>I do not know if the final version will be something richer, it depends on how much fun you will find in the series. So, if you are interested, just ask! You can drop a line of appreciation <a href="https://twitter.com/thedigicat">on Twitter</a>.</p>
<p>At the time of writing, then, the language grammar is</p>
<div class="highlight"><pre><span></span><code>factor : ('+' | '-') factor | '(' expression ')' | variable | number
power : factor [ '^' power ]*
term : power [ ('*' | '/') term ]*
expression : term [ ('*' | '/') expression ]*
assignment : variable '=' expression
line : assignment | expression
</code></pre></div>
<p>The syntax of the grammar is pretty self-explanatory if you have some programming background. If you want to know more about grammars like the one above start from the links in the resources section.</p>
<h2 id="tdd-and-refactoring">TDD and refactoring<a class="headerlink" href="#tdd-and-refactoring" title="Permanent link">¶</a></h2>
<p>If you already know what TDD is feel free to skip this section.</p>
<p><strong>TDD</strong> means <strong>Test-Driven Development</strong>, and in short it is a programming methodology that requires you to write a test for a feature before implementing the feature itself. Much has been said on the benefits of TDD elsewhere. I personally think it is one of the most effective ways to work on a programming task, and something that every programmer should know. I wrote a post on TDD with Python that you can find <a href="https://www.thedigitalcatonline.com/blog/2015/05/13/python-oop-tdd-example-part1/">here</a>.</p>
<p>A <strong>test</strong>, in TDD, is code that uses the code you are going to develop. You will start with a project skeleton and add the tests I will present in the posts one at a time. Once you add the test, you have to write the code that passes the test. Your code doesn't need to be beautiful or smart, it just needs to pass the test. Then you can move to the following test and start the cycle again.</p>
<p>After adding some tests you can start considering <strong>refactoring</strong>, which means changing the existing code in order to make it more beautiful, simpler or better organised. Every change has to be tested against the existing battery of tests. If the tests do not fail your change is correct, at least in terms of the behaviour that the tests are checking.</p>
<p><strong>Coverage</strong> is the check of how much of your code is covered by your tests. We call some code "covered" by a test if executing the test makes that code run. So, for example, if you have a test (an <code>if</code> block) you should write two tests. One to cover the first option, and another to cover the second one. If you work with a strict TDD methodology your coverage is going to be always 100%, because you wrote just the code that makes the tests pass.</p>
<p>You can find more on TDD on this blog <a href="/categories/tdd/">here</a>.</p>
<h2 id="about-the-project">About the project<a class="headerlink" href="#about-the-project" title="Permanent link">¶</a></h2>
<p>The main components of our interpreter are the following:</p>
<ul>
<li><strong>Token</strong>: a token is the minimal element of the language syntax, like an integer (not a digit, but a group of them), a name (not a letter but a group of them), or a symbol (like the mathematical operations).</li>
<li><strong>Buffer</strong>: the input text (the program) has to be managed by a specific component. Parsing the input text has many requirements, among them being able to read upcoming parts of the text and to move back, or to move to specific locations.</li>
<li><strong>Lexer</strong>: this is the first component of standard interpreters. Its job is to divide the stream of input characters into meaningful chunks called tokens. It will process a string like "123 + x" and output three tokens: an integer, a symbol and a variable name.</li>
<li><strong>Parser</strong>: the second component of standard interpreters. It analyses the stream of tokens produced by the lexer and produces a data structure that represents the whole program.</li>
<li><strong>Visitor</strong>: the output of the parser is processed by a component that will either write the equivalent in another language or execute it.</li>
<li><strong>Command Line Interface (CLI)</strong>: the whole stack can be directly used by a REPL (Read, Evaluate, Print Loop), a command line interface similar to the one Python provides. There each line is lexed, parsed, and visited, and the result is printed immediately.</li>
</ul>
<p>I will provide two classes: <code>Token</code> and <code>TextBuffer</code>. These will avoid you spending too much time to create the basic tools, and allow you to get straight into the game. Since those classes come obviously with their own test suite you are free to develop them on your own. You should however start from the same tests that I used, otherwise your interface might end up being incompatible witht he rest of the project.</p>
<h2 id="initial-setup">Initial setup<a class="headerlink" href="#initial-setup" title="Permanent link">¶</a></h2>
<p>I prepared <a href="https://github.com/lgiordani/smallcalc">this repository</a>, which contains everything you need to start the project.</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/lgiordani/smallcalc.git
</code></pre></div>
<p>Once you cloned the repository, set up a Python virtual environment using your favourite method/tool and install the testing requirements</p>
<div class="highlight"><pre><span></span><code>pip<span class="w"> </span>install<span class="w"> </span>-r<span class="w"> </span>requirements/test.txt
</code></pre></div>
<p>At this point you should be able to run the test suite. For this project we are going to use <a href="http://www.pytest.org">pytest</a>, so the command line is</p>
<div class="highlight"><pre><span></span><code>pytest<span class="w"> </span>-svv
</code></pre></div>
<p>or, if you want to check your code coverage,</p>
<div class="highlight"><pre><span></span><code>pytest<span class="w"> </span>-svv<span class="w"> </span>--cov-report<span class="w"> </span>term-missing<span class="w"> </span>--cov<span class="o">=</span>smallcalc<span class="w"> </span>
</code></pre></div>
<h2 id="tokens">Tokens<a class="headerlink" href="#tokens" title="Permanent link">¶</a></h2>
<p>The first class that I provide to start working on our interpreter is <code>Token</code>.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">Token</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">_type</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">position</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">type</span> <span class="o">=</span> <span class="n">_type</span>
<span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">value</span><span class="p">)</span> <span class="k">if</span> <span class="n">value</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="k">else</span> <span class="kc">None</span>
<span class="bp">self</span><span class="o">.</span><span class="n">position</span> <span class="o">=</span> <span class="n">position</span>
<span class="k">def</span> <span class="fm">__str__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">position</span><span class="p">:</span>
<span class="k">return</span> <span class="s2">"Token(</span><span class="si">{}</span><span class="s2">, '</span><span class="si">{}</span><span class="s2">')"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="p">,</span>
<span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">return</span> <span class="s2">"Token(</span><span class="si">{}</span><span class="s2">, '</span><span class="si">{}</span><span class="s2">', line=</span><span class="si">{}</span><span class="s2">, col=</span><span class="si">{}</span><span class="s2">)"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="p">,</span>
<span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">,</span>
<span class="bp">self</span><span class="o">.</span><span class="n">position</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span>
<span class="bp">self</span><span class="o">.</span><span class="n">position</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="p">)</span>
<span class="fm">__repr__</span> <span class="o">=</span> <span class="fm">__str__</span>
<span class="k">def</span> <span class="fm">__eq__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">other</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="o">==</span> <span class="p">(</span><span class="n">other</span><span class="o">.</span><span class="n">type</span><span class="p">,</span> <span class="n">other</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="fm">__len__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">def</span> <span class="fm">__bool__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="kc">True</span>
</code></pre></div>
<p>This represents one syntax unit in which we divide the input text. The token can contain information about its original position, which can be useful in case of syntax errors to print meaningful messages for the user. The class implements the method <code>__eq__</code> to provide comparison between tokens.</p>
<p>The value of a token is always a string, and shall be converted into a different type by an external object according to the value that the token assumes. For example the string <code>'123'</code> can be interpreted as an integer, but could also be the name of a variable if our language supports such a feature.</p>
<p>Remember that everything you find in this class has been introduced to make one or more tests pass, so check the test suite to understand how the object can be used.</p>
<h2 id="buffer">Buffer<a class="headerlink" href="#buffer" title="Permanent link">¶</a></h2>
<p>The second element that you will find in the initial setup is the class <code>TextBuffer</code>, that provides a very basic manager for an input text file</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">EOLError</span><span class="p">(</span><span class="ne">ValueError</span><span class="p">):</span>
<span class="w"> </span><span class="sd">""" Signals that the buffer is reading after the end of a line."""</span>
<span class="k">class</span> <span class="nc">EOFError</span><span class="p">(</span><span class="ne">ValueError</span><span class="p">):</span>
<span class="w"> </span><span class="sd">""" Signals that the buffer is reading after the end of the text."""</span>
<span class="k">class</span> <span class="nc">TextBuffer</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">reset</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">line</span> <span class="o">=</span> <span class="mi">0</span>
<span class="bp">self</span><span class="o">.</span><span class="n">column</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">def</span> <span class="nf">load</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">text</span> <span class="o">=</span> <span class="n">text</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lines</span> <span class="o">=</span> <span class="n">text</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">)</span> <span class="k">if</span> <span class="n">text</span> <span class="k">else</span> <span class="p">[]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">reset</span><span class="p">()</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">current_line</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">lines</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">line</span><span class="p">]</span>
<span class="k">except</span> <span class="ne">IndexError</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">EOFError</span><span class="p">(</span>
<span class="s2">"EOF reading line </span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">line</span><span class="p">)</span>
<span class="p">)</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">current_char</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_line</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">column</span><span class="p">]</span>
<span class="k">except</span> <span class="ne">IndexError</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">EOLError</span><span class="p">(</span>
<span class="s2">"EOL reading column </span><span class="si">{}</span><span class="s2"> at line </span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">column</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">line</span>
<span class="p">)</span>
<span class="p">)</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">next_char</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_line</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">column</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span>
<span class="k">except</span> <span class="ne">IndexError</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">EOLError</span><span class="p">(</span>
<span class="s2">"EOL reading column </span><span class="si">{}</span><span class="s2"> at line </span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">column</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">line</span>
<span class="p">)</span>
<span class="p">)</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">tail</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_line</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">column</span><span class="p">:]</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">position</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">line</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">column</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">newline</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">line</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="bp">self</span><span class="o">.</span><span class="n">column</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">def</span> <span class="nf">skip</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">steps</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">column</span> <span class="o">+=</span> <span class="n">steps</span>
<span class="k">def</span> <span class="nf">goto</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">line</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">line</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">column</span> <span class="o">=</span> <span class="n">line</span><span class="p">,</span> <span class="n">column</span>
</code></pre></div>
<p>As happened for the <code>Token</code> class, you can read the tests to understand how to use the class. Basically, however, the class can <code>load</code> an input text and extract the <code>current_line</code>, the <code>current_char</code>, and the <code>next_char</code>. You can also <code>skip</code> a given number of characters, <code>goto</code> a given position, extract the current <code>position</code> and read the <code>tail</code>, which is the remaining text from the current position to the end of the line.</p>
<p>This class has not been optimized or designed to manage big files or continuous streams of text. This is perfectly fine for our current project, but be aware that for a real compiler you might want to implement something more powerful.</p>
<h2 id="cli">CLI<a class="headerlink" href="#cli" title="Permanent link">¶</a></h2>
<p>The third element I provide is a simple REPL (<a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop">Read–eval–print loop</a>) that at the moment just echoes any text you will input and gracefully exit when we press Ctrl+D. There are and there will be no tests for the CLI. Testing endpoints like this is complex and not always worth the effort, as in this case.</p>
<p>The command line can be run from the project main directory with</p>
<div class="highlight"><pre><span></span><code>python<span class="w"> </span>cli.py
</code></pre></div>
<h2 id="level-1-end-of-file">Level 1 - End of file<a class="headerlink" href="#level-1-end-of-file" title="Permanent link">¶</a></h2>
<p><em>"End? No, the journey doesn't end here."</em> - The Lord of the Rings: The Return of the King (2003)</p>
<p>The first thing a Lexer shall be able to do is to load and process an empty text. This should return an <code>EOF</code> (<code>End Of File</code>) token. <code>EOF</code> is used to signal that the input buffer has ended and that there is no more text to process.</p>
<p>The method <code>get_tokens</code> returns all the tokens of the input stream in a single list.</p>
<p>Add this code to <code>tests/test_calc_lexer.py</code></p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">smallcalc</span> <span class="kn">import</span> <span class="n">tok</span> <span class="k">as</span> <span class="n">token</span>
<span class="kn">from</span> <span class="nn">smallcalc</span> <span class="kn">import</span> <span class="n">calc_lexer</span> <span class="k">as</span> <span class="n">clex</span>
<span class="k">def</span> <span class="nf">test_get_tokens_understands_eof</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">''</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">]</span>
</code></pre></div>
<p>To avoid misleading errors you should also create the empty file <code>smallcalc/calc_lexer.py</code>, as without that file pytest will raise an <code>ImportError</code>.</p>
<p>This is our first test, and if you run the test suite now you will see that it fails. This is expected, as there is no code to pass the test.</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>pytest<span class="w"> </span>-svv<span class="w"> </span>--cov-report<span class="w"> </span>term-missing<span class="w"> </span>--cov<span class="o">=</span><span class="nv">smallcalc</span>
<span class="o">==================================</span><span class="w"> </span><span class="nv">FAILURES</span><span class="w"> </span><span class="o">===================================</span>
_______________________<span class="w"> </span>test_get_tokens_understands_eof<span class="w"> </span>_______________________
<span class="w"> </span>def<span class="w"> </span>test_get_tokens_understands_eof<span class="o">()</span>:
><span class="w"> </span><span class="nv">l</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>clex.CalcLexer<span class="o">()</span>
E<span class="w"> </span>AttributeError:<span class="w"> </span>module<span class="w"> </span><span class="s1">'smallcalc.calc_lexer'</span><span class="w"> </span>has<span class="w"> </span>no<span class="w"> </span>attribute<span class="w"> </span><span class="s1">'CalcLexer'</span>
tests/test_calc_lexer.py:6:<span class="w"> </span><span class="nv">AttributeError</span>
<span class="o">=====================</span><span class="w"> </span><span class="m">1</span><span class="w"> </span>failed,<span class="w"> </span><span class="m">29</span><span class="w"> </span>passed<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">0</span>.08<span class="w"> </span><span class="nv">seconds</span><span class="w"> </span><span class="o">=====================</span>
</code></pre></div>
<p>Implement now a class <code>CalcLexer</code> in the file <code>smallcalc/calc_lexer.py</code> that makes the test pass. Remember that you just need the code to pass this test. So do not implement complex systems now and go for the simplest solution (hint: the test expects that specific output).</p>
<p>The <code>EOF</code> constant can be a simple string with the value <code>'EOF'</code>.</p>
<p>It is worth executing the test suite with coverage (check the command line above), which will tell you if you over-engineered your code. You should aim for 100% coverage, always.</p>
<hr>
<h3 id="solution">Solution<a class="headerlink" href="#solution" title="Permanent link">¶</a></h3>
<p>To pass the test, the class <code>CalcLexer</code> can use the provided <code>text_buffer.TextBuffer</code> class, that exposes a method <code>load</code> and wrap it in <code>CalcLexer.load</code>. The test is not providing any input so the easiest solution is just to return the required token. The test requires us to implement the method <code>get_tokens</code>, but I preferred to isolate the code in a method called <code>get_token</code> and to call the latter from <code>get_tokens</code>. The file <code>smallcalc/calc_lexer.py</code> is then</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">smallcalc</span> <span class="kn">import</span> <span class="n">text_buffer</span>
<span class="kn">from</span> <span class="nn">smallcalc</span> <span class="kn">import</span> <span class="n">tok</span> <span class="k">as</span> <span class="n">token</span>
<span class="n">EOF</span> <span class="o">=</span> <span class="s1">'EOF'</span>
<span class="k">class</span> <span class="nc">CalcLexer</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="s1">''</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span> <span class="o">=</span> <span class="n">text_buffer</span><span class="o">.</span><span class="n">TextBuffer</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">load</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_current_token</span> <span class="o">=</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">EOF</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_current_token</span>
<span class="k">def</span> <span class="nf">get_tokens</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">get_token</span><span class="p">()]</span>
</code></pre></div>
<hr>
<p>You can see here in practice what I mentioned in the introduction about TDD. The method <code>get_token</code> returns a hardcoded <code>token.Token(EOF)</code>, because <em>that is enough to pass the test</em>. It is not enough to be a good Lexer, but if we write and pass the right tests, this will happen in time. Be smart, be strict: write the minimal code needed to pass the test.</p>
<p>Being really strict, however, this solution is already over-engineered. The code</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">get_tokens</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">EOF</span><span class="p">)]</span>
</code></pre></div>
<p>would be enough to pass the test. It would also be the first thing we change as soon as we add another test. So, let me amend the previous advice: be strict, with a pinch of salt.</p>
<h2 id="level-2-single-digit-integers">Level 2 - Single digit integers<a class="headerlink" href="#level-2-single-digit-integers" title="Permanent link">¶</a></h2>
<p><em>"You're missing just a couple of digits there."</em> - Iron Man (2008)</p>
<p>The requirement for this section is</p>
<div class="highlight"><pre><span></span><code># The only accepted value for the input is one single digit between 0 and 9
integer: [0-9]
</code></pre></div>
<h3 id="lexer">Lexer<a class="headerlink" href="#lexer" title="Permanent link">¶</a></h3>
<p>Since a calculator has to deal with numbers let us implement support for integers (we will add floating point numbers later). The first thing that we need is to recognise single-digit integers. This is the test that you have to add to <code>tests/test_calc_lexer.py</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_token_understands_integers</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span> <span class="o">==</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'3'</span><span class="p">)</span>
</code></pre></div>
<p>Note that here we are testing <code>get_token</code> and not <code>get_tokens</code>. This method will come handy later, so it is worth testing it here. As soon as that works you can test the behaviour of <code>get_tokens</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_integers</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'3'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">]</span>
</code></pre></div>
<p>Please note that now the lexer shall output both an <code>EOF</code> and an <code>EOL</code> token, as the current line of code ends. The biggest issue you have to face here is that when you recognise a token then you have to skip it in the source text.</p>
<p>After this test you may end up with some code duplication, as <code>get_token</code> and <code>get_tokens</code> perform similar tasks. If you haven't already, please call the former from the latter. It could also be worth doing some refactoring. Remember: you can confidently change your code, because as long as the tests pass your changes are correct! This is the true power of TDD.</p>
<p>If you refactor the code creating helper methods you should make them "private" by prefixing their name with an underscore. This also means that you do not need to test them, in principle (watch <a href="https://www.youtube.com/watch?v=URSWYvyc42M">this talk</a> by Sandy Metz on this subject).</p>
<h3 id="parser">Parser<a class="headerlink" href="#parser" title="Permanent link">¶</a></h3>
<p>Now that we have a working lexer that recognises integers let us work on the parser. This has to use the lexer to process a text and produce a tree of nodes that represent the syntactic structure of the processed code. Don't worry if it seems extremely complex, it is actually pretty simple if you follow the tests.</p>
<p>Edit the <code>tests/test_calc_parser.py</code> file and insert this code</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">smallcalc</span> <span class="kn">import</span> <span class="n">calc_parser</span> <span class="k">as</span> <span class="n">cpar</span>
<span class="k">def</span> <span class="nf">test_parse_integer</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"5"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">5</span>
<span class="p">}</span>
</code></pre></div>
<p>The <code>node</code> variable is an instance of a specific class that contains integers, <code>IntegerNode</code> (but you are free to name it as you want, as this is not tested). Please note that this class doesn't consider the value as a string any more, but as a proper (Python) integer (<code>'value': 5</code>). Now edit the file <code>smallcalc/calc_parser.py</code> and write some code that passes the test.</p>
<p>Does it work? Well, you just wrote your first parser! Congratulations! From here to something that understands C++ or Python the journey is pretty long, but the initial steps are promising.</p>
<h3 id="visitor">Visitor<a class="headerlink" href="#visitor" title="Permanent link">¶</a></h3>
<p>Let us consider the visitor, now. This is the run-time component of our language, the part that actually runs through the tree of nodes and executes it. This part, thus, is where most of the actual behaviour of the language happens. For instance, the fact that the symbol "+" actually sums integers is because the visitor implements that operation.</p>
<p>This can seem a trivial consideration, but if you think about the division between integers you immediately understand that the visitor has a great responsibility. Does the symbol <code>/</code> divide integers with or without floating point math? Python 3, for instance, opted for a floating point division, and introduced the <code>//</code> operator for the integer version of the operation, but other languages behave differently.</p>
<p>I'll discuss this in more detail later, when we will implement mathematical operations. For the time being, let us create the <code>tests/test_calc_visitor.py</code> file and introduce the following test</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">smallcalc</span> <span class="kn">import</span> <span class="n">calc_visitor</span> <span class="k">as</span> <span class="n">cvis</span>
<span class="k">def</span> <span class="nf">test_visitor_integer</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">12</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="o">==</span> <span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="s1">'integer'</span><span class="p">)</span>
</code></pre></div>
<p>As you can see, at this stage the visitor has a trivial job, which is to just return the value and the type of the number that it finds in the tree. Note that the visitor provides a method <code>visit</code> which is type agnostic (i.e. it doesn't care about the type of the node). This is correct, as the visitor has to traverse the whole tree recursively and to react to the different nodes without a previous knowledge of what it should expect.</p>
<p>As simple as the visitor can be, now we can make our CLI interface use the parser and the visitor to understand and execute one simple command, which is to parse a single-digit integer and print it with its type. Change the <code>cli.py</code> file to </p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">smallcalc</span> <span class="kn">import</span> <span class="n">calc_parser</span> <span class="k">as</span> <span class="n">cpar</span>
<span class="kn">from</span> <span class="nn">smallcalc</span> <span class="kn">import</span> <span class="n">calc_visitor</span> <span class="k">as</span> <span class="n">cvis</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">text</span> <span class="o">=</span> <span class="nb">input</span><span class="p">(</span><span class="s1">'smallcalc :> '</span><span class="p">)</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">())</span>
<span class="nb">print</span><span class="p">(</span><span class="n">res</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">EOFError</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Bye!"</span><span class="p">)</span>
<span class="k">break</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">text</span><span class="p">:</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</code></pre></div>
<p>Test it to check that everything works. If your code passes the tests I gave you, the result is guaranteed.</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>python<span class="w"> </span>cli.py<span class="w"> </span>
smallcalc<span class="w"> </span>:><span class="w"> </span><span class="m">3</span>
<span class="o">(</span><span class="m">3</span>,<span class="w"> </span><span class="s1">'integer'</span><span class="o">)</span>
</code></pre></div>
<p>Let me recap what we just created. We wrote a lexer, which is a component that splits the input text in different tokens with a meaning, and we instructed it to react to single-digits integers. Then, we created a parser, which is the component that tries to make sense of several tokens put together, applying syntactical rules. Last, the visitor runs through the output of the parser and actually performs the actions that the grammar describes. All this to just print out an integer? Seems overkill! It is, actually, but there is a lot to come, and this separation of levels will come handy.</p>
<hr>
<h3 id="solution_1">Solution<a class="headerlink" href="#solution_1" title="Permanent link">¶</a></h3>
<p>The two functions <code>get_token</code> and <code>get_tokens</code> have to evolve to deal with the new requirements, and to avoid having too much code in a single function I created some private helpers (where "private" has the Python meaning of "please don't use them").</p>
<p>The idea behind <code>get_tokens</code> is to call <code>get_token</code> until the <code>EOF</code> token is returned, even though we want the latter to be present in the final result.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">get_tokens</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="n">t</span> <span class="o">!=</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="s1">'EOF'</span><span class="p">):</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">t</span><span class="p">)</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="s1">'EOF'</span><span class="p">))</span>
<span class="k">return</span> <span class="n">tokens</span>
</code></pre></div>
<p>Then I decided to make <code>get_token</code> the central hub of my process with the following paradigm: the function tries to extract a specific token (<code>_process_integer</code>, in this case) and to return it; if the token cannot be extracted, the function tries the following one. At the moment I don't have any other type of token, but I will have them soon.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">get_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">eof</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_eof</span><span class="p">()</span>
<span class="k">if</span> <span class="n">eof</span><span class="p">:</span>
<span class="k">return</span> <span class="n">eof</span>
<span class="n">eol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_eol</span><span class="p">()</span>
<span class="k">if</span> <span class="n">eol</span><span class="p">:</span>
<span class="k">return</span> <span class="n">eol</span>
<span class="n">integer</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_integer</span><span class="p">()</span>
<span class="k">if</span> <span class="n">integer</span><span class="p">:</span>
<span class="k">return</span> <span class="n">integer</span>
</code></pre></div>
<p>The three helpers shall just try to extract and return the token they have been assigned or None. After some refactoring I came up with three functions (two of them as properties) that simplify common tasks. <code>_current_char</code> and <code>_current_line</code> are just wrappers around two attributes of <code>self._text_storage</code>, while <code>_set_current_token_and_skip</code> is a bit more complex and ensures that the <code>_current_token</code> is always up to date.</p>
<div class="highlight"><pre><span></span><code> <span class="nd">@property</span>
<span class="k">def</span> <span class="nf">_current_char</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">current_char</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">_current_line</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">current_line</span>
<span class="k">def</span> <span class="nf">_set_current_token_and_skip</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">skip</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">token</span><span class="p">))</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_current_token</span> <span class="o">=</span> <span class="n">token</span>
<span class="k">return</span> <span class="n">token</span>
</code></pre></div>
<p>Once this functions are in place I can write the actual helpers for the token extraction. The method <code>_process_eol</code> leverages <code>self._text_storage</code>, which raises an <code>EOLError</code> when the end of line has been reached. So all I need to do is to try to get the current char and return <code>None</code> if nothing happens. In case an <code>EOLError</code> exception is raised I run <code>_set_current_token_and_skip</code> with the end of line token.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_process_eol</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_current_char</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">except</span> <span class="n">text_buffer</span><span class="o">.</span><span class="n">EOLError</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">newline</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_set_current_token_and_skip</span><span class="p">(</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">EOL</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p>The helper to process the end of file (<code>_process_eof</code>) is exactly like <code>_process_eol</code>, using <code>self._current_line</code> and <code>text_buffer.EOFError</code>.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_process_eof</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_current_line</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">except</span> <span class="n">text_buffer</span><span class="o">.</span><span class="n">EOFError</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_set_current_token_and_skip</span><span class="p">(</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p>At this point of the development the incoming token can only be <code>EOL</code>, <code>EOF</code>, or an integer, so the <code>_process_integer</code> function doesn't need to return <code>None</code>. Therefore, it is sufficient to create an integer token with the current char and return it.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_process_integer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_set_current_token_and_skip</span><span class="p">(</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_current_char</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p>The above methods use two new global variables <code>EOL</code> and <code>INTEGER</code> that are defined at the beginning of the file along with <code>EOF</code></p>
<div class="highlight"><pre><span></span><code><span class="n">EOL</span> <span class="o">=</span> <span class="s1">'EOL'</span>
<span class="n">INTEGER</span> <span class="o">=</span> <span class="s1">'INTEGER'</span>
</code></pre></div>
<p><code>CalcParser</code> is the only class that is tested, but forecasting (actually, knowing) that we are going to manage multiple types of nodes, I isolated the code for the <code>IntegerNode</code> in its own class. There is no need to abstract things further for the time being, so <code>IntegerNode</code> doesn't inherit from any other class.</p>
<p>From a pure TDD point of view this is wrong, because I should have written some tests for the <code>IntegerNode</code> class before writing it. The purpose of this exercise, however is to guide you through the creation of a simple compiler, so tests are already given, and I will turn a blind eye on my own exception to the rule (how convenient!).</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">smallcalc</span> <span class="kn">import</span> <span class="n">calc_lexer</span> <span class="k">as</span> <span class="n">clex</span>
<span class="k">class</span> <span class="nc">IntegerNode</span><span class="p">:</span>
<span class="n">node_type</span> <span class="o">=</span> <span class="s1">'integer'</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">asdict</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">node_type</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span>
<span class="p">}</span>
<span class="k">class</span> <span class="nc">CalcParser</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">parse_integer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">IntegerNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
</code></pre></div>
<p><code>CalcVisitor</code> is by far the simplest class at the moment, as the only node we are managing is the one with an <code>integer</code> type.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">CalcVisitor</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'integer'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">],</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span>
</code></pre></div>
<hr>
<h2 id="level-3-binary-operations-addition">Level 3 - Binary operations: addition<a class="headerlink" href="#level-3-binary-operations-addition" title="Permanent link">¶</a></h2>
<p><em>"You're about to become a permanent addition to this archaeological find."</em> - Raiders of the Lost Ark (1981)</p>
<p>Let's update the grammar of the language with <code>addsymbol</code> and <code>expression</code></p>
<div class="highlight"><pre><span></span><code>integer: [0-9]
# Label the symbol '+'' with the name 'addsymbol'
addsymbol: '+'
# An expression is an integer followed by an addsymbol followed by another integer
expression: integer addsymbol integer
</code></pre></div>
<p>At the moment, our parser doesn't sound like an important component, as its output is just a refurbished version of the lexer one. The visitor, in turn, doesn't really perform any action but to print in a different format what the parser produces.</p>
<p>Let us try to introduce a simple mathematical operation, then, that should spice up our components. The new test for the lexer component (<code>tests/test_calc_lexer.py</code>) is</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_unspaced_sum_of_integers</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3+5'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'3'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'+'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'5'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">]</span>
</code></pre></div>
<p>Please note that there are no spaces, as our lexer doesn't know how to deal with them yet. As you can see the output is straightforward, so go and change the <code>CalcLexer</code> class to make this tests pass without breaking any of the ones you already wrote. Check for coverage, to spot possible overengineered parts, and if necessary refactor the class to keep methods as simple as possible.</p>
<p>The parser now has a more complex job than before, though not yet really challenging. The test for the parser is</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_expression</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"2+3"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>I want to resume here the discussion about mathematical operators and the role of the visitor that I started in the previous section. As you can see the expression is a generic binary operator, with a <code>left</code> term, a <code>right</code> term, and an <code>operator</code>. The operator, furthermore, is just a literal which value is the symbol we use for that binary operation.</p>
<p>This parser, thus, is pretty ignorant of the different operations we can perform, giving the whole responsibility to the visitor. We could, however, implement the parser to make it produce something more specific, like for example a <code>binary_sum</code> or <code>addition</code> node, which represents only the addition, and which wouldn't need the <code>'operator'</code> key, as it is implicit in the node type.</p>
<p>The amount of work done by the parser and by the visitor is a peculiarity of the specific language or program, so feel free to experiment. For the moment you have to stick to one solution as you are guided by the tests that I wrote, but as soon as you grasped the concepts and start writing a new language, you will be free to implement each component as you prefer.</p>
<p>Finally, the visitor shall implement the actual mathematical operation. The test is</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_expression_sum</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">5</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="o">==</span> <span class="p">(</span><span class="mi">9</span><span class="p">,</span> <span class="s1">'integer'</span><span class="p">)</span>
</code></pre></div>
<p>As soon as you changed the method <code>visit</code> to deal with <code>'expression'</code> nodes, you can test the new syntax in the CLI. Since we changed <code>visit</code> internally, that part of the CLI doesn't require any modification. We have, however, to change the parser entry point from <code>parse_integer</code> to <code>parse_expression</code>, so the new <code>cli.py</code> file will be</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">smallcalc</span> <span class="kn">import</span> <span class="n">calc_parser</span> <span class="k">as</span> <span class="n">cpar</span>
<span class="kn">from</span> <span class="nn">smallcalc</span> <span class="kn">import</span> <span class="n">calc_visitor</span> <span class="k">as</span> <span class="n">cvis</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">text</span> <span class="o">=</span> <span class="nb">input</span><span class="p">(</span><span class="s1">'smallcalc :> '</span><span class="p">)</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">())</span>
<span class="nb">print</span><span class="p">(</span><span class="n">res</span><span class="p">)</span>
<span class="k">except</span> <span class="ne">EOFError</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Bye!"</span><span class="p">)</span>
<span class="k">break</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">text</span><span class="p">:</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">main</span><span class="p">()</span>
</code></pre></div>
<p>And a quick test of the CLI confirms that everything works fine</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>python<span class="w"> </span>cli.py<span class="w"> </span>
smallcalc<span class="w"> </span>:><span class="w"> </span><span class="m">2</span>+4
<span class="o">(</span><span class="m">6</span>,<span class="w"> </span><span class="s1">'integer'</span><span class="o">)</span>
</code></pre></div>
<p>Everything? Well, not exactly. If I type just a single integer in the CLI the whole program crashes with an exception. If your solution doesn't blow up with a single integer, it means that you (probably) overengineered it a little. This is fine, but if you had implemented just what was needed to pass the tests the result would have been an error in that case.</p>
<p>Why do we have an error? Because we now parse the input with <code>parse_expression</code> and this method expects its input to be a full-formed expression, not a single integer. Generally speaking, our parser's entry point should be able to parse different syntax structures. We will improve this behaviour later, when we will address the problem of nested expressions.</p>
<hr>
<h3 id="solution_2">Solution<a class="headerlink" href="#solution_2" title="Permanent link">¶</a></h3>
<p>The helper <code>_process_literal</code> does what <code>_process_integer</code> did before, which is to blindly return a token, this time with the <code>LITERAL</code> type.</p>
<div class="highlight"><pre><span></span><code><span class="n">LITERAL</span> <span class="o">=</span> <span class="s1">'LITERAL'</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_process_literal</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_set_current_token_and_skip</span><span class="p">(</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">LITERAL</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_current_char</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p>The helper <code>_process_integer</code>, on the other hand, changes to return <code>None</code> when no integer can be parsed, which is easily checked with <code>isdigit</code>.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_process_integer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">_current_char</span><span class="o">.</span><span class="n">isdigit</span><span class="p">():</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_set_current_token_and_skip</span><span class="p">(</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_current_char</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p>Last, the method <code>get_token</code> receives <code>_process_literal</code> as an additional case.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">get_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">eof</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_eof</span><span class="p">()</span>
<span class="k">if</span> <span class="n">eof</span><span class="p">:</span>
<span class="k">return</span> <span class="n">eof</span>
<span class="n">eol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_eol</span><span class="p">()</span>
<span class="k">if</span> <span class="n">eol</span><span class="p">:</span>
<span class="k">return</span> <span class="n">eol</span>
<span class="n">integer</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_integer</span><span class="p">()</span>
<span class="k">if</span> <span class="n">integer</span><span class="p">:</span>
<span class="k">return</span> <span class="n">integer</span>
<span class="n">literal</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_literal</span><span class="p">()</span>
<span class="k">if</span> <span class="n">literal</span><span class="p">:</span>
<span class="k">return</span> <span class="n">literal</span>
</code></pre></div>
<p>The parser needs a node that represents the literal, namely <code>LiteralNode</code>, and a node to represent a binary operation, called <code>BinaryNode</code>. To avoid duplicating methods I created the <code>ValueNode</code> class and made both <code>IntegerNode</code> and <code>LiteralNode</code> inherit from that.</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">smallcalc</span> <span class="kn">import</span> <span class="n">calc_lexer</span> <span class="k">as</span> <span class="n">clex</span>
<span class="k">class</span> <span class="nc">Node</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">asdict</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{}</span> <span class="c1"># pragma: no cover</span>
<span class="k">class</span> <span class="nc">ValueNode</span><span class="p">(</span><span class="n">Node</span><span class="p">):</span>
<span class="n">node_type</span> <span class="o">=</span> <span class="s1">'value_node'</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span>
<span class="k">def</span> <span class="nf">asdict</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">node_type</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">value</span>
<span class="p">}</span>
<span class="k">class</span> <span class="nc">IntegerNode</span><span class="p">(</span><span class="n">ValueNode</span><span class="p">):</span>
<span class="n">node_type</span> <span class="o">=</span> <span class="s1">'integer'</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">value</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">LiteralNode</span><span class="p">(</span><span class="n">ValueNode</span><span class="p">):</span>
<span class="n">node_type</span> <span class="o">=</span> <span class="s1">'literal'</span>
<span class="k">class</span> <span class="nc">BinaryNode</span><span class="p">(</span><span class="n">Node</span><span class="p">):</span>
<span class="n">node_type</span> <span class="o">=</span> <span class="s1">'binary'</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">left</span> <span class="o">=</span> <span class="n">left</span>
<span class="bp">self</span><span class="o">.</span><span class="n">operator</span> <span class="o">=</span> <span class="n">operator</span>
<span class="bp">self</span><span class="o">.</span><span class="n">right</span> <span class="o">=</span> <span class="n">right</span>
<span class="k">def</span> <span class="nf">asdict</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">result</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">node_type</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">left</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span>
<span class="p">}</span>
<span class="n">result</span><span class="p">[</span><span class="s1">'right'</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">right</span><span class="p">:</span>
<span class="n">result</span><span class="p">[</span><span class="s1">'right'</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">right</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span>
<span class="n">result</span><span class="p">[</span><span class="s1">'operator'</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">operator</span><span class="p">:</span>
<span class="n">result</span><span class="p">[</span><span class="s1">'operator'</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">operator</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span>
<span class="k">return</span> <span class="n">result</span>
</code></pre></div>
<p>The most important change, however, is in <code>CalcParser</code>, where I added the methods <code>parse_addsymbol</code> and <code>parse_expression</code>.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">CalcParser</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">lexer</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">parse_addsymbol</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">LiteralNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">parse_integer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">IntegerNode</span><span class="p">(</span><span class="n">t</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">parse_expression</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_addsymbol</span><span class="p">()</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
<span class="k">return</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
</code></pre></div>
<p>The visitor has to add the processing code for <code>binary</code> nodes, which assumes the operation is a sum, so it just needs to visit the left and right nodes.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">CalcVisitor</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'integer'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">],</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'binary'</span><span class="p">:</span>
<span class="n">lvalue</span><span class="p">,</span> <span class="n">ltype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'left'</span><span class="p">])</span>
<span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'right'</span><span class="p">])</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">+</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
</code></pre></div>
<hr>
<h2 id="level-4-multi-digit-integers">Level 4 - Multi-digit integers<a class="headerlink" href="#level-4-multi-digit-integers" title="Permanent link">¶</a></h2>
<p><em>"So many."</em> - Braveheart (1995)</p>
<p>Let's move allowing integers to be made of multiple digits.</p>
<div class="highlight"><pre><span></span><code># An integer is a sequence of digits, + here means `one or more`
integer: [0-9]+
addsymbol: '+'
expression: integer addsymbol integer
</code></pre></div>
<p>Up to now our language can handle only single-digit integers, so this part shall be enhanced before moving to more complex syntax structures. The only component that requires a change is the lexer, as it should emit one single token containing all the digits. The test, consequently, goes in <code>tests/test_calc_lexer.py</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_multidigit_integers</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'356'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'356'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">]</span>
</code></pre></div>
<p>There are many ways to solve this problem, one of the simplest (and a perfectly valid one) is using regular expressions (which are, if you think about it, another language).</p>
<p>If you do not know how to use regular expressions do yourself a favour and learn them! You can find a nice tutorial on them at <a href="https://regexone.com/">RegexOne</a>. If you already know the syntax but don't know how to use them in Python <a href="https://developers.google.com/edu/python/regular-expressions">this Google for Education page</a> and the <a href="https://docs.python.org/3/howto/regex.html">official documentation</a> are your friends.</p>
<p>After this test the CLI should be able to handle expressions like <code>123+456</code>. We don't need any change in the parser and in the visitor, can you tell why?</p>
<hr>
<h3 id="solution_3">Solution<a class="headerlink" href="#solution_3" title="Permanent link">¶</a></h3>
<p>To provide support for multi-digit integers we just need to change the method <code>_process_integer</code> of the lexer. The new version makes use of a very simple regular expressions.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">re</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_process_integer</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">regexp</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s1">'\d+'</span><span class="p">)</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">regexp</span><span class="o">.</span><span class="n">match</span><span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">tail</span>
<span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">match</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">token_string</span> <span class="o">=</span> <span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">()</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_set_current_token_and_skip</span><span class="p">(</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">INTEGER</span><span class="p">,</span> <span class="nb">int</span><span class="p">(</span><span class="n">token_string</span><span class="p">))</span>
<span class="p">)</span>
</code></pre></div>
<p>The reason why we don't need to change the parser and the visitor is that nothing changed at that level. We altered the way the lexer identifies an integer token, but once that has been isolated the following steps are exactly the same as before.</p>
<hr>
<h2 id="level-5-whitespaces">Level 5 - Whitespaces<a class="headerlink" href="#level-5-whitespaces" title="Permanent link">¶</a></h2>
<p><em>"Follow the white rabbit."</em> - The Matrix (1999)</p>
<p>The second limitation that our language has at the moment is that it cannot handle whitespaces. If you try to input an expression like <code>3 + 4</code> in the CLI the program will crash with an exception (why?). Traditionally, whitespaces are completely ignored by programming languages: in Python, as well as in C and many other languages, writing <code>3+4</code>, <code>3 + 4</code>, <code>3+ 4</code>, or <code>3 + 4</code> doesn't change the meaning at all. In Python, however, whitespaces matter at the beginning of the line, as indentation is used in lieu of parentheses.</p>
<p>How can we put such a behaviour in our language? Again, the lexer is the component in charge, as it should just skip whitespaces. So add this test to <code>tests/test_calc_lexer.py</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_ignores_spaces</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3 + 5'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'3'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'+'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'5'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">]</span>
</code></pre></div>
<p>While you change the <code>CalcLexer</code> class to make it pass this test, ask yourself if the current structure of the class satisfies you or if it is the right time to refactor it, possibly even heavily rewriting some parts of it. You have a good test suite, now, so you can be sure that what you implemented is correct, at least according to the current requirements.</p>
<p>Note that we are hitting a limitation of unit testing here, which is that we should test that the language skips <em>any</em> amount of whitespaces, but it is impossible to write a test for this. We can test 1, 2, 100 whitespaces, but never <em>any</em> amount. The pragmatic solution, here is that of testing that the language skips one whitespace and leave further tests to be written only if specific errors arise in the future. The code should however try to provide a generic solution.</p>
<hr>
<h3 id="solution_4">Solution<a class="headerlink" href="#solution_4" title="Permanent link">¶</a></h3>
<p>To process whitespaces I needed to add a helper called <code>_process_whitespace</code> with the same structure of the new <code>_process_integer</code>.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_process_whitespace</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">regexp</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s1">'\ +'</span><span class="p">)</span>
<span class="n">match</span> <span class="o">=</span> <span class="n">regexp</span><span class="o">.</span><span class="n">match</span><span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">tail</span>
<span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">match</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">skip</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">()))</span>
</code></pre></div>
<p>Note that the solution here is <code>'\ +'</code> which skips any amount of whitespaces, even though <code>'\ '</code> would have been enough to pass the test. As I said before, every time we have to test cases with "any" in them we have to be a bit less strict. TDD is not perfect, and remember that at the end of the day it's more important to have something that works and is not perfect than something that is perfect and doesn't work at all.</p>
<p>As this time I am not interested in returning whitespace tokens, I just want to skip them. The helper is therefore added to <code>get_token</code> without a <code>return</code> statement.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">get_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">eof</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_eof</span><span class="p">()</span>
<span class="k">if</span> <span class="n">eof</span><span class="p">:</span>
<span class="k">return</span> <span class="n">eof</span>
<span class="n">eol</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_eol</span><span class="p">()</span>
<span class="k">if</span> <span class="n">eol</span><span class="p">:</span>
<span class="k">return</span> <span class="n">eol</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_process_whitespace</span><span class="p">()</span>
<span class="n">integer</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_integer</span><span class="p">()</span>
<span class="k">if</span> <span class="n">integer</span><span class="p">:</span>
<span class="k">return</span> <span class="n">integer</span>
<span class="n">literal</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_process_literal</span><span class="p">()</span>
<span class="k">if</span> <span class="n">literal</span><span class="p">:</span>
<span class="k">return</span> <span class="n">literal</span>
</code></pre></div>
<hr>
<h2 id="level-6-subtraction">Level 6 - Subtraction<a class="headerlink" href="#level-6-subtraction" title="Permanent link">¶</a></h2>
<p><em>"I can add, subtract. I can make coffee. I can shuffle cards."</em> - The Bourne Identity (2002)</p>
<div class="highlight"><pre><span></span><code>integer: [0-9]+
# An addsymbol can be the symbol '+' or the symbol '-'
addsymbol: '+' | '-'
expression: integer addsymbol integer
</code></pre></div>
<p>Now that we addressed two basic issues of our language we can start enhancing the higher level syntactical structures. Since we implemented the addition operation, the most natural step forward is to implement subtraction. As for the addition, this change will involve all the three layers of the language, lexer, parser, and visitor.</p>
<p>Let us start teaching the lexer to understand the minus sign. The test that we need is the following (in <code>tests/test_calc_lexer.py</code>)</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_subtraction</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3 - 5'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'3'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'-'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'5'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">]</span>
</code></pre></div>
<p>Does the lexer require any change? Why?</p>
<p>As you can see the decision to handle operators and symbols as <code>LITERAL</code> tokens allows us to introduce new symbols without the need to change the lexer. This obviously means that we will need to tell the symbols apart in a later stage, as nothing happens automatically. We could have decided to represent each symbol with a specific token, like <code>PLUS</code> and <code>MINUS</code>, but if you think about it, this would not have really changed the code in later stages, as <code>PLUS</code> is just another symbol, exactly like <code>+</code> is.</p>
<p>Using specific tokens, however, can simplify things if we want to handle multi-character literals. If we have an operator like <code>-></code> (as in C) or <code>//</code> (like in Python), or something more complex, we could prefer to handle those in the lexer, emitting a single token with a specific name.</p>
<p>We could introduce in the lexer a table of accepted values for literals, which would lead to an earlier and better error reporting. At the moment our language accepts every literal between two integers (try with <code>$</code>, for example), but fails to process them in the parser, interpreting any literal as <code>+</code>. Feel free to expand the project in such a direction if you want.</p>
<p>The test for the parser is the following</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_expression_understands_subtraction</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"2-3"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'-'</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>which is a small variation of the previously implemented <code>test_parse_expression</code>.</p>
<p>The considerations made for the lexer are perfectly valid for the parser as well, so you should need no node change at this point. The last test we have to add is that of the visitor, which is again very similar to the previous one</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_expression_subtraction</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">5</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'-'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="o">==</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'integer'</span><span class="p">)</span>
</code></pre></div>
<p>This time, however, we are at the end of the processing chain, and we have to deal with the difference symbol, namely to actually subtract numbers. To make this test pass, thus, you will need to change something in your <code>CalcVisitor</code> class.</p>
<p>Once your code passes this test, a quick test of the CLI shows that everything works as intended</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>python<span class="w"> </span>cli.py<span class="w"> </span>
smallcalc<span class="w"> </span>:><span class="w"> </span><span class="m">4</span><span class="w"> </span>-<span class="w"> </span><span class="m">6</span>
<span class="o">(</span>-2,<span class="w"> </span><span class="s1">'integer'</span><span class="o">)</span>
</code></pre></div>
<p>and since we rely on Python to perform the actual subtraction we get negative numbers for free. Pay attention: we can have negative numbers in the results, but we cannot input negative numbers. This is something that we will have to add later.</p>
<hr>
<h3 id="solution_5">Solution<a class="headerlink" href="#solution_5" title="Permanent link">¶</a></h3>
<p>Adding the addition binary operation changed code in the lexer, the parser, and in the visitor. That operation was however considered a generic binary operation, and only the visitor implements the actual <code>+</code> operation. So adding the subtraction works out of the box for the first two stages and requires me to change the visitor only, with a simple <code>if</code> condition on the value of the operator.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">CalcVisitor</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">visit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'integer'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">],</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'binary'</span><span class="p">:</span>
<span class="n">lvalue</span><span class="p">,</span> <span class="n">ltype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'left'</span><span class="p">])</span>
<span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'right'</span><span class="p">])</span>
<span class="n">operator</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'operator'</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">operator</span> <span class="o">==</span> <span class="s1">'+'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">+</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">lvalue</span> <span class="o">-</span> <span class="n">rvalue</span><span class="p">,</span> <span class="n">rtype</span>
</code></pre></div>
<hr>
<h2 id="level-7-multiple-operations">Level 7 - Multiple operations<a class="headerlink" href="#level-7-multiple-operations" title="Permanent link">¶</a></h2>
<p><em>"The machine simply does not operate as expected."</em> - The Prestige (2006)</p>
<div class="highlight"><pre><span></span><code>integer: [0-9]+
addsymbol: '+' | '-'
# A expression starts with a single integer and optionally
# contains an addsymbol and another expression
# (this is a recursive definition)
expression: integer (addsymbol expression)
</code></pre></div>
<p>Before we dive into the fascinating but complex topic of nested operations, let us take a look and implement multiple operations, that is the application of a chain of "similar" operators with the same priority.</p>
<p>Since this tutorial is a practical approach to the construction of an interpreter, I will not go too deep into the subject matter. Feel free to check the references if you are interested in such topics. For the moment, it is sufficient to understand that addition and subtraction are two operations that have the same precedence, which means that their order can be changed without affecting the result.</p>
<p>For instance: the expression <code>3 + 4 - 5</code> gives <code>2</code> as a result. The result is the same if we perform <code>(3 + 4) - 5 = 7 - 5 = 2</code> or <code>3 + (4 - 5) = 3 - 1 = 2</code>, where the expressions between parentheses are executed first.</p>
<p>From the interpreter's point of view, then, we can process a chain of additions and subtractions without being concerned about precedence, which greatly simplifies our job. As the output of the parser is a tree, however, we need to find a way to represent such a chain of operations in that form. One way is to nest expressions, which means that each operation is a single <code>binary</code> node, with the left term containing an integer and the right one the rest of the expression. In the previous example <code>3 + 4 - 5</code> is represented by an addition between <code>3</code> and <code>4 - 5</code>. <code>4 - 5</code>, in turn, is another binary node, a subtraction between <code>4</code> and <code>5</code>.</p>
<p>Let us start checking if the lexer understand multiple operations</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_tokens_understands_multiple_operations</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3 + 5 - 7'</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_tokens</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'3'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'+'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'5'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'-'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'7'</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOL</span><span class="p">),</span>
<span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">EOF</span><span class="p">)</span>
<span class="p">]</span>
</code></pre></div>
<p>If the current version of your lexer doesn't pass this test make the necessary changes to the code.</p>
<p>Now we should modify the parser. While expressing the test is very simple, actually creating the code that makes it pass is not that trivial. So, as an intermediate step, I will make you implement the code that allows the parser to check upcoming tokens.</p>
<p>One solution to this problem is to save the state of the parser, get as many tokens as we need, and then restore the status. Inspired by Git, I called those methods <code>stash</code> and <code>pop</code>. Put the following test in <code>tests/test_calc_lexer.py</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_lexer_can_stash_and_pop_status</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3 5'</span><span class="p">)</span>
<span class="n">l</span><span class="o">.</span><span class="n">stash</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span> <span class="o">==</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">INTEGER</span><span class="p">,</span> <span class="s1">'3'</span><span class="p">)</span>
</code></pre></div>
<p>As you can see the <code>get_token</code> call between <code>stash</code> and <code>pop</code> doesn't leave any trace.</p>
<p>Once your code is working implement the second test. Create a method <code>peek_token</code> that performs all the previous actions together</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_lexer_can_peek_token</span><span class="p">():</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">clex</span><span class="o">.</span><span class="n">CalcLexer</span><span class="p">()</span>
<span class="n">l</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s1">'3 + 5'</span><span class="p">)</span>
<span class="n">l</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">l</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span> <span class="o">==</span> <span class="n">token</span><span class="o">.</span><span class="n">Token</span><span class="p">(</span><span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">,</span> <span class="s1">'+'</span><span class="p">)</span>
</code></pre></div>
<p>You can implement <code>peek_token</code> very easily leveraging <code>stash</code> and <code>pop</code>.</p>
<p>Now we are ready to face the test that covers nested operations, which goes in <code>tests/test_calc_parser.py</code></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_parse_expression_with_multiple_operations</span><span class="p">():</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">cpar</span><span class="o">.</span><span class="n">CalcParser</span><span class="p">()</span>
<span class="n">p</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">"2 + 3 - 4"</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">p</span><span class="o">.</span><span class="n">parse_expression</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">node</span><span class="o">.</span><span class="n">asdict</span><span class="p">()</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">2</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'-'</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>A note of warning: probably the first version of the code that makes this test pass will be horrible, as the logic involved is not trivial. Remember that your <strong>first goal is to make the test pass and then, with the battery of tests in your arsenal, to tidy up the code</strong>.</p>
<p>As usual, the last test involves the visitor</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_visitor_expression_with_multiple_operations</span><span class="p">():</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'binary'</span><span class="p">,</span>
<span class="s1">'left'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">3</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">4</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'-'</span>
<span class="p">}</span>
<span class="p">},</span>
<span class="s1">'right'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'integer'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="mi">200</span>
<span class="p">},</span>
<span class="s1">'operator'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="s1">'+'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">cvis</span><span class="o">.</span><span class="n">CalcVisitor</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">v</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="o">==</span> <span class="p">(</span><span class="mi">199</span><span class="p">,</span> <span class="s1">'integer'</span><span class="p">)</span>
</code></pre></div>
<p>What changes do you need to make to the <code>CalcVisitor</code> class? Why?</p>
<hr>
<h3 id="solution_6">Solution<a class="headerlink" href="#solution_6" title="Permanent link">¶</a></h3>
<p>I made no assumptions on the length of the tokens stream in <code>get_tokens</code>, so processing multiple tokens comes out of the box in the lexer.</p>
<p>Adding <code>stash</code> and <code>pop</code> is not very complex, as the tests show exactly what we need to save and retrieve. Here I leverage the <code>position</code> attribute and the <code>goto</code> functions of the <code>TextBuffer</code> class.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">CalcLexer</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">text</span><span class="o">=</span><span class="s1">''</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span> <span class="o">=</span> <span class="n">text_buffer</span><span class="o">.</span><span class="n">TextBuffer</span><span class="p">(</span><span class="n">text</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_status</span> <span class="o">=</span> <span class="p">[]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_current_token</span> <span class="o">=</span> <span class="kc">None</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">_current_status</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">status</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">status</span><span class="p">[</span><span class="s1">'text_storage'</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">position</span>
<span class="n">status</span><span class="p">[</span><span class="s1">'current_token'</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_current_token</span>
<span class="k">return</span> <span class="n">status</span>
<span class="k">def</span> <span class="nf">stash</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_status</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_current_status</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">pop</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">status</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_status</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_text_storage</span><span class="o">.</span><span class="n">goto</span><span class="p">(</span><span class="o">*</span><span class="n">status</span><span class="p">[</span><span class="s1">'text_storage'</span><span class="p">])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_current_token</span> <span class="o">=</span> <span class="n">status</span><span class="p">[</span><span class="s1">'current_token'</span><span class="p">]</span>
</code></pre></div>
<p>Once <code>stash</code> and <code>pop</code> are in place implementing <code>peek_token</code> is trivial</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">peek_token</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">stash</span><span class="p">()</span>
<span class="n">token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_token</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">pop</span><span class="p">()</span>
<span class="k">return</span> <span class="n">token</span>
</code></pre></div>
<p>Finally, <code>peek_token</code> allows me to add support for multiple expressions in the parser.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">parse_expression</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">left</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">while</span> <span class="n">next_token</span><span class="o">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">clex</span><span class="o">.</span><span class="n">LITERAL</span><span class="p">:</span>
<span class="n">operator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_addsymbol</span><span class="p">()</span>
<span class="n">right</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">parse_integer</span><span class="p">()</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">BinaryNode</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">operator</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
<span class="n">next_token</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">lexer</span><span class="o">.</span><span class="n">peek_token</span><span class="p">()</span>
<span class="k">return</span> <span class="n">left</span>
</code></pre></div>
<hr>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>Phew! That was something, wasn't it? I think so, we went from nothing to a trivial calculator, but the engine we have under the bonnet is clearly powerful, so I'm already looking forward to implementing more complex syntax elements, like round brackets, multiplication, division, not to mention that sooner or later this should become a language, so we will need variables, functions, scopes, and so on.</p>
<p>The code I developed in this post is available on the GitHub repository tagged with <code>part1</code> (<a href="https://github.com/lgiordani/smallcalc/tree/part1">link</a>).</p>
<p>Well, See you in the next post of the series, then!</p>
<h2 id="resources">Resources<a class="headerlink" href="#resources" title="Permanent link">¶</a></h2>
<p>Some links on compilers history</p>
<ul>
<li><a href="http://gcc.gnu.org/wiki/History">GCC history</a></li>
<li><a href="https://en.wikipedia.org/wiki/History_of_compiler_construction">History of compiler construction on Wikipedia</a></li>
</ul>
<p>Tutorials and analysis of compilers and parsers</p>
<ul>
<li>The beautiful <a href="https://ruslanspivak.com/lsbasi-part1/">"Let’s Build A Simple Interpreter"</a> series by Ruslan Spivak. Thanks Ruslan!</li>
<li>How to implement a programming language in JavaScript <a href="http://lisperator.net/pltut/">on Lisperator.net</a> by Mihai Bazon.</li>
<li><a href="http://www.buildyourownlisp.com/">Build Your Own Lisp</a></li>
<li><a href="http://blog.reverberate.org/2013/07/ll-and-lr-parsing-demystified.html">LL and LR Parsing Demystified</a> by Josh Haberman.</li>
</ul>
<p>Grammars</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form">Backus-Naur form on Wikipedia</a></li>
<li><a href="https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form">Extended Backus-Naur form on Wikipedia</a></li>
<li><a href="http://matt.might.net/articles/grammars-bnf-ebnf/">The language of languages</a></li>
</ul>
<h2 id="updates">Updates<a class="headerlink" href="#updates" title="Permanent link">¶</a></h2>
<p>2017-12-24: Victor Uriarte (<a href="https://github.com/vmuriart">vmuriart</a>) spotted an important issue in a previous version of the post. The last two tests (<code>test_parse_expression_with_multiple_operations</code> and <code>test_visitor_expression_with_multiple_operations</code>) used a right-growing tree instead of a left-growing one. The problem with a right-growing tree is that an operator affects <em>everything</em> is on the right side, that is the whole rest of the operation. Thus, an operation like <code>10 - 1 + 1</code> would become <code>10 - (1 + 1)</code>, and the result is obviously different. I fixed the tests and the solution I give in the next posts. You can read Victor's issue <a href="https://github.com/lgiordani/smallcalc/issues/4">here</a>. Thanks Victor for spotting it!</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>