The Digital Cat - refactoringhttps://www.thedigitalcatonline.com/2023-09-03T19:00:00+02:00Adventures of a curious cat in the land of programmingTDD in Python with pytest - Part 52020-09-21T10:30:00+02:002021-03-06T19:00:00+00:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-09-21:/blog/2020/09/21/tdd-in-python-with-pytest-part-5/<p>This is the fifth and last post in the series "TDD in Python with pytest" where I develop a simple project following a strict TDD methodology. The posts come from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a> and have been reviewed to get rid of some bad naming choices of the …</p><p>This is the fifth and last post in the series "TDD in Python with pytest" where I develop a simple project following a strict TDD methodology. The posts come from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a> and have been reviewed to get rid of some bad naming choices of the version published in the book.</p>
<p>You can find the first post <a href="https://www.thedigitalcatonline.com/blog/2020/09/10/tdd-in-python-with-pytest-part-1/">here</a>.</p>
<p>In this post I will conclude the discussion about mocks introducing patching.</p>
<h2 id="patching">Patching<a class="headerlink" href="#patching" title="Permanent link">¶</a></h2>
<p>Mocks are very simple to introduce in your tests whenever your objects accept classes or instances from outside. In that case, as shown in the previous sections, you just have to instantiate the class <code>Mock</code> and pass the resulting object to your system. However, when the external classes instantiated by your library are hardcoded this simple trick does not work. In this case you have no chance to pass a fake object instead of the real one.</p>
<p>This is exactly the case addressed by patching. Patching, in a testing framework, means to replace a globally reachable object with a mock, thus achieving the goal of having the code run unmodified, while part of it has been hot swapped, that is, replaced at run time.</p>
<h3 id="a-warm-up-example">A warm-up example<a class="headerlink" href="#a-warm-up-example" title="Permanent link">¶</a></h3>
<p>Clone the repository <code>fileinfo</code> that you can find <a href="https://github.com/lgiordani/fileinfo">here</a> and move to the branch <code>develop</code>. As I did for the project <code>simple_calculator</code>, the branch <code>master</code> contains the full solution, and I use it to maintain the repository, but if you want to code along you need to start from scratch. If you prefer, you can clearly clone it on GitHub and make your own copy of the repository.</p>
<div class="highlight"><pre><span></span><code>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/lgiordani/fileinfo
<span class="nb">cd</span><span class="w"> </span>fileinfo
git<span class="w"> </span>checkout<span class="w"> </span>--track<span class="w"> </span>origin/develop
</code></pre></div>
<p>Create a virtual environment following your preferred process and install the requirements</p>
<div class="highlight"><pre><span></span><code>pip<span class="w"> </span>install<span class="w"> </span>-r<span class="w"> </span>requirements/dev.txt
</code></pre></div>
<p>You should at this point be able to run</p>
<div class="highlight"><pre><span></span><code>pytest<span class="w"> </span>-svv
</code></pre></div>
<p>and get an output like</p>
<div class="highlight"><pre><span></span><code>=============================== test session starts ===============================
platform linux -- Python XXXX, pytest-XXXX, py-XXXX, pluggy-XXXX --
fileinfo/venv3/bin/python3
cachedir: .cache
rootdir: fileinfo, inifile: pytest.ini
plugins: cov-XXXX
collected 0 items
============================== no tests ran in 0.02s ==============================
</code></pre></div>
<p>Let us start with a very simple example. Patching can be complex to grasp at the beginning so it is better to start learning it with trivial use cases. The purpose of this library is to develop a simple class that returns information about a given file. The class shall be instantiated with the file path, which can be relative.</p>
<p>The starting point is the class with the method <code>__init__</code>. If you want you can develop the class using TDD, but for the sake of brevity I will not show here all the steps that I followed. This is the set of tests I have in <code>tests/test_fileinfo.py</code></p>
<div class="highlight"><span class="filename">tests/test_fileinfo.py</span><pre><span></span><code><span class="kn">from</span> <span class="nn">fileinfo.fileinfo</span> <span class="kn">import</span> <span class="n">FileInfo</span>
<span class="k">def</span> <span class="nf">test_init</span><span class="p">():</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">filename</span> <span class="o">==</span> <span class="n">filename</span>
<span class="k">def</span> <span class="nf">test_init_relative</span><span class="p">():</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">relative_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">relative_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">filename</span> <span class="o">==</span> <span class="n">filename</span>
</code></pre></div>
<p>and this is the code of the class <code>FileInfo</code> in the file <code>fileinfo/fileinfo.py</code></p>
<div class="highlight"><span class="filename">fileinfo/fileinfo.py</span><pre><span></span><code><span class="kn">import</span> <span class="nn">os</span>
<span class="k">class</span> <span class="nc">FileInfo</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">path</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">original_path</span> <span class="o">=</span> <span class="n">path</span>
<span class="bp">self</span><span class="o">.</span><span class="n">filename</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">basename</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>
</code></pre></div>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/fileinfo/tree/first-version">first-version</a></p>
<p>As you can see the class is extremely simple, and the tests are straightforward. So far I didn't add anything new to what we discussed in the previous posts.</p>
<p>Now I want the method <code>get_info</code> to return a tuple with the file name, the original path the class was instantiated with, and the absolute path of the file. Pretending we are in the directory <code>/some/absolute/path</code>, the class should work as shown here</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="s1">'../book_list.txt'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span>
<span class="p">(</span><span class="s1">'book_list.txt'</span><span class="p">,</span> <span class="s1">'../book_list.txt'</span><span class="p">,</span> <span class="s1">'/some/absolute'</span><span class="p">)</span>
</code></pre></div>
<p>You can quickly realise that you have a problem writing the test. There is no way to easily test something as "the absolute path", since the outcome of the function called in the test is supposed to vary with the path of the test itself. Let us try to write part of the test</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_info</span><span class="p">():</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">original_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span> <span class="o">==</span> <span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">original_path</span><span class="p">,</span> <span class="s1">'???'</span><span class="p">)</span>
</code></pre></div>
<p>where the <code>'???'</code> string highlights that I cannot put something sensible to test the absolute path of the file.</p>
<p>Patching is the way to solve this problem. You know that the function will use some code to get the absolute path of the file. So, within the scope of this test only, you can replace that code with something different and perform the test. Since the replacement code has a known outcome writing the test is now possible.</p>
<p>Patching, thus, means to inform Python that during the execution of a specific portion of the code you want a globally accessible module/object replaced by a mock. Let's see how we can use it in our example</p>
<div class="highlight"><span class="filename">tests/test_fileinfo.py</span><pre><span></span><code><span class="kn">from</span> <span class="nn">unittest.mock</span> <span class="kn">import</span> <span class="n">patch</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">test_get_info</span><span class="p">():</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">original_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="k">with</span> <span class="n">patch</span><span class="p">(</span><span class="s1">'os.path.abspath'</span><span class="p">)</span> <span class="k">as</span> <span class="n">abspath_mock</span><span class="p">:</span>
<span class="n">test_abspath</span> <span class="o">=</span> <span class="s1">'some/abs/path'</span>
<span class="n">abspath_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_abspath</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span> <span class="o">==</span> <span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">original_path</span><span class="p">,</span> <span class="n">test_abspath</span><span class="p">)</span>
</code></pre></div>
<p>You clearly see the context in which the patching happens, as it is enclosed in a <code>with</code> statement. Inside this statement the module <code>os.path.abspath</code> will be replaced by a mock created by the function <code>patch</code> and called <code>abspath_mock</code>. So, while Python executes the lines of code enclosed by the statement <code>with</code> any call to <code>os.path.abspath</code> will return the object <code>abspath_mock</code>.</p>
<p>The first thing we can do, then, is to give the mock a known <code>return_value</code>. This way we solve the issue that we had with the initial code, that is using an external component that returns an unpredictable result. The line</p>
<div class="highlight"><span class="filename">tests/test_fileinfo.py</span><pre><span></span><code><span class="kn">from</span> <span class="nn">unittest.mock</span> <span class="kn">import</span> <span class="n">patch</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">test_get_info</span><span class="p">():</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">original_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="k">with</span> <span class="n">patch</span><span class="p">(</span><span class="s1">'os.path.abspath'</span><span class="p">)</span> <span class="k">as</span> <span class="n">abspath_mock</span><span class="p">:</span>
<span class="n">test_abspath</span> <span class="o">=</span> <span class="s1">'some/abs/path'</span>
<span class="hll"> <span class="n">abspath_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_abspath</span>
</span> <span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span> <span class="o">==</span> <span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">original_path</span><span class="p">,</span> <span class="n">test_abspath</span><span class="p">)</span>
</code></pre></div>
<p>instructs the patching mock to return the given string as a result, regardless of the real values of the file under consideration. </p>
<p>The code that make the test pass is</p>
<div class="highlight"><span class="filename">fileinfo/fileinfo.py</span><pre><span></span><code><span class="k">class</span> <span class="nc">FileInfo</span><span class="p">:</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">get_info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">filename</span><span class="p">,</span>
<span class="bp">self</span><span class="o">.</span><span class="n">original_path</span><span class="p">,</span>
<span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">abspath</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">original_path</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p>When this code is executed by the test the function <code>os.path.abspath</code> is replaced at run time by the mock that we prepared there, which basically ignores the input value <code>self.original_path</code> and returns the fixed value it was instructed to use.</p>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/fileinfo/tree/patch-with-context-manager">patch-with-context-manager</a></p>
<p>It is worth at this point discussing outgoing messages again. The code that we are considering here is a clear example of an outgoing query, as the method <code>get_info</code> is not interested in changing the status of the external component. In the previous post we reached the conclusion that testing the return value of outgoing queries is pointless and should be avoided. With <code>patch</code> we are replacing the external component with something that we know, using it to test that our object correctly handles the value returned by the outgoing query. We are thus not testing the external component, as it has been replaced, and we are definitely not testing the mock, as its return value is already known.</p>
<p>Obviously to write the test you have to know that you are going to use the function <code>os.path.abspath</code>, so patching is somehow a "less pure" practice in TDD. In pure OOP/TDD you are only concerned with the external behaviour of the object, and not with its internal structure. This example, however, shows that this pure approach has some limitations that you have to cope with, and patching is a clean way to do it.</p>
<h2 id="the-patching-decorator">The patching decorator<a class="headerlink" href="#the-patching-decorator" title="Permanent link">¶</a></h2>
<p>The function <code>patch</code> we imported from the module <code>unittest.mock</code> is very powerful, as it can temporarily replace an external object. If the replacement has to or can be active for the whole test, there is a cleaner way to inject your mocks, which is to use <code>patch</code> as a function decorator.</p>
<p>This means that you can decorate the test function, passing as argument the same argument you would pass if <code>patch</code> was used in a <code>with</code> statement. This requires however a small change in the test function prototype, as it has to receive an additional argument, which will become the mock.</p>
<p>Let's change <code>test_get_info</code>, removing the statement <code>with</code> and decorating the function with <code>patch</code></p>
<div class="highlight"><span class="filename">tests/test_fileinfo.py</span><pre><span></span><code><span class="nd">@patch</span><span class="p">(</span><span class="s1">'os.path.abspath'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_get_info</span><span class="p">(</span><span class="n">abspath_mock</span><span class="p">):</span>
<span class="n">test_abspath</span> <span class="o">=</span> <span class="s1">'some/abs/path'</span>
<span class="n">abspath_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_abspath</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">original_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span> <span class="o">==</span> <span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">original_path</span><span class="p">,</span> <span class="n">test_abspath</span><span class="p">)</span>
</code></pre></div>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/fileinfo/tree/patch-with-function-decorator">patch-with-function-decorator</a></p>
<p>As you can see the decorator <code>patch</code> works like a big <code>with</code> statement for the whole function. The argument <code>abspath_mock</code> passed to the test becomes internally the mock that replaces <code>os.path.abspath</code>. Obviously this way you replace <code>os.path.abspath</code> for the whole function, so you have to decide case by case which form of the function <code>patch</code> you need to use.</p>
<h2 id="multiple-patches">Multiple patches<a class="headerlink" href="#multiple-patches" title="Permanent link">¶</a></h2>
<p>You can patch more that one object in the same test. For example, consider the case where the method <code>get_info</code> calls <code>os.path.getsize</code> in addition to <code>os.path.abspath</code> in order to return the size of the file. You have at this point two different outgoing queries, and you have to replace both with mocks to make your class work during the test.</p>
<p>This can be easily done with an additional <code>patch</code> decorator</p>
<div class="highlight"><span class="filename">tests/test_fileinfo.py</span><pre><span></span><code><span class="nd">@patch</span><span class="p">(</span><span class="s1">'os.path.getsize'</span><span class="p">)</span>
<span class="nd">@patch</span><span class="p">(</span><span class="s1">'os.path.abspath'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_get_info</span><span class="p">(</span><span class="n">abspath_mock</span><span class="p">,</span> <span class="n">getsize_mock</span><span class="p">):</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">original_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="n">test_abspath</span> <span class="o">=</span> <span class="s1">'some/abs/path'</span>
<span class="n">abspath_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_abspath</span>
<span class="n">test_size</span> <span class="o">=</span> <span class="mi">1234</span>
<span class="n">getsize_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_size</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span> <span class="o">==</span> <span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">original_path</span><span class="p">,</span> <span class="n">test_abspath</span><span class="p">,</span> <span class="n">test_size</span><span class="p">)</span>
</code></pre></div>
<p>Please note that the decorator which is nearest to the function is applied first. Always remember that the decorator syntax with <code>@</code> is a shortcut to replace the function with the output of the decorator, so two decorators result in</p>
<div class="highlight"><pre><span></span><code><span class="nd">@decorator1</span>
<span class="nd">@decorator2</span>
<span class="k">def</span> <span class="nf">myfunction</span><span class="p">():</span>
<span class="k">pass</span>
</code></pre></div>
<p>which is a shorcut for</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">myfunction</span><span class="p">():</span>
<span class="k">pass</span>
<span class="n">myfunction</span> <span class="o">=</span> <span class="n">decorator1</span><span class="p">(</span><span class="n">decorator2</span><span class="p">(</span><span class="n">myfunction</span><span class="p">))</span>
</code></pre></div>
<p>This explains why, in the test code, the function receives first <code>abspath_mock</code> and then <code>getsize_mock</code>. The first decorator applied to the function is the patch of <code>os.path.abspath</code>, which appends the mock that we call <code>abspath_mock</code>. Then the patch of <code>os.path.getsize</code> is applied and this appends its own mock.</p>
<p>The code that makes the test pass is</p>
<div class="highlight"><span class="filename">fileinfo/fileinfo.py</span><pre><span></span><code><span class="k">class</span> <span class="nc">FileInfo</span><span class="p">:</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">get_info</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">filename</span><span class="p">,</span>
<span class="bp">self</span><span class="o">.</span><span class="n">original_path</span><span class="p">,</span>
<span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">abspath</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">original_path</span><span class="p">),</span>
<span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">getsize</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">original_path</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/fileinfo/tree/multiple-patches">multiple-patches</a></p>
<p>We can write the above test using two <code>with</code> statements as well</p>
<div class="highlight"><span class="filename">tests/test_fileinfo.py</span><pre><span></span><code><span class="k">def</span> <span class="nf">test_get_info</span><span class="p">():</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">original_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="k">with</span> <span class="n">patch</span><span class="p">(</span><span class="s1">'os.path.abspath'</span><span class="p">)</span> <span class="k">as</span> <span class="n">abspath_mock</span><span class="p">:</span>
<span class="n">test_abspath</span> <span class="o">=</span> <span class="s1">'some/abs/path'</span>
<span class="n">abspath_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_abspath</span>
<span class="k">with</span> <span class="n">patch</span><span class="p">(</span><span class="s1">'os.path.getsize'</span><span class="p">)</span> <span class="k">as</span> <span class="n">getsize_mock</span><span class="p">:</span>
<span class="n">test_size</span> <span class="o">=</span> <span class="mi">1234</span>
<span class="n">getsize_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_size</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span> <span class="o">==</span> <span class="p">(</span>
<span class="n">filename</span><span class="p">,</span>
<span class="n">original_path</span><span class="p">,</span>
<span class="n">test_abspath</span><span class="p">,</span>
<span class="n">test_size</span>
<span class="p">)</span>
</code></pre></div>
<p>Using more than one <code>with</code> statement, however, makes the code difficult to read, in my opinion, so in general I prefer to avoid complex <code>with</code> trees if I do not really need to use a limited scope of the patching.</p>
<h2 id="checking-call-parameters">Checking call parameters<a class="headerlink" href="#checking-call-parameters" title="Permanent link">¶</a></h2>
<p>When you patch, your internal algorithm is not executed, as the patched method just return the values it has been instructed to return. This is connected to what we said about testing external systems, so everything is good, but while we don't want to test the internals of the module <code>os.path</code>, we want to be sure that we are passing the correct values to the external methods.</p>
<p>This is why mocks provide methods like <code>assert_called_with</code> (and other similar methods), through which we can check the values passed to a patched method when it is called. Let's add the checks to the test</p>
<div class="highlight"><span class="filename">tests/test_fileinfo.py</span><pre><span></span><code><span class="nd">@patch</span><span class="p">(</span><span class="s1">'os.path.getsize'</span><span class="p">)</span>
<span class="nd">@patch</span><span class="p">(</span><span class="s1">'os.path.abspath'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_get_info</span><span class="p">(</span><span class="n">abspath_mock</span><span class="p">,</span> <span class="n">getsize_mock</span><span class="p">):</span>
<span class="n">test_abspath</span> <span class="o">=</span> <span class="s1">'some/abs/path'</span>
<span class="n">abspath_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_abspath</span>
<span class="n">filename</span> <span class="o">=</span> <span class="s1">'somefile.ext'</span>
<span class="n">original_path</span> <span class="o">=</span> <span class="s1">'../</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">filename</span><span class="p">)</span>
<span class="n">test_size</span> <span class="o">=</span> <span class="mi">1234</span>
<span class="n">getsize_mock</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_size</span>
<span class="n">fi</span> <span class="o">=</span> <span class="n">FileInfo</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="n">info</span> <span class="o">=</span> <span class="n">fi</span><span class="o">.</span><span class="n">get_info</span><span class="p">()</span>
<span class="n">abspath_mock</span><span class="o">.</span><span class="n">assert_called_with</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="n">getsize_mock</span><span class="o">.</span><span class="n">assert_called_with</span><span class="p">(</span><span class="n">original_path</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">info</span> <span class="o">==</span> <span class="p">(</span><span class="n">filename</span><span class="p">,</span> <span class="n">original_path</span><span class="p">,</span> <span class="n">test_abspath</span><span class="p">,</span> <span class="n">test_size</span><span class="p">)</span>
</code></pre></div>
<p>As you can see, I first invoke <code>fi.get_info</code> storing the result in the variable <code>info</code>, check that the patched methods have been called witht the correct parameters, and then assert the format of its output.</p>
<p>The test passes, confirming that we are passing the correct values.</p>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/fileinfo/tree/addding-checks-for-input-values">addding-checks-for-input-values</a></p>
<h2 id="patching-immutable-objects">Patching immutable objects<a class="headerlink" href="#patching-immutable-objects" title="Permanent link">¶</a></h2>
<p>The most widespread version of Python is CPython, which is written, as the name suggests, in C. Part of the standard library is also written in C, while the rest is written in Python itself.</p>
<p>The objects (classes, modules, functions, etc.) that are implemented in C are shared between interpreters, and this requires those objects to be immutable, so that you cannot alter them at runtime from a single interpreter.</p>
<p>An example of this immutability can be given easily using a Python console</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">a</span> <span class="o">=</span> <span class="mi">1</span>
<span class="o">>>></span> <span class="n">a</span><span class="o">.</span><span class="n">conjugate</span> <span class="o">=</span> <span class="mi">5</span>
<span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span>
<span class="n">File</span> <span class="s2">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="ne">AttributeError</span><span class="p">:</span> <span class="s1">'int'</span> <span class="nb">object</span> <span class="n">attribute</span> <span class="s1">'conjugate'</span> <span class="ow">is</span> <span class="n">read</span><span class="o">-</span><span class="n">only</span>
</code></pre></div>
<p>Here I'm trying to replace a method with an integer, which is pointless per se, but clearly shows the issue we are facing.</p>
<p>What has this immutability to do with patching? What <code>patch</code> does is actually to temporarily replace an attribute of an object (method of a class, class of a module, etc.), which also means that if we try to replace an attribute in an immutable object the patching action will fail.</p>
<p>A typical example of this problem is the module <code>datetime</code>, which is also one of the best candidates for patching, since the output of time functions is by definition time-varying.</p>
<p>Let me show the problem with a simple class that logs operations. I will temporarily break the TDD methodology writing first the class and then the tests, so that you can appreciate the problem.</p>
<p>Create a file called <code>logger.py</code> and put there the following code</p>
<div class="highlight"><span class="filename">fileinfo/logger.py</span><pre><span></span><code><span class="kn">import</span> <span class="nn">datetime</span>
<span class="k">class</span> <span class="nc">Logger</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">messages</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">def</span> <span class="nf">log</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">message</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">messages</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">(),</span> <span class="n">message</span><span class="p">))</span>
</code></pre></div>
<p>This is pretty simple, but testing this code is problematic, because the method <code>log</code> produces results that depend on the actual execution time. The call to <code>datetime.datetime.now</code> is however an outgoing query, and as such it can be replaced by a mock with <code>patch</code>.</p>
<p>If we try to do it, however, we will have a bitter surprise. This is the test code, that you can put in <code>tests/test_logger.py</code></p>
<div class="highlight"><span class="filename">tests/test_logger.py</span><pre><span></span><code><span class="kn">from</span> <span class="nn">unittest.mock</span> <span class="kn">import</span> <span class="n">patch</span>
<span class="kn">from</span> <span class="nn">fileinfo.logger</span> <span class="kn">import</span> <span class="n">Logger</span>
<span class="nd">@patch</span><span class="p">(</span><span class="s1">'datetime.datetime.now'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_log</span><span class="p">(</span><span class="n">mock_now</span><span class="p">):</span>
<span class="n">test_now</span> <span class="o">=</span> <span class="mi">123</span>
<span class="n">test_message</span> <span class="o">=</span> <span class="s2">"A test message"</span>
<span class="n">mock_now</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_now</span>
<span class="n">test_logger</span> <span class="o">=</span> <span class="n">Logger</span><span class="p">()</span>
<span class="n">test_logger</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">test_message</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">test_logger</span><span class="o">.</span><span class="n">messages</span> <span class="o">==</span> <span class="p">[(</span><span class="n">test_now</span><span class="p">,</span> <span class="n">test_message</span><span class="p">)]</span>
</code></pre></div>
<p>When you try to execute this test you will get the following error</p>
<div class="highlight"><pre><span></span><code><span class="n">TypeError</span><span class="o">:</span><span class="w"> </span><span class="n">can</span><span class="s1">'t set attributes of built-in/extension type '</span><span class="n">datetime</span><span class="o">.</span><span class="na">datetime</span><span class="err">'</span>
</code></pre></div>
<p>which is raised because patching tries to replace the function <code>now</code> in <code>datetime.datetime</code> with a mock, and since the module is immutable this operation fails.</p>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/fileinfo/tree/initial-logger-not-working">initial-logger-not-working</a></p>
<p>There are several ways to address this problem. All of them, however, start from the fact that importing or subclassing an immutable object gives you a mutable "copy" of that object.</p>
<p>The easiest example in this case is the module <code>datetime</code> itself. In the function <code>test_log</code> we tried to patch directly the object <code>datetime.datetime.now</code>, affecting the builtin module <code>datetime</code>. The file <code>logger.py</code>, however, does import <code>datetime</code>, so this latter becomes a local symbol in the module <code>logger</code>. This is exactly the key for our patching. Let us change the code to</p>
<div class="highlight"><span class="filename">tests/test_logger.py</span><pre><span></span><code><span class="nd">@patch</span><span class="p">(</span><span class="s1">'fileinfo.logger.datetime.datetime'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_log</span><span class="p">(</span><span class="n">mock_datetime</span><span class="p">):</span>
<span class="n">test_now</span> <span class="o">=</span> <span class="mi">123</span>
<span class="n">test_message</span> <span class="o">=</span> <span class="s2">"A test message"</span>
<span class="n">mock_datetime</span><span class="o">.</span><span class="n">now</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">test_now</span>
<span class="n">test_logger</span> <span class="o">=</span> <span class="n">Logger</span><span class="p">()</span>
<span class="n">test_logger</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">test_message</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">test_logger</span><span class="o">.</span><span class="n">messages</span> <span class="o">==</span> <span class="p">[(</span><span class="n">test_now</span><span class="p">,</span> <span class="n">test_message</span><span class="p">)]</span>
</code></pre></div>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/fileinfo/tree/correct-patching">correct-patching</a></p>
<p>If you run the test now, you can see that the patching works. What we did was to inject our mock in <code>fileinfo.logger.datetime.datetime</code> instead of <code>datetime.datetime.now</code>. Two things changed, thus, in our test. First, we are patching the module imported in the file <code>logger.py</code> and not the module provided globally by the Python interpreter. Second, we have to patch the whole module because this is what is imported by the file <code>logger.py</code>. If you try to patch <code>fileinfo.logger.datetime.datetime.now</code> you will find that it is still immutable.</p>
<p>Another possible solution to this problem is to create a function that invokes the immutable object and returns its value. This last function can be easily patched, because it just uses the builtin objects and thus is not immutable. This solution, however, requires changing the source code to allow testing, which is far from being optimal. Obviously it is better to introduce a small change in the code and have it tested than to leave it untested, but whenever is possible I try as much as possible to avoid solutions that introduce code which wouldn't be required without tests.</p>
<h2 id="mocks-and-proper-tdd">Mocks and proper TDD<a class="headerlink" href="#mocks-and-proper-tdd" title="Permanent link">¶</a></h2>
<p>Following a strict TDD methodology means writing a test before writing the code that passes that test. This can be done because we use the object under test as a black box, interacting with it through its API, and thus not knowing anything of its internal structure.</p>
<p>When we mock systems we break this assumption. In particular we need to open the black box every time we need to patch an hardcoded external system. Let's say, for example, that the object under test creates a temporary directory to perform some data processing. This is a detail of the implementation and we are not supposed to know it while testing the object, but since we need to mock the file creation to avoid interaction with the external system (storage) we need to become aware of what happens internally.</p>
<p>This also means that writing a test for the object before writing the implementation of the object itself is difficult. Pretty often, thus, such objects are built with TDD but iteratively, where mocks are introduced after the code has been written.</p>
<p>While this is a violation of the strict TDD methodology, I don't consider it a bad practice. TDD helps us to write better code consistently, but good code can be written even without tests. The real outcome of TDD is a test suite that is capable of detecting regressions or the removal of important features in the future. This means that breaking strict TDD for a small part of the code (patching objects) will not affect the real result of the process, only change the way we achieve it.</p>
<h2 id="a-warning">A warning<a class="headerlink" href="#a-warning" title="Permanent link">¶</a></h2>
<p>Mocks are a good way to approach parts of the system that are not under test but that are still part of the code that we are running. This is particularly true for parts of the code that we wrote, which internal structure is ultimately known. When the external system is complex and completely detached from our code, mocking starts to become complicated and the risk is that we spend more time faking parts of the system than actually writing code.</p>
<p>In this cases we definitely crossed the barrier between unit testing and integration testing. You may see mocks as the bridge between the two, as they allow you to keep unit-testing parts that are naturally connected ("integrated") with external systems, but there is a point where you need to recognise that you need to change approach.</p>
<p>This threshold is not fixed, and I can't give you a rule to recognise it, but I can give you some advice. First of all keep an eye on how many things you need to mock to make a test run, as an increasing number of mocks in a single test is definitely a sign of something wrong in the testing approach. My rule of thumb is that when I have to create more than 3 mocks, an alarm goes off in my mind and I start questioning what I am doing.</p>
<p>The second advice is to always consider the complexity of the mocks. You may find yourself patching a class but then having to create monsters like <code>cls_mock().func1().func2().func3.assert_called_with(x=42)</code> which is a sign that the part of the system that you are mocking is deep into some code that you cannot really access, because you don't know it's internal mechanisms.</p>
<p>The third advice is to consider mocks as "hooks" that you throw at the external system, and that break its hull to reach its internal structure. These hooks are obviously against the assumption that we can interact with a system knowing only its external behaviour, or its API. As such, you should keep in mind that each mock you create is a step back from this perfect assumption, thus "breaking the spell" of the decoupled interaction. Doing this makes it increasingly complex to create mocks, and this will contribute to keep you aware of what you are doing (or overdoing).</p>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>Mocks are a very powerful tool that allows us to test code that contains outgoing messages. In particular they allow us to test the arguments of outgoing commands. Patching is a good way to overcome the fact that some external components are hardcoded in our code and are thus unreachable through the arguments passed to the classes or the methods under analysis.</p>
<h2 id="updates">Updates<a class="headerlink" href="#updates" title="Permanent link">¶</a></h2>
<p>2021-03-06 GitHub user <a href="https://github.com/4myhw">4myhw</a> spotted an inconsistency between the code on GitHub and the code in the post. Thanks!</p>
<p>2022-11-19 GitHub user <a href="https://github.com/rioj7">rioj7</a> found and corrected a typo. Thanks!</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>TDD in Python with pytest - Part 42020-09-17T11:30:00+02:002020-09-17T11:30:00+02:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-09-17:/blog/2020/09/17/tdd-in-python-with-pytest-part-4/<p>This is the fourth post in the series "TDD in Python with pytest" where I develop a simple project following a strict TDD methodology. The posts come from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a> and have been reviewed to get rid of some bad naming choices of the version published in the book.</p>
<p>You can find the first post <a href="https://www.thedigitalcatonline.com/blog/2020/09/10/tdd-in-python-with-pytest-part-1/">here</a>.</p>
<p>In this post I will discuss a very interesting and useful testing tool: mocks.</p>
<h2 id="basic-concepts">Basic concepts<a class="headerlink" href="#basic-concepts" title="Permanent link">¶</a></h2>
<p>As we saw in the previous post the relationship between the component that we are testing and other components of the system can be complex. Sometimes idempotency and isolation are not easy to achieve, and testing outgoing commands requires to check the parameters sent to the external component, which is not trivial.</p>
<p>The main difficulty comes from the fact that your code is actually using the external system. When you run it in production the external system will provide the data that your code needs and the whole process can work as intended. During testing, however, you don't want to be bound to the external system, for the reasons explained in the previous post, but at the same time you need it to make your code work.</p>
<p>So, you face a complex issue. On the one hand your code is connected to the external system (be it hardcoded or chosen programmatically), but on the other hand you want it to run without the external system being active (or even present).</p>
<p>This problem can be solved with the use of mocks. A mock, in the testing jargon, is an object that simulates the behaviour of another (more complex) object. Wherever your code connects to an external system, during testing you can replace the latter with a mock, pretending the external system is there and properly checking that your component behaves like intended.</p>
<h2 id="first-steps">First steps<a class="headerlink" href="#first-steps" title="Permanent link">¶</a></h2>
<p>Let us try and work with a mock in Python and see what it can do. First of all fire up a Python shell and import the library </p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="kn">from</span> <span class="nn">unittest</span> <span class="kn">import</span> <span class="n">mock</span>
</code></pre></div>
<p>The main object that the library provides is <code>Mock</code> and you can instantiate it without any argument</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">m</span> <span class="o">=</span> <span class="n">mock</span><span class="o">.</span><span class="n">Mock</span><span class="p">()</span>
</code></pre></div>
<p>This object has the peculiar property of creating methods and attributes on the fly when you require them. Let us first look inside the object to get an idea of what it provides</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="nb">dir</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>
<span class="p">[</span>
<span class="s1">'assert_any_call'</span><span class="p">,</span> <span class="s1">'assert_called_once_with'</span><span class="p">,</span>
<span class="s1">'assert_called_with'</span><span class="p">,</span> <span class="s1">'assert_has_calls'</span><span class="p">,</span>
<span class="s1">'attach_mock'</span><span class="p">,</span> <span class="s1">'call_args'</span><span class="p">,</span> <span class="s1">'call_args_list'</span><span class="p">,</span>
<span class="s1">'call_count'</span><span class="p">,</span> <span class="s1">'called'</span><span class="p">,</span> <span class="s1">'configure_mock'</span><span class="p">,</span>
<span class="s1">'method_calls'</span><span class="p">,</span> <span class="s1">'mock_add_spec'</span><span class="p">,</span> <span class="s1">'mock_calls'</span><span class="p">,</span>
<span class="s1">'reset_mock'</span><span class="p">,</span> <span class="s1">'return_value'</span><span class="p">,</span> <span class="s1">'side_effect'</span>
<span class="p">]</span>
</code></pre></div>
<p>As you can see there are some methods which are already defined into the object <code>Mock</code>. Let's try to read a non-existent attribute</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span>
<span class="o"><</span><span class="n">Mock</span> <span class="n">name</span><span class="o">=</span><span class="s1">'mock.some_attribute'</span> <span class="nb">id</span><span class="o">=</span><span class="s1">'140222043808432'</span><span class="o">></span>
<span class="o">>>></span> <span class="nb">dir</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>
<span class="p">[</span>
<span class="s1">'assert_any_call'</span><span class="p">,</span> <span class="s1">'assert_called_once_with'</span><span class="p">,</span>
<span class="s1">'assert_called_with'</span><span class="p">,</span> <span class="s1">'assert_has_calls'</span><span class="p">,</span>
<span class="s1">'attach_mock'</span><span class="p">,</span> <span class="s1">'call_args'</span><span class="p">,</span> <span class="s1">'call_args_list'</span><span class="p">,</span>
<span class="s1">'call_count'</span><span class="p">,</span> <span class="s1">'called'</span><span class="p">,</span> <span class="s1">'configure_mock'</span><span class="p">,</span>
<span class="s1">'method_calls'</span><span class="p">,</span> <span class="s1">'mock_add_spec'</span><span class="p">,</span> <span class="s1">'mock_calls'</span><span class="p">,</span>
<span class="s1">'reset_mock'</span><span class="p">,</span> <span class="s1">'return_value'</span><span class="p">,</span> <span class="s1">'side_effect'</span><span class="p">,</span>
<span class="s1">'some_attribute'</span>
<span class="p">]</span>
</code></pre></div>
<p>As you can see this class is somehow different from what you are used to. First of all, its instances do not raise an <code>AttributeError</code> when asked for a non-existent attribute, but they happily return another instance of <code>Mock</code> itself. Second, the attribute you tried to access has now been created inside the object and accessing it returns the same mock object as before.</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span>
<span class="o"><</span><span class="n">Mock</span> <span class="n">name</span><span class="o">=</span><span class="s1">'mock.some_attribute'</span> <span class="nb">id</span><span class="o">=</span><span class="s1">'140222043808432'</span><span class="o">></span>
</code></pre></div>
<p>Mock objects are callables, which means that they may act both as attributes and as methods. If you try to call the mock, it just returns another mock with a name that includes parentheses to signal its callable nature</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="o"><</span><span class="n">Mock</span> <span class="n">name</span><span class="o">=</span><span class="s1">'mock.some_attribute()'</span> <span class="nb">id</span><span class="o">=</span><span class="s1">'140247621475856'</span><span class="o">></span>
</code></pre></div>
<p>As you can understand, such objects are the perfect tool to mimic other objects or systems, since they may expose any API without raising exceptions. To use them in tests, however, we need them to behave just like the original, which implies returning sensible values or performing real operations.</p>
<h2 id="simple-return-values">Simple return values<a class="headerlink" href="#simple-return-values" title="Permanent link">¶</a></h2>
<p>The simplest thing a mock can do for you is to return a given value every time you call one of its methods. This is configured setting the attribute <code>return_value</code> of a mock object</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="mi">42</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="mi">42</span>
</code></pre></div>
<p>Now, as you can see the object does not return a mock object any more, instead it just returns the static value stored in the attribute <code>return_value</code>. Since in Python everything is an object you can return here any type of value: simple types like an integer of a string, more complex structures like dictionaries or lists, classes that you defined, instances of those, or functions.</p>
<p>Pay attention that what the mock returns is exactly the object that it is instructed to use as return value. If the return value is a callable such as a function, calling the mock will return the function itself and not the result of the function. Let me give you an example</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="k">def</span> <span class="nf">print_answer</span><span class="p">():</span>
<span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="s2">"42"</span><span class="p">)</span>
<span class="o">...</span>
<span class="o">>>></span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="n">print_answer</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="o"><</span><span class="n">function</span> <span class="n">print_answer</span> <span class="n">at</span> <span class="mh">0x7f8df1e3f400</span><span class="o">></span>
</code></pre></div>
<p>As you can see calling <code>some_attribute</code> just returns the value stored in <code>return_value</code>, that is the function itself. This is not exactly what we were aiming for. To make the mock call the object that we use as a return value we have to use a slightly more complex attribute called <code>side_effect</code>.</p>
<h2 id="complex-return-values">Complex return values<a class="headerlink" href="#complex-return-values" title="Permanent link">¶</a></h2>
<p>The <code>side_effect</code> parameter of mock objects is a very powerful tool. It accepts three different flavours of objects: callables, iterables, and exceptions, and changes its behaviour accordingly.</p>
<p>If you pass an exception the mock will raise it</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="o">.</span><span class="n">side_effect</span> <span class="o">=</span> <span class="ne">ValueError</span><span class="p">(</span><span class="s1">'A custom value error'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span>
<span class="n">File</span> <span class="s2">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="n">File</span> <span class="s2">"/usr/lib/python3.6/unittest/mock.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">939</span><span class="p">,</span> <span class="ow">in</span> <span class="fm">__call__</span>
<span class="k">return</span> <span class="n">_mock_self</span><span class="o">.</span><span class="n">_mock_call</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">File</span> <span class="s2">"/usr/lib/python3.6/unittest/mock.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">995</span><span class="p">,</span> <span class="ow">in</span> <span class="n">_mock_call</span>
<span class="k">raise</span> <span class="n">effect</span>
<span class="ne">ValueError</span><span class="p">:</span> <span class="n">A</span> <span class="n">custom</span> <span class="n">value</span> <span class="n">error</span>
</code></pre></div>
<p>If you pass an iterable, such as for example a generator, a plain list, tuple, or similar objects, the mock will yield the values of that iterable, i.e. return every value contained in the iterable on subsequent calls of the mock.</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="o">.</span><span class="n">side_effect</span> <span class="o">=</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="mi">0</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="mi">1</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="mi">2</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="n">Traceback</span> <span class="p">(</span><span class="n">most</span> <span class="n">recent</span> <span class="n">call</span> <span class="n">last</span><span class="p">):</span>
<span class="n">File</span> <span class="s2">"<stdin>"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">1</span><span class="p">,</span> <span class="ow">in</span> <span class="o"><</span><span class="n">module</span><span class="o">></span>
<span class="n">File</span> <span class="s2">"/usr/lib/python3.6/unittest/mock.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">939</span><span class="p">,</span> <span class="ow">in</span> <span class="fm">__call__</span>
<span class="k">return</span> <span class="n">_mock_self</span><span class="o">.</span><span class="n">_mock_call</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="n">File</span> <span class="s2">"/usr/lib/python3.6/unittest/mock.py"</span><span class="p">,</span> <span class="n">line</span> <span class="mi">998</span><span class="p">,</span> <span class="ow">in</span> <span class="n">_mock_call</span>
<span class="n">result</span> <span class="o">=</span> <span class="nb">next</span><span class="p">(</span><span class="n">effect</span><span class="p">)</span>
<span class="ne">StopIteration</span>
</code></pre></div>
<p>As promised, the mock just returns every object found in the iterable (in this case a <code>range</code> object) one at a time until the generator is exhausted. According to the iterator protocol once every item has been returned the object raises the <code>StopIteration</code> exception, which means that you can safely use it in a loop.</p>
<p>Last, if you feed <code>side_effect</code> a callable, the latter will be executed with the parameters passed when calling the attribute. Let's consider again the simple example given in the previous section</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="k">def</span> <span class="nf">print_answer</span><span class="p">():</span>
<span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="s2">"42"</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="o">.</span><span class="n">side_effect</span> <span class="o">=</span> <span class="n">print_answer</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">()</span>
<span class="mi">42</span>
</code></pre></div>
<p>A slightly more complex example is that of a function with arguments</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="k">def</span> <span class="nf">print_number</span><span class="p">(</span><span class="n">num</span><span class="p">):</span>
<span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="s2">"Number:"</span><span class="p">,</span> <span class="n">num</span><span class="p">)</span>
<span class="o">...</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="o">.</span><span class="n">side_effect</span> <span class="o">=</span> <span class="n">print_number</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
<span class="n">Number</span><span class="p">:</span> <span class="mi">5</span>
</code></pre></div>
<p>As you can see the arguments passed to the attribute are directly used as arguments for the stored function. This is very powerful, especially if you stop thinking about "functions" and start considering "callables". Indeed, given the nature of Python objects we know that instantiating an object is not different from calling a function, which means that <code>side_effect</code> can be given a class and return a instance of it</p>
<div class="highlight"><pre><span></span><code><span class="o">>>></span> <span class="k">class</span> <span class="nc">Number</span><span class="p">:</span>
<span class="o">...</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">value</span><span class="p">):</span>
<span class="o">...</span> <span class="bp">self</span><span class="o">.</span><span class="n">_value</span> <span class="o">=</span> <span class="n">value</span>
<span class="o">...</span> <span class="k">def</span> <span class="nf">print_value</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="o">...</span> <span class="nb">print</span><span class="p">(</span><span class="s2">"Value:"</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_value</span><span class="p">)</span>
<span class="o">...</span>
<span class="o">>>></span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="o">.</span><span class="n">side_effect</span> <span class="o">=</span> <span class="n">Number</span>
<span class="o">>>></span> <span class="n">n</span> <span class="o">=</span> <span class="n">m</span><span class="o">.</span><span class="n">some_attribute</span><span class="p">(</span><span class="mi">26</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">n</span>
<span class="o"><</span><span class="n">__main__</span><span class="o">.</span><span class="n">Number</span> <span class="nb">object</span> <span class="n">at</span> <span class="mh">0x7f8df1aa4470</span><span class="o">></span>
<span class="o">>>></span> <span class="n">n</span><span class="o">.</span><span class="n">print_value</span><span class="p">()</span>
<span class="n">Value</span><span class="p">:</span> <span class="mi">26</span>
</code></pre></div>
<h2 id="asserting-calls">Asserting calls<a class="headerlink" href="#asserting-calls" title="Permanent link">¶</a></h2>
<p>As I explained in the previous post outgoing commands shall be tested checking the correctness of the message argument. This can be easily done with mocks, as these objects record every call that they receive and the arguments passed to it.</p>
<p>Let's see a practical example</p>
<div class="highlight"><pre><span></span><code><span class="kn">from</span> <span class="nn">unittest</span> <span class="kn">import</span> <span class="n">mock</span>
<span class="kn">import</span> <span class="nn">myobj</span>
<span class="k">def</span> <span class="nf">test_connect</span><span class="p">():</span>
<span class="n">external_obj</span> <span class="o">=</span> <span class="n">mock</span><span class="o">.</span><span class="n">Mock</span><span class="p">()</span>
<span class="n">myobj</span><span class="o">.</span><span class="n">MyObj</span><span class="p">(</span><span class="n">external_obj</span><span class="p">)</span>
<span class="n">external_obj</span><span class="o">.</span><span class="n">connect</span><span class="o">.</span><span class="n">assert_called_with</span><span class="p">()</span>
</code></pre></div>
<p>Here, the class <code>myobj.MyObj</code> needs to connect to an external object, for example a remote repository or a database. The only thing we need to know for testing purposes is if the class called the method <code>connect</code> of the external object without any parameter.</p>
<p>So the first thing we do in this test is to instantiate the mock object. This is a fake version of the external object, and its only purpose is to accept calls from the object <code>MyObj</code> under test and possibly return sensible values. Then we instantiate the class <code>MyObj</code> passing the external object. We expect the class to call the method <code>connect</code> so we express this expectation calling <code>external_obj.connect.assert_called_with</code>.</p>
<p>What happens behind the scenes? The class <code>MyObj</code> receives the fake external object and somewhere in its initialization process calls the method <code>connect</code> of the mock object. This call creates the method itself as a mock object. This new mock records the parameters used to call it and the subsequent call to its method <code>assert_called_with</code> checks that the method was called and that no parameters were passed.</p>
<p>In this case an object like</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">MyObj</span><span class="p">():</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">repo</span><span class="p">):</span>
<span class="n">repo</span><span class="o">.</span><span class="n">connect</span><span class="p">()</span>
</code></pre></div>
<p>would pass the test, as the object passed as <code>repo</code> is a mock that does nothing but record the calls. As you can see, the method <code>__init__</code> actually calls <code>repo.connect</code>, and <code>repo</code> is expected to be a full-featured external object that provides <code>connect</code> in its API. Calling <code>repo.connect</code> when <code>repo</code> is a mock object, instead, silently creates the method (as another mock object) and records that the method has been called once without arguments.</p>
<p>The method <code>assert_called_with</code> allows us to also check the parameters we passed when calling. To show this let us pretend that we expect the method <code>MyObj.setup</code> to call <code>setup(cache=True, max_connections=256)</code> on the external object. Remember that this is an outgoing command, so we are interested in checking the parameters and not the result.</p>
<p>The new test can be something like</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_setup</span><span class="p">():</span>
<span class="n">external_obj</span> <span class="o">=</span> <span class="n">mock</span><span class="o">.</span><span class="n">Mock</span><span class="p">()</span>
<span class="n">obj</span> <span class="o">=</span> <span class="n">myobj</span><span class="o">.</span><span class="n">MyObj</span><span class="p">(</span><span class="n">external_obj</span><span class="p">)</span>
<span class="n">obj</span><span class="o">.</span><span class="n">setup</span><span class="p">()</span>
<span class="n">external_obj</span><span class="o">.</span><span class="n">setup</span><span class="o">.</span><span class="n">assert_called_with</span><span class="p">(</span><span class="n">cache</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">max_connections</span><span class="o">=</span><span class="mi">256</span><span class="p">)</span>
</code></pre></div>
<p>In this case an object that passes the test can be</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">MyObj</span><span class="p">():</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">repo</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_repo</span> <span class="o">=</span> <span class="n">repo</span>
<span class="n">repo</span><span class="o">.</span><span class="n">connect</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">setup</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_repo</span><span class="o">.</span><span class="n">setup</span><span class="p">(</span><span class="n">cache</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">max_connections</span><span class="o">=</span><span class="mi">256</span><span class="p">)</span>
</code></pre></div>
<p>If we change the method <code>setup</code> to</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">setup</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_repo</span><span class="o">.</span><span class="n">setup</span><span class="p">(</span><span class="n">cache</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</code></pre></div>
<p>the test will fail with the following error</p>
<div class="highlight"><pre><span></span><code>E AssertionError: Expected call: setup(cache=True, max_connections=256)
E Actual call: setup(cache=True)
</code></pre></div>
<p>Which I consider a very clear explanation of what went wrong during the test execution.</p>
<p>As you can read in the official documentation, the object <code>Mock</code> provides other methods and attributes, like <code>assert_called_once_with</code>, <code>assert_any_call</code>, <code>assert_has_calls</code>, <code>assert_not_called</code>, <code>called</code>, <code>call_count</code>, and many others. Each of those explores a different aspect of the mock behaviour concerning calls. Make sure to read their description and go through the examples.</p>
<h2 id="a-simple-example">A simple example<a class="headerlink" href="#a-simple-example" title="Permanent link">¶</a></h2>
<p>To learn how to use mocks in a practical case, let's work together on a new module in the <code>simple_calculator</code> package. The target is to write a class that downloads a JSON file with data on meteorites and computes some statistics on the dataset using the class <code>SimpleCalculator</code>. The file is provided by NASA at <a href="https://data.nasa.gov/resource/y77d-th95.json">this URL</a>.</p>
<p>The class contains a method <code>get_data</code> that queries the remote server and returns the data, and a method <code>average_mass</code> that uses the method <code>SimpleCalculator.avg</code> to compute the average mass of the meteorites and return it. In a real world case, like for example in a scientific application, I would probably split the class in two. One class manages the data, updating it whenever it is necessary, and another one manages the statistics. For the sake of simplicity, however, I will keep the two functionalities together in this example.</p>
<p>Let's see a quick example of what is supposed to happen inside our code. An excerpt of the file provided from the server is</p>
<div class="highlight"><pre><span></span><code><span class="p">[</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"fall"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Fell"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"geolocation"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Point"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"coordinates"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="mf">6.08333</span><span class="p">,</span><span class="w"> </span><span class="mf">50.775</span><span class="p">]</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nt">"id"</span><span class="p">:</span><span class="s2">"1"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"mass"</span><span class="p">:</span><span class="s2">"21"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="s2">"Aachen"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"nametype"</span><span class="p">:</span><span class="s2">"Valid"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"recclass"</span><span class="p">:</span><span class="s2">"L5"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"reclat"</span><span class="p">:</span><span class="s2">"50.775000"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"reclong"</span><span class="p">:</span><span class="s2">"6.083330"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"year"</span><span class="p">:</span><span class="s2">"1880-01-01T00:00:00.000"</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"fall"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Fell"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"geolocation"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Point"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"coordinates"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="mf">10.23333</span><span class="p">,</span><span class="w"> </span><span class="mf">56.18333</span><span class="p">]</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nt">"id"</span><span class="p">:</span><span class="s2">"2"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"mass"</span><span class="p">:</span><span class="s2">"720"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="s2">"Aarhus"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"nametype"</span><span class="p">:</span><span class="s2">"Valid"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"recclass"</span><span class="p">:</span><span class="s2">"H6"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"reclat"</span><span class="p">:</span><span class="s2">"56.183330"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"reclong"</span><span class="p">:</span><span class="s2">"10.233330"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"year"</span><span class="p">:</span><span class="s2">"1951-01-01T00:00:00.000"</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">]</span>
</code></pre></div>
<p>So a good way to compute the average mass of the meteorites is</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">urllib.request</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">simple_calculator.main</span> <span class="kn">import</span> <span class="n">SimpleCalculator</span>
<span class="n">URL</span> <span class="o">=</span> <span class="p">(</span><span class="s2">"https://data.nasa.gov/resource/y77d-th95.json"</span><span class="p">)</span>
<span class="k">with</span> <span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">URL</span><span class="p">)</span> <span class="k">as</span> <span class="n">url</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">url</span><span class="o">.</span><span class="n">read</span><span class="p">()</span><span class="o">.</span><span class="n">decode</span><span class="p">())</span>
<span class="n">masses</span> <span class="o">=</span> <span class="p">[</span><span class="nb">float</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'mass'</span><span class="p">])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="s1">'mass'</span> <span class="ow">in</span> <span class="n">d</span><span class="p">]</span>
<span class="nb">print</span><span class="p">(</span><span class="n">masses</span><span class="p">)</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">avg_mass</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">(</span><span class="n">masses</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">avg_mass</span><span class="p">)</span>
</code></pre></div>
<p>Where the list comprehension filters out those elements which do not have a attribute <code>mass</code>. This code returns the value 50190.19568930039, so that is the average mass of the meteorites contained in the file.</p>
<p>Now we have a proof of concept of the algorithm, so we can start writing the tests. We might initially come up with a simple solution like</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_average_mass</span><span class="p">():</span>
<span class="n">metstats</span> <span class="o">=</span> <span class="n">MeteoriteStats</span><span class="p">()</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">metstats</span><span class="o">.</span><span class="n">get_data</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">metstats</span><span class="o">.</span><span class="n">average_mass</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o">==</span> <span class="mf">50190.19568930039</span>
</code></pre></div>
<p>This little test contains, however, two big issues. First of all the method <code>get_data</code> is supposed to use the Internet connection to get the data from the server. This is a typical example of an outgoing query, as we are not trying to change the state of the web server providing the data. You already know that you should not test the return value of an outgoing query, but you can see here why you shouldn't use real data when testing either. The data coming from the server can change in time, and this can invalidate your tests. </p>
<p>Testing such a case becomes very simple with mocks. Since the class has a public method <code>get_data</code> that interacts with the external component, it is enough to temporarily replace it with a mock that provides sensible values. Create the file <code>tests/test_meteorites.py</code> and put this code in it</p>
<div class="highlight"><span class="filename">tests/test_meteorites.py</span><pre><span></span><code><span class="kn">from</span> <span class="nn">unittest</span> <span class="kn">import</span> <span class="n">mock</span>
<span class="kn">from</span> <span class="nn">simple_calculator.meteorites</span> <span class="kn">import</span> <span class="n">MeteoriteStats</span>
<span class="k">def</span> <span class="nf">test_average_mass</span><span class="p">():</span>
<span class="n">metstats</span> <span class="o">=</span> <span class="n">MeteoriteStats</span><span class="p">()</span>
<span class="n">metstats</span><span class="o">.</span><span class="n">get_data</span> <span class="o">=</span> <span class="n">mock</span><span class="o">.</span><span class="n">Mock</span><span class="p">()</span>
<span class="n">metstats</span><span class="o">.</span><span class="n">get_data</span><span class="o">.</span><span class="n">return_value</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"fall"</span><span class="p">:</span> <span class="s2">"Fell"</span><span class="p">,</span>
<span class="s2">"geolocation"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"Point"</span><span class="p">,</span>
<span class="s2">"coordinates"</span><span class="p">:</span> <span class="p">[</span><span class="mf">6.08333</span><span class="p">,</span> <span class="mf">50.775</span><span class="p">]</span>
<span class="p">},</span>
<span class="s2">"id"</span><span class="p">:</span><span class="s2">"1"</span><span class="p">,</span>
<span class="s2">"mass"</span><span class="p">:</span><span class="s2">"21"</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span><span class="s2">"Aachen"</span><span class="p">,</span>
<span class="s2">"nametype"</span><span class="p">:</span><span class="s2">"Valid"</span><span class="p">,</span>
<span class="s2">"recclass"</span><span class="p">:</span><span class="s2">"L5"</span><span class="p">,</span>
<span class="s2">"reclat"</span><span class="p">:</span><span class="s2">"50.775000"</span><span class="p">,</span>
<span class="s2">"reclong"</span><span class="p">:</span><span class="s2">"6.083330"</span><span class="p">,</span>
<span class="s2">"year"</span><span class="p">:</span><span class="s2">"1880-01-01T00:00:00.000"</span><span class="p">},</span>
<span class="p">{</span>
<span class="s2">"fall"</span><span class="p">:</span> <span class="s2">"Fell"</span><span class="p">,</span>
<span class="s2">"geolocation"</span><span class="p">:</span> <span class="p">{</span>
<span class="s2">"type"</span><span class="p">:</span> <span class="s2">"Point"</span><span class="p">,</span>
<span class="s2">"coordinates"</span><span class="p">:</span> <span class="p">[</span><span class="mf">10.23333</span><span class="p">,</span> <span class="mf">56.18333</span><span class="p">]</span>
<span class="p">},</span>
<span class="s2">"id"</span><span class="p">:</span><span class="s2">"2"</span><span class="p">,</span>
<span class="s2">"mass"</span><span class="p">:</span><span class="s2">"720"</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span><span class="s2">"Aarhus"</span><span class="p">,</span>
<span class="s2">"nametype"</span><span class="p">:</span><span class="s2">"Valid"</span><span class="p">,</span>
<span class="s2">"recclass"</span><span class="p">:</span><span class="s2">"H6"</span><span class="p">,</span>
<span class="s2">"reclat"</span><span class="p">:</span><span class="s2">"56.183330"</span><span class="p">,</span>
<span class="s2">"reclong"</span><span class="p">:</span><span class="s2">"10.233330"</span><span class="p">,</span>
<span class="s2">"year"</span><span class="p">:</span><span class="s2">"1951-01-01T00:00:00.000"</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">metstats</span><span class="o">.</span><span class="n">average_mass</span><span class="p">(</span><span class="n">metstats</span><span class="o">.</span><span class="n">get_data</span><span class="p">())</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">370.5</span>
</code></pre></div>
<p>When we run this test we are not testing that the external server provides the correct data. We are testing the process implemented by <code>average_mass</code>, feeding the algorithm some known input. This is not different from the first tests that we implemented: in that case we were testing an addition, here we are testing a more complex algorithm, but the concept is the same.</p>
<p>We can now write a class that passes this test. Put the following code in <code>simple_calculator/meteorites.py</code> alongside with <code>main.py</code></p>
<div class="highlight"><span class="filename">simple_calculator/meteorites.py</span><pre><span></span><code><span class="kn">import</span> <span class="nn">urllib.request</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">simple_calculator.main</span> <span class="kn">import</span> <span class="n">SimpleCalculator</span>
<span class="n">URL</span> <span class="o">=</span> <span class="p">(</span><span class="s2">"https://data.nasa.gov/resource/y77d-th95.json"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">MeteoriteStats</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">get_data</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">with</span> <span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="n">URL</span><span class="p">)</span> <span class="k">as</span> <span class="n">url</span><span class="p">:</span>
<span class="k">return</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">url</span><span class="o">.</span><span class="n">read</span><span class="p">()</span><span class="o">.</span><span class="n">decode</span><span class="p">())</span>
<span class="k">def</span> <span class="nf">average_mass</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">masses</span> <span class="o">=</span> <span class="p">[</span><span class="nb">float</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'mass'</span><span class="p">])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="s1">'mass'</span> <span class="ow">in</span> <span class="n">d</span><span class="p">]</span>
<span class="k">return</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">(</span><span class="n">masses</span><span class="p">)</span>
</code></pre></div>
<p>As you can see the class contains the code we wrote as a proof of concept, slightly reworked to match the methods we used in the test. Run the test suite now, and you will see that the latest test we wrote passes.</p>
<p>Please note that we are not testing the method <code>get_data</code>. That method uses the function <code>urllib.request.urlopen</code> that opens an Internet connection without passing through any other public object that we can replace at run time during the test. We need then a tool to replace internal parts of our objects when we run them, and this is provided by patching, which will be the topic of the next post.</p>
<p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/meteoritestats-class">meteoritestats-class-added</a></p>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>Mocks are very important, and as a Python programmer you need to know the subtleties of their implementation. Aside from the technical details, however, I believe it is mandatory to master the different types of tests that I discussed in the previous post, and to learn when to use simple assertions and when to pull a bigger gun like a mock object.</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>TDD in Python with pytest - Part 32020-09-15T08:00:00+02:002020-09-15T08:00:00+02:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-09-15:/blog/2020/09/15/tdd-in-python-with-pytest-part-3/<p>This is the third post in the series "TDD in Python from scratch" where I develop a simple project following a strict TDD methodology. The posts come from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a> and have been reviewed to get rid of some bad naming choices of the version published in the book.</p>
<p>What I introduced in the previous two posts is commonly called "unit testing", since it focuses on testing a single and very small unit of code. As simple as it may seem, the TDD process has some caveats that are worth being discussed. In this chapter I discuss some aspects of TDD and unit testing that I consider extremely important.</p>
<h2 id="tests-should-be-fast">Tests should be fast<a class="headerlink" href="#tests-should-be-fast" title="Permanent link">¶</a></h2>
<p>You will run your tests many times, potentially you should run them every time you save your code. Your tests are the watchdogs of your code, the dashboard warning lights that signal a correct status or some malfunction. This means that your testing suite should be <em>fast</em>. If you have to wait minutes for each execution to finish, chances are that you will end up running your tests only after some long coding session, which means that you are not using them as guides.</p>
<p>It's true however that some tests may be intrinsically slow, or that the test suite might be so big that running it would take an amount of time which makes continuous testing uncomfortable. In this case you should identify a subset of tests that run quickly and that can show you if something is not working properly, the so-called "smoke tests", and leave the rest of the suite for longer executions that you run less frequently. Typically, the library part of your project has tests that run very quickly, as testing functions does not require specific set-ups, while the user interface tests (be it a CLI or a GUI) are usually slower. If your tests are well-structured you can also run just the tests that are connected with the subsystem that you are dealing with.</p>
<h2 id="tests-should-be-idempotent">Tests should be idempotent<a class="headerlink" href="#tests-should-be-idempotent" title="Permanent link">¶</a></h2>
<p><em>Idempotency</em> in mathematics and computer science identifies processes that can be run multiple times without changing the status of the system. Since this latter doesn't change, the tests can be run in whichever order without changing their results. If a test interacts with an external system leaving it in a different state you will have random failures depending on the execution order.</p>
<p>The typical example is when you interact with the filesystem in your tests. A test may create a file and not remove it, and this makes another test fail because the file already exists, or because the directory is not empty. Whatever you do while interacting with external systems has to be reverted after the test. If you run your tests concurrently, however, even this precaution is not enough.</p>
<p>This poses a big problem, as interacting with external systems is definitely to be considered dangerous. Mocks, introduced in the next chapter, are a very good tool to deal with this aspect of testing.</p>
<h2 id="tests-should-be-isolated">Tests should be isolated<a class="headerlink" href="#tests-should-be-isolated" title="Permanent link">¶</a></h2>
<p>In computer science <em>isolation</em> means that a component shall not change its behaviour depending on something that happens externally. In particular it shouldn't be affected by the execution of other components in the system (spatial isolation) and by the previous execution of the component itself (temporal isolation). Each test should run as much as possible in an isolated universe.</p>
<p>While this is easy to achieve for small components, like we did with the class <code>SimpleCalculator</code>, it might be almost impossible to do in more complex cases. Whenever you write a routine that deals with time, for example, be it the current date or a time interval, you are faced with something that flows incessantly and that cannot be stopped or slowed down. This is also true in other cases, for example if you are testing a routine that accesses an external service like a website. If the website is not reachable the test will fail, but this failure comes from an external source, not from the code under test.</p>
<p>Mocks or fake objects are a good tool to enforce isolation in tests that need to communicate with external actors in the system.</p>
<h2 id="external-systems">External systems<a class="headerlink" href="#external-systems" title="Permanent link">¶</a></h2>
<p>It is important to understand that the above definitions (idempotency, isolation) depend on the scope of the test. You should consider <em>external</em> whatever part of the system is not directly involved in the test, even though you need to use it to run the test itself. You should also try to reduce the scope of the test as much as possible.</p>
<p>Let me give you an example. Consider a web application and imagine a test that checks that a user can log in. The login process involves many layers: the user inputs, the username and the password in a GUI and submits the form, the GUI communicates with the core of the application that finds the user in the DB and checks the password hash against the one stored there, then sends back a message that grants access to the user, and the GUI stores a cookie to keep the user logged in. Suppose now that the test fails. Where is the error? Is it in the query that retrieves the user from the DB? Or in the routine that hashes the password? Or is it just an issue in the connectivity between the application and the database?</p>
<p>As you can see there are too many possible points of failure. While this is a perfectly valid <em>integration test</em>, it is definitely not a <em>unit test</em>. Unit tests try to test the smallest possible units of code in your system, usually simple routines like functions or object methods. Integration tests, instead, put together whole systems that have already been tested and test that they can work together.</p>
<p>Too many times developers confuse integration tests with unit tests. One simple example: every time a web framework makes you test your models against a real database you are mixing a unit test (the methods of the model object work) with an integration one (the model object connects with the database and can store/retrieve data). You have to learn how to properly identify what is external to your system in the scope of a given test, so your tests can be focused and small.</p>
<h2 id="focus-on-messages">Focus on messages<a class="headerlink" href="#focus-on-messages" title="Permanent link">¶</a></h2>
<p>I will never recommend enough Sandi Metz's talk <a href="https://speakerdeck.com/skmetz/magic-tricks-of-testing-railsconf">"The Magic Tricks of Testing"</a> where she considers the different messages that a software component has to deal with. She comes up with 3 different origins for messages (incoming, sent to self, and outgoing) and 2 types (query and command). The very interesting conclusion she reaches is that you should only test half of them, and I believe this is one of the most useful results you can learn as a software developer. In this section I will shamelessly start from Sandi Metz's categorisations and give a personal view of the matter. I absolutely recommend to watch the original talk as it is both short and very effective.</p>
<p>Testing is all about the behaviour of a component when it is used, i.e. when it is connected to other components that interact with it. This interaction is well represented by the word "message", which has hereafter the simple meaning of "data exchanged between two actors".</p>
<p>We can then classify the interactions happening in our system, and thus to our components, by flow and by type (Sandi Metz speaks of <em>origin</em> and <em>type</em>).</p>
<h3 id="message-flow">Message flow<a class="headerlink" href="#message-flow" title="Permanent link">¶</a></h3>
<p>The flow is defined as the tuple <code>(source, origin)</code>, that is where the message comes from and what is its destination. There are three different combinations that we are interested in: <code>(outside, self)</code>, <code>(self, self)</code>, and <code>(self, outside)</code>, where <code>self</code> is the object we are testing, and <code>outside</code> is a generic object that lives in the system. There is a fourth combination, <code>(outside, outside)</code> that is not relevant for the testing, since it doesn't involve the object under analysis.</p>
<p>So <code>(outside, self)</code> contains all the messages that other parts of the system send to our component. These messages correspond to the public API of the component, that is the set of entry points the component makes available to interact with it. Notable examples are the public methods of an object in an object-oriented programming language or the HTTP endpoints of a Web application. This flow represents the <em>incoming messages</em>.</p>
<p>At the opposite side of the spectrum there is <code>(self, outside)</code>, which is the set of messages that the component under test sends to other parts of the system. These are for example the external calls that an object does to a library or to other objects, or the API of other applications we rely on, like databases or Web applications. This flow describes all the <em>outgoing messages</em>.</p>
<p>Between the two there is <code>(self, self)</code>, which identifies the messages that the component sends to itself, i.e. the use that the component does of its own internal API. This can be the set of private methods of an object or the business logic inside a Web application. The important thing about this last case is that while the component is seen as a black box by the rest of the system it actually has an internal structure and it uses it to run. This flow contains all the <em>private messages</em>.</p>
<h3 id="message-type">Message type<a class="headerlink" href="#message-type" title="Permanent link">¶</a></h3>
<p>Messages can be further divided according to the interaction the source requires to have with the target: <em>queries</em> and <em>commands</em>. Queries are messages that do not change the status of the component, they just extract information. The class <code>SimpleCalculator</code> that we developed in the previous section is a typical example of object that exposes query methods. Adding two numbers doesn't change the status of the object, and you will receive the same answer every time you call the method <code>add</code>.</p>
<p>Commands are the opposite. They do not extract any information, but they change the status of the object. A method of an object that increases an internal counter or a method that adds values to an array are perfect examples of commands.</p>
<p>It's perfectly normal to combine a query and a command in a single message, as long as you are aware that your message is changing the status of the component. Remember that changing the status is something that can have concrete secondary effect.</p>
<h2 id="the-testing-grid">The testing grid<a class="headerlink" href="#the-testing-grid" title="Permanent link">¶</a></h2>
<p>Combining 3 flows and 2 message types we get 6 different message cases that involve the component under testing. For each one of this cases we have to decide how to test the interaction represented by that flow and message type.</p>
<h3 id="incoming-queries">Incoming queries<a class="headerlink" href="#incoming-queries" title="Permanent link">¶</a></h3>
<p>An incoming query is a message that an external actor sends to get a value from your component. Testing this behaviour is straightforward, as you just need to write a test that sends the message and makes an assertion on the returned value. A concrete example of this is what we did to test the method <code>add</code> of <code>SimpleCalculator</code>.</p>
<h3 id="incoming-commands">Incoming commands<a class="headerlink" href="#incoming-commands" title="Permanent link">¶</a></h3>
<p>An incoming command comes from an external actor that wants to change the status of the system. There should be a way for an external actor to check the status, which translates into the need of having either a companion incoming query message that allows to extract the status (or at least the part of the status affected by the command), or the knowledge that the change is going to affect the behaviour of another query. A simple example might be a method that sets the precision (number of digits) of the division in the object <code>SimpleCalculator</code>. Setting that value changes the result of a query, which can be used to test the effect of the incoming command.</p>
<h3 id="private-queries">Private queries<a class="headerlink" href="#private-queries" title="Permanent link">¶</a></h3>
<p>A private query is a message that the component sends to self to get a value without affecting its own state, and it is basically nothing more than an explicit use of some internal logic. This happens often in object-oriented languages because you extracted some common logic from one or more methods of an object and created a private method to avoid duplication.</p>
<p>Since private queries use the internal logic you shouldn't test them. This might be surprising, as private methods are code, and code should be tested, but remember that other methods are calling them, so the effects of that code are not invisible, they are tested by the tests of the public entry points, although indirectly. The only effect you would achieve by testing private methods is to lock the tests to the internal implementation of the component, which by definition shouldn't be used by anyone outside of the component itself. This in turn, makes refactoring painful, because you have to keep redundant tests in sync with the changes that you do, instead of using them as a guide for the code changes like TDD wants you to do.</p>
<p>As Sandi Metz says, however, this is not an inflexible rule. Whenever you see that testing an internal method makes the structure more robust feel free to do it. Be aware that you are locking the implementation, so do it only where it makes a real difference businesswise.</p>
<h3 id="private-commands">Private commands<a class="headerlink" href="#private-commands" title="Permanent link">¶</a></h3>
<p>Private commands shouldn't be treated differently than private queries. They change the status of the component, but this is again part of the internal logic of the component itself, so you shouldn't test private commands either. As stated for private queries, feel free to do it if this makes a real difference.</p>
<h3 id="outgoing-queries-and-commands">Outgoing queries and commands<a class="headerlink" href="#outgoing-queries-and-commands" title="Permanent link">¶</a></h3>
<p>An outgoing query is a message that the component under testing sends to an external actor asking for a value, without changing the status of the actor itself. The correctness of the returned value, given the inputs, is not part of what you want to test, because that is an incoming query for the external actor. Let me repeat this: you don't want to test that the external actor return the correct value given some inputs.</p>
<p>This is perhaps one of the biggest mistakes that programmers make when they test their applications. Definitely it is a mistake that I made many times. We tend to introduce tests that, starting from the code of our component, end up testing different components.</p>
<p>Outgoing commands are messages sent to external actors in order to change their state. Since our component sends such messages to cause an effect in another part of the system we have to be sure that the sent values are correct. We do not want to test that the state of the external actor change accordingly, as this is part of the testing suite of the external actor itself (incoming command).</p>
<p>From this consideration it is evident that you shouldn't test the results of any outgoing query or command. Possibly, you should avoid running them at all, otherwise you will need the external system to be up and running when you run the test suite.</p>
<p>We want to be sure, however, that our component uses the API of the external actor in a proper way and the standard technique to test this is to use mocks, that is components that simulate other components. Mocks are an important tool in the TDD methodology and for this reason they are the topic of the next chapter.</p>
<div class="highlight"><pre><span></span><code>| Flow | Type | Test? |
|----------|---------|-------|
| Incoming | Query | Yes |
| Incoming | Command | Yes |
| Private | Query | Maybe |
| Private | Command | Maybe |
| Outgoing | Query | Mock |
| Outgoing | Command | Mock |
</code></pre></div>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>Since the discovery of TDD few things changed the way I write code more than these considerations on what I am supposed to test. Out of 6 different types of tests we discovered that 2 shouldn't be tested, 2 of them require a very simple technique based on assertions, and the last 2 are the only ones that requires an advanced technique (mocks). This should cheer you up, as for once a good methodology doesn't add new rules and further worries, but removes one third of them, even forbidding you to implement them!</p>
<p>In the next two posts I will discuss mocks and patches, two very important testing tools to have in your belt.</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>TDD in Python with pytest - Part 22020-09-11T10:30:00+02:002023-09-03T19:00:00+02:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-09-11:/blog/2020/09/11/tdd-in-python-with-pytest-part-2/<p>This is the second post in the series <strong>TDD in Python with pytest</strong> where I develop a simple project following a strict TDD methodology. The posts come from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a> and have been reviewed to get rid of some bad naming choices of the version published …</p><p>This is the second post in the series <strong>TDD in Python with pytest</strong> where I develop a simple project following a strict TDD methodology. The posts come from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a> and have been reviewed to get rid of some bad naming choices of the version published in the book.</p><p>You can find the first post <a href="https://www.thedigitalcatonline.com/blog/2020/09/10/tdd-in-python-with-pytest-part-1/">here</a>.</p><h2 id="step-7---division-0afe">Step 7 - Division<a class="headerlink" href="#step-7---division-0afe" title="Permanent link">¶</a></h2><p>The requirements state that there shall be a division function, and that it has to return a float value. This is a simple condition to test, as it is sufficient to divide two numbers that do not give an integer result</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_div_two_numbers_float</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">div</span><span class="p">(</span><span class="mi">13</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">6.5</span>
</pre></div> </div> </div><p>The test suite fails with the usual error that signals a missing method. The implementation of this function is very simple as the operator <code>/</code> in Python performs a float division</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">div</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">/</span> <span class="n">b</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-7-float-division">step-7-float-division</a></p><p>If you run the test suite again all the test should pass. There is a second requirement about this operation, however, that states that division by zero shall return <code>inf</code>.</p><p>I already mentioned in the previous post that this is not a good requirement, and please don't go around telling people that I told you to create function that return either floats or strings. This is a simple requirement that I will use to show you how to deal with exceptions.</p><p>The test that comes from the requirement is simple</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_div_by_zero_returns_inf</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">div</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="nb">float</span><span class="p">(</span><span class="s1">'inf'</span><span class="p">)</span>
</pre></div> </div> </div><p>And the test suite fails now with this message</p><div class="code"><div class="content"><div class="highlight"><pre>__________________________ test_div_by_zero_returns_inf ___________________________
def test_div_by_zero_returns_inf():
calculator = SimpleCalculator()
> result = calculator.div(5, 0)
tests/test_main.py:70:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <simple_calculator.main.SimpleCalculator object at 0x7f0b0b733990>, a = 5, b = 0 <span class="callout">1</span>
def div(self, a, b):
> return a / b
E ZeroDivisionError: division by zero
simple_calculator/main.py:17: ZeroDivisionError
</pre></div> </div> </div><p>Note that when an exception happens in the code and not in the test, the pytest output changes slightly. The first part of the message shows where the test fails, but then there is a second part that shows the internal code that raised the exception and provides information about the value of local variables on the first line <span class="callout">1</span>.</p><p>We might implement two different solutions to satisfy this requirement and its test. The first one is to prevent <code>b</code> to be 0</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">div</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">b</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">float</span><span class="p">(</span><span class="s1">'inf'</span><span class="p">)</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">/</span> <span class="n">b</span>
</pre></div> </div> </div><p>and the second one is to intercept the exception with a <code>try/except</code> block</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">div</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">/</span> <span class="n">b</span>
<span class="k">except</span> <span class="ne">ZeroDivisionError</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">float</span><span class="p">(</span><span class="s1">'inf'</span><span class="p">)</span>
</pre></div> </div> </div><p>Both solutions make the test suite pass, so both are correct. I leave to you the decision about which is the best one, syntactically speaking.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-7-division-by-zero">step-7-float-division</a></p><h2 id="step-8---testing-exceptions-ca11">Step 8 - Testing exceptions<a class="headerlink" href="#step-8---testing-exceptions-ca11" title="Permanent link">¶</a></h2><p>A further requirement is that multiplication by zero must raise a <code>ValueError</code> exception. This means that we need a way to test if our code raises an exception, which is the opposite of what we did until now. In the previous tests, the condition to pass was that there was no exception in the code, while in this test the condition will be that an exception has been raised.</p><p>Again, this is a requirement I made up just for the sake of showing you how do deal with exceptions, so if you think this is a silly behaviour for a multiplication function you are probably right.</p><p>Pytest provides a context manager named <code>raises</code> that runs the code contained in it and passes only if the given exception is produced by that code.</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">pytest</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">test_mul_by_zero_raises_exception</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="k">with</span> <span class="n">pytest</span><span class="o">.</span><span class="n">raises</span><span class="p">(</span><span class="ne">ValueError</span><span class="p">):</span>
<span class="n">calculator</span><span class="o">.</span><span class="n">mul</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
</pre></div> </div> </div><p>In this case, thus, pytest runs the line <code>calculator.mul(3, 0)</code>. If the method doesn't raise the exception <code>ValueError</code> the test will fail. Indeed, if you run the test suite now, you will get the following failure</p><div class="code"><div class="content"><div class="highlight"><pre>________________________ test_mul_by_zero_raises_exception ________________________
def test_mul_by_zero_raises_exception():
calculator = SimpleCalculator()
with pytest.raises(ValueError):
> calculator.mul(3, 0)
E Failed: DID NOT RAISE <class 'ValueError'>
tests/test_main.py:81: Failed
</pre></div> </div> </div><p>which signals that the code didn't raise the expected exception.</p><p>The code that makes the test pass needs to test if one of the inputs of the function <code>mul</code> is 0. This can be done with the help of the built-in function <code>all</code>, which accepts an iterable and returns <code>True</code> only if all the values contained in it are <code>True</code>. Since in Python the value <code>0</code> is not true, we may write</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">mul</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">all</span><span class="p">(</span><span class="n">args</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">ValueError</span>
<span class="k">return</span> <span class="n">reduce</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="n">x</span><span class="o">*</span><span class="n">y</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span>
</pre></div> </div> </div><p>and make the test suite pass. The condition checks that there are no false values in the tuple <code>args</code>, that is there are no zeros.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-8-multiply-by-zero">step-8-multiply-by-zero</a></p><h2 id="step-9---a-more-complex-set-of-requirements-9dc2">Step 9 - A more complex set of requirements<a class="headerlink" href="#step-9---a-more-complex-set-of-requirements-9dc2" title="Permanent link">¶</a></h2><p>Until now the requirements were pretty simple, and it was was easy to map each of them directly into tests. It's time to try to tackle a more complex problem. The remaining requirements say that the class has to provide a function to compute the average of an iterable, and that this function shall accept two optional upper and lower thresholds to remove outliers.</p><p>Let's break these two requirements into a set of simpler ones</p><ol><li>The function accepts an iterable and computes the average, i.e. <code>avg([2, 5, 12, 98]) == 29.25</code></li><li>The function accepts an optional upper threshold. It must remove all the values that are greater than the threshold before computing the average, i.e. <code>avg([2, 5, 12, 98], ut=90) == avg([2, 5, 12])</code></li><li>The function accepts an optional lower threshold. It must remove all the values that are less then the threshold before computing the average, i.e. <code>avg([2, 5, 12, 98], lt=10) == avg([12, 98])</code></li><li>The upper threshold is not included when removing data, i.e. <code>avg([2, 5, 12, 98], ut=12) == avg([2, 5, 12])</code></li><li>The lower threshold is not included when removing data, i.e. <code>avg([2, 5, 12, 98], lt=5) == avg([5, 12, 98])</code></li><li>The function works with an empty list, returning <code>0</code>, i.e. <code>avg([]) == 0</code></li><li>The function works if the list is empty after outlier removal, i.e. <code>avg([12, 98], lt=15, ut=90) == 0</code></li><li>The function outlier removal works if the list is empty, i.e. <code>avg([], lt=15, ut=90) == 0</code></li></ol><p>As you can see a requirement can produce multiple tests. Some of these are clearly expressed by the requirement (numbers 1, 2, 3), some of these are choices that we make (numbers 4, 5, 6) and can be discussed, some are boundary cases that we have to discover thinking about the problem (numbers 6, 7, 8).</p><p>There is a fourth category of tests, which are the ones that come from bugs that you discover. We will discuss about those later in this chapter.</p><p>Now, if you followed the posts coding along it is time to try to tackle a problem on your own. Why don't you try to go on and implement these features? Each of the eight requirements can be directly mapped into a test, and you know how to write tests and code that passes them. The next steps show my personal solution, which is just one of the possible ones, so you can compare what you did with what I came up with to solve the tests.</p><h3 id="step-9.1---average-of-an-iterable-4522">Step 9.1 - Average of an iterable</h3><p>Let's start adding a test for requirement number 1</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_correct_average</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">98</span><span class="p">])</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">29.25</span>
</pre></div> </div> </div><p>We feed the function <code>avg</code> a list of generic numbers, which average we calculated with an external tool. The first run of the test suite fails with the usual complaint about a missing function, and we can make the test pass with a simple use of <code>sum</code> and <code>len</code>, as both built-in functions work on iterables</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">it</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
</pre></div> </div> </div><p>Here, <code>it</code> stands for iterable, as this function works with anything that supports the loop protocol.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-1-average-of-an-iterable">step-9-1-average-of-an-iterable</a></p><h3 id="step-9.2---upper-threshold-e0a5">Step 9.2 - Upper threshold</h3><p>The second requirement mentions an upper threshold, but we are free with regards to the API, i.e. the requirement doesn't specify how the threshold is supposed to be specified or named. I decided to call the upper threshold parameter <code>ut</code>, so the test becomes</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_removes_upper_outliers</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">98</span><span class="p">],</span> <span class="n">ut</span><span class="o">=</span><span class="mi">90</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="o">.</span><span class="n">approx</span><span class="p">(</span><span class="mf">6.333333</span><span class="p">)</span>
</pre></div> </div> </div><p>As you can see the parameter <code>ut=90</code> is supposed to remove the element <code>98</code> from the list and then compute the average of the remaining elements. Since the result has an infinite number of digits I used the function <code>pytest.approx</code> to check the result.</p><p>The test suite fails because the function <code>avg</code> doesn't accept the parameter <code>ut</code></p><div class="code"><div class="content"><div class="highlight"><pre>_________________________ test_avg_removes_upper_outliers _________________________
def test_avg_removes_upper_outliers():
calculator = SimpleCalculator()
> result = calculator.avg([2, 5, 12, 98], ut=90)
E TypeError: avg() got an unexpected keyword argument 'ut'
tests/test_main.py:95: TypeError
</pre></div> </div> </div><p>There are two problems now that we have to solve, as it happened for the second test we wrote in this project. The new <code>ut</code> argument needs a default value, so we have to manage that case, and then we have to make the upper threshold work. My solution is</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">ut</span><span class="p">:</span>
<span class="n">ut</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">it</span> <span class="k">if</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">ut</span><span class="p">]</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span>
</pre></div> </div> </div><p>The idea here is that <code>ut</code> is used to filter the iterable keeping all the elements that are less than or equal to the threshold. This means that the default value for the threshold has to be neutral with regards to this filtering operation. Using the maximum value of the iterable makes the whole algorithm work in every case, while for example using a big fixed value like <code>9999</code> would introduce a bug, as one of the elements of the iterable might be bigger than that value.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-2-upper-threshold">step-9-2-upper-threshold</a></p><h3 id="step-9.3---lower-threshold-b88a">Step 9.3 - Lower threshold</h3><p>The lower threshold is the mirror of the upper threshold, so it doesn't require many explanations. The test is</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_removes_lower_outliers</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">98</span><span class="p">],</span> <span class="n">lt</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="n">pytest</span><span class="o">.</span><span class="n">approx</span><span class="p">(</span><span class="mi">55</span><span class="p">)</span>
</pre></div> </div> </div><p>and the code of the function <code>avg</code> now becomes</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">,</span> <span class="n">lt</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">lt</span><span class="p">:</span>
<span class="n">lt</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">ut</span><span class="p">:</span>
<span class="n">ut</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">it</span> <span class="k">if</span> <span class="n">x</span> <span class="o">>=</span> <span class="n">lt</span> <span class="ow">and</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">ut</span><span class="p">]</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-3-lower-threshold">step-9-3-lower-threshold</a></p><h3 id="step-9.4-and-9.5---boundary-inclusion-e6fe">Step 9.4 and 9.5 - Boundary inclusion</h3><p>As you can see from the code of the function <code>avg</code>, the upper and lower threshold are included in the comparison, so we might consider the requirements as already satisfied. TDD, however, pushes you to write a test for each requirement (as we saw it's not unusual to actually have multiple tests per requirements), and this is what we are going to do. </p><p>The reason behind this is that you might get the expected behaviour for free, like in this case, because some other code that you wrote to pass a different test provides that feature as a side effect. You don't know, however what will happen to that code in the future, so if you don't have tests that show that all your requirements are satisfied you might lose features without knowing it.</p><p>The test for the fourth requirement is</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_upper_threshold_is_included</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">98</span><span class="p">],</span> <span class="n">ut</span><span class="o">=</span><span class="mi">98</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">29.25</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-4-upper-threshold-is-included">step-9-4-upper-threshold-is-included</a></p><p>while the test for the fifth one is</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_lower_threshold_is_included</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">98</span><span class="p">],</span> <span class="n">lt</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">29.25</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-5-lower-threshold-is-included">step-9-5-lower-threshold-is-included</a></p><p>And, as expected, both pass without any change in the code. Do you remember rule number 5? You should ask yourself why the tests don't fail. In this case we reasoned about that before, so we can accept that the new tests don't require any code change to pass.</p><h3 id="step-9.6---empty-list-2dcd">Step 9.6 - Empty list</h3><p>Requirement number 6 is something that wasn't clearly specified in the project description so we decided to return 0 as the average of an empty list. You are free to change the requirement and decide to raise an exception, for example.</p><p>The test that implements this requirement is</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_empty_list</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([])</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">0</span>
</pre></div> </div> </div><p>and the test suite fails with the following error</p><div class="code"><div class="content"><div class="highlight"><pre>_______________________________ test_avg_empty_list _______________________________
def test_avg_empty_list():
calculator = SimpleCalculator()
> result = calculator.avg([])
tests/test_main.py:127:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <simple_calculator.main.SimpleCalculator object at 0x7feeb7098a10>, it = [], lt = None, ut = None
def avg(self, it, lt=None, ut=None):
if not lt:
> lt = min(it)
E ValueError: min() arg is an empty sequence
simple_calculator/main.py:26: ValueError
</pre></div> </div> </div><p>The function <code>min</code> that we used to compute the default lower threshold doesn't work with an empty list, so the code raises an exception. The simplest solution is to check for the length of the iterable before computing the default thresholds</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">,</span> <span class="n">lt</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">len</span><span class="p">(</span><span class="n">it</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">lt</span><span class="p">:</span>
<span class="n">lt</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">ut</span><span class="p">:</span>
<span class="n">ut</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">it</span> <span class="k">if</span> <span class="n">x</span> <span class="o">>=</span> <span class="n">lt</span> <span class="ow">and</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">ut</span><span class="p">]</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-6-empty-list">step-9-6-empty-list</a></p><p>As you can see the function <code>avg</code> is already pretty rich, but at the same time it is well structured and understandable. This obviously happens because the example is trivial, but cleaner code is definitely among the benefits of TDD.</p><h3 id="step-9.7---empty-list-after-applying-the-thresholds-deed">Step 9.7 - Empty list after applying the thresholds</h3><p>The next requirement deals with the case in which the outlier removal process empties the list. The test is the following</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_manages_empty_list_after_outlier_removal</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="mi">12</span><span class="p">,</span> <span class="mi">98</span><span class="p">],</span> <span class="n">lt</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="mi">90</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">0</span>
</pre></div> </div> </div><p>and the test suite fails with a <code>ZeroDivisionError</code>, because the length of the iterable is now 0.</p><div class="code"><div class="content"><div class="highlight"><pre>________________ test_avg_manages_empty_list_after_outlier_removal ________________
def test_avg_manages_empty_list_after_outlier_removal():
calculator = SimpleCalculator()
> result = calculator.avg([12, 98], lt=15, ut=90)
tests/test_main.py:135:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <simple_calculator.main.SimpleCalculator object at 0x7f9e60c3ba90>, it = [12, 98], lt = 15, ut = 90
def avg(self, it, lt=None, ut=None):
if not len(it):
return 0
if not lt:
lt = min(it)
if not ut:
ut = max(it)
_it = [x for x in it if x >= lt and x <= ut]
> return sum(_it)/len(_it)
E ZeroDivisionError: division by zero
simple_calculator/main.py:36: ZeroDivisionError
</pre></div> </div> </div><p>The easiest solution is to introduce a new check on the length of the iterable</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">,</span> <span class="n">lt</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">len</span><span class="p">(</span><span class="n">it</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">lt</span><span class="p">:</span>
<span class="n">lt</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">ut</span><span class="p">:</span>
<span class="n">ut</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">it</span><span class="p">)</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">it</span> <span class="k">if</span> <span class="n">x</span> <span class="o">>=</span> <span class="n">lt</span> <span class="ow">and</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">ut</span><span class="p">]</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span>
</pre></div> </div> </div><p>And this code makes the test suite pass. As I stated before, code that makes the tests pass is considered correct, but you are always allowed to improve it. In this case I don't really like the repetition of the length check, so I might try to refactor the function to get a cleaner solution. Since I have all the tests that show that the requirements are satisfied, I am free to try to change the code of the function.</p><p>After some attempts I found this solution</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">,</span> <span class="n">lt</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">_it</span> <span class="o">=</span> <span class="n">it</span><span class="p">[:]</span>
<span class="k">if</span> <span class="n">lt</span><span class="p">:</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">_it</span> <span class="k">if</span> <span class="n">x</span> <span class="o">>=</span> <span class="n">lt</span><span class="p">]</span>
<span class="k">if</span> <span class="n">ut</span><span class="p">:</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">_it</span> <span class="k">if</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">ut</span><span class="p">]</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span>
</pre></div> </div> </div><p>which looks reasonably clean, and makes the whole test suite pass.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-7-empty-list-after-thresholds">step-9-7-empty-list-after-thresholds</a></p><h3 id="step-9.8---empty-list-before-applying-the-thresholds-a7ab">Step 9.8 - Empty list before applying the thresholds</h3><p>The last requirement checks another boundary case, which happens when the list is empty and we specify one of or both the thresholds. This test will check that the outlier removal code doesn't assume the list contains elements.</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_manages_empty_list_before_outlier_removal</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([],</span> <span class="n">lt</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="mi">90</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">0</span>
</pre></div> </div> </div><p>This test doesn't fail. So, according to the TDD methodology, we should provide a reason why this happens and decide if we want to keep the test. The reason is because the two list comprehensions used to filter the elements work perfectly with empty lists. As for the test, it comes directly from a corner case, and it checks a behaviour which is not already covered by other tests. This makes me decide to keep the test.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-8-empty-list-before-thresholds">step-9-8-empty-list-before-thresholds</a></p><h3 id="step-9.9---zero-as-lowerupper-threshold-35dc">Step 9.9 - Zero as lower/upper threshold</h3><p>This is perhaps the most important step of the whole chapter, for two reasons.</p><p>First of all, the test added in this step was added by two readers of my book about clean architectures (<a href="https://github.com/faustgertz">Faust Gertz</a> and <a href="https://github.com/IrishPrime">Michael O'Neill</a>), and this shows a real TDD workflow. After you published you package (or your book, in this case) someone notices a wrong behaviour in some use case. This might be a big flaw or a tiny corner case, but in any case they can come up with a test that exposes the bug, and maybe even with a patch to the code, but the most important part is the test.</p><p>Whoever discovers the bug has a clear way to show it, and you, as an author/maintainter/developer can add that test to your suite and work on the code until that passes. The rest of the test suite will block any change in the code that disrupts the behaviour you already tested. As I already stressed multiple times, we could do the same without TDD, but if we need to change a substantial amount of code there is nothing like a test suite that can guarantee we are not re-introducing bugs (also called regressions).</p><p>Second, this step shows an important part of the TDD workflow: checking corner cases. In general you should pay a lot of attention to the boundaries of a domain, and test the behaviour of the code in those cases.</p><p>This test shows that the code doesn't manage zero-valued lower thresholds correctly</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_manages_zero_value_lower_outlier</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">lt</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">0.5</span>
</pre></div> </div> </div><p>The reason is that the function <code>avg</code> contains a check like <code>if lt:</code>, which fails when <code>lt</code> is 0, as that is a false value. The check should be <code>if lt is not None:</code>, so that part of the function <code>avg</code> becomes</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">if</span> <span class="n">lt</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">_it</span> <span class="k">if</span> <span class="n">x</span> <span class="o">>=</span> <span class="n">lt</span><span class="p">]</span>
</pre></div> </div> </div><p>It is immediately clear that the upper threshold has the same issue, so the two tests I added are</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_manages_zero_value_lower_outlier</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">lt</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">0.5</span>
<span class="k">def</span> <span class="nf">test_avg_manages_zero_value_upper_outlier</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">ut</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="o">-</span><span class="mf">0.5</span>
</pre></div> </div> </div><p>and the final version of <code>avg</code> is</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">,</span> <span class="n">lt</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">_it</span> <span class="o">=</span> <span class="n">it</span><span class="p">[:]</span>
<span class="k">if</span> <span class="n">lt</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">_it</span> <span class="k">if</span> <span class="n">x</span> <span class="o">>=</span> <span class="n">lt</span><span class="p">]</span>
<span class="k">if</span> <span class="n">ut</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">_it</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">_it</span> <span class="k">if</span> <span class="n">x</span> <span class="o"><=</span> <span class="n">ut</span><span class="p">]</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">_it</span><span class="p">)</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-9-zero-as-lower-upper-threshold">step-9-9-zero-as-lower-upper-threshold</a></p><h3 id="step-9.10---refactoring-for-generators-baa5">Step 9.10 - Refactoring for generators</h3><p>One of the readers of this series, <a href="https://github.com/labdmitriy">Dmitry Labazkin</a>, was following the series and noticed that the final implementation has some drawbacks, namely:</p><ul><li>According to the requirements, this method should accept any iterable, but the implementation can't process generators (which are iterators and also iterables). For example, the function <code>len()</code> cannot be used with generators.</li><li>The iterable is copied, which is something we try to avoid to reduce memory usage.</li><li>Globally, the iterator is read 4 times, which affects performances.</li></ul><p>These are interesting points, and he provides an implementation that solves them all. It's important to mention that the first point is closely related to requirements, so it should be represented by a unit test, while the other two are connected with performances and cannot be tested with pytest. However, any refactoring that produces code we consider better (for example from the performances point of view) can be tested by the existing tests. In other words, we can provide an alternative implementation and still make sure it works correctly.</p><p>Dmitry adds a test to check that generators are supported</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_avg_accepts_generators</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">avg</span><span class="p">(</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">98</span><span class="p">])</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mf">29.25</span>
</pre></div> </div> </div><p>His implementation of the function <code>avg()</code> passes that test and the previous ones we wrote</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">it</span><span class="p">,</span> <span class="n">lt</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">ut</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">total</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">number</span> <span class="ow">in</span> <span class="n">it</span><span class="p">:</span>
<span class="k">if</span> <span class="n">lt</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">number</span> <span class="o"><</span> <span class="n">lt</span><span class="p">:</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="n">ut</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">number</span> <span class="o">></span> <span class="n">ut</span><span class="p">:</span>
<span class="k">continue</span>
<span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">total</span> <span class="o">+=</span> <span class="n">number</span>
<span class="k">if</span> <span class="n">count</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">return</span> <span class="mi">0</span>
<span class="k">return</span> <span class="n">total</span> <span class="o">/</span> <span class="n">count</span>
</pre></div> </div> </div><p>One might argue that this implementation is less <em>pythonic</em> as it doesn't use fancy list comprehensions, but again, that is a matter of style (and performances). The point about generators is correct, but if that wasn't included in the requirements we might accept either implementation. I personally believe this new implementation is much better than the previous one, as I like to keep a low memory fingerprint, but if we were sure the calculator is used only on small sequences the concern might be overkill.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-9-10-refactoring-for-generators">step-9-10-refactoring-for-generators</a></p><h2 id="recap-of-the-tdd-rules-92ff">Recap of the TDD rules<a class="headerlink" href="#recap-of-the-tdd-rules-92ff" title="Permanent link">¶</a></h2><p>Through this very simple example we learned 6 important rules of the TDD methodology. Let us review them, now that we have some experience that can make the words meaningful</p><ol><li>Test first, code later</li><li>Add the bare minimum amount of code you need to pass the tests</li><li>You shouldn't have more than one failing test at a time</li><li>Write code that passes the test. Then refactor it.</li><li>A test should fail the first time you run it. If it doesn't ask yourself why you are adding it.</li><li>Never refactor without tests.</li></ol><h2 id="how-many-assertions-a828">How many assertions?<a class="headerlink" href="#how-many-assertions-a828" title="Permanent link">¶</a></h2><p>I am frequently asked "How many assertions do you put in a test?", and I consider this question important enough to discuss it in a dedicated section. To answer this question I want to briefly go back to the nature of TDD and the role of the test suite that we run.</p><p>The whole point of automated tests is to run through a set of checkpoints that can quickly reveal that there is a problem in a specific area. Mind the words "quickly" and "specific". When I run the test suite and an error occurs I'd like to be able to understand as fast as possible where the problem lies. This doesn't (always) mean that the problem will have a quick resolution, but at least I can be immediately aware of which part of the system is misbehaving.</p><p>On the other hand, we don't want to have too many test for the same condition, on the contrary we want to avoid testing the same condition more than once as tests have to be maintained. A test suite that is too fine-grained might result in too many tests failing because of the same problem in the code, which might be daunting and not very informative.</p><p>My advice is to group together assertions that can be executed after running the same setup, if they test the same process. For example, you might consider the two functions <code>add</code> and <code>sub</code> that we tested in this chapter. They require the same setup, which is to instantiate the class <code>SimpleCalculator</code> (a setup that they share with many other tests), but they are actually testing two different processes. A good sign of this is that you should rename the test to <code>test_add_or_sub</code>, and a failure in this test would require a further investigation in the test output to check which method of the class is failing.</p><p>If you have to test that a method returns positive even numbers, instead, you will have consider running the method and then writing two assertions, one that checks that the number is positive, and one that checks it is even. This makes sense, as a failure in one of the two means a failure of the whole process.</p><p>As a rule of thumb, then, consider if the test is a logical <code>AND</code> between conditions or a logical <code>OR</code>. In the former case go for multiple assertions, in the latter create multiple test functions.</p><h2 id="how-to-manage-bugs-or-missing-features-f2e8">How to manage bugs or missing features<a class="headerlink" href="#how-to-manage-bugs-or-missing-features-f2e8" title="Permanent link">¶</a></h2><p>In this chapter we developed the project from scratch, so the challenge was to come up with a series of small tests starting from the requirements. At a certain point in the life of your project you will have a stable version in production (this expression has many definitions, but in general it means "used by someone other than you") and you will need to maintain it. This means that people will file bug reports and feature requests, and TDD gives you a clear strategy to deal with those.</p><p>From the TDD point of view both a bug and a missing feature are cases not currently covered by a test, so I will refer to them collectively as bugs, but don't forget that I'm talking about the second ones as well. </p><p>The first thing you need to do is to write one or more tests that expose the bug. This way you can easily decide when the code that you wrote is correct or good enough. For example, let's assume that a user files an issue on the project <code>SimpleCalculator</code> saying: "The function <code>add</code> doesn't work with negative numbers". You should definitely try to get a concrete example from the user that wrote the issue and some information about the execution environment (as it is always possible that the problem comes from a different source, like for example an old version of a library your package relies on), but in the meanwhile you can come up with at least 3 tests: one that involves two negative numbers, one with a negative number as the first argument, and one with a negative numbers as the second argument.</p><p>You shouldn't write down all of them at once. Write the first test that you think might expose the issue and see if it fails. If it doesn't, discard it and write a new one. From the TDD point of view, if you don't have a failing test there is no bug, so you have to come up with at least one test that exposes the issue you are trying to solve.</p><p>At this point you can move on and try to change the code. Remember that you shouldn't have more than one failing test at a time, so start doing this as soon as you discover a test case that shows there is a problem in the code.</p><p>Once you reach a point where the test suite passes without errors stop and try to run the code in the environment where the bug was first discovered (for example sharing a branch with the user that created the ticket) and iterate the process.</p><h2 id="the-problem-of-types-2b1a">The problem of types<a class="headerlink" href="#the-problem-of-types-2b1a" title="Permanent link">¶</a></h2><p>Other than contributing to the TDD steps, Dmitry Labazkin asked some relevant questions about types, that I will summarise here. You can read his original questions in <a href="https://github.com/TheDigitalCatOnline/blog_source/issues/11">issue #11</a> and <a href="https://github.com/TheDigitalCatOnline/blog_source/issues/12">issue #12</a>.</p><p>The question of type checking is thorny, and since this is an introductory series I will discuss it briefly and give some pointers. Don't get me wrong, though. As I will say later, this is one of the most important topics we can discuss in computer science.</p><p>Overall the problem Dmitry raises is that operators like addition and multiplication are valid for types other than integers (like floats) and also non-numeric ones (like strings). In Python, it is possible to multiply a string by a number and obtain a concatenation of that number of copies of the original string. At the same time, however, subtraction and division are not defined for strings, so some of the questions we can ask are:</p><ul><li>can <code>SimpleCalculator</code> be used on non-integer numeric types?</li><li>can <code>SimpleCalculator</code> be used on non-numeric types?</li><li>shall we explicitly check in the code that the input values belong to a certain type?</li><li>shall we write tests to rule out other types?</li></ul><p>As I said, such questions are deceptively simple, so let's tackle them step by step.</p><p>Let's assume it makes sense for our class to work with numeric types. In Python there is no way to prevent a program from calling <code>SimpleCalculator().add("string1", "string2")</code>, which would fail as the current implementation uses the built-in function <code>sum</code> that doesn't work on strings (unless you call it with a specific initial value). However, calling <code>SimpleCalculator().mul("abc", 3)</code> would result in <code>"abcabcabc"</code>, as the internal implementation quietly supports strings.</p><p>Given the inconsistency, we might be tempted to rule out non-numeric types explicitly. In other words, we might want to add code to our calculator that <em>actively checks</em> if we are passing a non-numeric type. In that case we shall also add tests for those types, according to the TDD methodology, as no code can be added without tests.</p><p>The reason why this topic is thorny is because Python relies heavily on <em>polymorphism</em>, which means that it is more interested in the <em>behaviour</em> of an object more than in its <em>nature</em>. In other words, an object can be considered a number because <em>it is an instance</em> of <code>int</code> or <code>float</code>, for example, but it could just be a class we made up that <em>behaves like</em> one of those types. Using Abstract Base Classes like <a href="https://docs.python.org/3/library/numbers.html">numbers</a> is useful to check if an object is an instance of one of the types encompassed by the hierarchy (again, types such as <code>int</code> and <code>float</code>) but doesn't automatically include everything that behaves like a number. We can create a class that behaves like <code>int</code> without belonging to the hierarchy of <code>numbers</code>.</p><p>Ultimately, this is the reason why Python programmers have to remember that the operator <code>+</code> can be used with types like <code>int</code>, <code>string</code>, and <code>list</code>, but cannot be used with dictionaries. Conversely, <code>len</code> can be used on dictionaries and lists, but cannot be used on integers. We need to remember it, as these operators are polymorphic (there is no operator <code>int+</code> or <code>float+</code>) but don't make sense or are not implemented for some types.</p><p>Those basic operators and functions raise an exception when the wrong type is passed, so we might be tempted to do the same and explicitly raise an exception when the wrong type is passed to <code>SimpleCalculator</code>. Again, the focus is on behaviour and implementation. If our implementation doesn't work with instances of certain classes an exception will occur already, and we don't need to do it explicitly. The aforementioned snipped <code>SimpleCalculator().add("string1", "string2")</code> would raise a <code>TypeError</code> because the underlying <code>sum</code> doesn't like strings. We don't need to do it explicitly.</p><p>In conclusion, my answers to the questions above are:</p><p>Can <code>SimpleCalculator</code> be used on non-integer numeric types? Probably, given the implementation is not specific to integers, but if we want to be sure we should add some tests to expose the functionality. So far, according to TDD, the class is certified to work with integers only. In this case, I might want to add some tests to show that it works with floats. But if someone feeds the class float-like objects that for some reason do not support the operator <code>/</code> some part of the calculator won't work, and there is no way to test all those conditions.</p><p>Can <code>SimpleCalculator</code> be used on non-numeric types? Yes, to a certain extent. <code>mul</code> can be used on sequences, for example. It is a calculator, though, so it doesn't make much sense to try to use it on non-numeric types. Users can feed the calculator any sort of non-numeric types and we cannot do anything to prevent it.</p><p>Shall we explicitly check in the code that the input values belong to a certain type? This goes against the nature of Python: if a certain function or method doesn't work with a specific type an exception will be raised.</p><p>Shall we write tests to rule out other types? Since it is basically impossible to write code that narrows the set of accepted types it is also impossible to write <em>useful</em> tests to check this. We can check that it doesn't work on strings, but what about other sequences? We can check it doesn't work with classes that inherit from <code>Sequence</code>, but what about classes that do not and behave the same?</p><p>In a dynamically typed language like Python, polymorphism and operator overloading are embedded in the language. I think the deeply polymorphic nature of Python is one of the most important aspects any user of this language should understand. It is an incredibly sharp double-edged sword, as it is at the same time extremely powerful and dangerous. "Everything is an object" might sound very simple at first, but it hides a degree of complexity that sooner of later has to be faced by those who want to be proficient with the language.</p><p>I wrote some posts that might help you to understand these topics. You can find them grouped <a href="https://www.thedigitalcatonline.com/blog/2020/04/26/object-oriented-programming-concepts-in-python/">here</a>.</p><h2 id="final-words-9803">Final words<a class="headerlink" href="#final-words-9803" title="Permanent link">¶</a></h2><p>I hope you found the project entertaining and that you can now appreciate the power of TDD. The journey doesn't end here, though. In the next post I will discuss the practice of writing unit tests in depth, and then introduce you to another powerful tool: mocks.</p><h2 id="updates-0083">Updates<a class="headerlink" href="#updates-0083" title="Permanent link">¶</a></h2><p>2021-01-03: <a href="https://github.com/4myhw">George</a> fixed a typo, thanks!</p><p>2023-09-03: <a href="https://github.com/labdmitriy">Dmitry Labazkin</a> provided a new test for the method <code>avg</code> and a better implementation. He also asked relevant questions about type checking that I addressed in a new section. Thanks Dmitry!</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>TDD in Python with pytest - Part 12020-09-10T10:30:00+02:002023-09-03T19:00:00+02:00Leonardo Giordanitag:www.thedigitalcatonline.com,2020-09-10:/blog/2020/09/10/tdd-in-python-with-pytest-part-1/<p>This series of posts comes directly from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a>. As I am reviewing the book to prepare a second edition, I realised that Harry Percival was right when he said that the initial part on TDD shouldn't be in the book. That's a prerequisite …</p><p>This series of posts comes directly from my book <a href="https://leanpub.com/clean-architectures-in-python">Clean Architectures in Python</a>. As I am reviewing the book to prepare a second edition, I realised that Harry Percival was right when he said that the initial part on TDD shouldn't be in the book. That's a prerequisite to follow the chapters on the clean architecture, but it is something many programmers already know and they might be surprised to find it in a book that discusses architectures.</p><p>So, I decided to move it here before I start working on a new version of the book. I also followed the advice of <a href="https://github.com/valorien">valorien</a>, who pointed out that the main example had some bad naming choices, and so I reworked the code.</p><h2 id="introduction-8835">Introduction<a class="headerlink" href="#introduction-8835" title="Permanent link">¶</a></h2><p>Test-Driven Development (TDD) is fortunately one of the names that I can spot most frequently when people talk about methodologies. Unfortunately, many programmers still do not follow it, fearing that it will impose a further burden on the already difficult life of a developer.</p><p>In this chapter I will try to outline the basic concept of TDD and to show you how your job as a programmer can greatly benefit from it. I will develop a very simple project to show how to practically write software following this methodology.</p><p>TDD is a <em>methodology</em>, something that can help you to create better code. But it is not going to solve all your problems. As with all methodologies you have to pay attention not to commit blindly to it. Try to understand the reasons why certain practices are suggested by the methodology and you will also understand when and why you can or have to be flexible.</p><p>Keep also in mind that testing is a broader concept that doesn't end with TDD, which focuses a lot on unit testing, a specific type of test that helps you to develop the API of your library/package. There are other types of tests, like integration or functional ones, that are not specifically part of the TDD methodology, strictly speaking, even though the TDD approach can be extended to any testing activity.</p><h2 id="a-real-life-example-5470">A real-life example<a class="headerlink" href="#a-real-life-example-5470" title="Permanent link">¶</a></h2><p>Let's start with a simple example taken from a programmer's everyday life.</p><p>The programmer is in the office with other colleagues, trying to nail down an issue in some part of the software. Suddenly the boss storms into the office, and addresses the programmer:</p><p><strong>Boss</strong>: I just met with the rest of the board. Our clients are not happy, we didn't fix enough bugs in the last two months.</p><p><strong>Programmer</strong>: I see. How many bugs did we fix?</p><p><strong>Boss</strong>: Well, not enough!</p><p><strong>Programmer</strong>: OK, so how many bugs do we have to fix every month?</p><p><strong>Boss</strong>: More!</p><p>I guess you feel very sorry for the poor programmer. Apart from the aggressive attitude of the boss, what is the real issue in this conversation? At the end of it there is no hint for the programmer and their colleagues about what to do next. They don't have any clue about what they have to change. They can definitely try to work harder, but the boss didn't refer to actual figures, so it will be definitely hard for the developers to understand if they improved "enough".</p><p>The classical <a href="https://en.wikipedia.org/wiki/Sorites_paradox">sorites paradox</a> may help to understand the issue. One of the standard formulations, taken from the Wikipedia page, is</p><div class="callout"><div class="content"><p>1,000,000 grains of sand is a heap of sand (Premise 1)</p>
<p>A heap of sand minus one grain is still a heap. (Premise 2)</p>
<p>So 999,999 grains is a heap of sand.</p>
<p>A heap of sand minus one grain is still a heap. (Premise 2)</p>
<p>So 999,998 grains is a heap of sand.</p>
<p>So one grain is a heap of sand.</p></div></div><p>Where is the issue? The concept expressed by the word "heap" is nebulous, it is not defined clearly enough to allow the process to find a stable point, or a solution.</p><p>When you write software you face that same challenge. You cannot conceive a function and just expect it "to work", because this is not clearly defined. How do you test if the function that you wrote "works"? What do you mean by "works"? TDD forces you to <strong>clearly state your goal</strong> before you write the code. Actually, the TDD mantra is "Test first, code later", which can be translated to "Goal first, solution later". Will shortly see a practical example of this.</p><p>For the time being, consider that this is a valid practice also outside the realm of software creation. Whoever runs a business knows that you need to be able to extract some numbers (KPIs) from the activity of your company, because it is by comparing those numbers with some predefined thresholds that you can easily tell if the business is healthy or not. KPIs are a form of test, and you have to define them in advance, according to the expectations or needs that you have. </p><p>Pay attention. Nothing prevents you from changing the thresholds as a reaction to external events. You may consider that, given the incredible heat wave that hit your country, the amount of coats that your company sold could not reach the goal. So, because of a specific event, you can justify a change in the test (KPI). If you didn't have the test you would have just generically recorded that you earned less money.</p><p>Going back to software and TDD, following this methodology you are forced to state clear goals like</p><div class="code"><div class="content"><div class="highlight"><pre><span class="nb">sum</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span> <span class="o">==</span> <span class="mi">9</span>
</pre></div> </div> </div><p>Let me read this test for you: there will be a <code>sum</code> function available in the system that accepts two integers. If the two integers are 4 and 5 the function will return 9.</p><p>As you can see there are many things that are tested by this statement.</p><ul><li>The function exists and can be imported</li><li>The function accepts two integers</li><li>Passing 4 and 5 as inputs, the output of the function will be 9.</li></ul><p>Pay attention that at this stage there is no code that implements the function <code>sum</code>, the tests will fail for sure.</p><p>As we will see with a practical example in the next chapter, what I explained in this section will become a set of rules of the methodology.</p><h2 id="a-simple-tdd-project-e470">A simple TDD project<a class="headerlink" href="#a-simple-tdd-project-e470" title="Permanent link">¶</a></h2><p>The project we are going to develop is available at <a href="https://github.com/lgiordani/simple_calculator">https://github.com/lgiordani/simple_calculator</a>.</p><p>This project is purposefully extremely simple. You don't need to be an experienced Python programmer to follow this chapter, but you need to know the basics of the language. The goal of this series of posts is not that of making you write the best Python code, but that of allowing you learn the TDD work flow, so don't be too worried if your code is not perfect.</p><p>Methodologies are like sports or arts: you cannot learn them just by reading their description on a book. You have to practice them. Thus, you should avoid as much as possible to just follow this chapter reading the code passively. Instead, you should try to write the code and to try new solutions to the problems that I discuss. This is very important, as it actually makes you use TDD. This way, at the end of the chapter you will have a personal experience of what TDD is like.</p><p>The repository is tagged, and at the end of each section you will find a link to the relative tag that contains <em>my</em> working solution. Please note that it is entirely possible your solution is different from mine: there are several aspects of coding, like for example style, that are not related to unit testing and TDD.</p><h2 id="setup-the-project-5c88">Setup the project<a class="headerlink" href="#setup-the-project-5c88" title="Permanent link">¶</a></h2><p>Clone the project repository and move to the branch <code>develop</code>. The branch <code>master</code> contains the full solution, and I use it to maintain the repository, but if you want to code along you need to start from scratch. I recommend you fork the repository on GitHub so that you are able to commit your changes.</p><div class="code"><div class="content"><div class="highlight"><pre>git clone https://github.com/YOURUSERNAME/simple_calculator
cd simple_calculator
git checkout --track origin/develop
</pre></div> </div> </div><p>Create a virtual environment following your preferred process and install the requirements</p><div class="code"><div class="content"><div class="highlight"><pre>pip install -r requirements/dev.txt
</pre></div> </div> </div><p>You should at this point be able to run</p><div class="code"><div class="content"><div class="highlight"><pre>pytest -svv
</pre></div> </div> </div><p>and get an output like</p><div class="code"><div class="content"><div class="highlight"><pre>================================ test session starts ===============================
platform XXXX -- Python XXXX, pytest-XXXX, py-XXXX, pluggy-XXXX -- XXXX
cachedir: .pytest_cache
rootdir: XXXX
configfile: XXXX
plugins: XXXX
collected 0 items
=============================== no tests ran in 0.02s ==============================
</pre></div> </div> </div><p>You can see here the operating system and a short list of the versions of the main packages involved in running pytest: Python, pytest itself, and some of its components and plugins. You can also see here where pytest is reading its configuration from. As this header is standard I will omit it from the output that I will show in the rest of the chapter. The specific versions of the packages are not important for this series.</p><h2 id="requirements-dd57">Requirements<a class="headerlink" href="#requirements-dd57" title="Permanent link">¶</a></h2><p>The goal of the project is to write a class <code>SimpleCalculator</code> that performs calculations: addition, subtraction, multiplication, and division. Addition and multiplication shall accept multiple arguments. Division shall return a float value, and division by zero shall return the string <code>"inf"</code>. Multiplication by zero must raise a <code>ValueError</code> exception. The class will also provide a function to compute the average of an iterable like a list. This function gets two optional upper and lower thresholds and should remove from the computation the values that fall outside these boundaries.</p><p>As you can see the requirements are pretty simple, and a couple of them are definitely not "good" requirements, like the behaviour of division and multiplication. I added those requirements for the sake of example, to show how to deal with exceptions when developing in TDD.</p><p>An interesting topic to discuss is that of data types: shall the calculator perform addition between integers or between floats? What about complex numbers, strings, and other items that can be "added" together? And what about the other operations? I consider this an advanced topic, in particular in Python, so for now I will consider only integers as inputs and discuss the problem of different types later in the series.</p><h2 id="step-1---adding-two-numbers-513b">Step 1 - Adding two numbers<a class="headerlink" href="#step-1---adding-two-numbers-513b" title="Permanent link">¶</a></h2><p>The first test we are going to write is one that checks if the class <code>SimpleCalculator</code> can perform an addition. Add the following code to the file <code>tests/test_main.py</code></p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">simple_calculator.main</span> <span class="kn">import</span> <span class="n">SimpleCalculator</span> <span class="callout">1</span>
<span class="k">def</span> <span class="nf">test_add_two_numbers</span><span class="p">():</span> <span class="callout">2</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">9</span>
</pre></div> </div> </div><p>As you can see the first thing we do is to import the class <code>SimpleCalculator</code> <span class="callout">1</span> that we are supposed to write. This class doesn't exist yet, don't worry, you didn't skip any passage.</p><p>The test is a standard function <span class="callout">2</span> (this is how pytest works), and the function name shall begin with <code>test_</code> so that pytest can automatically discover all the tests. I tend to give my tests a descriptive name, so it is easier later to come back and understand what the test is about with a quick glance. You are free to follow the style you prefer but in general remember that naming components in a proper way is one of the most difficult things in programming. So better to get a handle on it as soon as possible.</p><p>The body of the test function is pretty simple. The class <code>SimpleCalculator</code> is instantiated, and the method <code>add</code> of the instance is called with two numbers, 4 and 5. The result is stored in the variable <code>result</code>, which is later the subject of the test itself. The statement <code>assert result == 9</code> first computes <code>result == 9</code> which is a boolean, with a value that is either <code>True</code> or <code>False</code>. The keyword <code>assert</code>, then, silently passes if the argument is <code>True</code>, but raises an exception if it is <code>False</code>.</p><p>And this is how you write tests in pytest: if your code doesn't raise any exception the test passes, otherwise it fails. The keyword <code>assert</code> is used to force an exception in case of wrong result. Remember that pytest doesn't consider the return value of the function, so it can detect a failure only if it raises an exception.</p><p>Save the file and go back to the terminal. Execute <code>pytest -svv</code> and you should receive the following error message</p><div class="code"><div class="content"><div class="highlight"><pre>====================================== ERRORS ======================================
_______________________ ERROR collecting tests/test_main.py _______________________
[...]
tests/test_main.py:4: in <module>
from simple_calculator.main import SimpleCalculator
E ImportError: cannot import name 'SimpleCalculator' from 'simple_calculator.main'
!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!
============================== 1 error in 0.20 seconds =============================
</pre></div> </div> </div><p>No surprise here, actually, as we just tried to use something that doesn't exist. This is good, the test is showing us that something we suppose exists actually doesn't.</p><div class="callout"><div class="content"><p><strong>TDD rule number 1:</strong> Test first, code later</p></div></div><p>This, by the way, is not yet an error in a test. The error happens very soon, during the tests collection phase (as shown by the message in the bottom line <code>Interrupted: 1 errors during collection</code>). Given this, the methodology is still valid, as we wrote a test and it fails because of an error or a missing feature in the code.</p><p>Let's fix this issue. Open the file <code>simple_calculator/main.py</code> and add this code</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="k">pass</span>
</pre></div> </div> </div><p>But, I hear you scream, this class doesn't implement any of the requirements that are in the project. Yes, this is the hardest lesson you have to learn when you start using TDD. The development of the code is ruled by the tests, not by the requirements. The requirements are used to write the tests, the tests are used to write the code. You shouldn't worry about something that is more than one level above the current one in this workflow.</p><div class="callout"><div class="content"><p><strong>TDD rule number 2:</strong> Add the reasonably minimum amount of code you need to pass the tests</p></div></div><p>Run the test again, and this time you should receive a different error, that is</p><div class="code"><div class="content"><div class="highlight"><pre>tests/test_main.py::test_add_two_numbers FAILED
===================================== FAILURES =====================================
______________________________ test_add_two_numbers _______________________________
def test_add_two_numbers():
calculator = SimpleCalculator()
> result = calculator.add(4, 5)
E AttributeError: 'SimpleCalculator' object has no attribute 'add'
tests/test_main.py:10: AttributeError
============================= 1 failed in 0.04 seconds =============================
</pre></div> </div> </div><p>This is the first proper pytest failure report that we receive. You see a list of files containing tests and the result of each test</p><div class="code"><div class="content"><div class="highlight"><pre>tests/test_main.py::test_add_two_numbers FAILED
</pre></div> </div> </div><p>Later we will see that the syntax <code>FILENAME::TESTNAME</code> can be given directly to pytest to run a single test. In this case we already have only one test, but later you might run a single failing test giving the name shown here on the command line. For example</p><div class="code"><div class="content"><div class="highlight"><pre>pytest -svv tests/test_main.py::test_add_two_numbers
</pre></div> </div> </div><p>The second part of the output shows details on the failing tests, if any</p><div class="code"><div class="content"><div class="highlight"><pre>______________________________ test_add_two_numbers _______________________________
def test_add_two_numbers():
calculator = SimpleCalculator()
> result = calculator.add(4, 5)
E AttributeError: 'SimpleCalculator' object has no attribute 'add'
tests/test_main.py:10: AttributeError
</pre></div> </div> </div><p>For each failing test, pytest shows a header with the name of the test and the part of the code that raised the exception. At the end of each box, pytest shows the line of the test file where the error happened.</p><p>Back to the project. The new error is no surprise, as the test uses the method <code>add</code> that wasn't defined in the class. I bet you already guessed what I'm going to do, didn't you? This is the code that you should add to the class</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="hll"> <span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span><span class="hll"> <span class="k">pass</span>
</pre></div> </div> </div><p>And again, as you notice, we made the smallest possible addition to the code to pass the test. Running pytest again you should receive a different error message</p><div class="code"><div class="content"><div class="highlight"><pre>_______________________________ test_add_two_numbers _______________________________
def test_add_two_numbers():
calculator = SimpleCalculator()
> result = calculator.add(4, 5)
E TypeError: add() takes 1 positional argument but 3 were given
tests/test_main.py:10: TypeError
</pre></div> </div> </div><p>The function we defined doesn't accept any argument other than <code>self</code> (<code>def add(self)</code>), but in the test we pass three of them (<code>calculator.add(4, 5)</code>. Remember that in Python <code>self</code> is passed implicitly when you call a function. Our move at this point is to change the function to accept the parameters that it is supposed to receive, namely two numbers. The code now becomes</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="hll"> <span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
</span> <span class="k">pass</span>
</pre></div> </div> </div><p>Run the test again, and you will receive another error</p><div class="code"><div class="content"><div class="highlight"><pre>______________________________ test_add_two_numbers ________________________________
def test_add_two_numbers():
calculator = SimpleCalculator()
result = calculator.add(4, 5)
> assert result == 9
E assert None == 9
E -None
E +9
tests/test_main.py:12: AssertionError
</pre></div> </div> </div><p>The function returns <code>None</code>, as it doesn't contain any code, while the test expects it to return <code>9</code>. What do you think is the minimum code you can add to pass this test?</p><p>Well, the answer is</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="hll"> <span class="k">return</span> <span class="mi">9</span>
</pre></div> </div> </div><p>and this may surprise you (it should!). You might have been tempted to add some code that performs an addition between <code>a</code> and <code>b</code>, but this would violate the TDD principles, because you would have been driven by the requirements and not by the tests.</p><p>When you run pytest again, you will be rewarded by a success message</p><div class="code"><div class="content"><div class="highlight"><pre>tests/test_main.py::test_add_two_numbers PASSED
</pre></div> </div> </div><p>I know this sound weird, but think about it for a moment: if your code works (that is, it passes the tests), you don't need to change anything, as your tests should specify everything the code should do. Maybe in the future you will discover that this solution is not good enough, and at that point you will have to change it (this will happen with the next test, in this case). But for now everything works, and you shouldn't implement more than this.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-1-adding-two-numbers">step-1-adding-two-numbers</a></p><h2 id="step-2---adding-three-numbers-c8d7">Step 2 - Adding three numbers<a class="headerlink" href="#step-2---adding-three-numbers-c8d7" title="Permanent link">¶</a></h2><p>The requirements state that "Addition and multiplication shall accept multiple arguments". This means that we should be able to execute not only <code>add(4, 5)</code> like we did, but also <code>add(4, 5, 11)</code>, <code>add(4, 5, 11, 2)</code>, and so on. We can start testing this behaviour with the following test, that you should put in <code>tests/test_main.py</code>, after the previous test that we wrote.</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_add_three_numbers</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">15</span>
</pre></div> </div> </div><p>This test fails when we run the test suite</p><div class="code"><div class="content"><div class="highlight"><pre>_____________________________ test_add_three_numbers _______________________________
def test_add_three_numbers():
calculator = SimpleCalculator()
> result = calculator.add(4, 5, 6)
E TypeError: SimpleCalculator.add() takes 3 positional arguments but 4 were given
tests/test_main.py:18: TypeError
</pre></div> </div> </div><p>for the obvious reason that the function we wrote in the previous section accepts only 2 arguments other than <code>self</code>. What is the minimum code that you can write to fix this test?</p><p>Well, the simplest solution is to add another argument, so my first attempt is</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">9</span>
</pre></div> </div> </div><p>which solves the previous error, but creates a new one. If that wasn't enough, it also makes the first test fail!</p><div class="code"><div class="content"><div class="highlight"><pre>______________________________ test_add_two_numbers ________________________________
def test_add_two_numbers():
calculator = SimpleCalculator()
> result = calculator.add(4, 5)
E TypeError: SimpleCalculator.add() missing 1 required positional argument: 'c'
tests/test_main.py:10: TypeError
_____________________________ test_add_three_numbers _______________________________
def test_add_three_numbers():
calculator = SimpleCalculator()
result = calculator.add(4, 5, 6)
> assert result == 15
E assert 9 == 15
tests/test_main.py:20: AssertionError
</pre></div> </div> </div><p>The first test now fails because the new <code>add</code> method requires three arguments and we are passing only two. The second tests fails because the method <code>add</code> returns <code>9</code> and not <code>15</code> as expected by the test.</p><p>When multiple tests fail it's easy to feel discomforted and lost. Where are you supposed to start fixing this? Well, one possible solution is to undo the previous change and to try a different solution, but in general you should try to get to a situation in which only one test fails.</p><div class="callout"><div class="content"><p><strong>TDD rule number 3:</strong> You shouldn't have more than one failing test at a time</p></div></div><p>This is very important as it allows you to focus on one single test and thus one single problem. Clearly, we need to keep an eye on the global problem that we are trying to solve, but real test batteries can contain hundreds of tests and it is not practical to try to tackle all of them together.</p><p>Commenting tests to make them inactive is a perfectly valid way to have only one failing test. Pytest, however, has a smarter solution: you can use the option <code>-k</code> that allows you to specify a matching name. That option has a lot of expressive power, but for now we can just give it the name of the test that we want to run</p><div class="code"><div class="content"><div class="highlight"><pre>pytest -svv -k test_add_two_numbers
</pre></div> </div> </div><p>This option allows you to select multiple tests that share the same prefix, for example. If you want to run a single specific test you can also name it on the command line with the syntax we discussed previously</p><div class="code"><div class="content"><div class="highlight"><pre>pytest -svv tests/test_main.py::test_add_two_numbers
</pre></div> </div> </div><p>Either way, pytest will run only the first test and return the same result returned before, since we didn't change the test itself</p><div class="code"><div class="content"><div class="highlight"><pre>______________________________ test_add_two_numbers ________________________________
def test_add_two_numbers():
calculator = SimpleCalculator()
> result = calculator.add(4, 5)
E TypeError: SimpleCalculator.add() missing 1 required positional argument: 'c'
tests/test_main.py:10: TypeError
</pre></div> </div> </div><p>To fix this error we can obviously revert the addition of the third argument, but this would mean going back to the previous solution. Obviously tests focus on a very small part of the code, but we have to keep in mind what we are doing in terms of the big picture. A better solution is to add a default value to the third argument. The additive identity is <code>0</code>, so the new code of the method <code>add</code> is</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="hll"> <span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
</span> <span class="k">return</span> <span class="mi">9</span>
</pre></div> </div> </div><p>And this makes the first test pass. At this point we can run the full suite with <code>pytest -svv</code> and see what happens</p><div class="code"><div class="content"><div class="highlight"><pre>_____________________________ test_add_three_numbers ______________________________
def test_add_three_numbers():
calculator = SimpleCalculator()
result = calculator.add(4, 5, 6)
> assert result == 15
E assert 9 == 15
tests/test_main.py:20: AssertionError
</pre></div> </div> </div><p>The second test still fails, because the returned value that we hard coded doesn't match the expected one. At this point the tests show that our previous solution (<code>return 9</code>) is not sufficient anymore, and we have to try to implement something more complex.</p><p>I want to stress this. You should implement the minimal change in the code that makes tests pass. If that solution is not enough there will be a test that shows it. Now, as you can see, the addition of a new requirement changes the tests, adding a new one, and the old solution is not sufficient any more.</p><p>How can we solve this? We know that writing <code>return 15</code> will make the first test fail (you may try, if you want), so here we have to be a bit smarter and try a better solution, that in this case is actually to implement a real sum</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
<span class="hll"> <span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span> <span class="o">+</span> <span class="n">c</span>
</pre></div> </div> </div><p>This solution makes both tests pass, so the entire suite runs without errors.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-2-adding-three-numbers">step-2-adding-three-numbers</a></p><p>I can see your face, your are probably frowning at the fact that it took us 10 minutes to write a method that performs the addition of two or three numbers. On the one hand, keep in mind that I'm going at a very slow pace, this being an introduction, and for these first tests it is better to take the time to properly understand every single step. Later, when you will be used to TDD, some of these steps will be implicit. On the other hand, TDD <em>is</em> slower than untested development, but the time that you invest writing tests now is usually negligible compared to the amount of time you would spend trying to identify and fix bugs later.</p><h2 id="step-3---adding-multiple-numbers-5bb3">Step 3 - Adding multiple numbers<a class="headerlink" href="#step-3---adding-multiple-numbers-5bb3" title="Permanent link">¶</a></h2><p>The requirements are not yet satisfied, however, as they mention "multiple" numbers and not just three. How can we test that we can add a generic amount of numbers? We might add a <code>test_add_four_numbers</code>, a <code>test_add_five_numbers</code>, and so on, but this will cover specific cases and will never cover all of them. Sad to say, it is impossible to test that generic condition, or, at least in this case, so complex that it is not worth trying to do it.</p><p>What you shall do in TDD is to test boundary cases. In general you should always try to find the so-called "corner cases" of your algorithm and write tests that show that the code covers them. For example, if you are testing some code that accepts as inputs a number from 1 to 100, you need a test that runs it with a generic number like 42 (which is far from being generic, but don't panic!), but you definitely want to have a specific test that runs the algorithm with the number 1 and one that runs with the number 100. You also want to have tests that show the algorithm doesn't work with 0 and with 101, but we will talk later about testing error conditions.</p><p>In our example there is no real limitation to the number of arguments that you pass to your function. Before Python 3.7 there was a limit of 256 arguments, which has been removed in that version of the language, but these are limitations enforced by an external system, and they are not real boundaries of your algorithm.</p><p>The definition of "external system" obviously depends on what you are testing. If you are implementing a programming language you want to have tests that show how many arguments you can pass to a function, or that check the amount of memory used by certain language features. In this case we accept the Python language as the environment in which we work, so we don't want to test its features.</p><p>The solution, in this case, might be to test a reasonable high amount of input arguments, to check that everything works. In particular, we should try to keep in mind that our goal is to devise as much as possible a generic solution. For example, we easily realise that we cannot come up with a function like</p><div class="code"><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">d</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">e</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">f</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">g</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">h</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">):</span>
</pre></div> </div> </div><p>as it is not <em>generic</em>, it is just covering a greater amount of inputs (9, in this case, but not 10 or more).</p><p>That said, a good test might be the following</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_add_many_numbers</span><span class="p">():</span>
<span class="n">numbers</span> <span class="o">=</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="o">*</span><span class="n">numbers</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">4950</span>
</pre></div> </div> </div><p>which creates an array (strictly speaking a <code>range</code>, which is an iterable) of all the numbers from 0 to 99. The sum of all those numbers is 4950, which is what the algorithm shall return.</p><p>Please note that the assertion doesn't implement any algorithm to find the solution. I calculated the answer manually and hard coded it in the test. You should try as much as possible to minimise the algorithmic complexity of tests, instead "stating the facts". The reason is simple: the more complex the code of the test is, the higher the chances of introducing a bug <em>in the test</em>.</p><p>The test suite fails because we are giving the function too many arguments</p><div class="code"><div class="content"><div class="highlight"><pre>______________________________ test_add_many_numbers _______________________________
def test_add_many_numbers():
numbers = range(100)
calculator = SimpleCalculator()
> result = calculator.add(*numbers)
E TypeError: SimpleCalculator.add() takes from 3 to 4 positional arguments but 101 were given
tests/test_main.py:28: TypeError
</pre></div> </div> </div><p>The minimum amount of code that we can add, this time, will not be so trivial, as we have to pass three tests. This is actually the greatest advantage of TDD: the tests that we wrote are still there and will check that the previous conditions are still satisfied. And since tests are committed with the code they will always be there.</p><p>The Python way to support a generic number of arguments (technically called <em>variadic functions</em>) is through the use of the syntax <code>*args</code>, which stores in <code>args</code> a tuple that contains all the arguments.</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">args</span><span class="p">)</span>
</pre></div> </div> </div><p>At that point we can use the built-in function <code>sum</code> to sum all the arguments. This solution makes the whole test suite pass without errors, so it is correct.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-3-adding-multiple-numbers">step-3-adding-multiple-numbers</a></p><p>Pay attention here, please. In TDD, a solution is not correct when it is beautiful, when it is smart, or when it uses the latest feature of the language. All these things are good, but TDD wants your code to pass the tests. So, your code might be ugly, convoluted, and slow, but if it passes the test it is correct. This in turn means that TDD doesn't cover all the needs of your software project. Delivering fast routines, for example, might be part of the advantage you have on your competitors, but it is not really testable with the TDD methodology (typically, performance testing is done in a completely different way).</p><p>Part of the TDD methodology, then, deals with "refactoring", which means changing the code in a way that doesn't change the outputs, which in turns means that all your tests keep passing. Once you have a proper test suite in place, you can focus on the beauty of the code, or you can introduce smart solutions according to what the language allows you to do. We will discuss refactoring further later in this post.</p><div class="callout"><div class="content"><p><strong>TDD rule number 4:</strong> Write code that passes the test. Then refactor it.</p></div></div><h2 id="step-4---subtraction-952c">Step 4 - Subtraction<a class="headerlink" href="#step-4---subtraction-952c" title="Permanent link">¶</a></h2><p>From the requirements we know that we have to implement a function to subtract numbers, but this doesn't mention multiple arguments (as it would be complex to define what subtracting 3 of more numbers actually means). The tests that implements this requirements is</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_subtract_two_numbers</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">7</span>
</pre></div> </div> </div><p>which doesn't pass with the following error</p><div class="code"><div class="content"><div class="highlight"><pre>____________________________ test_subtract_two_numbers ____________________________
def test_subtract_two_numbers():
calculator = SimpleCalculator()
> result = calculator.sub(10, 3)
E AttributeError: 'SimpleCalculator' object has no attribute 'sub'
tests/test_main.py:36: AttributeError
</pre></div> </div> </div><p>Now that you understood the TDD process, and that you know you should avoid over-engineering, you can also skip some of the passages that we run through in the previous sections. A good solution for this test is</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">sub</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">-</span> <span class="n">b</span>
</pre></div> </div> </div><p>which makes the test suite pass.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-4-subtraction">step-4-subtraction</a></p><h2 id="step-5---multiplication-20bb">Step 5 - Multiplication<a class="headerlink" href="#step-5---multiplication-20bb" title="Permanent link">¶</a></h2><p>It's time to move to multiplication, which has many similarities to addition. The requirements state that we have to provide a function to multiply numbers and that this function shall allow us to multiply multiple arguments. In TDD you should try to tackle problems one by one, possibly dividing a bigger requirement in multiple smaller ones.</p><p>In this case the first test can be the multiplication of two numbers, as it was for addition.</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_mul_two_numbers</span><span class="p">():</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">mul</span><span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">24</span>
</pre></div> </div> </div><p>And the test suite fails as expected with the following error</p><div class="code"><div class="content"><div class="highlight"><pre>______________________________ test_mul_two_numbers _______________________________
def test_mul_two_numbers():
calculator = SimpleCalculator()
> result = calculator.mul(6, 4)
E AttributeError: 'SimpleCalculator' object has no attribute 'mul'
tests/test_main.py:44: AttributeError
</pre></div> </div> </div><p>We face now a classical TDD dilemma. Shall we implement the solution to this test as a function that multiplies two numbers, knowing that the next test will invalidate it, or shall we already consider that the target is that of implementing a variadic function and thus use <code>*args</code> directly?</p><p>In this case the choice is not really important, as we are dealing with very simple functions. In other cases, however, it might be worth recognising that we are facing the same issue we solved in a similar case and try to implement a smarter solution from the very beginning. In general, however, you should not implement anything that you don't plan to test in one of the next few tests that you will write.</p><p>If we decide to follow the strict TDD, that is implement the simplest first solution, the bare minimum code that passes the test would be</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre> <span class="k">def</span> <span class="nf">mul</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">*</span> <span class="n">b</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-5-multiply-two-numbers">step-5-multiply-two-numbers</a></p><p>To show you how to deal with redundant tests I will in this case choose the second path, and implement a smarter solution for the present test. Keep in mind however that it is perfectly correct to implement that solution shown above and then move on and try to solve the problem of multiple arguments later.</p><p>The problem of multiplying a tuple of numbers can be solved in Python using the function <code>reduce</code>. This function implements a typical algorithm that "reduces" an array to a single number, applying a given function. The algorithm steps are the following</p><p>1. Apply the function to the first two elements 2. Remove the first two elements from the array 3. Apply the function to the result of the previous step and to the first element of the array 4. Remove the first element 5. If there are still elements in the array go back to step 3</p><p>So, suppose the function is</p><div class="code"><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">mul2</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">*</span> <span class="n">b</span>
</pre></div> </div> </div><p>and the array is</p><div class="code"><div class="content"><div class="highlight"><pre><span class="n">a</span> <span class="o">=</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">3</span><span class="p">]</span>
</pre></div> </div> </div><p>The steps followed by the algorithm will be</p><p>1. Apply the function to 2 and 6 (first two elements). The result is <code>2 * 6</code>, that is 12 2. Remove the first two elements, the array is now <code>a = [4, 8, 3]</code> 3. Apply the function to 12 (result of the previous step) and 4 (first element of the array). The new result is <code>12 * 4</code>, that is 48 4. Remove the first element, the array is now <code>a = [8, 3]</code> 5. Apply the function to 48 (result of the previous step) and 8 (first element of the array). The new result is <code>48 * 8</code>, that is 384 6. Remove the first element, the array is now <code>a = [3]</code> 7. Apply the function to 384 (result of the previous step) and 3 (first element of the array). The new result is <code>384 * 3</code>, that is 1152 8. Remove the first element, the array is now empty and the procedure ends</p><p>Going back to our class <code>SimpleCalculator</code>, we might import <code>reduce</code> from the module <code>functools</code> and use it on the array <code>args</code>. We need to provide a function that we can define in the function <code>mul</code> itself.</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">reduce</span>
<span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">mul</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">mul2</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">*</span> <span class="n">b</span>
<span class="k">return</span> <span class="n">reduce</span><span class="p">(</span><span class="n">mul2</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-5-multiply-two-numbers-smart">step-5-multiply-two-numbers-smart</a></p><p>More information about the algorithm <code>reduce</code> can be found on the MapReduce Wikipedia page <a href="https://en.wikipedia.org/wiki/MapReduce">https://en.wikipedia.org/wiki/MapReduce</a>. The Python function documentation can be found at <a href="https://docs.python.org/3.10/library/functools.html#functools.reduce">https://docs.python.org/3.10/library/functools.html#functools.reduce</a>.</p><p>The above code makes the test suite pass, so we can move on and address the next problem. As happened with addition we cannot properly test that the function accepts a potentially infinite number of arguments, so we can test a reasonably high number of inputs.</p><div class="code"><div class="title"><code>tests/test_main.py</code></div><div class="content"><div class="highlight"><pre><span class="k">def</span> <span class="nf">test_mul_many_numbers</span><span class="p">():</span>
<span class="n">numbers</span> <span class="o">=</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">SimpleCalculator</span><span class="p">()</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">mul</span><span class="p">(</span><span class="o">*</span><span class="n">numbers</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">result</span> <span class="o">==</span> <span class="mi">362880</span>
</pre></div> </div> </div><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-5-multiply-many-numbers">step-5-multiply-many-numbers</a></p><p>We might use 100 arguments as we did with addition, but the multiplication of all numbers from 1 to 100 gives a result with 156 digits and I don't really need to clutter the tests file with such a monstrosity. As I said, testing multiple arguments is testing a boundary, and the idea is that if the algorithm works for 2 numbers and for 10 it will work for 10 thousands arguments as well.</p><p>If we run the test suite now all tests pass, and <em>this should worry you</em>.</p><p>Yes, you shouldn't be happy. When you follow TDD each new test that you add should fail. If it doesn't fail you should ask yourself if it is worth adding that test or not. This is because chances are that you are adding a useless test and we don't want to add useless code, because code has to be maintained, so the less the better.</p><p>In this case, however, we know why the test already passes. We implemented a smarter algorithm as a solution for the first test knowing that we would end up trying to solve a more generic problem. And the value of this new test is that it shows that multiple arguments can be used, while the first test doesn't.</p><p>So, after these considerations, we can be happy that the second test already passes.</p><div class="callout"><div class="content"><p><strong>TDD rule number 5:</strong> A test should fail the first time you run it. If it doesn't, ask yourself why you are adding it.</p></div></div><h2 id="step-6---refactoring-b6bd">Step 6 - Refactoring<a class="headerlink" href="#step-6---refactoring-b6bd" title="Permanent link">¶</a></h2><p>Previously, I introduced the concept of refactoring, which means changing the code without altering the results. How can you be sure you are not altering the behaviour of your code? Well, this is what the tests are for. If the new code keeps passing the test suite you can be sure that you didn't remove any feature.</p><p>In theory, refactoring shouldn't add any new behaviour to the code, as it should be an idempotent transformation. There is no real practical way to check this, and we will not bother with it now. You should be concerned with this if you are discussing security, as your code shouldn't add any entry point you don't want to be there. In this case you will need tests that check the absence of features instead of their presence.</p><p>This means that if you have no tests you shouldn't refactor. But, after all, if you have no tests you shouldn't have any code, either, so refactoring shouldn't be a problem you have. If you have some code without tests (I know you have it, I do), you should seriously consider writing tests for it, at least before changing it. More on this in a later section.</p><p>For the time being, let's see if we can work on the code of the class <code>SimpleCalculator</code> without altering the results. I do not really like the definition of the function <code>mul2</code> inside the function <code>mul</code>. It is obviously perfectly fine and valid, but for the sake of example I will pretend we have to get rid of it.</p><p>Python provides a useful function to multiply two objects in the module <code>operator</code> of the standard library</p><div class="code"><div class="title"><code>simple_calculator/main.py</code></div><div class="content"><div class="highlight"><pre><span class="kn">import</span> <span class="nn">operator</span>
<span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">reduce</span>
<span class="k">class</span> <span class="nc">SimpleCalculator</span><span class="p">:</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">mul</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="k">return</span> <span class="n">reduce</span><span class="p">(</span><span class="n">operator</span><span class="o">.</span><span class="n">mul</span><span class="p">,</span> <span class="n">args</span><span class="p">)</span>
</pre></div> </div> </div><p>Running the test suite I can see that all the test pass, so my refactoring is correct.</p><p><strong>Git tag:</strong> <a href="https://github.com/lgiordani/simple_calculator/tree/step-6-refactoring">step-6-refactoring</a></p><div class="callout"><div class="content"><p><strong>TDD rule number 6:</strong> Never refactor without tests.</p></div></div><h2 id="final-words-9803">Final words<a class="headerlink" href="#final-words-9803" title="Permanent link">¶</a></h2><p>Well, I think we learned a lot. We started with no knowledge of TDD and we managed to implement a fully tested class with 3 methods. We also briefly touched the topic of refactoring, which is of paramount importance in development. In the next post I will cover the remaining requirements: division, testing exceptions, and the average function.</p><h2 id="updates-0083">Updates<a class="headerlink" href="#updates-0083" title="Permanent link">¶</a></h2><p>2021-01-03: <a href="https://github.com/4myhw">George</a> fixed a typo, thanks!</p><p>2021-08-11: <a href="https://github.com/floatingpurr">Andrea Mignone</a> fixed a link. Thank you!</p><p>2023-09-03: <a href="https://github.com/labdmitriy">Dmitry Labazkin</a> and <a href="https://github.com/blablatdinov">Ilaletdinov Almaz</a> suggested using <code>operator.mul</code> instead of a <code>lambda</code> in the final refactoring. Thanks both!</p><h2 id="feedback-d845">Feedback<a class="headerlink" href="#feedback-d845" title="Permanent link">¶</a></h2><p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>Refactoring with tests in Python: a practical example2017-07-21T09:30:00+01:002017-07-21T09:30:00+01:00Leonardo Giordanitag:www.thedigitalcatonline.com,2017-07-21:/blog/2017/07/21/refactoring-with-test-in-python-a-practical-example/<p>A step-by-step review of a refactoring session of Python code, using TDD</p><p>This post contains a step-by-step example of a refactoring session guided by tests. When dealing with untested or legacy code refactoring is dangerous and tests can help us do it the right way, minimizing the amount of bugs we introduce, and possibly completely avoiding them.</p>
<p>Refactoring is not easy. It requires a double effort to understand code that others wrote, or that we wrote in the past, and moving around parts of it, simplifying it, in one word <strong>improving</strong> it, is by no means something for the faint-hearted. Like programming, refactoring has its rules and best practices, but it can be described as a mixture of technique, intuition, experience, risk.</p>
<p>Programming, after all, is craftsmanship.</p>
<h2 id="the-starting-point">The starting point<a class="headerlink" href="#the-starting-point" title="Permanent link">¶</a></h2>
<p>The simple use case I will use for this post is that of a service API that we can access, and that produces data in JSON format, namely a <strong>list</strong> of elements like the one shown here</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">"age"</span><span class="p">:</span><span class="w"> </span><span class="mi">20</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"surname"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Frazier"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"John"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"salary"</span><span class="p">:</span><span class="w"> </span><span class="s2">"£28943"</span>
<span class="p">}</span>
</code></pre></div>
<p>Once we convert this to a Python data structure we obtain a list of dictionaries, where <code>'age'</code> is an integer, and the remaining fields are strings.</p>
<p>Someone then wrote a class that computes some statistics on the input data. This class, called <code>DataStats</code>, provides a single method <code>stats()</code>, whose inputs are the data returned by the service (in JSON format), and two integers called <code>iage</code> and <code>isalary</code>. Those, according to the short documentation of the class, are the initial age and the initial salary used to compute the average yearly increase of the salary on the whole dataset.</p>
<p>The code is the following</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">math</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="k">class</span> <span class="nc">DataStats</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">stats</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">):</span>
<span class="c1"># iage and isalary are the starting age and salary used to</span>
<span class="c1"># compute the average yearly increase of salary.</span>
<span class="c1"># Compute average yearly increase</span>
<span class="n">average_age_increase</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span>
<span class="nb">sum</span><span class="p">([</span><span class="n">e</span><span class="p">[</span><span class="s1">'age'</span><span class="p">]</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">))</span> <span class="o">-</span> <span class="n">iage</span>
<span class="n">average_salary_increase</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span>
<span class="nb">sum</span><span class="p">([</span><span class="nb">int</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">))</span> <span class="o">-</span> <span class="n">isalary</span>
<span class="n">yearly_avg_increase</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span>
<span class="n">average_salary_increase</span><span class="o">/</span><span class="n">average_age_increase</span><span class="p">)</span>
<span class="c1"># Compute max salary</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="n">threshold</span> <span class="o">=</span> <span class="s1">'£'</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="n">salaries</span><span class="p">))</span>
<span class="n">max_salary</span> <span class="o">=</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span> <span class="n">threshold</span><span class="p">]</span>
<span class="c1"># Compute min salary</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="n">min_salary</span> <span class="o">=</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span>
<span class="s1">'£</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="nb">min</span><span class="p">(</span><span class="n">salaries</span><span class="p">)))]</span>
<span class="k">return</span> <span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">({</span>
<span class="s1">'avg_age'</span><span class="p">:</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="nb">sum</span><span class="p">([</span><span class="n">e</span><span class="p">[</span><span class="s1">'age'</span><span class="p">]</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)),</span>
<span class="s1">'avg_salary'</span><span class="p">:</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span>
<span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)),</span>
<span class="s1">'avg_yearly_increase'</span><span class="p">:</span> <span class="n">yearly_avg_increase</span><span class="p">,</span>
<span class="s1">'max_salary'</span><span class="p">:</span> <span class="n">max_salary</span><span class="p">,</span>
<span class="s1">'min_salary'</span><span class="p">:</span> <span class="n">min_salary</span>
<span class="p">})</span>
</code></pre></div>
<h2 id="the-goal">The goal<a class="headerlink" href="#the-goal" title="Permanent link">¶</a></h2>
<p>It is fairly easy, even for the untrained eye, to spot some issues in the previous class. A list of the most striking ones is</p>
<ul>
<li>The class exposes a single method and has no <code>__init__()</code>, thus the same functionality could be provided by a single function.</li>
<li>The <code>stats()</code> method is too big, and performs too many tasks. This makes debugging very difficult, as there is a single inextricable piece of code that does everything.</li>
<li>There is a lot of code duplication, or at least several lines that are very similar. Most notably the two operations <code>'£' + str(max(salaries))</code> and <code>'£{}'.format(str(min(salaries)))</code>, the two different lines starting with <code>salaries =</code>, and the several list comprehensions.</li>
</ul>
<p>So, since we are going to use this code in some part of our Amazing New Project™, we want to possibly fix these issues.</p>
<p>The class, however, is working perfectly. It has been used in production for many years and there are no known bugs, so our operation has to be a <strong>refactoring</strong>, which means that we want to write something better, preserving the behaviour of the previous object.</p>
<h2 id="the-path">The path<a class="headerlink" href="#the-path" title="Permanent link">¶</a></h2>
<p>In this post I want to show you how you can safely refactor such a class using tests. This is different from TDD, but the two are closely related. The class we have has not been created using TDD, as there are no tests, but we can use tests to ensure its behaviour is preserved. This should therefore be called Test Driven Refactoring (TDR).</p>
<p>The idea behind TDR is pretty simple. First, we have to write a test that checks the behaviour of some code, possibly a small part with a clearly defined scope and output. This is a posthumous (or late) unit test, and it simulates what the author of the code should have provided (cough cough, it was you some months ago...).</p>
<p>Once you have you unit test you can go and modify the code, knowing that the behaviour of the resulting object will be the same of the previous one. As you can easily understand, the effectiveness of this methodology depends strongly on the quality of the tests themselves, possibly more than when developing with TDD, and this is why refactoring is hard.</p>
<h2 id="caveats">Caveats<a class="headerlink" href="#caveats" title="Permanent link">¶</a></h2>
<p>Two remarks before we start our refactoring. The first is that such a class could easily be refactored to some functional code. As you will be able to infer from the final result there is no real reason to keep an object-oriented approach for this code. I decided to go that way, however, as it gave me the possibility to show a design pattern called wrapper, and the refactoring technique that leverages it.</p>
<p>The second remark is that in pure TDD it is strongly advised not to test internal methods, that is those methods that do not form the public API of the object. In general, we identify such methods in Python by prefixing their name with an underscore, and the reason not to test them is that TDD wants you to shape objects according to the object-oriented programming methodology, which considers objects as <strong>behaviours</strong> and not as <strong>structures</strong>. Thus, we are only interested in testing public methods.</p>
<p>It is also true, however, that sometimes even tough we do not want to make a method public, that method contains some complex logic that we want to test. So, in my opinion the TDD advice should sound like "Test internal methods only when they contain some non-trivial logic".</p>
<p>When it comes to refactoring, however, we are somehow deconstructing a previously existing structure, and usually we end up creating a lot of private methods to help extracting and generalising parts of the code. My advice in this case is to test those methods, as this gives you a higher degree of confidence in what you are doing. With experience you will then learn which tests are required and which are not.</p>
<h2 id="setup-of-the-testing-environment">Setup of the testing environment<a class="headerlink" href="#setup-of-the-testing-environment" title="Permanent link">¶</a></h2>
<p>Clone <a href="https://github.com/lgiordani/datastats">this repository</a> and create a virtual environment. Activate it and install the required packages with </p>
<div class="highlight"><pre><span></span><code>pip<span class="w"> </span>install<span class="w"> </span>-r<span class="w"> </span>requirements.txt
</code></pre></div>
<p>The repository already contains a configuration file for pytest and you should customise it to avoid entering your virtual environment directory. Go and fix the <code>norecursedirs</code> parameter in that file, adding the name of the virtual environment you just created; I usually name my virtual environments with a <code>venv</code> prefix, and this is why that variable contains the entry <code>venv*</code>.</p>
<p>At this point you should be able to run <code>pytest -svv</code> in the parent directory of the repository (the one that contains <code>pytest.ini</code>), and obtain a result similar to the following</p>
<div class="highlight"><pre><span></span><code><span class="o">==========================</span><span class="w"> </span><span class="nb">test</span><span class="w"> </span>session<span class="w"> </span><span class="nv">starts</span><span class="w"> </span><span class="o">==========================</span>
platform<span class="w"> </span>linux<span class="w"> </span>--<span class="w"> </span>Python<span class="w"> </span><span class="m">3</span>.5.3,<span class="w"> </span>pytest-3.1.2,<span class="w"> </span>py-1.4.34,<span class="w"> </span>pluggy-0.4.0
cachedir:<span class="w"> </span>.cache
rootdir:<span class="w"> </span>datastats,<span class="w"> </span>inifile:<span class="w"> </span>pytest.ini
plugins:<span class="w"> </span>cov-2.5.1
collected<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="nv">items</span><span class="w"> </span>
<span class="o">======================</span><span class="w"> </span>no<span class="w"> </span>tests<span class="w"> </span>ran<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">0</span>.00<span class="w"> </span><span class="nv">seconds</span><span class="w"> </span><span class="o">======================</span>
</code></pre></div>
<p>The given repository contains two branches. <code>master</code> is the one that you are into, and contains the initial setup, while <code>develop</code> points to the last step of the whole refactoring process. Every step of this post contains a reference to the commit that contains the changes introduced in that section.</p>
<h2 id="step-1-testing-the-endpoints">Step 1 - Testing the endpoints<a class="headerlink" href="#step-1-testing-the-endpoints" title="Permanent link">¶</a></h2>
<p>Commit: <a href="https://github.com/lgiordani/datastats/commit/27a1d8ccd5b0a57fa6d9d5f3bd80874538f14ed2">27a1d8c</a></p>
<p>When you start refactoring a system, regardless of the size, you have to test the endpoints. This means that you consider the system as a black box (i.e. you do not know what is inside) and just check the external behaviour. In this case we can write a test that initialises the class and runs the <code>stats()</code> method with some test data, possibly <strong>real</strong> data, and checks the output. Obviously we will write the test with the actual output returned by the method, so this test is automatically passing.</p>
<p>Querying the server we get the following data</p>
<div class="highlight"><pre><span></span><code><span class="n">test_data</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Laith"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Simmons"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">68</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£27888"</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Mikayla"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Henry"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">49</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£67137"</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Garth"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Fields"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">70</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£70472"</span>
<span class="p">}</span>
<span class="p">]</span>
</code></pre></div>
<p>and calling the <code>stats()</code> method with that output, with <code>iage</code> set to <code>20</code>, and <code>isalary</code> set to <code>20000</code>, we get the following JSON result</p>
<div class="highlight"><pre><span></span><code><span class="p">{</span>
<span class="w"> </span><span class="nt">"avg_age"</span><span class="p">:</span><span class="w"> </span><span class="mi">62</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"avg_salary"</span><span class="p">:</span><span class="w"> </span><span class="mi">55165</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"avg_yearly_increase"</span><span class="p">:</span><span class="w"> </span><span class="mi">837</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"max_salary"</span><span class="p">:</span><span class="w"> </span><span class="p">[{</span>
<span class="w"> </span><span class="nt">"id"</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Garth"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"surname"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Fields"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"age"</span><span class="p">:</span><span class="w"> </span><span class="mi">70</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"salary"</span><span class="p">:</span><span class="w"> </span><span class="s2">"£70472"</span>
<span class="w"> </span><span class="p">}],</span>
<span class="w"> </span><span class="nt">"min_salary"</span><span class="p">:</span><span class="w"> </span><span class="p">[{</span>
<span class="w"> </span><span class="nt">"id"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Laith"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"surname"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Simmons"</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"age"</span><span class="p">:</span><span class="w"> </span><span class="mi">68</span><span class="p">,</span>
<span class="w"> </span><span class="nt">"salary"</span><span class="p">:</span><span class="w"> </span><span class="s2">"£27888"</span>
<span class="w"> </span><span class="p">}]</span>
<span class="p">}</span>
</code></pre></div>
<p>Caveat: I'm using a single very short set of real data, namely a list of 3 dictionaries. In a real case I would test the black box with many different use cases, to ensure I am not just checking some corner case.</p>
<p>The test is the following</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">datastats.datastats</span> <span class="kn">import</span> <span class="n">DataStats</span>
<span class="k">def</span> <span class="nf">test_json</span><span class="p">():</span>
<span class="n">test_data</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Laith"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Simmons"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">68</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£27888"</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Mikayla"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Henry"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">49</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£67137"</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Garth"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Fields"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">70</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£70472"</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="n">ds</span> <span class="o">=</span> <span class="n">DataStats</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">ds</span><span class="o">.</span><span class="n">stats</span><span class="p">(</span><span class="n">test_data</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">20000</span><span class="p">)</span> <span class="o">==</span> <span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span>
<span class="p">{</span>
<span class="s1">'avg_age'</span><span class="p">:</span> <span class="mi">62</span><span class="p">,</span>
<span class="s1">'avg_salary'</span><span class="p">:</span> <span class="mi">55165</span><span class="p">,</span>
<span class="s1">'avg_yearly_increase'</span><span class="p">:</span> <span class="mi">837</span><span class="p">,</span>
<span class="s1">'max_salary'</span><span class="p">:</span> <span class="p">[{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Garth"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Fields"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">70</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£70472"</span>
<span class="p">}],</span>
<span class="s1">'min_salary'</span><span class="p">:</span> <span class="p">[{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Laith"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Simmons"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">68</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£27888"</span>
<span class="p">}]</span>
<span class="p">}</span>
<span class="p">)</span>
</code></pre></div>
<p>As said before, this test is obviously passing, having been artificially constructed from a real execution of the code.</p>
<p>Well, this test is very important! Now we know that if we change something inside the code, altering the behaviour of the class, at least one test will fail.</p>
<h2 id="step-2-getting-rid-of-the-json-format">Step 2 - Getting rid of the JSON format<a class="headerlink" href="#step-2-getting-rid-of-the-json-format" title="Permanent link">¶</a></h2>
<p>Commit: <a href="https://github.com/lgiordani/datastats/commit/65e2997d71ade752633229186c6669803a46f185">65e2997</a></p>
<p>The method returns its output in JSON format, and looking at the class it is pretty evident that the conversion is done by <code>json.dumps()</code>.</p>
<p>The structure of the code is the following</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">DataStats</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">stats</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">):</span>
<span class="p">[</span><span class="n">code_part_1</span><span class="p">]</span>
<span class="k">return</span> <span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">({</span>
<span class="p">[</span><span class="n">code_part_2</span><span class="p">]</span>
<span class="p">})</span>
</code></pre></div>
<p>Where obviously <code>code_part_2</code> depends on <code>code_part_1</code>. The first refactoring, then, will follow this procedure</p>
<p>1. We write a test called <code>test__stats()</code> for a <code>_stats()</code> method that is supposed to return the data as a Python structure. We can infer the latter manually from the JSON or running <code>json.loads()</code> from a Python shell. The test fails.</p>
<p>2. We <strong>duplicate</strong> the code of the <code>stats()</code> method that produces the data, putting it in the new <code>_stats()</code> method. The test passes.</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">DataStats</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">_stats</span><span class="p">(</span><span class="n">parameters</span><span class="p">):</span>
<span class="p">[</span><span class="n">code_part_1</span><span class="p">]</span>
<span class="k">return</span> <span class="p">[</span><span class="n">code_part_2</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">stats</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">):</span>
<span class="p">[</span><span class="n">code_part_1</span><span class="p">]</span>
<span class="k">return</span> <span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">({</span>
<span class="p">[</span><span class="n">code_part_2</span><span class="p">]</span>
<span class="p">})</span>
</code></pre></div>
<p>3. We remove the duplicated code in <code>stats()</code> replacing it with a call to <code>_stats()</code></p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">DataStats</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">_stats</span><span class="p">(</span><span class="n">parameters</span><span class="p">):</span>
<span class="p">[</span><span class="n">code_part_1</span><span class="p">]</span>
<span class="k">return</span> <span class="p">[</span><span class="n">code_part_2</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">stats</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">):</span>
<span class="k">return</span> <span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_stats</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p>At this point we could refactor the initial test <code>test_json()</code> that we wrote, but this is an advanced consideration, and I'll leave it for some later notes.</p>
<p>So now the code of our class looks like this</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">DataStats</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">_stats</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">):</span>
<span class="c1"># iage and isalary are the starting age and salary used to</span>
<span class="c1"># compute the average yearly increase of salary.</span>
<span class="c1"># Compute average yearly increase</span>
<span class="n">average_age_increase</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span>
<span class="nb">sum</span><span class="p">([</span><span class="n">e</span><span class="p">[</span><span class="s1">'age'</span><span class="p">]</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">))</span> <span class="o">-</span> <span class="n">iage</span>
<span class="n">average_salary_increase</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span>
<span class="nb">sum</span><span class="p">([</span><span class="nb">int</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">))</span> <span class="o">-</span> <span class="n">isalary</span>
<span class="n">yearly_avg_increase</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span>
<span class="n">average_salary_increase</span><span class="o">/</span><span class="n">average_age_increase</span><span class="p">)</span>
<span class="c1"># Compute max salary</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="n">threshold</span> <span class="o">=</span> <span class="s1">'£'</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="n">salaries</span><span class="p">))</span>
<span class="n">max_salary</span> <span class="o">=</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span> <span class="n">threshold</span><span class="p">]</span>
<span class="c1"># Compute min salary</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="n">min_salary</span> <span class="o">=</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span>
<span class="s1">'£</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="nb">min</span><span class="p">(</span><span class="n">salaries</span><span class="p">)))]</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s1">'avg_age'</span><span class="p">:</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="nb">sum</span><span class="p">([</span><span class="n">e</span><span class="p">[</span><span class="s1">'age'</span><span class="p">]</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)),</span>
<span class="s1">'avg_salary'</span><span class="p">:</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span>
<span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)),</span>
<span class="s1">'avg_yearly_increase'</span><span class="p">:</span> <span class="n">yearly_avg_increase</span><span class="p">,</span>
<span class="s1">'max_salary'</span><span class="p">:</span> <span class="n">max_salary</span><span class="p">,</span>
<span class="s1">'min_salary'</span><span class="p">:</span> <span class="n">min_salary</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">stats</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">):</span>
<span class="k">return</span> <span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_stats</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p>and we have two tests that check the correctness of it.</p>
<h2 id="step-3-refactoring-the-tests">Step 3 - Refactoring the tests<a class="headerlink" href="#step-3-refactoring-the-tests" title="Permanent link">¶</a></h2>
<p>Commit: <a href="https://github.com/lgiordani/datastats/commit/d61901754b83ccc36fa25bcebf88da7cace28ff2">d619017</a></p>
<p>It is pretty clear that the <code>test_data</code> list of dictionaries is bound to be used in every test we will perform, so it is high time we moved that to a global variable. There is no point now in using a fixture, as the test data is just static data.</p>
<p>We could also move the output data to a global variable, but the upcoming tests are not using the whole output dictionary any more, so we can postpone the decision.</p>
<p>The test suite now looks like </p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">datastats.datastats</span> <span class="kn">import</span> <span class="n">DataStats</span>
<span class="n">test_data</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Laith"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Simmons"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">68</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£27888"</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Mikayla"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Henry"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">49</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£67137"</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Garth"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Fields"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">70</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£70472"</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="k">def</span> <span class="nf">test_json</span><span class="p">():</span>
<span class="n">ds</span> <span class="o">=</span> <span class="n">DataStats</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">ds</span><span class="o">.</span><span class="n">stats</span><span class="p">(</span><span class="n">test_data</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">20000</span><span class="p">)</span> <span class="o">==</span> <span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span>
<span class="p">{</span>
<span class="s1">'avg_age'</span><span class="p">:</span> <span class="mi">62</span><span class="p">,</span>
<span class="s1">'avg_salary'</span><span class="p">:</span> <span class="mi">55165</span><span class="p">,</span>
<span class="s1">'avg_yearly_increase'</span><span class="p">:</span> <span class="mi">837</span><span class="p">,</span>
<span class="s1">'max_salary'</span><span class="p">:</span> <span class="p">[{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Garth"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Fields"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">70</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£70472"</span>
<span class="p">}],</span>
<span class="s1">'min_salary'</span><span class="p">:</span> <span class="p">[{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Laith"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Simmons"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">68</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£27888"</span>
<span class="p">}]</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">test__stats</span><span class="p">():</span>
<span class="n">ds</span> <span class="o">=</span> <span class="n">DataStats</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">ds</span><span class="o">.</span><span class="n">_stats</span><span class="p">(</span><span class="n">test_data</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">20000</span><span class="p">)</span> <span class="o">==</span> <span class="p">{</span>
<span class="s1">'avg_age'</span><span class="p">:</span> <span class="mi">62</span><span class="p">,</span>
<span class="s1">'avg_salary'</span><span class="p">:</span> <span class="mi">55165</span><span class="p">,</span>
<span class="s1">'avg_yearly_increase'</span><span class="p">:</span> <span class="mi">837</span><span class="p">,</span>
<span class="s1">'max_salary'</span><span class="p">:</span> <span class="p">[{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Garth"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Fields"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">70</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£70472"</span>
<span class="p">}],</span>
<span class="s1">'min_salary'</span><span class="p">:</span> <span class="p">[{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Laith"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Simmons"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">68</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£27888"</span>
<span class="p">}]</span>
<span class="p">}</span>
</code></pre></div>
<h2 id="step-4-isolate-the-average-age-algorithm">Step 4 - Isolate the average age algorithm<a class="headerlink" href="#step-4-isolate-the-average-age-algorithm" title="Permanent link">¶</a></h2>
<p>Commit: <a href="https://github.com/lgiordani/datastats/commit/9db18036eee2f6712384195fcd970303387291f6">9db1803</a></p>
<p>Isolating independent features is a key target of software design. Thus, our refactoring shall aim to disentangle the code dividing it into small separated functions.</p>
<p>The output dictionary contains five keys, and each of them corresponds to a value computed either on the fly (for <code>avg_age</code> and <code>avg_salary</code>) or by the method's code (for <code>avg_yearly_increase</code>, <code>max_salary</code>, and <code>min_salary</code>). We can start replacing the code that computes the value of each key with dedicated methods, trying to isolate the algorithms.</p>
<p>To isolate some code, the first thing to do is to duplicate it, putting it into a dedicated method. As we are refactoring with tests, the first thing is to write a test for this method.</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test__avg_age</span><span class="p">():</span>
<span class="n">ds</span> <span class="o">=</span> <span class="n">DataStats</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">ds</span><span class="o">.</span><span class="n">_avg_age</span><span class="p">(</span><span class="n">test_data</span><span class="p">)</span> <span class="o">==</span> <span class="mi">62</span>
</code></pre></div>
<p>We know that the method's output shall be <code>62</code> as that is the value we have in the output data of the original <code>stats()</code> method. Please note that there is no need to pass <code>iage</code> and <code>isalary</code> as they are not used in the refactored code.</p>
<p>The test fails, so we can dutifully go and duplicate the code we use to compute <code>'avg_age'</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_avg_age</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="k">return</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="nb">sum</span><span class="p">([</span><span class="n">e</span><span class="p">[</span><span class="s1">'age'</span><span class="p">]</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
</code></pre></div>
<p>and once the test passes we can replace the duplicated code in <code>_stats()</code> with a call to <code>_avg_age()</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">return</span> <span class="p">{</span>
<span class="s1">'avg_age'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">_avg_age</span><span class="p">(</span><span class="n">data</span><span class="p">),</span>
<span class="s1">'avg_salary'</span><span class="p">:</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span>
<span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)),</span>
<span class="s1">'avg_yearly_increase'</span><span class="p">:</span> <span class="n">yearly_avg_increase</span><span class="p">,</span>
<span class="s1">'max_salary'</span><span class="p">:</span> <span class="n">max_salary</span><span class="p">,</span>
<span class="s1">'min_salary'</span><span class="p">:</span> <span class="n">min_salary</span>
<span class="p">}</span>
</code></pre></div>
<p>Checking after that that no test is failing. Well done! We isolated the first feature, and our refactoring produced already three tests.</p>
<h2 id="step-5-isolate-the-average-salary-algorithm">Step 5 - Isolate the average salary algorithm<a class="headerlink" href="#step-5-isolate-the-average-salary-algorithm" title="Permanent link">¶</a></h2>
<p>Commit: <a href="https://github.com/lgiordani/datastats/commit/412220145ea4d7ef846b1d1f289b4ddefc4fb24b">4122201</a></p>
<p>The <code>avg_salary</code> key works exactly like the <code>avg_age</code>, with different code. Thus, the refactoring process is the same as before, and the result should be a new <code>test__avg_salary()</code> test</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test__avg_salary</span><span class="p">():</span>
<span class="n">ds</span> <span class="o">=</span> <span class="n">DataStats</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">ds</span><span class="o">.</span><span class="n">_avg_salary</span><span class="p">(</span><span class="n">test_data</span><span class="p">)</span> <span class="o">==</span> <span class="mi">55165</span>
</code></pre></div>
<p>a new <code>_avg_salary()</code> method</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_avg_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="k">return</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="nb">sum</span><span class="p">([</span><span class="nb">int</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
</code></pre></div>
<p>and a new version of the final return value</p>
<div class="highlight"><pre><span></span><code> <span class="k">return</span> <span class="p">{</span>
<span class="s1">'avg_age'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">_avg_age</span><span class="p">(</span><span class="n">data</span><span class="p">),</span>
<span class="s1">'avg_salary'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">_avg_salary</span><span class="p">(</span><span class="n">data</span><span class="p">),</span>
<span class="s1">'avg_yearly_increase'</span><span class="p">:</span> <span class="n">yearly_avg_increase</span><span class="p">,</span>
<span class="s1">'max_salary'</span><span class="p">:</span> <span class="n">max_salary</span><span class="p">,</span>
<span class="s1">'min_salary'</span><span class="p">:</span> <span class="n">min_salary</span>
<span class="p">}</span>
</code></pre></div>
<h2 id="step-6-isolate-the-average-yearly-increase-algorithm">Step 6 - Isolate the average yearly increase algorithm<a class="headerlink" href="#step-6-isolate-the-average-yearly-increase-algorithm" title="Permanent link">¶</a></h2>
<p>Commit: <a href="https://github.com/lgiordani/datastats/commit/4005145f39d36fda0519127d57e1b4099d24e72b">4005145</a></p>
<p>The remaining three keys are computed with algorithms that, being longer than one line, couldn't be squeezed directly in the definition of the dictionary. The refactoring process, however, does not really change; as before, we first test a helper method, then we define it duplicating the code, and last we call the helper removing the code duplication.</p>
<p>For the average yearly increase of the salary we have a new test</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test__avg_yearly_increase</span><span class="p">():</span>
<span class="n">ds</span> <span class="o">=</span> <span class="n">DataStats</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">ds</span><span class="o">.</span><span class="n">_avg_yearly_increase</span><span class="p">(</span><span class="n">test_data</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">20000</span><span class="p">)</span> <span class="o">==</span> <span class="mi">837</span>
</code></pre></div>
<p>a new method that passes the test</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_avg_yearly_increase</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">):</span>
<span class="c1"># iage and isalary are the starting age and salary used to</span>
<span class="c1"># compute the average yearly increase of salary.</span>
<span class="c1"># Compute average yearly increase</span>
<span class="n">average_age_increase</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span>
<span class="nb">sum</span><span class="p">([</span><span class="n">e</span><span class="p">[</span><span class="s1">'age'</span><span class="p">]</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">))</span> <span class="o">-</span> <span class="n">iage</span>
<span class="n">average_salary_increase</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span>
<span class="nb">sum</span><span class="p">([</span><span class="nb">int</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">))</span> <span class="o">-</span> <span class="n">isalary</span>
<span class="k">return</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="n">average_salary_increase</span><span class="o">/</span><span class="n">average_age_increase</span><span class="p">)</span>
</code></pre></div>
<p>and a new version of the <code>_stats()</code> method</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_stats</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">):</span>
<span class="c1"># Compute max salary</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="n">threshold</span> <span class="o">=</span> <span class="s1">'£'</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="n">salaries</span><span class="p">))</span>
<span class="n">max_salary</span> <span class="o">=</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span> <span class="n">threshold</span><span class="p">]</span>
<span class="c1"># Compute min salary</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="n">min_salary</span> <span class="o">=</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span>
<span class="s1">'£</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="nb">min</span><span class="p">(</span><span class="n">salaries</span><span class="p">)))]</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s1">'avg_age'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">_avg_age</span><span class="p">(</span><span class="n">data</span><span class="p">),</span>
<span class="s1">'avg_salary'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">_avg_salary</span><span class="p">(</span><span class="n">data</span><span class="p">),</span>
<span class="s1">'avg_yearly_increase'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">_avg_yearly_increase</span><span class="p">(</span>
<span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">),</span>
<span class="s1">'max_salary'</span><span class="p">:</span> <span class="n">max_salary</span><span class="p">,</span>
<span class="s1">'min_salary'</span><span class="p">:</span> <span class="n">min_salary</span>
<span class="p">}</span>
</code></pre></div>
<p>Please note that we are not solving any code duplication but the ones that we introduce to refactor. The first achievement we should aim to is to completely isolate independent features.</p>
<h2 id="step-7-isolate-max-and-min-salary-algorithms">Step 7 - Isolate max and min salary algorithms<a class="headerlink" href="#step-7-isolate-max-and-min-salary-algorithms" title="Permanent link">¶</a></h2>
<p>Commit: <a href="https://github.com/lgiordani/datastats/commit/17b24138e712f9174b072a579a2dfc9e2800e6ac">17b2413</a></p>
<p>When refactoring we shall always do one thing at a time, but for the sake of conciseness, I'll show here the result of two refactoring steps at once. I'll recommend the reader to perform them as independent steps, as I did when I wrote the code that I am posting below.</p>
<p>The new tests are</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test__max_salary</span><span class="p">():</span>
<span class="n">ds</span> <span class="o">=</span> <span class="n">DataStats</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">ds</span><span class="o">.</span><span class="n">_max_salary</span><span class="p">(</span><span class="n">test_data</span><span class="p">)</span> <span class="o">==</span> <span class="p">[{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Garth"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Fields"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">70</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£70472"</span>
<span class="p">}]</span>
<span class="k">def</span> <span class="nf">test__min_salary</span><span class="p">():</span>
<span class="n">ds</span> <span class="o">=</span> <span class="n">DataStats</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">ds</span><span class="o">.</span><span class="n">_min_salary</span><span class="p">(</span><span class="n">test_data</span><span class="p">)</span> <span class="o">==</span> <span class="p">[{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Laith"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Simmons"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">68</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£27888"</span>
<span class="p">}]</span>
</code></pre></div>
<p>The new methods in the <code>DataStats</code> class are</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_max_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="c1"># Compute max salary</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="n">threshold</span> <span class="o">=</span> <span class="s1">'£'</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="n">salaries</span><span class="p">))</span>
<span class="k">return</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span> <span class="n">threshold</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">_min_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="c1"># Compute min salary</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="k">return</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span>
<span class="s1">'£</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="nb">min</span><span class="p">(</span><span class="n">salaries</span><span class="p">)))]</span>
</code></pre></div>
<p>and the <code>_stats()</code> method is now really tiny</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_stats</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{</span>
<span class="s1">'avg_age'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">_avg_age</span><span class="p">(</span><span class="n">data</span><span class="p">),</span>
<span class="s1">'avg_salary'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">_avg_salary</span><span class="p">(</span><span class="n">data</span><span class="p">),</span>
<span class="s1">'avg_yearly_increase'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">_avg_yearly_increase</span><span class="p">(</span>
<span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">),</span>
<span class="s1">'max_salary'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">_max_salary</span><span class="p">(</span><span class="n">data</span><span class="p">),</span>
<span class="s1">'min_salary'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">_min_salary</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<h2 id="step-8-reducing-code-duplication">Step 8 - Reducing code duplication<a class="headerlink" href="#step-8-reducing-code-duplication" title="Permanent link">¶</a></h2>
<p>Commit: <a href="https://github.com/lgiordani/datastats/commit/b559a5c91ef58e1e734ac97b676468d09a460a45">b559a5c</a></p>
<p>Now that we have the main tests in place we can start changing the code of the various helper methods. These are now small enough to allow us to change the code without further tests. While this can be true in this case, however, in general there is no definition of what "small enough" means, as there is no real definition of what "unit test" is. Generally speaking you should be confident that the change that you are doing is covered by the tests that you have. Weren't this the case, you'd better add one or more tests until you feel confident enough.</p>
<p>The two methods <code>_max_salary()</code> and <code>_min_salary()</code> share a great deal of code, even though the second one is more concise</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_max_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="c1"># Compute max salary</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="n">threshold</span> <span class="o">=</span> <span class="s1">'£'</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="n">salaries</span><span class="p">))</span>
<span class="k">return</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span> <span class="n">threshold</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">_min_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="c1"># Compute min salary</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="k">return</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span>
<span class="s1">'£</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="nb">min</span><span class="p">(</span><span class="n">salaries</span><span class="p">)))]</span>
</code></pre></div>
<p>I'll start by making explicit the <code>threshold</code> variable in the second function. As soon as I change something, I'll run the tests to check that the external behaviour did not change.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_max_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="c1"># Compute max salary</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="n">threshold</span> <span class="o">=</span> <span class="s1">'£'</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="n">salaries</span><span class="p">))</span>
<span class="k">return</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span> <span class="n">threshold</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">_min_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="c1"># Compute min salary</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="n">threshold</span> <span class="o">=</span> <span class="s1">'£</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="nb">min</span><span class="p">(</span><span class="n">salaries</span><span class="p">)))</span>
<span class="k">return</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span> <span class="n">threshold</span><span class="p">]</span>
</code></pre></div>
<p>Now, it is pretty evident that the two functions are the same but for the <code>min()</code> and <code>max()</code> functions. They still use different variable names and different code to format the threshold, so my first action is to even out them, copying the code of <code>_min_salary()</code> to <code>_max_salary()</code> and changing <code>min()</code> to <code>max()</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_max_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="c1"># Compute max salary</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="n">threshold</span> <span class="o">=</span> <span class="s1">'£</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="n">salaries</span><span class="p">)))</span>
<span class="k">return</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span> <span class="n">threshold</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">_min_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="c1"># Compute min salary</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="n">threshold</span> <span class="o">=</span> <span class="s1">'£</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="nb">min</span><span class="p">(</span><span class="n">salaries</span><span class="p">)))</span>
<span class="k">return</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span> <span class="n">threshold</span><span class="p">]</span>
</code></pre></div>
<p>Now I can create another helper called <code>_select_salary()</code> that duplicates that code and accepts a function, used instead of <code>min()</code> or <code>max()</code>. As I did before, first I duplicate the code, and then remove the duplication by calling the new function.</p>
<p>After some passages, the code looks like this</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_select_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">func</span><span class="p">):</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
<span class="n">threshold</span> <span class="o">=</span> <span class="s1">'£</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">func</span><span class="p">(</span><span class="n">salaries</span><span class="p">)))</span>
<span class="k">return</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span> <span class="n">threshold</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">_max_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_select_salary</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="nb">max</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_min_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_select_salary</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="nb">min</span><span class="p">)</span>
</code></pre></div>
<p>I noticed then a code duplication between <code>_avg_salary()</code> and <code>_select_salary()</code></p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_avg_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="k">return</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="nb">sum</span><span class="p">([</span><span class="nb">int</span><span class="p">(</span><span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_select_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">func</span><span class="p">):</span>
<span class="n">salaries</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
</code></pre></div>
<p>and decided to extract the common algorithm in a method called <code>_salaries()</code>. As before, I write the test first</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_salaries</span><span class="p">():</span>
<span class="n">ds</span> <span class="o">=</span> <span class="n">DataStats</span><span class="p">()</span>
<span class="k">assert</span> <span class="n">ds</span><span class="o">.</span><span class="n">_salaries</span><span class="p">(</span><span class="n">test_data</span><span class="p">)</span> <span class="o">==</span> <span class="p">[</span><span class="mi">27888</span><span class="p">,</span> <span class="mi">67137</span><span class="p">,</span> <span class="mi">70472</span><span class="p">]</span>
</code></pre></div>
<p>then I implement the method</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_salaries</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
</code></pre></div>
<p>and eventually I replace the duplicated code with a call to the new method</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_salaries</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">][</span><span class="mi">1</span><span class="p">:])</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="n">data</span><span class="p">]</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_select_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">func</span><span class="p">):</span>
<span class="n">threshold</span> <span class="o">=</span> <span class="s1">'£</span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">func</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_salaries</span><span class="p">(</span><span class="n">data</span><span class="p">))))</span>
<span class="k">return</span> <span class="p">[</span><span class="n">e</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span> <span class="k">if</span> <span class="n">e</span><span class="p">[</span><span class="s1">'salary'</span><span class="p">]</span> <span class="o">==</span> <span class="n">threshold</span><span class="p">]</span>
</code></pre></div>
<p>While doing this I noticed that <code>_avg_yearly_increase()</code> contains the same code, and fix it there as well.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_avg_yearly_increase</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">):</span>
<span class="c1"># iage and isalary are the starting age and salary used to</span>
<span class="c1"># compute the average yearly increase of salary.</span>
<span class="c1"># Compute average yearly increase</span>
<span class="n">average_age_increase</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span>
<span class="nb">sum</span><span class="p">([</span><span class="n">e</span><span class="p">[</span><span class="s1">'age'</span><span class="p">]</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">data</span><span class="p">])</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">))</span> <span class="o">-</span> <span class="n">iage</span>
<span class="n">average_salary_increase</span> <span class="o">=</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span>
<span class="nb">sum</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_salaries</span><span class="p">(</span><span class="n">data</span><span class="p">))</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">))</span> <span class="o">-</span> <span class="n">isalary</span>
<span class="k">return</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="n">average_salary_increase</span><span class="o">/</span><span class="n">average_age_increase</span><span class="p">)</span>
</code></pre></div>
<p>It would be useful at this point to store the input data inside the class and to use it as <code>self.data</code> instead of passing it around to all the class's methods. This however would break the class's API, as <code>DataStats</code> is currently initialised without any data. Later I will show how to introduce changes that potentially break the API, and briefly discuss the issue. For the moment, however, I'll keep changing the class without modifying the external interface.</p>
<p>It looks like <code>age</code> has the same code duplication issues as <code>salary</code>, so with the same procedure I introduce the <code>_ages()</code> method and change the <code>_avg_age()</code> and <code>_avg_yearly_increase()</code> methods accordingly.</p>
<p>Speaking of <code>_avg_yearly_increase()</code>, the code of that method contains the code of the <code>_avg_age()</code> and <code>_avg_salary()</code> methods, so it is worth replacing it with two calls. As I am moving code between existing methods, I do not need further tests.</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_avg_yearly_increase</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">):</span>
<span class="c1"># iage and isalary are the starting age and salary used to</span>
<span class="c1"># compute the average yearly increase of salary.</span>
<span class="c1"># Compute average yearly increase</span>
<span class="n">average_age_increase</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_avg_age</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o">-</span> <span class="n">iage</span>
<span class="n">average_salary_increase</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_avg_salary</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="o">-</span> <span class="n">isalary</span>
<span class="k">return</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="n">average_salary_increase</span><span class="o">/</span><span class="n">average_age_increase</span><span class="p">)</span>
</code></pre></div>
<h2 id="step-9-advanced-refactoring">Step 9 - Advanced refactoring<a class="headerlink" href="#step-9-advanced-refactoring" title="Permanent link">¶</a></h2>
<p>Commit: <a href="https://github.com/lgiordani/datastats/commit/cc0b0a105ebc882cb73831b177e881bb65f4b491">cc0b0a1</a></p>
<p>The initial class didn't have any <code>__init__()</code> method, and was thus missing the encapsulation part of the object-oriented paradigm. There was no reason to keep the class, as the <code>stats()</code> method could have easily been extracted and provided as a plain function.</p>
<p>This is much more evident now that we refactored the method, because we have 10 methods that accept <code>data</code> as a parameter. I would be nice to load the input data into the class at instantiation time, and then access it as <code>self.data</code>. This would greatly improve the readability of the class, and also justify its existence.</p>
<p>If we introduce a <code>__init__()</code> method that requires a parameter, however, we will change the class's API, breaking the compatibility with the code that imports and uses it. Since we want to keep it, we have to devise a way to provide both the advantages of a new, clean class and of a stable API. This is not always perfectly achievable, but in this case the <a href="https://en.wikipedia.org/wiki/Adapter_pattern">Adapter design pattern</a> (also known as Wrapper) can perfectly solve the issue.</p>
<p>The goal is to change the current class to match the new API, and then build a class that wraps the first one and provides the old API. The strategy is not that different from what we did previously, only this time we will deal with classes instead of methods. With a stupendous effort of my imagination I named the new class <code>NewDataStats</code>. Sorry, sometimes you just have to get the job done.</p>
<p>The first things, as happens very often with refactoring, is to duplicate the code, and when we insert new code we need to have tests that justify it. The tests will be the same as before, as the new class shall provide the same functionalities as the previous one, so I just create a new file, called <code>test_newdatastats.py</code> and start putting there the first test <code>test_init()</code>.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">datastats.datastats</span> <span class="kn">import</span> <span class="n">NewDataStats</span>
<span class="n">test_data</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Laith"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Simmons"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">68</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£27888"</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">2</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Mikayla"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Henry"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">49</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£67137"</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s2">"id"</span><span class="p">:</span> <span class="mi">3</span><span class="p">,</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="s2">"Garth"</span><span class="p">,</span>
<span class="s2">"surname"</span><span class="p">:</span> <span class="s2">"Fields"</span><span class="p">,</span>
<span class="s2">"age"</span><span class="p">:</span> <span class="mi">70</span><span class="p">,</span>
<span class="s2">"salary"</span><span class="p">:</span> <span class="s2">"£70472"</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="k">def</span> <span class="nf">test_init</span><span class="p">():</span>
<span class="n">ds</span> <span class="o">=</span> <span class="n">NewDataStats</span><span class="p">(</span><span class="n">test_data</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">ds</span><span class="o">.</span><span class="n">data</span> <span class="o">==</span> <span class="n">test_data</span>
</code></pre></div>
<p>This test doesn't pass, and the code that implements the class is very simple</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">NewDataStats</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">data</span>
</code></pre></div>
<p>Now I can start an iterative process:</p>
<ol>
<li>I will copy one of the tests of <code>DataStats</code> and adapt it to <code>NewDataStats</code></li>
<li>I will copy some code from <code>DataStats</code> to <code>NewDataStats</code>, adapting it to the new API and making it pass the test.</li>
</ol>
<p>At this point iteratively removing methods from <code>DataStats</code> and replacing them with a call to <code>NewDataStats</code> would be overkill. I'll show you in the next section why, and what we can do to avoid that.</p>
<p>An example of the resulting tests for <code>NewDataStats</code> is the following</p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">test_ages</span><span class="p">():</span>
<span class="n">ds</span> <span class="o">=</span> <span class="n">NewDataStats</span><span class="p">(</span><span class="n">test_data</span><span class="p">)</span>
<span class="k">assert</span> <span class="n">ds</span><span class="o">.</span><span class="n">_ages</span><span class="p">()</span> <span class="o">==</span> <span class="p">[</span><span class="mi">68</span><span class="p">,</span> <span class="mi">49</span><span class="p">,</span> <span class="mi">70</span><span class="p">]</span>
</code></pre></div>
<p>and the code that passes the test is</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_ages</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">d</span><span class="p">[</span><span class="s1">'age'</span><span class="p">]</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="p">]</span>
</code></pre></div>
<p>Once finished, I noticed that, as now methods like <code>_ages()</code> do not require an input parameter any more, I can convert them to properties, changing the tests accordingly.</p>
<div class="highlight"><pre><span></span><code> <span class="nd">@property</span>
<span class="k">def</span> <span class="nf">_ages</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">d</span><span class="p">[</span><span class="s1">'age'</span><span class="p">]</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="p">]</span>
</code></pre></div>
<p>It is time to replace the methods of <code>DataStats</code> with calls to <code>NewDataStats</code>. We could do it method by method, but actually the only thing that we really need is to replace <code>stats()</code>. So the new code is</p>
<div class="highlight"><pre><span></span><code><span class="k">class</span> <span class="nc">DataStats</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">stats</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">):</span>
<span class="n">nds</span> <span class="o">=</span> <span class="n">NewDataStats</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="k">return</span> <span class="n">nds</span><span class="o">.</span><span class="n">stats</span><span class="p">(</span><span class="n">iage</span><span class="p">,</span> <span class="n">isalary</span><span class="p">)</span>
</code></pre></div>
<p>And since all the other methods are not used any more we can safely delete them, checking that the tests do not fail. Speaking of tests, removing methods will make many tests of <code>DataStats</code> fail, so we need to remove them.</p>
<h2 id="step-10-still-room-for-improvement">Step 10 - Still room for improvement<a class="headerlink" href="#step-10-still-room-for-improvement" title="Permanent link">¶</a></h2>
<p>As refactoring is an iterative process it will often happen that you think you did everything was possible, just to spot later that you missed something. In this case the missing step was spotted by Harun Yasar, who noticed another small code duplication.</p>
<p>The two functions</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_avg_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_salaries</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">_avg_age</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_ages</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="p">))</span>
</code></pre></div>
<p>share the same logic, so we can definitely isolate that and call the common code in each function</p>
<div class="highlight"><pre><span></span><code> <span class="k">def</span> <span class="nf">_floor_avg</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">sum_of_numbers</span><span class="p">):</span>
<span class="k">return</span> <span class="n">math</span><span class="o">.</span><span class="n">floor</span><span class="p">(</span><span class="n">sum_of_numbers</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">data</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">_avg_salary</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_floor_avg</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_salaries</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">_avg_age</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_floor_avg</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_ages</span><span class="p">))</span>
</code></pre></div>
<p>which passes all the tests and is thus correct.</p>
<p>Whenever I get corrected by someone who read one of my posts and just learned something new I feel so happy, because it means that the message is clear!</p>
<h2 id="final-words">Final words<a class="headerlink" href="#final-words" title="Permanent link">¶</a></h2>
<p>I hope this little tour of a refactoring session didn't result too trivial, and helped you to grasp the basic concepts of this technique. If you are interested in the subject I'd strongly recommend the classic book by Martin Fowler "Refactoring: Improving the Design of Existing Code", which is a collection of refactoring patterns. The reference language is Java, but the concepts are easily adapted to Python.</p>
<h2 id="updates">Updates<a class="headerlink" href="#updates" title="Permanent link">¶</a></h2>
<p>2017-07-28: <a href="https://github.com/delirious-lettuce">delirious-lettuce</a> and <a href="https://github.com/superbeckgit">Matt Beck</a> did a very serious proofread and spotted many typos. Thank you both for reading the post and for taking the time to submit the issues!</p>
<p>2020-02-15: <a href="https://github.com/harunyasar">Harun Yasar</a> spotted a missing refactoring in two functions. Thanks!</p>
<h2 id="feedback">Feedback<a class="headerlink" href="#feedback" title="Permanent link">¶</a></h2>
<p>Feel free to reach me on <a href="https://twitter.com/thedigicat">Twitter</a> if you have questions. The <a href="https://github.com/TheDigitalCatOnline/blog_source/issues">GitHub issues</a> page is the best place to submit corrections.</p>