The Digital Cat - refactoring

TDD in Python with pytest - Part 5

2020-09-21T10:30:00+02:00

This is the fifth and last post in the series "TDD in Python with pytest" where I develop a simple project following a strict TDD methodology. The posts come from my book Clean Architectures in Python and have been reviewed to get rid of some bad naming choices of the version published in the book.

You can find the first post here.

In this post I will conclude the discussion about mocks introducing patching.

Patching¶

Mocks are very simple to introduce in your tests whenever your objects accept classes or instances from outside. In that case, as shown in the previous sections, you just have to instantiate the class Mock and pass the resulting object to your system. However, when the external classes instantiated by your library are hardcoded this simple trick does not work. In this case you have no chance to pass a fake object instead of the real one.

This is exactly the case addressed by patching. Patching, in a testing framework, means to replace a globally reachable object with a mock, thus achieving the goal of having the code run unmodified, while part of it has been hot swapped, that is, replaced at run time.

A warm-up example¶

Clone the repository fileinfo that you can find here and move to the branch develop. As I did for the project simple_calculator, the branch master contains the full solution, and I use it to maintain the repository, but if you want to code along you need to start from scratch. If you prefer, you can clearly clone it on GitHub and make your own copy of the repository.

git clone https://github.com/lgiordani/fileinfo
cd fileinfo
git checkout --track origin/develop

Create a virtual environment following your preferred process and install the requirements

pip install -r requirements/dev.txt

You should at this point be able to run

pytest -svv

and get an output like

=============================== test session starts ===============================
platform linux -- Python XXXX, pytest-XXXX, py-XXXX, pluggy-XXXX --
fileinfo/venv3/bin/python3
cachedir: .cache
rootdir: fileinfo, inifile: pytest.ini
plugins: cov-XXXX
collected 0 items 

============================== no tests ran in 0.02s ==============================

Let us start with a very simple example. Patching can be complex to grasp at the beginning so it is better to start learning it with trivial use cases. The purpose of this library is to develop a simple class that returns information about a given file. The class shall be instantiated with the file path, which can be relative.

The starting point is the class with the method __init__. If you want you can develop the class using TDD, but for the sake of brevity I will not show here all the steps that I followed. This is the set of tests I have in tests/test_fileinfo.py

tests/test_fileinfo.py

from fileinfo.fileinfo import FileInfo


def test_init():
    filename = 'somefile.ext'
    fi = FileInfo(filename)
    assert fi.filename == filename


def test_init_relative():
    filename = 'somefile.ext'
    relative_path = '../{}'.format(filename)
    fi = FileInfo(relative_path)
    assert fi.filename == filename

and this is the code of the class FileInfo in the file fileinfo/fileinfo.py

fileinfo/fileinfo.py

import os


class FileInfo:
    def __init__(self, path):
        self.original_path = path
        self.filename = os.path.basename(path)

Git tag: first-version

As you can see the class is extremely simple, and the tests are straightforward. So far I didn't add anything new to what we discussed in the previous posts.

Now I want the method get_info to return a tuple with the file name, the original path the class was instantiated with, and the absolute path of the file. Pretending we are in the directory /some/absolute/path, the class should work as shown here

>>> fi = FileInfo('../book_list.txt')
>>> fi.get_info()
('book_list.txt', '../book_list.txt', '/some/absolute')

You can quickly realise that you have a problem writing the test. There is no way to easily test something as "the absolute path", since the outcome of the function called in the test is supposed to vary with the path of the test itself. Let us try to write part of the test

def test_get_info():
    filename = 'somefile.ext'
    original_path = '../{}'.format(filename)
    fi = FileInfo(original_path)
    assert fi.get_info() == (filename, original_path, '???')

where the '???' string highlights that I cannot put something sensible to test the absolute path of the file.

Patching is the way to solve this problem. You know that the function will use some code to get the absolute path of the file. So, within the scope of this test only, you can replace that code with something different and perform the test. Since the replacement code has a known outcome writing the test is now possible.

Patching, thus, means to inform Python that during the execution of a specific portion of the code you want a globally accessible module/object replaced by a mock. Let's see how we can use it in our example

tests/test_fileinfo.py

from unittest.mock import patch

[...]

def test_get_info():
    filename = 'somefile.ext'
    original_path = '../{}'.format(filename)

    with patch('os.path.abspath') as abspath_mock:
        test_abspath = 'some/abs/path'
        abspath_mock.return_value = test_abspath
        fi = FileInfo(original_path)
        assert fi.get_info() == (filename, original_path, test_abspath)

You clearly see the context in which the patching happens, as it is enclosed in a with statement. Inside this statement the module os.path.abspath will be replaced by a mock created by the function patch and called abspath_mock. So, while Python executes the lines of code enclosed by the statement with any call to os.path.abspath will return the object abspath_mock.

The first thing we can do, then, is to give the mock a known return_value. This way we solve the issue that we had with the initial code, that is using an external component that returns an unpredictable result. The line

tests/test_fileinfo.py

from unittest.mock import patch

[...]

def test_get_info():
    filename = 'somefile.ext'
    original_path = '../{}'.format(filename)

    with patch('os.path.abspath') as abspath_mock:
        test_abspath = 'some/abs/path'
        abspath_mock.return_value = test_abspath
        fi = FileInfo(original_path)
        assert fi.get_info() == (filename, original_path, test_abspath)

instructs the patching mock to return the given string as a result, regardless of the real values of the file under consideration.

The code that make the test pass is

fileinfo/fileinfo.py

class FileInfo:
    [...]

    def get_info(self):
        return (
            self.filename,
            self.original_path,
            os.path.abspath(self.original_path)
        )

When this code is executed by the test the function os.path.abspath is replaced at run time by the mock that we prepared there, which basically ignores the input value self.original_path and returns the fixed value it was instructed to use.

Git tag: patch-with-context-manager

It is worth at this point discussing outgoing messages again. The code that we are considering here is a clear example of an outgoing query, as the method get_info is not interested in changing the status of the external component. In the previous post we reached the conclusion that testing the return value of outgoing queries is pointless and should be avoided. With patch we are replacing the external component with something that we know, using it to test that our object correctly handles the value returned by the outgoing query. We are thus not testing the external component, as it has been replaced, and we are definitely not testing the mock, as its return value is already known.

Obviously to write the test you have to know that you are going to use the function os.path.abspath, so patching is somehow a "less pure" practice in TDD. In pure OOP/TDD you are only concerned with the external behaviour of the object, and not with its internal structure. This example, however, shows that this pure approach has some limitations that you have to cope with, and patching is a clean way to do it.

The patching decorator¶

The function patch we imported from the module unittest.mock is very powerful, as it can temporarily replace an external object. If the replacement has to or can be active for the whole test, there is a cleaner way to inject your mocks, which is to use patch as a function decorator.

This means that you can decorate the test function, passing as argument the same argument you would pass if patch was used in a with statement. This requires however a small change in the test function prototype, as it has to receive an additional argument, which will become the mock.

Let's change test_get_info, removing the statement with and decorating the function with patch

tests/test_fileinfo.py

@patch('os.path.abspath')
def test_get_info(abspath_mock):
    test_abspath = 'some/abs/path'
    abspath_mock.return_value = test_abspath

    filename = 'somefile.ext'
    original_path = '../{}'.format(filename)

    fi = FileInfo(original_path)
    assert fi.get_info() == (filename, original_path, test_abspath)

Git tag: patch-with-function-decorator

As you can see the decorator patch works like a big with statement for the whole function. The argument abspath_mock passed to the test becomes internally the mock that replaces os.path.abspath. Obviously this way you replace os.path.abspath for the whole function, so you have to decide case by case which form of the function patch you need to use.

Multiple patches¶

You can patch more that one object in the same test. For example, consider the case where the method get_info calls os.path.getsize in addition to os.path.abspath in order to return the size of the file. You have at this point two different outgoing queries, and you have to replace both with mocks to make your class work during the test.

This can be easily done with an additional patch decorator

tests/test_fileinfo.py

@patch('os.path.getsize')
@patch('os.path.abspath')
def test_get_info(abspath_mock, getsize_mock):
    filename = 'somefile.ext'
    original_path = '../{}'.format(filename)

    test_abspath = 'some/abs/path'
    abspath_mock.return_value = test_abspath

    test_size = 1234
    getsize_mock.return_value = test_size

    fi = FileInfo(original_path)
    assert fi.get_info() == (filename, original_path, test_abspath, test_size)

Please note that the decorator which is nearest to the function is applied first. Always remember that the decorator syntax with @ is a shortcut to replace the function with the output of the decorator, so two decorators result in

@decorator1
@decorator2
def myfunction():
    pass

which is a shorcut for

def myfunction():
    pass
myfunction = decorator1(decorator2(myfunction))

This explains why, in the test code, the function receives first abspath_mock and then getsize_mock. The first decorator applied to the function is the patch of os.path.abspath, which appends the mock that we call abspath_mock. Then the patch of os.path.getsize is applied and this appends its own mock.

The code that makes the test pass is

fileinfo/fileinfo.py

class FileInfo:
    [...]

    def get_info(self):
        return (
            self.filename,
            self.original_path,
            os.path.abspath(self.original_path),
            os.path.getsize(self.original_path)
        )

Git tag: multiple-patches

We can write the above test using two with statements as well

tests/test_fileinfo.py

def test_get_info():
    filename = 'somefile.ext'
    original_path = '../{}'.format(filename)

    with patch('os.path.abspath') as abspath_mock:
        test_abspath = 'some/abs/path'
        abspath_mock.return_value = test_abspath

        with patch('os.path.getsize') as getsize_mock:
            test_size = 1234
            getsize_mock.return_value = test_size

            fi = FileInfo(original_path)
            assert fi.get_info() == (
                filename,
                original_path,
                test_abspath,
                test_size
            )

Using more than one with statement, however, makes the code difficult to read, in my opinion, so in general I prefer to avoid complex with trees if I do not really need to use a limited scope of the patching.

Checking call parameters¶

When you patch, your internal algorithm is not executed, as the patched method just return the values it has been instructed to return. This is connected to what we said about testing external systems, so everything is good, but while we don't want to test the internals of the module os.path, we want to be sure that we are passing the correct values to the external methods.

This is why mocks provide methods like assert_called_with (and other similar methods), through which we can check the values passed to a patched method when it is called. Let's add the checks to the test

tests/test_fileinfo.py

@patch('os.path.getsize')
@patch('os.path.abspath')
def test_get_info(abspath_mock, getsize_mock):
    test_abspath = 'some/abs/path'
    abspath_mock.return_value = test_abspath

    filename = 'somefile.ext'
    original_path = '../{}'.format(filename)

    test_size = 1234
    getsize_mock.return_value = test_size

    fi = FileInfo(original_path)
    info = fi.get_info() 

    abspath_mock.assert_called_with(original_path)
    getsize_mock.assert_called_with(original_path)
    assert info == (filename, original_path, test_abspath, test_size)

As you can see, I first invoke fi.get_info storing the result in the variable info, check that the patched methods have been called witht the correct parameters, and then assert the format of its output.

The test passes, confirming that we are passing the correct values.

Git tag: addding-checks-for-input-values

Patching immutable objects¶

The most widespread version of Python is CPython, which is written, as the name suggests, in C. Part of the standard library is also written in C, while the rest is written in Python itself.

The objects (classes, modules, functions, etc.) that are implemented in C are shared between interpreters, and this requires those objects to be immutable, so that you cannot alter them at runtime from a single interpreter.

An example of this immutability can be given easily using a Python console

>>> a = 1
>>> a.conjugate = 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'int' object attribute 'conjugate' is read-only

Here I'm trying to replace a method with an integer, which is pointless per se, but clearly shows the issue we are facing.

What has this immutability to do with patching? What patch does is actually to temporarily replace an attribute of an object (method of a class, class of a module, etc.), which also means that if we try to replace an attribute in an immutable object the patching action will fail.

A typical example of this problem is the module datetime, which is also one of the best candidates for patching, since the output of time functions is by definition time-varying.

Let me show the problem with a simple class that logs operations. I will temporarily break the TDD methodology writing first the class and then the tests, so that you can appreciate the problem.

Create a file called logger.py and put there the following code

fileinfo/logger.py

import datetime


class Logger:
    def __init__(self):
        self.messages = []

    def log(self, message):
        self.messages.append((datetime.datetime.now(), message))

This is pretty simple, but testing this code is problematic, because the method log produces results that depend on the actual execution time. The call to datetime.datetime.now is however an outgoing query, and as such it can be replaced by a mock with patch.

If we try to do it, however, we will have a bitter surprise. This is the test code, that you can put in tests/test_logger.py

tests/test_logger.py

from unittest.mock import patch

from fileinfo.logger import Logger


@patch('datetime.datetime.now')
def test_log(mock_now):
    test_now = 123
    test_message = "A test message"
    mock_now.return_value = test_now

    test_logger = Logger()
    test_logger.log(test_message)
    assert test_logger.messages == [(test_now, test_message)]

When you try to execute this test you will get the following error

TypeError: can't set attributes of built-in/extension type 'datetime.datetime'

which is raised because patching tries to replace the function now in datetime.datetime with a mock, and since the module is immutable this operation fails.

Git tag: initial-logger-not-working

There are several ways to address this problem. All of them, however, start from the fact that importing or subclassing an immutable object gives you a mutable "copy" of that object.

The easiest example in this case is the module datetime itself. In the function test_log we tried to patch directly the object datetime.datetime.now, affecting the builtin module datetime. The file logger.py, however, does import datetime, so this latter becomes a local symbol in the module logger. This is exactly the key for our patching. Let us change the code to

tests/test_logger.py

@patch('fileinfo.logger.datetime.datetime')
def test_log(mock_datetime):
    test_now = 123
    test_message = "A test message"
    mock_datetime.now.return_value = test_now

    test_logger = Logger()
    test_logger.log(test_message)
    assert test_logger.messages == [(test_now, test_message)]

Git tag: correct-patching

If you run the test now, you can see that the patching works. What we did was to inject our mock in fileinfo.logger.datetime.datetime instead of datetime.datetime.now. Two things changed, thus, in our test. First, we are patching the module imported in the file logger.py and not the module provided globally by the Python interpreter. Second, we have to patch the whole module because this is what is imported by the file logger.py. If you try to patch fileinfo.logger.datetime.datetime.now you will find that it is still immutable.

Another possible solution to this problem is to create a function that invokes the immutable object and returns its value. This last function can be easily patched, because it just uses the builtin objects and thus is not immutable. This solution, however, requires changing the source code to allow testing, which is far from being optimal. Obviously it is better to introduce a small change in the code and have it tested than to leave it untested, but whenever is possible I try as much as possible to avoid solutions that introduce code which wouldn't be required without tests.

Mocks and proper TDD¶

Following a strict TDD methodology means writing a test before writing the code that passes that test. This can be done because we use the object under test as a black box, interacting with it through its API, and thus not knowing anything of its internal structure.

When we mock systems we break this assumption. In particular we need to open the black box every time we need to patch an hardcoded external system. Let's say, for example, that the object under test creates a temporary directory to perform some data processing. This is a detail of the implementation and we are not supposed to know it while testing the object, but since we need to mock the file creation to avoid interaction with the external system (storage) we need to become aware of what happens internally.

This also means that writing a test for the object before writing the implementation of the object itself is difficult. Pretty often, thus, such objects are built with TDD but iteratively, where mocks are introduced after the code has been written.

While this is a violation of the strict TDD methodology, I don't consider it a bad practice. TDD helps us to write better code consistently, but good code can be written even without tests. The real outcome of TDD is a test suite that is capable of detecting regressions or the removal of important features in the future. This means that breaking strict TDD for a small part of the code (patching objects) will not affect the real result of the process, only change the way we achieve it.

A warning¶

Mocks are a good way to approach parts of the system that are not under test but that are still part of the code that we are running. This is particularly true for parts of the code that we wrote, which internal structure is ultimately known. When the external system is complex and completely detached from our code, mocking starts to become complicated and the risk is that we spend more time faking parts of the system than actually writing code.

In this cases we definitely crossed the barrier between unit testing and integration testing. You may see mocks as the bridge between the two, as they allow you to keep unit-testing parts that are naturally connected ("integrated") with external systems, but there is a point where you need to recognise that you need to change approach.

This threshold is not fixed, and I can't give you a rule to recognise it, but I can give you some advice. First of all keep an eye on how many things you need to mock to make a test run, as an increasing number of mocks in a single test is definitely a sign of something wrong in the testing approach. My rule of thumb is that when I have to create more than 3 mocks, an alarm goes off in my mind and I start questioning what I am doing.

The second advice is to always consider the complexity of the mocks. You may find yourself patching a class but then having to create monsters like cls_mock().func1().func2().func3.assert_called_with(x=42) which is a sign that the part of the system that you are mocking is deep into some code that you cannot really access, because you don't know it's internal mechanisms.

The third advice is to consider mocks as "hooks" that you throw at the external system, and that break its hull to reach its internal structure. These hooks are obviously against the assumption that we can interact with a system knowing only its external behaviour, or its API. As such, you should keep in mind that each mock you create is a step back from this perfect assumption, thus "breaking the spell" of the decoupled interaction. Doing this makes it increasingly complex to create mocks, and this will contribute to keep you aware of what you are doing (or overdoing).

Final words¶

Mocks are a very powerful tool that allows us to test code that contains outgoing messages. In particular they allow us to test the arguments of outgoing commands. Patching is a good way to overcome the fact that some external components are hardcoded in our code and are thus unreachable through the arguments passed to the classes or the methods under analysis.

Updates¶

2021-03-06 GitHub user 4myhw spotted an inconsistency between the code on GitHub and the code in the post. Thanks!

2022-11-19 GitHub user rioj7 found and corrected a typo. Thanks!

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

TDD in Python with pytest - Part 4

2020-09-17T11:30:00+02:00

This is the fourth post in the series "TDD in Python with pytest" where I develop a simple project following a strict TDD methodology. The posts come from my book Clean Architectures in Python and have been reviewed to get rid of some bad naming choices of the version published in the book.

You can find the first post here.

In this post I will discuss a very interesting and useful testing tool: mocks.

Basic concepts¶

As we saw in the previous post the relationship between the component that we are testing and other components of the system can be complex. Sometimes idempotency and isolation are not easy to achieve, and testing outgoing commands requires to check the parameters sent to the external component, which is not trivial.

The main difficulty comes from the fact that your code is actually using the external system. When you run it in production the external system will provide the data that your code needs and the whole process can work as intended. During testing, however, you don't want to be bound to the external system, for the reasons explained in the previous post, but at the same time you need it to make your code work.

So, you face a complex issue. On the one hand your code is connected to the external system (be it hardcoded or chosen programmatically), but on the other hand you want it to run without the external system being active (or even present).

This problem can be solved with the use of mocks. A mock, in the testing jargon, is an object that simulates the behaviour of another (more complex) object. Wherever your code connects to an external system, during testing you can replace the latter with a mock, pretending the external system is there and properly checking that your component behaves like intended.

First steps¶

Let us try and work with a mock in Python and see what it can do. First of all fire up a Python shell and import the library

>>> from unittest import mock

The main object that the library provides is Mock and you can instantiate it without any argument

>>> m = mock.Mock()

This object has the peculiar property of creating methods and attributes on the fly when you require them. Let us first look inside the object to get an idea of what it provides

>>> dir(m)
[
    'assert_any_call', 'assert_called_once_with',
    'assert_called_with', 'assert_has_calls',
    'attach_mock', 'call_args', 'call_args_list',
    'call_count', 'called', 'configure_mock',
    'method_calls', 'mock_add_spec', 'mock_calls',
    'reset_mock', 'return_value', 'side_effect'
]

As you can see there are some methods which are already defined into the object Mock. Let's try to read a non-existent attribute

>>> m.some_attribute
<Mock name='mock.some_attribute' id='140222043808432'>
>>> dir(m)
[
    'assert_any_call', 'assert_called_once_with',
    'assert_called_with', 'assert_has_calls',
    'attach_mock', 'call_args', 'call_args_list',
    'call_count', 'called', 'configure_mock',
    'method_calls', 'mock_add_spec', 'mock_calls',
    'reset_mock', 'return_value', 'side_effect',
    'some_attribute'
]

As you can see this class is somehow different from what you are used to. First of all, its instances do not raise an AttributeError when asked for a non-existent attribute, but they happily return another instance of Mock itself. Second, the attribute you tried to access has now been created inside the object and accessing it returns the same mock object as before.

>>> m.some_attribute
<Mock name='mock.some_attribute' id='140222043808432'>

Mock objects are callables, which means that they may act both as attributes and as methods. If you try to call the mock, it just returns another mock with a name that includes parentheses to signal its callable nature

>>> m.some_attribute()
<Mock name='mock.some_attribute()' id='140247621475856'>

As you can understand, such objects are the perfect tool to mimic other objects or systems, since they may expose any API without raising exceptions. To use them in tests, however, we need them to behave just like the original, which implies returning sensible values or performing real operations.

Simple return values¶

The simplest thing a mock can do for you is to return a given value every time you call one of its methods. This is configured setting the attribute return_value of a mock object

>>> m.some_attribute.return_value = 42
>>> m.some_attribute()
42

Now, as you can see the object does not return a mock object any more, instead it just returns the static value stored in the attribute return_value. Since in Python everything is an object you can return here any type of value: simple types like an integer of a string, more complex structures like dictionaries or lists, classes that you defined, instances of those, or functions.

Pay attention that what the mock returns is exactly the object that it is instructed to use as return value. If the return value is a callable such as a function, calling the mock will return the function itself and not the result of the function. Let me give you an example

>>> def print_answer():
...  print("42")
... 
>>> 
>>> m.some_attribute.return_value = print_answer
>>> m.some_attribute()
<function print_answer at 0x7f8df1e3f400>

As you can see calling some_attribute just returns the value stored in return_value, that is the function itself. This is not exactly what we were aiming for. To make the mock call the object that we use as a return value we have to use a slightly more complex attribute called side_effect.

Complex return values¶

The side_effect parameter of mock objects is a very powerful tool. It accepts three different flavours of objects: callables, iterables, and exceptions, and changes its behaviour accordingly.

If you pass an exception the mock will raise it

>>> m.some_attribute.side_effect = ValueError('A custom value error')
>>> m.some_attribute()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/unittest/mock.py", line 939, in __call__
    return _mock_self._mock_call(*args, **kwargs)
  File "/usr/lib/python3.6/unittest/mock.py", line 995, in _mock_call
    raise effect
ValueError: A custom value error

If you pass an iterable, such as for example a generator, a plain list, tuple, or similar objects, the mock will yield the values of that iterable, i.e. return every value contained in the iterable on subsequent calls of the mock.

>>> m.some_attribute.side_effect = range(3)
>>> m.some_attribute()
0
>>> m.some_attribute()
1
>>> m.some_attribute()
2
>>> m.some_attribute()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/unittest/mock.py", line 939, in __call__
    return _mock_self._mock_call(*args, **kwargs)
  File "/usr/lib/python3.6/unittest/mock.py", line 998, in _mock_call
    result = next(effect)
StopIteration

As promised, the mock just returns every object found in the iterable (in this case a range object) one at a time until the generator is exhausted. According to the iterator protocol once every item has been returned the object raises the StopIteration exception, which means that you can safely use it in a loop.

Last, if you feed side_effect a callable, the latter will be executed with the parameters passed when calling the attribute. Let's consider again the simple example given in the previous section

>>> def print_answer():
...     print("42")       
>>> m.some_attribute.side_effect = print_answer
>>> m.some_attribute()
42

A slightly more complex example is that of a function with arguments

>>> def print_number(num):
...     print("Number:", num)
... 
>>> m.some_attribute.side_effect = print_number
>>> m.some_attribute(5)
Number: 5

As you can see the arguments passed to the attribute are directly used as arguments for the stored function. This is very powerful, especially if you stop thinking about "functions" and start considering "callables". Indeed, given the nature of Python objects we know that instantiating an object is not different from calling a function, which means that side_effect can be given a class and return a instance of it

>>> class Number:
...     def __init__(self, value):
...         self._value = value
...     def print_value(self):
...         print("Value:", self._value)
... 
>>> m.some_attribute.side_effect = Number
>>> n = m.some_attribute(26)
>>> n
<__main__.Number object at 0x7f8df1aa4470>
>>> n.print_value()
Value: 26

Asserting calls¶

As I explained in the previous post outgoing commands shall be tested checking the correctness of the message argument. This can be easily done with mocks, as these objects record every call that they receive and the arguments passed to it.

Let's see a practical example

from unittest import mock
import myobj


def test_connect():
    external_obj = mock.Mock()

    myobj.MyObj(external_obj)

    external_obj.connect.assert_called_with()

Here, the class myobj.MyObj needs to connect to an external object, for example a remote repository or a database. The only thing we need to know for testing purposes is if the class called the method connect of the external object without any parameter.

So the first thing we do in this test is to instantiate the mock object. This is a fake version of the external object, and its only purpose is to accept calls from the object MyObj under test and possibly return sensible values. Then we instantiate the class MyObj passing the external object. We expect the class to call the method connect so we express this expectation calling external_obj.connect.assert_called_with.

What happens behind the scenes? The class MyObj receives the fake external object and somewhere in its initialization process calls the method connect of the mock object. This call creates the method itself as a mock object. This new mock records the parameters used to call it and the subsequent call to its method assert_called_with checks that the method was called and that no parameters were passed.

In this case an object like

class MyObj():
    def __init__(self, repo):
        repo.connect()

would pass the test, as the object passed as repo is a mock that does nothing but record the calls. As you can see, the method __init__ actually calls repo.connect, and repo is expected to be a full-featured external object that provides connect in its API. Calling repo.connect when repo is a mock object, instead, silently creates the method (as another mock object) and records that the method has been called once without arguments.

The method assert_called_with allows us to also check the parameters we passed when calling. To show this let us pretend that we expect the method MyObj.setup to call setup(cache=True, max_connections=256) on the external object. Remember that this is an outgoing command, so we are interested in checking the parameters and not the result.

The new test can be something like

def test_setup():
    external_obj = mock.Mock()
    obj = myobj.MyObj(external_obj)
    obj.setup()
    external_obj.setup.assert_called_with(cache=True, max_connections=256)

In this case an object that passes the test can be

class MyObj():
    def __init__(self, repo):
        self._repo = repo
        repo.connect()

    def setup(self):
        self._repo.setup(cache=True, max_connections=256)

If we change the method setup to

    def setup(self):
        self._repo.setup(cache=True)

the test will fail with the following error

E           AssertionError: Expected call: setup(cache=True, max_connections=256)
E           Actual call: setup(cache=True)

Which I consider a very clear explanation of what went wrong during the test execution.

As you can read in the official documentation, the object Mock provides other methods and attributes, like assert_called_once_with, assert_any_call, assert_has_calls, assert_not_called, called, call_count, and many others. Each of those explores a different aspect of the mock behaviour concerning calls. Make sure to read their description and go through the examples.

A simple example¶

To learn how to use mocks in a practical case, let's work together on a new module in the simple_calculator package. The target is to write a class that downloads a JSON file with data on meteorites and computes some statistics on the dataset using the class SimpleCalculator. The file is provided by NASA at this URL.

The class contains a method get_data that queries the remote server and returns the data, and a method average_mass that uses the method SimpleCalculator.avg to compute the average mass of the meteorites and return it. In a real world case, like for example in a scientific application, I would probably split the class in two. One class manages the data, updating it whenever it is necessary, and another one manages the statistics. For the sake of simplicity, however, I will keep the two functionalities together in this example.

Let's see a quick example of what is supposed to happen inside our code. An excerpt of the file provided from the server is

[
    {
        "fall": "Fell",
        "geolocation": {
            "type": "Point",
            "coordinates": [6.08333, 50.775]
        },
        "id":"1",
        "mass":"21",
        "name":"Aachen",
        "nametype":"Valid",
        "recclass":"L5",
        "reclat":"50.775000",
        "reclong":"6.083330",
        "year":"1880-01-01T00:00:00.000"
    },
    {
        "fall": "Fell",
        "geolocation": {
            "type": "Point",
            "coordinates": [10.23333, 56.18333]
        },
        "id":"2",
        "mass":"720",
        "name":"Aarhus",
        "nametype":"Valid",
        "recclass":"H6",
        "reclat":"56.183330",
        "reclong":"10.233330",
        "year":"1951-01-01T00:00:00.000"
    }
]

So a good way to compute the average mass of the meteorites is

import urllib.request
import json

from simple_calculator.main import SimpleCalculator

URL = ("https://data.nasa.gov/resource/y77d-th95.json")

with urllib.request.urlopen(URL) as url:
    data = json.loads(url.read().decode())

masses = [float(d['mass']) for d in data if 'mass' in d]

print(masses)

calculator = SimpleCalculator()

avg_mass = calculator.avg(masses)

print(avg_mass)

Where the list comprehension filters out those elements which do not have a attribute mass. This code returns the value 50190.19568930039, so that is the average mass of the meteorites contained in the file.

Now we have a proof of concept of the algorithm, so we can start writing the tests. We might initially come up with a simple solution like

def test_average_mass():
    metstats = MeteoriteStats()

    data = metstats.get_data()

    assert metstats.average_mass(data) == 50190.19568930039

This little test contains, however, two big issues. First of all the method get_data is supposed to use the Internet connection to get the data from the server. This is a typical example of an outgoing query, as we are not trying to change the state of the web server providing the data. You already know that you should not test the return value of an outgoing query, but you can see here why you shouldn't use real data when testing either. The data coming from the server can change in time, and this can invalidate your tests.

Testing such a case becomes very simple with mocks. Since the class has a public method get_data that interacts with the external component, it is enough to temporarily replace it with a mock that provides sensible values. Create the file tests/test_meteorites.py and put this code in it

tests/test_meteorites.py

from unittest import mock

from simple_calculator.meteorites import MeteoriteStats


def test_average_mass():
    metstats = MeteoriteStats()

    metstats.get_data = mock.Mock()
    metstats.get_data.return_value = [
        {
            "fall": "Fell",
            "geolocation": {
                "type": "Point",
                "coordinates": [6.08333, 50.775]
            },
            "id":"1",
            "mass":"21",
            "name":"Aachen",
            "nametype":"Valid",
            "recclass":"L5",
            "reclat":"50.775000",
            "reclong":"6.083330",
            "year":"1880-01-01T00:00:00.000"},
        {
            "fall": "Fell",
            "geolocation": {
                "type": "Point",
                "coordinates": [10.23333, 56.18333]
            },
            "id":"2",
            "mass":"720",
            "name":"Aarhus",
            "nametype":"Valid",
            "recclass":"H6",
            "reclat":"56.183330",
            "reclong":"10.233330",
            "year":"1951-01-01T00:00:00.000"
        }
    ]

    result = metstats.average_mass(metstats.get_data())

    assert result == 370.5

When we run this test we are not testing that the external server provides the correct data. We are testing the process implemented by average_mass, feeding the algorithm some known input. This is not different from the first tests that we implemented: in that case we were testing an addition, here we are testing a more complex algorithm, but the concept is the same.

We can now write a class that passes this test. Put the following code in simple_calculator/meteorites.py alongside with main.py

simple_calculator/meteorites.py

import urllib.request
import json

from simple_calculator.main import SimpleCalculator

URL = ("https://data.nasa.gov/resource/y77d-th95.json")


class MeteoriteStats:
    def get_data(self):
        with urllib.request.urlopen(URL) as url:
            return json.loads(url.read().decode())

    def average_mass(self, data):
        calculator = SimpleCalculator()

        masses = [float(d['mass']) for d in data if 'mass' in d]

        return calculator.avg(masses)

As you can see the class contains the code we wrote as a proof of concept, slightly reworked to match the methods we used in the test. Run the test suite now, and you will see that the latest test we wrote passes.

Please note that we are not testing the method get_data. That method uses the function urllib.request.urlopen that opens an Internet connection without passing through any other public object that we can replace at run time during the test. We need then a tool to replace internal parts of our objects when we run them, and this is provided by patching, which will be the topic of the next post.

Git tag: meteoritestats-class-added

Final words¶

Mocks are very important, and as a Python programmer you need to know the subtleties of their implementation. Aside from the technical details, however, I believe it is mandatory to master the different types of tests that I discussed in the previous post, and to learn when to use simple assertions and when to pull a bigger gun like a mock object.

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

TDD in Python with pytest - Part 3

2020-09-15T08:00:00+02:00

This is the third post in the series "TDD in Python from scratch" where I develop a simple project following a strict TDD methodology. The posts come from my book Clean Architectures in Python and have been reviewed to get rid of some bad naming choices of the version published in the book.

What I introduced in the previous two posts is commonly called "unit testing", since it focuses on testing a single and very small unit of code. As simple as it may seem, the TDD process has some caveats that are worth being discussed. In this chapter I discuss some aspects of TDD and unit testing that I consider extremely important.

Tests should be fast¶

You will run your tests many times, potentially you should run them every time you save your code. Your tests are the watchdogs of your code, the dashboard warning lights that signal a correct status or some malfunction. This means that your testing suite should be fast. If you have to wait minutes for each execution to finish, chances are that you will end up running your tests only after some long coding session, which means that you are not using them as guides.

It's true however that some tests may be intrinsically slow, or that the test suite might be so big that running it would take an amount of time which makes continuous testing uncomfortable. In this case you should identify a subset of tests that run quickly and that can show you if something is not working properly, the so-called "smoke tests", and leave the rest of the suite for longer executions that you run less frequently. Typically, the library part of your project has tests that run very quickly, as testing functions does not require specific set-ups, while the user interface tests (be it a CLI or a GUI) are usually slower. If your tests are well-structured you can also run just the tests that are connected with the subsystem that you are dealing with.

Tests should be idempotent¶

Idempotency in mathematics and computer science identifies processes that can be run multiple times without changing the status of the system. Since this latter doesn't change, the tests can be run in whichever order without changing their results. If a test interacts with an external system leaving it in a different state you will have random failures depending on the execution order.

The typical example is when you interact with the filesystem in your tests. A test may create a file and not remove it, and this makes another test fail because the file already exists, or because the directory is not empty. Whatever you do while interacting with external systems has to be reverted after the test. If you run your tests concurrently, however, even this precaution is not enough.

This poses a big problem, as interacting with external systems is definitely to be considered dangerous. Mocks, introduced in the next chapter, are a very good tool to deal with this aspect of testing.

Tests should be isolated¶

In computer science isolation means that a component shall not change its behaviour depending on something that happens externally. In particular it shouldn't be affected by the execution of other components in the system (spatial isolation) and by the previous execution of the component itself (temporal isolation). Each test should run as much as possible in an isolated universe.

While this is easy to achieve for small components, like we did with the class SimpleCalculator, it might be almost impossible to do in more complex cases. Whenever you write a routine that deals with time, for example, be it the current date or a time interval, you are faced with something that flows incessantly and that cannot be stopped or slowed down. This is also true in other cases, for example if you are testing a routine that accesses an external service like a website. If the website is not reachable the test will fail, but this failure comes from an external source, not from the code under test.

Mocks or fake objects are a good tool to enforce isolation in tests that need to communicate with external actors in the system.

External systems¶

It is important to understand that the above definitions (idempotency, isolation) depend on the scope of the test. You should consider external whatever part of the system is not directly involved in the test, even though you need to use it to run the test itself. You should also try to reduce the scope of the test as much as possible.

Let me give you an example. Consider a web application and imagine a test that checks that a user can log in. The login process involves many layers: the user inputs, the username and the password in a GUI and submits the form, the GUI communicates with the core of the application that finds the user in the DB and checks the password hash against the one stored there, then sends back a message that grants access to the user, and the GUI stores a cookie to keep the user logged in. Suppose now that the test fails. Where is the error? Is it in the query that retrieves the user from the DB? Or in the routine that hashes the password? Or is it just an issue in the connectivity between the application and the database?

As you can see there are too many possible points of failure. While this is a perfectly valid integration test, it is definitely not a unit test. Unit tests try to test the smallest possible units of code in your system, usually simple routines like functions or object methods. Integration tests, instead, put together whole systems that have already been tested and test that they can work together.

Too many times developers confuse integration tests with unit tests. One simple example: every time a web framework makes you test your models against a real database you are mixing a unit test (the methods of the model object work) with an integration one (the model object connects with the database and can store/retrieve data). You have to learn how to properly identify what is external to your system in the scope of a given test, so your tests can be focused and small.

Focus on messages¶

I will never recommend enough Sandi Metz's talk "The Magic Tricks of Testing" where she considers the different messages that a software component has to deal with. She comes up with 3 different origins for messages (incoming, sent to self, and outgoing) and 2 types (query and command). The very interesting conclusion she reaches is that you should only test half of them, and I believe this is one of the most useful results you can learn as a software developer. In this section I will shamelessly start from Sandi Metz's categorisations and give a personal view of the matter. I absolutely recommend to watch the original talk as it is both short and very effective.

Testing is all about the behaviour of a component when it is used, i.e. when it is connected to other components that interact with it. This interaction is well represented by the word "message", which has hereafter the simple meaning of "data exchanged between two actors".

We can then classify the interactions happening in our system, and thus to our components, by flow and by type (Sandi Metz speaks of origin and type).

Message flow¶

The flow is defined as the tuple (source, destination), that is where the message comes from and what is its destination. There are three different combinations that we are interested in: (outside, self), (self, self), and (self, outside), where self is the object we are testing, and outside is a generic object that lives in the system. There is a fourth combination, (outside, outside) that is not relevant for the testing, since it doesn't involve the object under analysis.

So (outside, self) contains all the messages that other parts of the system send to our component. These messages correspond to the public API of the component, that is the set of entry points the component makes available to interact with it. Notable examples are the public methods of an object in an object-oriented programming language or the HTTP endpoints of a Web application. This flow represents the incoming messages.

At the opposite side of the spectrum there is (self, outside), which is the set of messages that the component under test sends to other parts of the system. These are for example the external calls that an object does to a library or to other objects, or the API of other applications we rely on, like databases or Web applications. This flow describes all the outgoing messages.

Between the two there is (self, self), which identifies the messages that the component sends to itself, i.e. the use that the component does of its own internal API. This can be the set of private methods of an object or the business logic inside a Web application. The important thing about this last case is that while the component is seen as a black box by the rest of the system it actually has an internal structure and it uses it to run. This flow contains all the private messages.

Message type¶

Messages can be further divided according to the interaction the source requires to have with the target: queries and commands. Queries are messages that do not change the status of the component, they just extract information. The class SimpleCalculator that we developed in the previous section is a typical example of object that exposes query methods. Adding two numbers doesn't change the status of the object, and you will receive the same answer every time you call the method add.

Commands are the opposite. They do not extract any information, but they change the status of the object. A method of an object that increases an internal counter or a method that adds values to an array are perfect examples of commands.

It's perfectly normal to combine a query and a command in a single message, as long as you are aware that your message is changing the status of the component. Remember that changing the status is something that can have concrete secondary effect.

The testing grid¶

Combining 3 flows and 2 message types we get 6 different message cases that involve the component under testing. For each one of this cases we have to decide how to test the interaction represented by that flow and message type.

Incoming queries¶

An incoming query is a message that an external actor sends to get a value from your component. Testing this behaviour is straightforward, as you just need to write a test that sends the message and makes an assertion on the returned value. A concrete example of this is what we did to test the method add of SimpleCalculator.

Incoming commands¶

An incoming command comes from an external actor that wants to change the status of the system. There should be a way for an external actor to check the status, which translates into the need of having either a companion incoming query message that allows to extract the status (or at least the part of the status affected by the command), or the knowledge that the change is going to affect the behaviour of another query. A simple example might be a method that sets the precision (number of digits) of the division in the object SimpleCalculator. Setting that value changes the result of a query, which can be used to test the effect of the incoming command.

Private queries¶

A private query is a message that the component sends to self to get a value without affecting its own state, and it is basically nothing more than an explicit use of some internal logic. This happens often in object-oriented languages because you extracted some common logic from one or more methods of an object and created a private method to avoid duplication.

Since private queries use the internal logic you shouldn't test them. This might be surprising, as private methods are code, and code should be tested, but remember that other methods are calling them, so the effects of that code are not invisible, they are tested by the tests of the public entry points, although indirectly. The only effect you would achieve by testing private methods is to lock the tests to the internal implementation of the component, which by definition shouldn't be used by anyone outside of the component itself. This in turn, makes refactoring painful, because you have to keep redundant tests in sync with the changes that you do, instead of using them as a guide for the code changes like TDD wants you to do.

As Sandi Metz says, however, this is not an inflexible rule. Whenever you see that testing an internal method makes the structure more robust feel free to do it. Be aware that you are locking the implementation, so do it only where it makes a real difference businesswise.

Private commands¶

Private commands shouldn't be treated differently than private queries. They change the status of the component, but this is again part of the internal logic of the component itself, so you shouldn't test private commands either. As stated for private queries, feel free to do it if this makes a real difference.

Outgoing queries and commands¶

An outgoing query is a message that the component under testing sends to an external actor asking for a value, without changing the status of the actor itself. The correctness of the returned value, given the inputs, is not part of what you want to test, because that is an incoming query for the external actor. Let me repeat this: you don't want to test that the external actor return the correct value given some inputs.

This is perhaps one of the biggest mistakes that programmers make when they test their applications. Definitely it is a mistake that I made many times. We tend to introduce tests that, starting from the code of our component, end up testing different components.

Outgoing commands are messages sent to external actors in order to change their state. Since our component sends such messages to cause an effect in another part of the system we have to be sure that the sent values are correct. We do not want to test that the state of the external actor change accordingly, as this is part of the testing suite of the external actor itself (incoming command).

From this consideration it is evident that you shouldn't test the results of any outgoing query or command. Possibly, you should avoid running them at all, otherwise you will need the external system to be up and running when you run the test suite.

We want to be sure, however, that our component uses the API of the external actor in a proper way and the standard technique to test this is to use mocks, that is components that simulate other components. Mocks are an important tool in the TDD methodology and for this reason they are the topic of the next chapter.

| Flow     | Type    | Test? |
|----------|---------|-------|
| Incoming | Query   | Yes   |
| Incoming | Command | Yes   |
| Private  | Query   | Maybe |
| Private  | Command | Maybe |
| Outgoing | Query   | Mock  |
| Outgoing | Command | Mock  |

Final words¶

Since the discovery of TDD few things changed the way I write code more than these considerations on what I am supposed to test. Out of 6 different types of tests we discovered that 2 shouldn't be tested, 2 of them require a very simple technique based on assertions, and the last 2 are the only ones that requires an advanced technique (mocks). This should cheer you up, as for once a good methodology doesn't add new rules and further worries, but removes one third of them, even forbidding you to implement them!

In the next two posts I will discuss mocks and patches, two very important testing tools to have in your belt.

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

TDD in Python with pytest - Part 2

2020-09-11T10:30:00+02:00

This is the second post in the series TDD in Python with pytest where I develop a simple project following a strict TDD methodology. The posts come from my book Clean Architectures in Python and have been reviewed to get rid of some bad naming choices of the version published in the book.

You can find the first post here.

Step 7 - Division¶

The requirements state that there shall be a division function, and that it has to return a float value. This is a simple condition to test, as it is sufficient to divide two numbers that do not give an integer result

tests/test_main.py

def test_div_two_numbers_float():
    calculator = SimpleCalculator()

    result = calculator.div(13, 2)

    assert result == 6.5

The test suite fails with the usual error that signals a missing method. The implementation of this function is very simple as the operator / in Python performs a float division

simple_calculator/main.py

class SimpleCalculator:
    [...]

    def div(self, a, b):
        return a / b

Git tag: step-7-float-division

If you run the test suite again all the test should pass. There is a second requirement about this operation, however, that states that division by zero shall return inf.

I already mentioned in the previous post that this is not a good requirement, and please don't go around telling people that I told you to create function that return either floats or strings. This is a simple requirement that I will use to show you how to deal with exceptions.

The test that comes from the requirement is simple

tests/test_main.py

def test_div_by_zero_returns_inf():
    calculator = SimpleCalculator()

    result = calculator.div(5, 0)

    assert result == float('inf')

And the test suite fails now with this message

__________________________ test_div_by_zero_returns_inf ___________________________

    def test_div_by_zero_returns_inf():
        calculator = SimpleCalculator()

>       result = calculator.div(5, 0)

tests/test_main.py:70:  
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <simple_calculator.main.SimpleCalculator object at 0x7f0b0b733990>, a = 5, b = 0  1

    def div(self, a, b):
>       return a / b
E       ZeroDivisionError: division by zero

simple_calculator/main.py:17: ZeroDivisionError

Note that when an exception happens in the code and not in the test, the pytest output changes slightly. The first part of the message shows where the test fails, but then there is a second part that shows the internal code that raised the exception and provides information about the value of local variables on the first line 1.

We might implement two different solutions to satisfy this requirement and its test. The first one is to prevent b to be 0

simple_calculator/main.py

    def div(self, a, b):
        if not b:
            return float('inf')

        return a / b

and the second one is to intercept the exception with a try/except block

simple_calculator/main.py

    def div(self, a, b):
        try:
            return a / b
        except ZeroDivisionError:
            return float('inf')

Both solutions make the test suite pass, so both are correct. I leave to you the decision about which is the best one, syntactically speaking.

Git tag: step-7-float-division

Step 8 - Testing exceptions¶

A further requirement is that multiplication by zero must raise a ValueError exception. This means that we need a way to test if our code raises an exception, which is the opposite of what we did until now. In the previous tests, the condition to pass was that there was no exception in the code, while in this test the condition will be that an exception has been raised.

Again, this is a requirement I made up just for the sake of showing you how do deal with exceptions, so if you think this is a silly behaviour for a multiplication function you are probably right.

Pytest provides a context manager named raises that runs the code contained in it and passes only if the given exception is produced by that code.

tests/test_main.py

import pytest

[...]

def test_mul_by_zero_raises_exception():
    calculator = SimpleCalculator()

    with pytest.raises(ValueError):
        calculator.mul(3, 0)

In this case, thus, pytest runs the line calculator.mul(3, 0). If the method doesn't raise the exception ValueError the test will fail. Indeed, if you run the test suite now, you will get the following failure

________________________ test_mul_by_zero_raises_exception ________________________

    def test_mul_by_zero_raises_exception():
        calculator = SimpleCalculator()

        with pytest.raises(ValueError):
>           calculator.mul(3, 0)
E           Failed: DID NOT RAISE <class 'ValueError'>

tests/test_main.py:81: Failed

which signals that the code didn't raise the expected exception.

The code that makes the test pass needs to test if one of the inputs of the function mul is 0. This can be done with the help of the built-in function all, which accepts an iterable and returns True only if all the values contained in it are True. Since in Python the value 0 is not true, we may write

simple_calculator/main.py

    def mul(self, *args):
        if not all(args):
            raise ValueError
        return reduce(lambda x, y: x*y, args)

and make the test suite pass. The condition checks that there are no false values in the tuple args, that is there are no zeros.

Git tag: step-8-multiply-by-zero

Step 9 - A more complex set of requirements¶

Until now the requirements were pretty simple, and it was easy to map each of them directly into tests. It's time to try to tackle a more complex problem. The remaining requirements say that the class has to provide a function to compute the average of an iterable, and that this function shall accept two optional upper and lower thresholds to remove outliers.

Let's break these two requirements into a set of simpler ones

The function accepts an iterable and computes the average, i.e. avg([2, 5, 12, 98]) == 29.25
The function accepts an optional upper threshold. It must remove all the values that are greater than the threshold before computing the average, i.e. avg([2, 5, 12, 98], ut=90) == avg([2, 5, 12])
The function accepts an optional lower threshold. It must remove all the values that are less then the threshold before computing the average, i.e. avg([2, 5, 12, 98], lt=10) == avg([12, 98])
The upper threshold is not included when removing data, i.e. avg([2, 5, 12, 98], ut=12) == avg([2, 5, 12])
The lower threshold is not included when removing data, i.e. avg([2, 5, 12, 98], lt=5) == avg([5, 12, 98])
The function works with an empty list, returning 0, i.e. avg([]) == 0
The function works if the list is empty after outlier removal, i.e. avg([12, 98], lt=15, ut=90) == 0
The function outlier removal works if the list is empty, i.e. avg([], lt=15, ut=90) == 0

As you can see a requirement can produce multiple tests. Some of these are clearly expressed by the requirement (numbers 1, 2, 3), some of these are choices that we make (numbers 4, 5, 6) and can be discussed, some are boundary cases that we have to discover thinking about the problem (numbers 6, 7, 8).

There is a fourth category of tests, which are the ones that come from bugs that you discover. We will discuss about those later in this chapter.

Now, if you followed the posts coding along it is time to try to tackle a problem on your own. Why don't you try to go on and implement these features? Each of the eight requirements can be directly mapped into a test, and you know how to write tests and code that passes them. The next steps show my personal solution, which is just one of the possible ones, so you can compare what you did with what I came up with to solve the tests.

Step 9.1 - Average of an iterable

Let's start adding a test for requirement number 1

tests/test_main.py

def test_avg_correct_average():
    calculator = SimpleCalculator()

    result = calculator.avg([2, 5, 12, 98])

    assert result == 29.25

We feed the function avg a list of generic numbers, which average we calculated with an external tool. The first run of the test suite fails with the usual complaint about a missing function, and we can make the test pass with a simple use of sum and len, as both built-in functions work on iterables

simple_calculator/main.py

class SimpleCalculator:
    [...]

    def avg(self, it):
        return sum(it)/len(it)

Here, it stands for iterable, as this function works with anything that supports the loop protocol.

Git tag: step-9-1-average-of-an-iterable

Step 9.2 - Upper threshold

The second requirement mentions an upper threshold, but we are free with regards to the API, i.e. the requirement doesn't specify how the threshold is supposed to be specified or named. I decided to call the upper threshold parameter ut, so the test becomes

tests/test_main.py

def test_avg_removes_upper_outliers():
    calculator = SimpleCalculator()

    result = calculator.avg([2, 5, 12, 98], ut=90)

    assert result == pytest.approx(6.333333)

As you can see the parameter ut=90 is supposed to remove the element 98 from the list and then compute the average of the remaining elements. Since the result has an infinite number of digits I used the function pytest.approx to check the result.

The test suite fails because the function avg doesn't accept the parameter ut

_________________________ test_avg_removes_upper_outliers _________________________

    def test_avg_removes_upper_outliers():
        calculator = SimpleCalculator()

>       result = calculator.avg([2, 5, 12, 98], ut=90)
E       TypeError: avg() got an unexpected keyword argument 'ut'

tests/test_main.py:95: TypeError

There are two problems now that we have to solve, as it happened for the second test we wrote in this project. The new ut argument needs a default value, so we have to manage that case, and then we have to make the upper threshold work. My solution is

simple_calculator/main.py

    def avg(self, it, ut=None):
        if not ut:
            ut = max(it)

        _it = [x for x in it if x <= ut]

        return sum(_it)/len(_it)

The idea here is that ut is used to filter the iterable keeping all the elements that are less than or equal to the threshold. This means that the default value for the threshold has to be neutral with regards to this filtering operation. Using the maximum value of the iterable makes the whole algorithm work in every case, while for example using a big fixed value like 9999 would introduce a bug, as one of the elements of the iterable might be bigger than that value.

Git tag: step-9-2-upper-threshold

Step 9.3 - Lower threshold

The lower threshold is the mirror of the upper threshold, so it doesn't require many explanations. The test is

tests/test_main.py

def test_avg_removes_lower_outliers():
    calculator = SimpleCalculator()

    result = calculator.avg([2, 5, 12, 98], lt=10)

    assert result == pytest.approx(55)

and the code of the function avg now becomes

simple_calculator/main.py

    def avg(self, it, lt=None, ut=None):
        if not lt:
            lt = min(it)

        if not ut:
            ut = max(it)

        _it = [x for x in it if x >= lt and x <= ut]

        return sum(_it)/len(_it)

Git tag: step-9-3-lower-threshold

Step 9.4 and 9.5 - Boundary inclusion

As you can see from the code of the function avg, the upper and lower threshold are included in the comparison, so we might consider the requirements as already satisfied. TDD, however, pushes you to write a test for each requirement (as we saw it's not unusual to actually have multiple tests per requirements), and this is what we are going to do.

The reason behind this is that you might get the expected behaviour for free, like in this case, because some other code that you wrote to pass a different test provides that feature as a side effect. You don't know, however what will happen to that code in the future, so if you don't have tests that show that all your requirements are satisfied you might lose features without knowing it.

The test for the fourth requirement is

tests/test_main.py

def test_avg_upper_threshold_is_included():
    calculator = SimpleCalculator()

    result = calculator.avg([2, 5, 12, 98], ut=98)

    assert result == 29.25

Git tag: step-9-4-upper-threshold-is-included

while the test for the fifth one is

tests/test_main.py

def test_avg_lower_threshold_is_included():
    calculator = SimpleCalculator()

    result = calculator.avg([2, 5, 12, 98], lt=2)

    assert result == 29.25

Git tag: step-9-5-lower-threshold-is-included

And, as expected, both pass without any change in the code. Do you remember rule number 5? You should ask yourself why the tests don't fail. In this case we reasoned about that before, so we can accept that the new tests don't require any code change to pass.

Step 9.6 - Empty list

Requirement number 6 is something that wasn't clearly specified in the project description so we decided to return 0 as the average of an empty list. You are free to change the requirement and decide to raise an exception, for example.

The test that implements this requirement is

tests/test_main.py

def test_avg_empty_list():
    calculator = SimpleCalculator()

    result = calculator.avg([])

    assert result == 0

and the test suite fails with the following error

_______________________________ test_avg_empty_list _______________________________

    def test_avg_empty_list():
        calculator = SimpleCalculator()

>       result = calculator.avg([])

tests/test_main.py:127:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <simple_calculator.main.SimpleCalculator object at 0x7feeb7098a10>, it = [], lt = None, ut = None

    def avg(self, it, lt=None, ut=None):
        if not lt:
>           lt = min(it)
E           ValueError: min() arg is an empty sequence

simple_calculator/main.py:26: ValueError

The function min that we used to compute the default lower threshold doesn't work with an empty list, so the code raises an exception. The simplest solution is to check for the length of the iterable before computing the default thresholds

simple_calculator/main.py

    def avg(self, it, lt=None, ut=None):
        if not len(it):
            return 0

        if not lt:
            lt = min(it)

        if not ut:
            ut = max(it)

        _it = [x for x in it if x >= lt and x <= ut]

        return sum(_it)/len(_it)

Git tag: step-9-6-empty-list

As you can see the function avg is already pretty rich, but at the same time it is well structured and understandable. This obviously happens because the example is trivial, but cleaner code is definitely among the benefits of TDD.

Step 9.7 - Empty list after applying the thresholds

The next requirement deals with the case in which the outlier removal process empties the list. The test is the following

tests/test_main.py

def test_avg_manages_empty_list_after_outlier_removal():
    calculator = SimpleCalculator()

    result = calculator.avg([12, 98], lt=15, ut=90)

    assert result == 0

and the test suite fails with a ZeroDivisionError, because the length of the iterable is now 0.

________________ test_avg_manages_empty_list_after_outlier_removal ________________

    def test_avg_manages_empty_list_after_outlier_removal():
        calculator = SimpleCalculator()

>       result = calculator.avg([12, 98], lt=15, ut=90)

tests/test_main.py:135:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <simple_calculator.main.SimpleCalculator object at 0x7f9e60c3ba90>, it = [12, 98], lt = 15, ut = 90

    def avg(self, it, lt=None, ut=None):
        if not len(it):
            return 0

        if not lt:
            lt = min(it)

        if not ut:
            ut = max(it)

        _it = [x for x in it if x >= lt and x <= ut]

>       return sum(_it)/len(_it)
E       ZeroDivisionError: division by zero

simple_calculator/main.py:36: ZeroDivisionError

The easiest solution is to introduce a new check on the length of the iterable

simple_calculator/main.py

    def avg(self, it, lt=None, ut=None):
        if not len(it):
            return 0

        if not lt:
            lt = min(it)

        if not ut:
            ut = max(it)

        _it = [x for x in it if x >= lt and x <= ut]

        if not len(_it):
            return 0

        return sum(_it)/len(_it)

And this code makes the test suite pass. As I stated before, code that makes the tests pass is considered correct, but you are always allowed to improve it. In this case I don't really like the repetition of the length check, so I might try to refactor the function to get a cleaner solution. Since I have all the tests that show that the requirements are satisfied, I am free to try to change the code of the function.

After some attempts I found this solution

simple_calculator/main.py

    def avg(self, it, lt=None, ut=None):
        _it = it[:]

        if lt:
            _it = [x for x in _it if x >= lt]

        if ut:
            _it = [x for x in _it if x <= ut]

        if not len(_it):
            return 0

        return sum(_it)/len(_it)

which looks reasonably clean, and makes the whole test suite pass.

Git tag: step-9-7-empty-list-after-thresholds

Step 9.8 - Empty list before applying the thresholds

The last requirement checks another boundary case, which happens when the list is empty and we specify one of or both the thresholds. This test will check that the outlier removal code doesn't assume the list contains elements.

tests/test_main.py

def test_avg_manages_empty_list_before_outlier_removal():
    calculator = SimpleCalculator()

    result = calculator.avg([], lt=15, ut=90)

    assert result == 0

This test doesn't fail. So, according to the TDD methodology, we should provide a reason why this happens and decide if we want to keep the test. The reason is because the two list comprehensions used to filter the elements work perfectly with empty lists. As for the test, it comes directly from a corner case, and it checks a behaviour which is not already covered by other tests. This makes me decide to keep the test.

Git tag: step-9-8-empty-list-before-thresholds

Step 9.9 - Zero as lower/upper threshold

This is perhaps the most important step of the whole chapter, for two reasons.

First of all, the test added in this step was added by two readers of my book about clean architectures (Faust Gertz and Michael O'Neill), and this shows a real TDD workflow. After you published you package (or your book, in this case) someone notices a wrong behaviour in some use case. This might be a big flaw or a tiny corner case, but in any case they can come up with a test that exposes the bug, and maybe even with a patch to the code, but the most important part is the test.

Whoever discovers the bug has a clear way to show it, and you, as an author/maintainter/developer can add that test to your suite and work on the code until that passes. The rest of the test suite will block any change in the code that disrupts the behaviour you already tested. As I already stressed multiple times, we could do the same without TDD, but if we need to change a substantial amount of code there is nothing like a test suite that can guarantee we are not re-introducing bugs (also called regressions).

Second, this step shows an important part of the TDD workflow: checking corner cases. In general you should pay a lot of attention to the boundaries of a domain, and test the behaviour of the code in those cases.

This test shows that the code doesn't manage zero-valued lower thresholds correctly

tests/test_main.py

def test_avg_manages_zero_value_lower_outlier():
    calculator = SimpleCalculator()

    result = calculator.avg([-1, 0, 1], lt=0)

    assert result == 0.5

The reason is that the function avg contains a check like if lt:, which fails when lt is 0, as that is a false value. The check should be if lt is not None:, so that part of the function avg becomes

simple_calculator/main.py

        if lt is not None:
            _it = [x for x in _it if x >= lt]

It is immediately clear that the upper threshold has the same issue, so the two tests I added are

tests/test_main.py

def test_avg_manages_zero_value_lower_outlier():
    calculator = SimpleCalculator()

    result = calculator.avg([-1, 0, 1], lt=0)

    assert result == 0.5


def test_avg_manages_zero_value_upper_outlier():
    calculator = SimpleCalculator()

    result = calculator.avg([-1, 0, 1], ut=0)

    assert result == -0.5

and the final version of avg is

simple_calculator/main.py

    def avg(self, it, lt=None, ut=None):
        _it = it[:]

        if lt is not None:
            _it = [x for x in _it if x >= lt]

        if ut is not None:
            _it = [x for x in _it if x <= ut]

        if not len(_it):
            return 0

        return sum(_it)/len(_it)

Git tag: step-9-9-zero-as-lower-upper-threshold

Step 9.10 - Refactoring for generators

One of the readers of this series, Dmitry Labazkin, was following the series and noticed that the final implementation has some drawbacks, namely:

According to the requirements, this method should accept any iterable, but the implementation can't process generators (which are iterators and also iterables). For example, the function len() cannot be used with generators.
The iterable is copied, which is something we try to avoid to reduce memory usage.
Globally, the iterator is read 4 times, which affects performances.

These are interesting points, and he provides an implementation that solves them all. It's important to mention that the first point is closely related to requirements, so it should be represented by a unit test, while the other two are connected with performances and cannot be tested with pytest. However, any refactoring that produces code we consider better (for example from the performances point of view) can be tested by the existing tests. In other words, we can provide an alternative implementation and still make sure it works correctly.

Dmitry adds a test to check that generators are supported

tests/test_main.py

def test_avg_accepts_generators():
    calculator = SimpleCalculator()
    result = calculator.avg(i for i in [2, 5, 12, 98])
    assert result == 29.25

His implementation of the function avg() passes that test and the previous ones we wrote

simple_calculator/main.py

    def avg(self, it, lt=None, ut=None):
        count = 0
        total = 0

        for number in it:
            if lt is not None and number < lt:
                continue
            if ut is not None and number > ut:
                continue
            count += 1
            total += number

        if count == 0:
            return 0

        return total / count

One might argue that this implementation is less pythonic as it doesn't use fancy list comprehensions, but again, that is a matter of style (and performances). The point about generators is correct, but if that wasn't included in the requirements we might accept either implementation. I personally believe this new implementation is much better than the previous one, as I like to keep a low memory fingerprint, but if we were sure the calculator is used only on small sequences the concern might be overkill.

Git tag: step-9-10-refactoring-for-generators

Recap of the TDD rules¶

Through this very simple example we learned 6 important rules of the TDD methodology. Let us review them, now that we have some experience that can make the words meaningful

Test first, code later
Add the bare minimum amount of code you need to pass the tests
You shouldn't have more than one failing test at a time
Write code that passes the test. Then refactor it.
A test should fail the first time you run it. If it doesn't ask yourself why you are adding it.
Never refactor without tests.

How many assertions?¶

I am frequently asked "How many assertions do you put in a test?", and I consider this question important enough to discuss it in a dedicated section. To answer this question I want to briefly go back to the nature of TDD and the role of the test suite that we run.

The whole point of automated tests is to run through a set of checkpoints that can quickly reveal that there is a problem in a specific area. Mind the words "quickly" and "specific". When I run the test suite and an error occurs I'd like to be able to understand as fast as possible where the problem lies. This doesn't (always) mean that the problem will have a quick resolution, but at least I can be immediately aware of which part of the system is misbehaving.

On the other hand, we don't want to have too many test for the same condition, on the contrary we want to avoid testing the same condition more than once as tests have to be maintained. A test suite that is too fine-grained might result in too many tests failing because of the same problem in the code, which might be daunting and not very informative.

My advice is to group together assertions that can be executed after running the same setup, if they test the same process. For example, you might consider the two functions add and sub that we tested in this chapter. They require the same setup, which is to instantiate the class SimpleCalculator (a setup that they share with many other tests), but they are actually testing two different processes. A good sign of this is that you should rename the test to test_add_or_sub, and a failure in this test would require a further investigation in the test output to check which method of the class is failing.

If you have to test that a method returns positive even numbers, instead, you will have consider running the method and then writing two assertions, one that checks that the number is positive, and one that checks it is even. This makes sense, as a failure in one of the two means a failure of the whole process.

As a rule of thumb, then, consider if the test is a logical AND between conditions or a logical OR. In the former case go for multiple assertions, in the latter create multiple test functions.

How to manage bugs or missing features¶

In this chapter we developed the project from scratch, so the challenge was to come up with a series of small tests starting from the requirements. At a certain point in the life of your project you will have a stable version in production (this expression has many definitions, but in general it means "used by someone other than you") and you will need to maintain it. This means that people will file bug reports and feature requests, and TDD gives you a clear strategy to deal with those.

From the TDD point of view both a bug and a missing feature are cases not currently covered by a test, so I will refer to them collectively as bugs, but don't forget that I'm talking about the second ones as well.

The first thing you need to do is to write one or more tests that expose the bug. This way you can easily decide when the code that you wrote is correct or good enough. For example, let's assume that a user files an issue on the project SimpleCalculator saying: "The function add doesn't work with negative numbers". You should definitely try to get a concrete example from the user that wrote the issue and some information about the execution environment (as it is always possible that the problem comes from a different source, like for example an old version of a library your package relies on), but in the meanwhile you can come up with at least 3 tests: one that involves two negative numbers, one with a negative number as the first argument, and one with a negative numbers as the second argument.

You shouldn't write down all of them at once. Write the first test that you think might expose the issue and see if it fails. If it doesn't, discard it and write a new one. From the TDD point of view, if you don't have a failing test there is no bug, so you have to come up with at least one test that exposes the issue you are trying to solve.

At this point you can move on and try to change the code. Remember that you shouldn't have more than one failing test at a time, so start doing this as soon as you discover a test case that shows there is a problem in the code.

Once you reach a point where the test suite passes without errors stop and try to run the code in the environment where the bug was first discovered (for example sharing a branch with the user that created the ticket) and iterate the process.

The problem of types¶

Other than contributing to the TDD steps, Dmitry Labazkin asked some relevant questions about types, that I will summarise here. You can read his original questions in issue #11 and issue #12.

The question of type checking is thorny, and since this is an introductory series I will discuss it briefly and give some pointers. Don't get me wrong, though. As I will say later, this is one of the most important topics we can discuss in computer science.

Overall the problem Dmitry raises is that operators like addition and multiplication are valid for types other than integers (like floats) and also non-numeric ones (like strings). In Python, it is possible to multiply a string by a number and obtain a concatenation of that number of copies of the original string. At the same time, however, subtraction and division are not defined for strings, so some of the questions we can ask are:

can SimpleCalculator be used on non-integer numeric types?
can SimpleCalculator be used on non-numeric types?
shall we explicitly check in the code that the input values belong to a certain type?
shall we write tests to rule out other types?

As I said, such questions are deceptively simple, so let's tackle them step by step.

Let's assume it makes sense for our class to work with numeric types. In Python there is no way to prevent a program from calling SimpleCalculator().add("string1", "string2"), which would fail as the current implementation uses the built-in function sum that doesn't work on strings (unless you call it with a specific initial value). However, calling SimpleCalculator().mul("abc", 3) would result in "abcabcabc", as the internal implementation quietly supports strings.

Given the inconsistency, we might be tempted to rule out non-numeric types explicitly. In other words, we might want to add code to our calculator that actively checks if we are passing a non-numeric type. In that case we shall also add tests for those types, according to the TDD methodology, as no code can be added without tests.

The reason why this topic is thorny is because Python relies heavily on polymorphism, which means that it is more interested in the behaviour of an object more than in its nature. In other words, an object can be considered a number because it is an instance of int or float, for example, but it could just be a class we made up that behaves like one of those types. Using Abstract Base Classes like numbers is useful to check if an object is an instance of one of the types encompassed by the hierarchy (again, types such as int and float) but doesn't automatically include everything that behaves like a number. We can create a class that behaves like int without belonging to the hierarchy of numbers.

Ultimately, this is the reason why Python programmers have to remember that the operator + can be used with types like int, string, and list, but cannot be used with dictionaries. Conversely, len can be used on dictionaries and lists, but cannot be used on integers. We need to remember it, as these operators are polymorphic (there is no operator int+ or float+) but don't make sense or are not implemented for some types.

Those basic operators and functions raise an exception when the wrong type is passed, so we might be tempted to do the same and explicitly raise an exception when the wrong type is passed to SimpleCalculator. Again, the focus is on behaviour and implementation. If our implementation doesn't work with instances of certain classes an exception will occur already, and we don't need to do it explicitly. The aforementioned snipped SimpleCalculator().add("string1", "string2") would raise a TypeError because the underlying sum doesn't like strings. We don't need to do it explicitly.

In conclusion, my answers to the questions above are:

Can SimpleCalculator be used on non-integer numeric types? Probably, given the implementation is not specific to integers, but if we want to be sure we should add some tests to expose the functionality. So far, according to TDD, the class is certified to work with integers only. In this case, I might want to add some tests to show that it works with floats. But if someone feeds the class float-like objects that for some reason do not support the operator / some part of the calculator won't work, and there is no way to test all those conditions.

Can SimpleCalculator be used on non-numeric types? Yes, to a certain extent. mul can be used on sequences, for example. It is a calculator, though, so it doesn't make much sense to try to use it on non-numeric types. Users can feed the calculator any sort of non-numeric types and we cannot do anything to prevent it.

Shall we explicitly check in the code that the input values belong to a certain type? This goes against the nature of Python: if a certain function or method doesn't work with a specific type an exception will be raised.

Shall we write tests to rule out other types? Since it is basically impossible to write code that narrows the set of accepted types it is also impossible to write useful tests to check this. We can check that it doesn't work on strings, but what about other sequences? We can check it doesn't work with classes that inherit from Sequence, but what about classes that do not and behave the same?

In a dynamically typed language like Python, polymorphism and operator overloading are embedded in the language. I think the deeply polymorphic nature of Python is one of the most important aspects any user of this language should understand. It is an incredibly sharp double-edged sword, as it is at the same time extremely powerful and dangerous. "Everything is an object" might sound very simple at first, but it hides a degree of complexity that sooner of later has to be faced by those who want to be proficient with the language.

I wrote some posts that might help you to understand these topics. You can find them grouped here.

Final words¶

I hope you found the project entertaining and that you can now appreciate the power of TDD. The journey doesn't end here, though. In the next post I will discuss the practice of writing unit tests in depth, and then introduce you to another powerful tool: mocks.

Updates¶

2021-01-03: George fixed a typo, thanks!

2023-09-03: Dmitry Labazkin provided a new test for the method avg and a better implementation. He also asked relevant questions about type checking that I addressed in a new section. Thanks Dmitry!

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

TDD in Python with pytest - Part 1

2020-09-10T10:30:00+02:00

This series of posts comes directly from my book Clean Architectures in Python. As I am reviewing the book to prepare a second edition, I realised that Harry Percival was right when he said that the initial part on TDD shouldn't be in the book. That's a prerequisite to follow the chapters on the clean architecture, but it is something many programmers already know and they might be surprised to find it in a book that discusses architectures.

So, I decided to move it here before I start working on a new version of the book. I also followed the advice of valorien, who pointed out that the main example had some bad naming choices, and so I reworked the code.

Introduction¶

Test-Driven Development (TDD) is fortunately one of the names that I can spot most frequently when people talk about methodologies. Unfortunately, many programmers still do not follow it, fearing that it will impose a further burden on the already difficult life of a developer.

In this chapter I will try to outline the basic concept of TDD and to show you how your job as a programmer can greatly benefit from it. I will develop a very simple project to show how to practically write software following this methodology.

TDD is a methodology, something that can help you to create better code. But it is not going to solve all your problems. As with all methodologies you have to pay attention not to commit blindly to it. Try to understand the reasons why certain practices are suggested by the methodology and you will also understand when and why you can or have to be flexible.

Keep also in mind that testing is a broader concept that doesn't end with TDD, which focuses a lot on unit testing, a specific type of test that helps you to develop the API of your library/package. There are other types of tests, like integration or functional ones, that are not specifically part of the TDD methodology, strictly speaking, even though the TDD approach can be extended to any testing activity.

A real-life example¶

Let's start with a simple example taken from a programmer's everyday life.

The programmer is in the office with other colleagues, trying to nail down an issue in some part of the software. Suddenly the boss storms into the office, and addresses the programmer:

Boss: I just met with the rest of the board. Our clients are not happy, we didn't fix enough bugs in the last two months.

Programmer: I see. How many bugs did we fix?

Boss: Well, not enough!

Programmer: OK, so how many bugs do we have to fix every month?

Boss: More!

I guess you feel very sorry for the poor programmer. Apart from the aggressive attitude of the boss, what is the real issue in this conversation? At the end of it there is no hint for the programmer and their colleagues about what to do next. They don't have any clue about what they have to change. They can definitely try to work harder, but the boss didn't refer to actual figures, so it will be definitely hard for the developers to understand if they improved "enough".

The classical sorites paradox may help to understand the issue. One of the standard formulations, taken from the Wikipedia page, is

1,000,000 grains of sand is a heap of sand (Premise 1)

A heap of sand minus one grain is still a heap. (Premise 2)

So 999,999 grains is a heap of sand.

A heap of sand minus one grain is still a heap. (Premise 2)

So 999,998 grains is a heap of sand.

So one grain is a heap of sand.

Where is the issue? The concept expressed by the word "heap" is nebulous, it is not defined clearly enough to allow the process to find a stable point, or a solution.

When you write software you face that same challenge. You cannot conceive a function and just expect it "to work", because this is not clearly defined. How do you test if the function that you wrote "works"? What do you mean by "works"? TDD forces you to clearly state your goal before you write the code. Actually, the TDD mantra is "Test first, code later", which can be translated to "Goal first, solution later". Will shortly see a practical example of this.

For the time being, consider that this is a valid practice also outside the realm of software creation. Whoever runs a business knows that you need to be able to extract some numbers (KPIs) from the activity of your company, because it is by comparing those numbers with some predefined thresholds that you can easily tell if the business is healthy or not. KPIs are a form of test, and you have to define them in advance, according to the expectations or needs that you have.

Pay attention. Nothing prevents you from changing the thresholds as a reaction to external events. You may consider that, given the incredible heat wave that hit your country, the amount of coats that your company sold could not reach the goal. So, because of a specific event, you can justify a change in the test (KPI). If you didn't have the test you would have just generically recorded that you earned less money.

Going back to software and TDD, following this methodology you are forced to state clear goals like

sum(4, 5) == 9

Let me read this test for you: there will be a sum function available in the system that accepts two integers. If the two integers are 4 and 5 the function will return 9.

As you can see there are many things that are tested by this statement.

The function exists and can be imported
The function accepts two integers
Passing 4 and 5 as inputs, the output of the function will be 9.

Pay attention that at this stage there is no code that implements the function sum, the tests will fail for sure.

As we will see with a practical example in the next chapter, what I explained in this section will become a set of rules of the methodology.

A simple TDD project¶

The project we are going to develop is available at https://github.com/lgiordani/simple_calculator.

This project is purposefully extremely simple. You don't need to be an experienced Python programmer to follow this chapter, but you need to know the basics of the language. The goal of this series of posts is not that of making you write the best Python code, but that of allowing you learn the TDD work flow, so don't be too worried if your code is not perfect.

Methodologies are like sports or arts: you cannot learn them just by reading their description on a book. You have to practice them. Thus, you should avoid as much as possible to just follow this chapter reading the code passively. Instead, you should try to write the code and to try new solutions to the problems that I discuss. This is very important, as it actually makes you use TDD. This way, at the end of the chapter you will have a personal experience of what TDD is like.

The repository is tagged, and at the end of each section you will find a link to the relative tag that contains my working solution. Please note that it is entirely possible your solution is different from mine: there are several aspects of coding, like for example style, that are not related to unit testing and TDD.

Setup the project¶

Clone the project repository and move to the branch develop. The branch master contains the full solution, and I use it to maintain the repository, but if you want to code along you need to start from scratch. I recommend you fork the repository on GitHub so that you are able to commit your changes.

git clone https://github.com/YOURUSERNAME/simple_calculator
cd simple_calculator
git checkout --track origin/develop

Create a virtual environment following your preferred process and install the requirements

pip install -r requirements/dev.txt

You should at this point be able to run

pytest -svv

and get an output like

================================ test session starts ===============================
platform XXXX -- Python XXXX, pytest-XXXX, py-XXXX, pluggy-XXXX -- XXXX
cachedir: .pytest_cache
rootdir: XXXX
configfile: XXXX
plugins: XXXX
collected 0 items 

=============================== no tests ran in 0.02s ==============================

You can see here the operating system and a short list of the versions of the main packages involved in running pytest: Python, pytest itself, and some of its components and plugins. You can also see here where pytest is reading its configuration from. As this header is standard I will omit it from the output that I will show in the rest of the chapter. The specific versions of the packages are not important for this series.

Requirements¶

The goal of the project is to write a class SimpleCalculator that performs calculations: addition, subtraction, multiplication, and division. Addition and multiplication shall accept multiple arguments. Division shall return a float value, and division by zero shall return the string "inf". Multiplication by zero must raise a ValueError exception. The class will also provide a function to compute the average of an iterable like a list. This function gets two optional upper and lower thresholds and should remove from the computation the values that fall outside these boundaries.

As you can see the requirements are pretty simple, and a couple of them are definitely not "good" requirements, like the behaviour of division and multiplication. I added those requirements for the sake of example, to show how to deal with exceptions when developing in TDD.

An interesting topic to discuss is that of data types: shall the calculator perform addition between integers or between floats? What about complex numbers, strings, and other items that can be "added" together? And what about the other operations? I consider this an advanced topic, in particular in Python, so for now I will consider only integers as inputs and discuss the problem of different types later in the series.

Step 1 - Adding two numbers¶

The first test we are going to write is one that checks if the class SimpleCalculator can perform an addition. Add the following code to the file tests/test_main.py

tests/test_main.py

from simple_calculator.main import SimpleCalculator 1

def test_add_two_numbers(): 2
    calculator = SimpleCalculator()

    result = calculator.add(4, 5)

    assert result == 9

As you can see the first thing we do is to import the class SimpleCalculator 1 that we are supposed to write. This class doesn't exist yet, don't worry, you didn't skip any passage.

The test is a standard function 2 (this is how pytest works), and the function name shall begin with test_ so that pytest can automatically discover all the tests. I tend to give my tests a descriptive name, so it is easier later to come back and understand what the test is about with a quick glance. You are free to follow the style you prefer but in general remember that naming components in a proper way is one of the most difficult things in programming. So better to get a handle on it as soon as possible.

The body of the test function is pretty simple. The class SimpleCalculator is instantiated, and the method add of the instance is called with two numbers, 4 and 5. The result is stored in the variable result, which is later the subject of the test itself. The statement assert result == 9 first computes result == 9 which is a boolean, with a value that is either True or False. The keyword assert, then, silently passes if the argument is True, but raises an exception if it is False.

And this is how you write tests in pytest: if your code doesn't raise any exception the test passes, otherwise it fails. The keyword assert is used to force an exception in case of wrong result. Remember that pytest doesn't consider the return value of the function, so it can detect a failure only if it raises an exception.

Save the file and go back to the terminal. Execute pytest -svv and you should receive the following error message

====================================== ERRORS ======================================
_______________________ ERROR collecting tests/test_main.py _______________________

[...]

tests/test_main.py:4: in <module>
    from simple_calculator.main import SimpleCalculator
E   ImportError: cannot import name 'SimpleCalculator' from 'simple_calculator.main'
!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!
============================== 1 error in 0.20 seconds =============================

No surprise here, actually, as we just tried to use something that doesn't exist. This is good, the test is showing us that something we suppose exists actually doesn't.

TDD rule number 1: Test first, code later

This, by the way, is not yet an error in a test. The error happens very soon, during the tests collection phase (as shown by the message in the bottom line Interrupted: 1 errors during collection). Given this, the methodology is still valid, as we wrote a test and it fails because of an error or a missing feature in the code.

Let's fix this issue. Open the file simple_calculator/main.py and add this code

simple_calculator/main.py

class SimpleCalculator:
    pass

But, I hear you scream, this class doesn't implement any of the requirements that are in the project. Yes, this is the hardest lesson you have to learn when you start using TDD. The development of the code is ruled by the tests, not by the requirements. The requirements are used to write the tests, the tests are used to write the code. You shouldn't worry about something that is more than one level above the current one in this workflow.

TDD rule number 2: Add the reasonably minimum amount of code you need to pass the tests

Run the test again, and this time you should receive a different error, that is

tests/test_main.py::test_add_two_numbers FAILED

===================================== FAILURES =====================================
______________________________ test_add_two_numbers _______________________________


    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       AttributeError: 'SimpleCalculator' object has no attribute 'add'

tests/test_main.py:10: AttributeError
============================= 1 failed in 0.04 seconds =============================

This is the first proper pytest failure report that we receive. You see a list of files containing tests and the result of each test

tests/test_main.py::test_add_two_numbers FAILED

Later we will see that the syntax FILENAME::TESTNAME can be given directly to pytest to run a single test. In this case we already have only one test, but later you might run a single failing test giving the name shown here on the command line. For example

pytest -svv tests/test_main.py::test_add_two_numbers

The second part of the output shows details on the failing tests, if any

______________________________ test_add_two_numbers _______________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       AttributeError: 'SimpleCalculator' object has no attribute 'add'

tests/test_main.py:10: AttributeError

For each failing test, pytest shows a header with the name of the test and the part of the code that raised the exception. At the end of each box, pytest shows the line of the test file where the error happened.

Back to the project. The new error is no surprise, as the test uses the method add that wasn't defined in the class. I bet you already guessed what I'm going to do, didn't you? This is the code that you should add to the class

simple_calculator/main.py

class SimpleCalculator:
    def add(self):
        pass

And again, as you notice, we made the smallest possible addition to the code to pass the test. Running pytest again you should receive a different error message

_______________________________ test_add_two_numbers _______________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       TypeError: add() takes 1 positional argument but 3 were given

tests/test_main.py:10: TypeError

The function we defined doesn't accept any argument other than self (def add(self)), but in the test we pass three of them (calculator.add(4, 5). Remember that in Python self is passed implicitly when you call a function. Our move at this point is to change the function to accept the parameters that it is supposed to receive, namely two numbers. The code now becomes

simple_calculator/main.py

class SimpleCalculator:
    def add(self, a, b):
        pass

Run the test again, and you will receive another error

______________________________ test_add_two_numbers ________________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

        result = calculator.add(4, 5)

>       assert result == 9
E       assert None == 9
E         -None
E         +9

tests/test_main.py:12: AssertionError

The function returns None, as it doesn't contain any code, while the test expects it to return 9. What do you think is the minimum code you can add to pass this test?

Well, the answer is

simple_calculator/main.py

class SimpleCalculator:
    def add(self, a, b):
        return 9

and this may surprise you (it should!). You might have been tempted to add some code that performs an addition between a and b, but this would violate the TDD principles, because you would have been driven by the requirements and not by the tests.

When you run pytest again, you will be rewarded by a success message

tests/test_main.py::test_add_two_numbers PASSED

I know this sound weird, but think about it for a moment: if your code works (that is, it passes the tests), you don't need to change anything, as your tests should specify everything the code should do. Maybe in the future you will discover that this solution is not good enough, and at that point you will have to change it (this will happen with the next test, in this case). But for now everything works, and you shouldn't implement more than this.

Git tag: step-1-adding-two-numbers

Step 2 - Adding three numbers¶

The requirements state that "Addition and multiplication shall accept multiple arguments". This means that we should be able to execute not only add(4, 5) like we did, but also add(4, 5, 11), add(4, 5, 11, 2), and so on. We can start testing this behaviour with the following test, that you should put in tests/test_main.py, after the previous test that we wrote.

tests/test_main.py

def test_add_three_numbers():
    calculator = SimpleCalculator()

    result = calculator.add(4, 5, 6)

    assert result == 15

This test fails when we run the test suite

_____________________________ test_add_three_numbers _______________________________

    def test_add_three_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5, 6)
E       TypeError: SimpleCalculator.add() takes 3 positional arguments but 4 were given

tests/test_main.py:18: TypeError

for the obvious reason that the function we wrote in the previous section accepts only 2 arguments other than self. What is the minimum code that you can write to fix this test?

Well, the simplest solution is to add another argument, so my first attempt is

simple_calculator/main.py

class SimpleCalculator:
    def add(self, a, b, c):
        return 9

which solves the previous error, but creates a new one. If that wasn't enough, it also makes the first test fail!

______________________________ test_add_two_numbers ________________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       TypeError: SimpleCalculator.add() missing 1 required positional argument: 'c'

tests/test_main.py:10: TypeError
_____________________________ test_add_three_numbers _______________________________

    def test_add_three_numbers():
        calculator = SimpleCalculator()

        result = calculator.add(4, 5, 6)

>       assert result == 15
E       assert 9 == 15

tests/test_main.py:20: AssertionError

The first test now fails because the new add method requires three arguments and we are passing only two. The second tests fails because the method add returns 9 and not 15 as expected by the test.

When multiple tests fail it's easy to feel discomforted and lost. Where are you supposed to start fixing this? Well, one possible solution is to undo the previous change and to try a different solution, but in general you should try to get to a situation in which only one test fails.

TDD rule number 3: You shouldn't have more than one failing test at a time

This is very important as it allows you to focus on one single test and thus one single problem. Clearly, we need to keep an eye on the global problem that we are trying to solve, but real test batteries can contain hundreds of tests and it is not practical to try to tackle all of them together.

Commenting tests to make them inactive is a perfectly valid way to have only one failing test. Pytest, however, has a smarter solution: you can use the option -k that allows you to specify a matching name. That option has a lot of expressive power, but for now we can just give it the name of the test that we want to run

pytest -svv -k test_add_two_numbers

This option allows you to select multiple tests that share the same prefix, for example. If you want to run a single specific test you can also name it on the command line with the syntax we discussed previously

pytest -svv tests/test_main.py::test_add_two_numbers

Either way, pytest will run only the first test and return the same result returned before, since we didn't change the test itself

______________________________ test_add_two_numbers ________________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       TypeError: SimpleCalculator.add() missing 1 required positional argument: 'c'

tests/test_main.py:10: TypeError

To fix this error we can obviously revert the addition of the third argument, but this would mean going back to the previous solution. Obviously tests focus on a very small part of the code, but we have to keep in mind what we are doing in terms of the big picture. A better solution is to add a default value to the third argument. The additive identity is 0, so the new code of the method add is

simple_calculator/main.py

class SimpleCalculator:
    def add(self, a, b, c=0):
        return 9

And this makes the first test pass. At this point we can run the full suite with pytest -svv and see what happens

_____________________________ test_add_three_numbers ______________________________

    def test_add_three_numbers():
        calculator = SimpleCalculator()

        result = calculator.add(4, 5, 6)

>       assert result == 15
E       assert 9 == 15

tests/test_main.py:20: AssertionError

The second test still fails, because the returned value that we hard coded doesn't match the expected one. At this point the tests show that our previous solution (return 9) is not sufficient anymore, and we have to try to implement something more complex.

I want to stress this. You should implement the minimal change in the code that makes tests pass. If that solution is not enough there will be a test that shows it. Now, as you can see, the addition of a new requirement changes the tests, adding a new one, and the old solution is not sufficient any more.

How can we solve this? We know that writing return 15 will make the first test fail (you may try, if you want), so here we have to be a bit smarter and try a better solution, that in this case is actually to implement a real sum

simple_calculator/main.py

class SimpleCalculator:
    def add(self, a, b, c=0):
        return a + b + c

This solution makes both tests pass, so the entire suite runs without errors.

Git tag: step-2-adding-three-numbers

I can see your face, your are probably frowning at the fact that it took us 10 minutes to write a method that performs the addition of two or three numbers. On the one hand, keep in mind that I'm going at a very slow pace, this being an introduction, and for these first tests it is better to take the time to properly understand every single step. Later, when you will be used to TDD, some of these steps will be implicit. On the other hand, TDD is slower than untested development, but the time that you invest writing tests now is usually negligible compared to the amount of time you would spend trying to identify and fix bugs later.

Step 3 - Adding multiple numbers¶

The requirements are not yet satisfied, however, as they mention "multiple" numbers and not just three. How can we test that we can add a generic amount of numbers? We might add a test_add_four_numbers, a test_add_five_numbers, and so on, but this will cover specific cases and will never cover all of them. Sad to say, it is impossible to test that generic condition, or, at least in this case, so complex that it is not worth trying to do it.

What you shall do in TDD is to test boundary cases. In general you should always try to find the so-called "corner cases" of your algorithm and write tests that show that the code covers them. For example, if you are testing some code that accepts as inputs a number from 1 to 100, you need a test that runs it with a generic number like 42 (which is far from being generic, but don't panic!), but you definitely want to have a specific test that runs the algorithm with the number 1 and one that runs with the number 100. You also want to have tests that show the algorithm doesn't work with 0 and with 101, but we will talk later about testing error conditions.

In our example there is no real limitation to the number of arguments that you pass to your function. Before Python 3.7 there was a limit of 256 arguments, which has been removed in that version of the language, but these are limitations enforced by an external system, and they are not real boundaries of your algorithm.

The definition of "external system" obviously depends on what you are testing. If you are implementing a programming language you want to have tests that show how many arguments you can pass to a function, or that check the amount of memory used by certain language features. In this case we accept the Python language as the environment in which we work, so we don't want to test its features.

The solution, in this case, might be to test a reasonable high amount of input arguments, to check that everything works. In particular, we should try to keep in mind that our goal is to devise as much as possible a generic solution. For example, we easily realise that we cannot come up with a function like

    def add(self, a, b, c=0, d=0, e=0, f=0, g=0, h=0, i=0):

as it is not generic, it is just covering a greater amount of inputs (9, in this case, but not 10 or more).

That said, a good test might be the following

tests/test_main.py

def test_add_many_numbers():
    numbers = range(100)

    calculator = SimpleCalculator()

    result = calculator.add(*numbers)

    assert result == 4950

which creates an array (strictly speaking a range, which is an iterable) of all the numbers from 0 to 99. The sum of all those numbers is 4950, which is what the algorithm shall return.

Please note that the assertion doesn't implement any algorithm to find the solution. I calculated the answer manually and hard coded it in the test. You should try as much as possible to minimise the algorithmic complexity of tests, instead "stating the facts". The reason is simple: the more complex the code of the test is, the higher the chances of introducing a bug in the test.

The test suite fails because we are giving the function too many arguments

______________________________ test_add_many_numbers _______________________________

    def test_add_many_numbers():
        numbers = range(100)

        calculator = SimpleCalculator()

>       result = calculator.add(*numbers)
E       TypeError: SimpleCalculator.add() takes from 3 to 4 positional arguments but 101 were given

tests/test_main.py:28: TypeError

The minimum amount of code that we can add, this time, will not be so trivial, as we have to pass three tests. This is actually the greatest advantage of TDD: the tests that we wrote are still there and will check that the previous conditions are still satisfied. And since tests are committed with the code they will always be there.

The Python way to support a generic number of arguments (technically called variadic functions) is through the use of the syntax *args, which stores in args a tuple that contains all the arguments.

simple_calculator/main.py

class SimpleCalculator:
    def add(self, *args):
        return sum(args)

At that point we can use the built-in function sum to sum all the arguments. This solution makes the whole test suite pass without errors, so it is correct.

Git tag: step-3-adding-multiple-numbers

Pay attention here, please. In TDD, a solution is not correct when it is beautiful, when it is smart, or when it uses the latest feature of the language. All these things are good, but TDD wants your code to pass the tests. So, your code might be ugly, convoluted, and slow, but if it passes the test it is correct. This in turn means that TDD doesn't cover all the needs of your software project. Delivering fast routines, for example, might be part of the advantage you have on your competitors, but it is not really testable with the TDD methodology (typically, performance testing is done in a completely different way).

Part of the TDD methodology, then, deals with "refactoring", which means changing the code in a way that doesn't change the outputs, which in turns means that all your tests keep passing. Once you have a proper test suite in place, you can focus on the beauty of the code, or you can introduce smart solutions according to what the language allows you to do. We will discuss refactoring further later in this post.

TDD rule number 4: Write code that passes the test. Then refactor it.

Step 4 - Subtraction¶

From the requirements we know that we have to implement a function to subtract numbers, but this doesn't mention multiple arguments (as it would be complex to define what subtracting 3 of more numbers actually means). The tests that implements this requirements is

tests/test_main.py

def test_subtract_two_numbers():
    calculator = SimpleCalculator()

    result = calculator.sub(10, 3)

    assert result == 7

which doesn't pass with the following error

____________________________ test_subtract_two_numbers ____________________________

    def test_subtract_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.sub(10, 3)
E       AttributeError: 'SimpleCalculator' object has no attribute 'sub'

tests/test_main.py:36: AttributeError

Now that you understood the TDD process, and that you know you should avoid over-engineering, you can also skip some of the passages that we run through in the previous sections. A good solution for this test is

simple_calculator/main.py

    def sub(self, a, b):
        return a - b

which makes the test suite pass.

Git tag: step-4-subtraction

Step 5 - Multiplication¶

It's time to move to multiplication, which has many similarities to addition. The requirements state that we have to provide a function to multiply numbers and that this function shall allow us to multiply multiple arguments. In TDD you should try to tackle problems one by one, possibly dividing a bigger requirement in multiple smaller ones.

In this case the first test can be the multiplication of two numbers, as it was for addition.

tests/test_main.py

def test_mul_two_numbers():
    calculator = SimpleCalculator()

    result = calculator.mul(6, 4)

    assert result == 24

And the test suite fails as expected with the following error

______________________________ test_mul_two_numbers _______________________________

    def test_mul_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.mul(6, 4)
E       AttributeError: 'SimpleCalculator' object has no attribute 'mul'

tests/test_main.py:44: AttributeError

We face now a classical TDD dilemma. Shall we implement the solution to this test as a function that multiplies two numbers, knowing that the next test will invalidate it, or shall we already consider that the target is that of implementing a variadic function and thus use *args directly?

In this case the choice is not really important, as we are dealing with very simple functions. In other cases, however, it might be worth recognising that we are facing the same issue we solved in a similar case and try to implement a smarter solution from the very beginning. In general, however, you should not implement anything that you don't plan to test in one of the next few tests that you will write.

If we decide to follow the strict TDD, that is implement the simplest first solution, the bare minimum code that passes the test would be

simple_calculator/main.py

    def mul(self, a, b):
        return a * b

Git tag: step-5-multiply-two-numbers

To show you how to deal with redundant tests I will in this case choose the second path, and implement a smarter solution for the present test. Keep in mind however that it is perfectly correct to implement that solution shown above and then move on and try to solve the problem of multiple arguments later.

The problem of multiplying a tuple of numbers can be solved in Python using the function reduce. This function implements a typical algorithm that "reduces" an array to a single number, applying a given function. The algorithm steps are the following

1. Apply the function to the first two elements 2. Remove the first two elements from the array 3. Apply the function to the result of the previous step and to the first element of the array 4. Remove the first element 5. If there are still elements in the array go back to step 3

So, suppose the function is

def mul2(a, b):
    return a * b

and the array is

a = [2, 6, 4, 8, 3]

The steps followed by the algorithm will be

1. Apply the function to 2 and 6 (first two elements). The result is 2 * 6, that is 12 2. Remove the first two elements, the array is now a = [4, 8, 3] 3. Apply the function to 12 (result of the previous step) and 4 (first element of the array). The new result is 12 * 4, that is 48 4. Remove the first element, the array is now a = [8, 3] 5. Apply the function to 48 (result of the previous step) and 8 (first element of the array). The new result is 48 * 8, that is 384 6. Remove the first element, the array is now a = [3] 7. Apply the function to 384 (result of the previous step) and 3 (first element of the array). The new result is 384 * 3, that is 1152 8. Remove the first element, the array is now empty and the procedure ends

Going back to our class SimpleCalculator, we might import reduce from the module functools and use it on the array args. We need to provide a function that we can define in the function mul itself.

simple_calculator/main.py

from functools import reduce


class SimpleCalculator:
    [...]

    def mul(self, *args):
        def mul2(a, b):
            return a * b

        return reduce(mul2, args)

Git tag: step-5-multiply-two-numbers-smart

More information about the algorithm reduce can be found on the MapReduce Wikipedia page https://en.wikipedia.org/wiki/MapReduce. The Python function documentation can be found at https://docs.python.org/3.10/library/functools.html#functools.reduce.

The above code makes the test suite pass, so we can move on and address the next problem. As happened with addition we cannot properly test that the function accepts a potentially infinite number of arguments, so we can test a reasonably high number of inputs.

tests/test_main.py

def test_mul_many_numbers():
    numbers = range(1, 10)

    calculator = SimpleCalculator()

    result = calculator.mul(*numbers)

    assert result == 362880

Git tag: step-5-multiply-many-numbers

We might use 100 arguments as we did with addition, but the multiplication of all numbers from 1 to 100 gives a result with 156 digits and I don't really need to clutter the tests file with such a monstrosity. As I said, testing multiple arguments is testing a boundary, and the idea is that if the algorithm works for 2 numbers and for 10 it will work for 10 thousands arguments as well.

If we run the test suite now all tests pass, and this should worry you.

Yes, you shouldn't be happy. When you follow TDD each new test that you add should fail. If it doesn't fail you should ask yourself if it is worth adding that test or not. This is because chances are that you are adding a useless test and we don't want to add useless code, because code has to be maintained, so the less the better.

In this case, however, we know why the test already passes. We implemented a smarter algorithm as a solution for the first test knowing that we would end up trying to solve a more generic problem. And the value of this new test is that it shows that multiple arguments can be used, while the first test doesn't.

So, after these considerations, we can be happy that the second test already passes.

TDD rule number 5: A test should fail the first time you run it. If it doesn't, ask yourself why you are adding it.

Step 6 - Refactoring¶

Previously, I introduced the concept of refactoring, which means changing the code without altering the results. How can you be sure you are not altering the behaviour of your code? Well, this is what the tests are for. If the new code keeps passing the test suite you can be sure that you didn't remove any feature.

In theory, refactoring shouldn't add any new behaviour to the code, as it should be an idempotent transformation. There is no real practical way to check this, and we will not bother with it now. You should be concerned with this if you are discussing security, as your code shouldn't add any entry point you don't want to be there. In this case you will need tests that check the absence of features instead of their presence.

This means that if you have no tests you shouldn't refactor. But, after all, if you have no tests you shouldn't have any code, either, so refactoring shouldn't be a problem you have. If you have some code without tests (I know you have it, I do), you should seriously consider writing tests for it, at least before changing it. More on this in a later section.

For the time being, let's see if we can work on the code of the class SimpleCalculator without altering the results. I do not really like the definition of the function mul2 inside the function mul. It is obviously perfectly fine and valid, but for the sake of example I will pretend we have to get rid of it.

Python provides a useful function to multiply two objects in the module operator of the standard library

simple_calculator/main.py

import operator
from functools import reduce


class SimpleCalculator:
    [...]

    def mul(self, *args):
        return reduce(operator.mul, args)

Running the test suite I can see that all the test pass, so my refactoring is correct.

Git tag: step-6-refactoring

TDD rule number 6: Never refactor without tests.

Final words¶

Well, I think we learned a lot. We started with no knowledge of TDD and we managed to implement a fully tested class with 3 methods. We also briefly touched the topic of refactoring, which is of paramount importance in development. In the next post I will cover the remaining requirements: division, testing exceptions, and the average function.

Updates¶

2021-01-03: George fixed a typo, thanks!

2021-08-11: Andrea Mignone fixed a link. Thank you!

2023-09-03: Dmitry Labazkin and Ilaletdinov Almaz suggested using operator.mul instead of a lambda in the final refactoring. Thanks both!

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

Refactoring with tests in Python: a practical example

2017-07-21T09:30:00+01:00

This post contains a step-by-step example of a refactoring session guided by tests. When dealing with untested or legacy code refactoring is dangerous and tests can help us do it the right way, minimizing the amount of bugs we introduce, and possibly completely avoiding them.

Refactoring is not easy. It requires a double effort to understand code that others wrote, or that we wrote in the past, and moving around parts of it, simplifying it, in one word improving it, is by no means something for the faint-hearted. Like programming, refactoring has its rules and best practices, but it can be described as a mixture of technique, intuition, experience, risk.

Programming, after all, is craftsmanship.

The starting point¶

The simple use case I will use for this post is that of a service API that we can access, and that produces data in JSON format, namely a list of elements like the one shown here

{
    "age": 20,
    "surname": "Frazier",
    "name": "John",
    "salary": "£28943"
}

Once we convert this to a Python data structure we obtain a list of dictionaries, where 'age' is an integer, and the remaining fields are strings.

Someone then wrote a class that computes some statistics on the input data. This class, called DataStats, provides a single method stats(), whose inputs are the data returned by the service (in JSON format), and two integers called iage and isalary. Those, according to the short documentation of the class, are the initial age and the initial salary used to compute the average yearly increase of the salary on the whole dataset.

The code is the following

import math
import json


class DataStats:

    def stats(self, data, iage, isalary):
        # iage and isalary are the starting age and salary used to
        # compute the average yearly increase of salary.

        # Compute average yearly increase
        average_age_increase = math.floor(
            sum([e['age'] for e in data])/len(data)) - iage
        average_salary_increase = math.floor(
            sum([int(e['salary'][1:]) for e in data])/len(data)) - isalary

        yearly_avg_increase = math.floor(
            average_salary_increase/average_age_increase)

        # Compute max salary
        salaries = [int(e['salary'][1:]) for e in data]
        threshold = '£' + str(max(salaries))

        max_salary = [e for e in data if e['salary'] == threshold]

        # Compute min salary
        salaries = [int(d['salary'][1:]) for d in data]
        min_salary = [e for e in data if e['salary'] ==
                      '£{}'.format(str(min(salaries)))]

        return json.dumps({
            'avg_age': math.floor(sum([e['age'] for e in data])/len(data)),
            'avg_salary': math.floor(sum(
                [int(e['salary'][1:]) for e in data])/len(data)),
            'avg_yearly_increase': yearly_avg_increase,
            'max_salary': max_salary,
            'min_salary': min_salary
        })

The goal¶

It is fairly easy, even for the untrained eye, to spot some issues in the previous class. A list of the most striking ones is

The class exposes a single method and has no __init__(), thus the same functionality could be provided by a single function.
The stats() method is too big, and performs too many tasks. This makes debugging very difficult, as there is a single inextricable piece of code that does everything.
There is a lot of code duplication, or at least several lines that are very similar. Most notably the two operations '£' + str(max(salaries)) and '£{}'.format(str(min(salaries))), the two different lines starting with salaries =, and the several list comprehensions.

So, since we are going to use this code in some part of our Amazing New Project™, we want to possibly fix these issues.

The class, however, is working perfectly. It has been used in production for many years and there are no known bugs, so our operation has to be a refactoring, which means that we want to write something better, preserving the behaviour of the previous object.

The path¶

In this post I want to show you how you can safely refactor such a class using tests. This is different from TDD, but the two are closely related. The class we have has not been created using TDD, as there are no tests, but we can use tests to ensure its behaviour is preserved. This should therefore be called Test Driven Refactoring (TDR).

The idea behind TDR is pretty simple. First, we have to write a test that checks the behaviour of some code, possibly a small part with a clearly defined scope and output. This is a posthumous (or late) unit test, and it simulates what the author of the code should have provided (cough cough, it was you some months ago...).

Once you have you unit test you can go and modify the code, knowing that the behaviour of the resulting object will be the same of the previous one. As you can easily understand, the effectiveness of this methodology depends strongly on the quality of the tests themselves, possibly more than when developing with TDD, and this is why refactoring is hard.

Caveats¶

Two remarks before we start our refactoring. The first is that such a class could easily be refactored to some functional code. As you will be able to infer from the final result there is no real reason to keep an object-oriented approach for this code. I decided to go that way, however, as it gave me the possibility to show a design pattern called wrapper, and the refactoring technique that leverages it.

The second remark is that in pure TDD it is strongly advised not to test internal methods, that is those methods that do not form the public API of the object. In general, we identify such methods in Python by prefixing their name with an underscore, and the reason not to test them is that TDD wants you to shape objects according to the object-oriented programming methodology, which considers objects as behaviours and not as structures. Thus, we are only interested in testing public methods.

It is also true, however, that sometimes even tough we do not want to make a method public, that method contains some complex logic that we want to test. So, in my opinion the TDD advice should sound like "Test internal methods only when they contain some non-trivial logic".

When it comes to refactoring, however, we are somehow deconstructing a previously existing structure, and usually we end up creating a lot of private methods to help extracting and generalising parts of the code. My advice in this case is to test those methods, as this gives you a higher degree of confidence in what you are doing. With experience you will then learn which tests are required and which are not.

Setup of the testing environment¶

Clone this repository and create a virtual environment. Activate it and install the required packages with

pip install -r requirements.txt

The repository already contains a configuration file for pytest and you should customise it to avoid entering your virtual environment directory. Go and fix the norecursedirs parameter in that file, adding the name of the virtual environment you just created; I usually name my virtual environments with a venv prefix, and this is why that variable contains the entry venv*.

At this point you should be able to run pytest -svv in the parent directory of the repository (the one that contains pytest.ini), and obtain a result similar to the following

========================== test session starts ==========================
platform linux -- Python 3.5.3, pytest-3.1.2, py-1.4.34, pluggy-0.4.0
cachedir: .cache
rootdir: datastats, inifile: pytest.ini
plugins: cov-2.5.1
collected 0 items 

====================== no tests ran in 0.00 seconds ======================

The given repository contains two branches. master is the one that you are into, and contains the initial setup, while develop points to the last step of the whole refactoring process. Every step of this post contains a reference to the commit that contains the changes introduced in that section.

Step 1 - Testing the endpoints¶

Commit: 27a1d8c

When you start refactoring a system, regardless of the size, you have to test the endpoints. This means that you consider the system as a black box (i.e. you do not know what is inside) and just check the external behaviour. In this case we can write a test that initialises the class and runs the stats() method with some test data, possibly real data, and checks the output. Obviously we will write the test with the actual output returned by the method, so this test is automatically passing.

Querying the server we get the following data

test_data = [
    {
        "id": 1,
        "name": "Laith",
        "surname": "Simmons",
        "age": 68,
        "salary": "£27888"
    },
    {
        "id": 2,
        "name": "Mikayla",
        "surname": "Henry",
        "age": 49,
        "salary": "£67137"
    },
    {
        "id": 3,
        "name": "Garth",
        "surname": "Fields",
        "age": 70,
        "salary": "£70472"
    }
]

and calling the stats() method with that output, with iage set to 20, and isalary set to 20000, we get the following JSON result

{
    "avg_age": 62,
    "avg_salary": 55165,
    "avg_yearly_increase": 837,
    "max_salary": [{
        "id": 3,
        "name": "Garth",
        "surname": "Fields",
        "age": 70,
        "salary": "£70472"
    }],
    "min_salary": [{
        "id": 1,
        "name": "Laith",
        "surname": "Simmons",
        "age": 68,
        "salary": "£27888"
    }]
}

Caveat: I'm using a single very short set of real data, namely a list of 3 dictionaries. In a real case I would test the black box with many different use cases, to ensure I am not just checking some corner case.

The test is the following

import json

from datastats.datastats import DataStats


def test_json():
    test_data = [
        {
            "id": 1,
            "name": "Laith",
            "surname": "Simmons",
            "age": 68,
            "salary": "£27888"
        },
        {
            "id": 2,
            "name": "Mikayla",
            "surname": "Henry",
            "age": 49,
            "salary": "£67137"
        },
        {
            "id": 3,
            "name": "Garth",
            "surname": "Fields",
            "age": 70,
            "salary": "£70472"
        }
    ]

    ds = DataStats()

    assert ds.stats(test_data, 20, 20000) == json.dumps(
        {
            'avg_age': 62,
            'avg_salary': 55165,
            'avg_yearly_increase': 837,
            'max_salary': [{
                "id": 3,
                "name": "Garth",
                "surname": "Fields",
                "age": 70,
                "salary": "£70472"
            }],
            'min_salary': [{
                "id": 1,
                "name": "Laith",
                "surname": "Simmons",
                "age": 68,
                "salary": "£27888"
            }]
        }
    )

As said before, this test is obviously passing, having been artificially constructed from a real execution of the code.

Well, this test is very important! Now we know that if we change something inside the code, altering the behaviour of the class, at least one test will fail.

Step 2 - Getting rid of the JSON format¶

Commit: 65e2997

The method returns its output in JSON format, and looking at the class it is pretty evident that the conversion is done by json.dumps().

The structure of the code is the following

class DataStats:

    def stats(self, data, iage, isalary):
        [code_part_1]

        return json.dumps({
            [code_part_2]
        })

Where obviously code_part_2 depends on code_part_1. The first refactoring, then, will follow this procedure

1. We write a test called test__stats() for a _stats() method that is supposed to return the data as a Python structure. We can infer the latter manually from the JSON or running json.loads() from a Python shell. The test fails.

2. We duplicate the code of the stats() method that produces the data, putting it in the new _stats() method. The test passes.

class DataStats:

    def _stats(parameters):
        [code_part_1]

        return [code_part_2]

    def stats(self, data, iage, isalary):
        [code_part_1]

        return json.dumps({
            [code_part_2]
        })

3. We remove the duplicated code in stats() replacing it with a call to _stats()

class DataStats:

    def _stats(parameters):
        [code_part_1]

        return [code_part_2]

    def stats(self, data, iage, isalary):
        return json.dumps(
            self._stats(data, iage, isalary)
        )

At this point we could refactor the initial test test_json() that we wrote, but this is an advanced consideration, and I'll leave it for some later notes.

So now the code of our class looks like this

class DataStats:

    def _stats(self, data, iage, isalary):
        # iage and isalary are the starting age and salary used to
        # compute the average yearly increase of salary.

        # Compute average yearly increase
        average_age_increase = math.floor(
            sum([e['age'] for e in data])/len(data)) - iage
        average_salary_increase = math.floor(
            sum([int(e['salary'][1:]) for e in data])/len(data)) - isalary

        yearly_avg_increase = math.floor(
            average_salary_increase/average_age_increase)

        # Compute max salary
        salaries = [int(e['salary'][1:]) for e in data]
        threshold = '£' + str(max(salaries))

        max_salary = [e for e in data if e['salary'] == threshold]

        # Compute min salary
        salaries = [int(d['salary'][1:]) for d in data]
        min_salary = [e for e in data if e['salary'] ==
                      '£{}'.format(str(min(salaries)))]

        return {
            'avg_age': math.floor(sum([e['age'] for e in data])/len(data)),
            'avg_salary': math.floor(sum(
                [int(e['salary'][1:]) for e in data])/len(data)),
            'avg_yearly_increase': yearly_avg_increase,
            'max_salary': max_salary,
            'min_salary': min_salary
        }

    def stats(self, data, iage, isalary):
        return json.dumps(
            self._stats(data, iage, isalary)
        )

and we have two tests that check the correctness of it.

Step 3 - Refactoring the tests¶

Commit: d619017

It is pretty clear that the test_data list of dictionaries is bound to be used in every test we will perform, so it is high time we moved that to a global variable. There is no point now in using a fixture, as the test data is just static data.

We could also move the output data to a global variable, but the upcoming tests are not using the whole output dictionary any more, so we can postpone the decision.

The test suite now looks like

import json

from datastats.datastats import DataStats


test_data = [
    {
        "id": 1,
        "name": "Laith",
        "surname": "Simmons",
        "age": 68,
        "salary": "£27888"
    },
    {
        "id": 2,
        "name": "Mikayla",
        "surname": "Henry",
        "age": 49,
        "salary": "£67137"
    },
    {
        "id": 3,
        "name": "Garth",
        "surname": "Fields",
        "age": 70,
        "salary": "£70472"
    }
]


def test_json():

    ds = DataStats()

    assert ds.stats(test_data, 20, 20000) == json.dumps(
        {
            'avg_age': 62,
            'avg_salary': 55165,
            'avg_yearly_increase': 837,
            'max_salary': [{
                "id": 3,
                "name": "Garth",
                "surname": "Fields",
                "age": 70,
                "salary": "£70472"
            }],
            'min_salary': [{
                "id": 1,
                "name": "Laith",
                "surname": "Simmons",
                "age": 68,
                "salary": "£27888"
            }]
        }
    )


def test__stats():

    ds = DataStats()

    assert ds._stats(test_data, 20, 20000) == {
        'avg_age': 62,
        'avg_salary': 55165,
        'avg_yearly_increase': 837,
        'max_salary': [{
            "id": 3,
            "name": "Garth",
            "surname": "Fields",
            "age": 70,
            "salary": "£70472"
        }],
        'min_salary': [{
            "id": 1,
            "name": "Laith",
            "surname": "Simmons",
            "age": 68,
            "salary": "£27888"
        }]
    }

Step 4 - Isolate the average age algorithm¶

Commit: 9db1803

Isolating independent features is a key target of software design. Thus, our refactoring shall aim to disentangle the code dividing it into small separated functions.

The output dictionary contains five keys, and each of them corresponds to a value computed either on the fly (for avg_age and avg_salary) or by the method's code (for avg_yearly_increase, max_salary, and min_salary). We can start replacing the code that computes the value of each key with dedicated methods, trying to isolate the algorithms.

To isolate some code, the first thing to do is to duplicate it, putting it into a dedicated method. As we are refactoring with tests, the first thing is to write a test for this method.

def test__avg_age():

    ds = DataStats()

    assert ds._avg_age(test_data) == 62

We know that the method's output shall be 62 as that is the value we have in the output data of the original stats() method. Please note that there is no need to pass iage and isalary as they are not used in the refactored code.

The test fails, so we can dutifully go and duplicate the code we use to compute 'avg_age'

    def _avg_age(self, data):
        return math.floor(sum([e['age'] for e in data])/len(data))

and once the test passes we can replace the duplicated code in _stats() with a call to _avg_age()

        return {
            'avg_age': self._avg_age(data),
            'avg_salary': math.floor(sum(
                [int(e['salary'][1:]) for e in data])/len(data)),
            'avg_yearly_increase': yearly_avg_increase,
            'max_salary': max_salary,
            'min_salary': min_salary
        }

Checking after that that no test is failing. Well done! We isolated the first feature, and our refactoring produced already three tests.

Step 5 - Isolate the average salary algorithm¶

Commit: 4122201

The avg_salary key works exactly like the avg_age, with different code. Thus, the refactoring process is the same as before, and the result should be a new test__avg_salary() test

def test__avg_salary():

    ds = DataStats()

    assert ds._avg_salary(test_data) == 55165

a new _avg_salary() method

    def _avg_salary(self, data):
        return math.floor(sum([int(e['salary'][1:]) for e in data])/len(data))

and a new version of the final return value

        return {
            'avg_age': self._avg_age(data),
            'avg_salary': self._avg_salary(data),
            'avg_yearly_increase': yearly_avg_increase,
            'max_salary': max_salary,
            'min_salary': min_salary
        }

Step 6 - Isolate the average yearly increase algorithm¶

Commit: 4005145

The remaining three keys are computed with algorithms that, being longer than one line, couldn't be squeezed directly in the definition of the dictionary. The refactoring process, however, does not really change; as before, we first test a helper method, then we define it duplicating the code, and last we call the helper removing the code duplication.

For the average yearly increase of the salary we have a new test

def test__avg_yearly_increase():

    ds = DataStats()

    assert ds._avg_yearly_increase(test_data, 20, 20000) == 837

a new method that passes the test

    def _avg_yearly_increase(self, data, iage, isalary):
        # iage and isalary are the starting age and salary used to
        # compute the average yearly increase of salary.

        # Compute average yearly increase
        average_age_increase = math.floor(
            sum([e['age'] for e in data])/len(data)) - iage
        average_salary_increase = math.floor(
            sum([int(e['salary'][1:]) for e in data])/len(data)) - isalary

        return math.floor(average_salary_increase/average_age_increase)

and a new version of the _stats() method

    def _stats(self, data, iage, isalary):
        # Compute max salary
        salaries = [int(e['salary'][1:]) for e in data]
        threshold = '£' + str(max(salaries))

        max_salary = [e for e in data if e['salary'] == threshold]

        # Compute min salary
        salaries = [int(d['salary'][1:]) for d in data]
        min_salary = [e for e in data if e['salary'] ==
                      '£{}'.format(str(min(salaries)))]

        return {
            'avg_age': self._avg_age(data),
            'avg_salary': self._avg_salary(data),
            'avg_yearly_increase': self._avg_yearly_increase(
                data, iage, isalary),
            'max_salary': max_salary,
            'min_salary': min_salary
        }

Please note that we are not solving any code duplication but the ones that we introduce to refactor. The first achievement we should aim to is to completely isolate independent features.

Step 7 - Isolate max and min salary algorithms¶

Commit: 17b2413

When refactoring we shall always do one thing at a time, but for the sake of conciseness, I'll show here the result of two refactoring steps at once. I'll recommend the reader to perform them as independent steps, as I did when I wrote the code that I am posting below.

The new tests are

def test__max_salary():

    ds = DataStats()

    assert ds._max_salary(test_data) == [{
        "id": 3,
        "name": "Garth",
        "surname": "Fields",
        "age": 70,
        "salary": "£70472"
    }]


def test__min_salary():

    ds = DataStats()

    assert ds._min_salary(test_data) == [{
        "id": 1,
        "name": "Laith",
        "surname": "Simmons",
        "age": 68,
        "salary": "£27888"
    }]

The new methods in the DataStats class are

    def _max_salary(self, data):
        # Compute max salary
        salaries = [int(e['salary'][1:]) for e in data]
        threshold = '£' + str(max(salaries))

        return [e for e in data if e['salary'] == threshold]

    def _min_salary(self, data):
        # Compute min salary
        salaries = [int(d['salary'][1:]) for d in data]
        return [e for e in data if e['salary'] ==
                '£{}'.format(str(min(salaries)))]

and the _stats() method is now really tiny

    def _stats(self, data, iage, isalary):
        return {
            'avg_age': self._avg_age(data),
            'avg_salary': self._avg_salary(data),
            'avg_yearly_increase': self._avg_yearly_increase(
                data, iage, isalary),
            'max_salary': self._max_salary(data),
            'min_salary': self._min_salary(data)
        }

Step 8 - Reducing code duplication¶

Commit: b559a5c

Now that we have the main tests in place we can start changing the code of the various helper methods. These are now small enough to allow us to change the code without further tests. While this can be true in this case, however, in general there is no definition of what "small enough" means, as there is no real definition of what "unit test" is. Generally speaking you should be confident that the change that you are doing is covered by the tests that you have. Weren't this the case, you'd better add one or more tests until you feel confident enough.

The two methods _max_salary() and _min_salary() share a great deal of code, even though the second one is more concise

    def _max_salary(self, data):
        # Compute max salary
        salaries = [int(e['salary'][1:]) for e in data]
        threshold = '£' + str(max(salaries))

        return [e for e in data if e['salary'] == threshold]

    def _min_salary(self, data):
        # Compute min salary
        salaries = [int(d['salary'][1:]) for d in data]
        return [e for e in data if e['salary'] ==
                '£{}'.format(str(min(salaries)))]

I'll start by making explicit the threshold variable in the second function. As soon as I change something, I'll run the tests to check that the external behaviour did not change.

    def _max_salary(self, data):
        # Compute max salary
        salaries = [int(e['salary'][1:]) for e in data]
        threshold = '£' + str(max(salaries))

        return [e for e in data if e['salary'] == threshold]

    def _min_salary(self, data):
        # Compute min salary
        salaries = [int(d['salary'][1:]) for d in data]
        threshold = '£{}'.format(str(min(salaries)))

        return [e for e in data if e['salary'] == threshold]

Now, it is pretty evident that the two functions are the same but for the min() and max() functions. They still use different variable names and different code to format the threshold, so my first action is to even out them, copying the code of _min_salary() to _max_salary() and changing min() to max()

    def _max_salary(self, data):
        # Compute max salary
        salaries = [int(d['salary'][1:]) for d in data]
        threshold = '£{}'.format(str(max(salaries)))

        return [e for e in data if e['salary'] == threshold]

    def _min_salary(self, data):
        # Compute min salary
        salaries = [int(d['salary'][1:]) for d in data]
        threshold = '£{}'.format(str(min(salaries)))

        return [e for e in data if e['salary'] == threshold]

Now I can create another helper called _select_salary() that duplicates that code and accepts a function, used instead of min() or max(). As I did before, first I duplicate the code, and then remove the duplication by calling the new function.

After some passages, the code looks like this

    def _select_salary(self, data, func):
        salaries = [int(d['salary'][1:]) for d in data]
        threshold = '£{}'.format(str(func(salaries)))

        return [e for e in data if e['salary'] == threshold]

    def _max_salary(self, data):
        return self._select_salary(data, max)

    def _min_salary(self, data):
        return self._select_salary(data, min)

I noticed then a code duplication between _avg_salary() and _select_salary()

    def _avg_salary(self, data):
        return math.floor(sum([int(e['salary'][1:]) for e in data])/len(data))

    def _select_salary(self, data, func):
        salaries = [int(d['salary'][1:]) for d in data]

and decided to extract the common algorithm in a method called _salaries(). As before, I write the test first

def test_salaries():

    ds = DataStats()

    assert ds._salaries(test_data) == [27888, 67137, 70472]

then I implement the method

    def _salaries(self, data):
        return [int(d['salary'][1:]) for d in data]

and eventually I replace the duplicated code with a call to the new method

    def _salaries(self, data):
        return [int(d['salary'][1:]) for d in data]

    def _select_salary(self, data, func):
        threshold = '£{}'.format(str(func(self._salaries(data))))

        return [e for e in data if e['salary'] == threshold]

While doing this I noticed that _avg_yearly_increase() contains the same code, and fix it there as well.

    def _avg_yearly_increase(self, data, iage, isalary):
        # iage and isalary are the starting age and salary used to
        # compute the average yearly increase of salary.

        # Compute average yearly increase
        average_age_increase = math.floor(
            sum([e['age'] for e in data])/len(data)) - iage
        average_salary_increase = math.floor(
            sum(self._salaries(data))/len(data)) - isalary

        return math.floor(average_salary_increase/average_age_increase)

It would be useful at this point to store the input data inside the class and to use it as self.data instead of passing it around to all the class's methods. This however would break the class's API, as DataStats is currently initialised without any data. Later I will show how to introduce changes that potentially break the API, and briefly discuss the issue. For the moment, however, I'll keep changing the class without modifying the external interface.

It looks like age has the same code duplication issues as salary, so with the same procedure I introduce the _ages() method and change the _avg_age() and _avg_yearly_increase() methods accordingly.

Speaking of _avg_yearly_increase(), the code of that method contains the code of the _avg_age() and _avg_salary() methods, so it is worth replacing it with two calls. As I am moving code between existing methods, I do not need further tests.

    def _avg_yearly_increase(self, data, iage, isalary):
        # iage and isalary are the starting age and salary used to
        # compute the average yearly increase of salary.

        # Compute average yearly increase
        average_age_increase = self._avg_age(data) - iage
        average_salary_increase = self._avg_salary(data) - isalary

        return math.floor(average_salary_increase/average_age_increase)

Step 9 - Advanced refactoring¶

Commit: cc0b0a1

The initial class didn't have any __init__() method, and was thus missing the encapsulation part of the object-oriented paradigm. There was no reason to keep the class, as the stats() method could have easily been extracted and provided as a plain function.

This is much more evident now that we refactored the method, because we have 10 methods that accept data as a parameter. I would be nice to load the input data into the class at instantiation time, and then access it as self.data. This would greatly improve the readability of the class, and also justify its existence.

If we introduce a __init__() method that requires a parameter, however, we will change the class's API, breaking the compatibility with the code that imports and uses it. Since we want to keep it, we have to devise a way to provide both the advantages of a new, clean class and of a stable API. This is not always perfectly achievable, but in this case the Adapter design pattern (also known as Wrapper) can perfectly solve the issue.

The goal is to change the current class to match the new API, and then build a class that wraps the first one and provides the old API. The strategy is not that different from what we did previously, only this time we will deal with classes instead of methods. With a stupendous effort of my imagination I named the new class NewDataStats. Sorry, sometimes you just have to get the job done.

The first things, as happens very often with refactoring, is to duplicate the code, and when we insert new code we need to have tests that justify it. The tests will be the same as before, as the new class shall provide the same functionalities as the previous one, so I just create a new file, called test_newdatastats.py and start putting there the first test test_init().

import json

from datastats.datastats import NewDataStats


test_data = [
    {
        "id": 1,
        "name": "Laith",
        "surname": "Simmons",
        "age": 68,
        "salary": "£27888"
    },
    {
        "id": 2,
        "name": "Mikayla",
        "surname": "Henry",
        "age": 49,
        "salary": "£67137"
    },
    {
        "id": 3,
        "name": "Garth",
        "surname": "Fields",
        "age": 70,
        "salary": "£70472"
    }
]


def test_init():

    ds = NewDataStats(test_data)

    assert ds.data == test_data

This test doesn't pass, and the code that implements the class is very simple

class NewDataStats:

    def __init__(self, data):
        self.data = data

Now I can start an iterative process:

I will copy one of the tests of DataStats and adapt it to NewDataStats
I will copy some code from DataStats to NewDataStats, adapting it to the new API and making it pass the test.

At this point iteratively removing methods from DataStats and replacing them with a call to NewDataStats would be overkill. I'll show you in the next section why, and what we can do to avoid that.

An example of the resulting tests for NewDataStats is the following

def test_ages():

    ds = NewDataStats(test_data)

    assert ds._ages() == [68, 49, 70]

and the code that passes the test is

    def _ages(self):
        return [d['age'] for d in self.data]

Once finished, I noticed that, as now methods like _ages() do not require an input parameter any more, I can convert them to properties, changing the tests accordingly.

    @property
    def _ages(self):
        return [d['age'] for d in self.data]

It is time to replace the methods of DataStats with calls to NewDataStats. We could do it method by method, but actually the only thing that we really need is to replace stats(). So the new code is

class DataStats:

    def stats(self, data, iage, isalary):
        nds = NewDataStats(data)
        return nds.stats(iage, isalary)

And since all the other methods are not used any more we can safely delete them, checking that the tests do not fail. Speaking of tests, removing methods will make many tests of DataStats fail, so we need to remove them.

Step 10 - Still room for improvement¶

As refactoring is an iterative process it will often happen that you think you did everything was possible, just to spot later that you missed something. In this case the missing step was spotted by Harun Yasar, who noticed another small code duplication.

The two functions

    def _avg_salary(self):
        return math.floor(sum(self._salaries)/len(self.data))

    def _avg_age(self):
        return math.floor(sum(self._ages)/len(self.data))

share the same logic, so we can definitely isolate that and call the common code in each function

    def _floor_avg(self, sum_of_numbers):
        return math.floor(sum_of_numbers / len(self.data))

    def _avg_salary(self):
        return self._floor_avg(sum(self._salaries))

    def _avg_age(self):
        return self._floor_avg(sum(self._ages))

which passes all the tests and is thus correct.

Whenever I get corrected by someone who read one of my posts and just learned something new I feel so happy, because it means that the message is clear!

Final words¶

I hope this little tour of a refactoring session didn't result too trivial, and helped you to grasp the basic concepts of this technique. If you are interested in the subject I'd strongly recommend the classic book by Martin Fowler "Refactoring: Improving the Design of Existing Code", which is a collection of refactoring patterns. The reference language is Java, but the concepts are easily adapted to Python.

Updates¶

2017-07-28: delirious-lettuce and Matt Beck did a very serious proofread and spotted many typos. Thank you both for reading the post and for taking the time to submit the issues!

2020-02-15: Harun Yasar spotted a missing refactoring in two functions. Thanks!

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.