The Digital Cat - TDD

Free book "Rust Projects - Write a Redis Clone"

2024-12-24T10:00:00+00:00

I just published version 1.1.0 of my free book "Rust Projects - Write a Redis Clone".

You can download the book from Leanpub!

The book follows the Redis challenge on CodeCrafters and contains a 40% discount for any type of paid membership.

In this version I added a section to chapter 5 to restore some tests that I left commented in one of the previous sections during a refactoring. I also added a final section to address compiler warnings and tidy up the code.

A Rust to-do list CLI app - Part 1

2024-02-14T18:00:00+01:00

This blog was born as a place where I could share what I discovered while I was learning new technologies and concepts. Remaining faithful to this manifesto, I decided to start writing some posts about Rust, as I recently joined the vibrant community behind this language. My roots are in C and Assembly, so I feel at home with Rust that looks to me like a proper modern version of C. As such, I'm supremely interested in the ideas behind it, and in particular I want to focus on low-level data representation, structures, and memory usage.

After having read the manual and implemented several snippets, I decided to try to implement a more complete application and to annotate the journey here. While I was looking for a tutorial I found this useful post by Claudio Restifo, where he develops a simple to-do list management application.

So, I decided to implement the same using Claudio's solution when I was stuck or to compare his strategy with mine, as it is always extremely useful to see how another coder tackles certain challenges. Thanks Claudio! However, as I'm a big fan of TDD, I'd like to follow that approach, which is something that Claudio doesn't do in his post.

Please keep in mind that these are my first steps with the language, so consider what you read here as the work of a beginner (as I am, with this language). I'm more than happy to receive advice or corrections, so feel free to get in touch if you see anything that can be done in a better way. In the post, you will find annotations that highlight the major topics that I think a Rust programmer should be familiar with.

Requirements¶

The requirements I set for the application are:

Manage a list of entries. Each entry can be in state "to be done" or "done".
Provide commands to view, add, delete and mark items as "done" or "to be done".
Can save and retrieve data from a file. The file has a default name that can be changed with an option.

This is an extremely basic application, so the command line is: todo [OPTIONS] COMMAND [KEY].

I expect the interaction with the tool to be something like

$ todo list
# TO DO

* Write post
* Buy milk
* Have fun

# DONE

* Feed the cat

$ todo add "Update CV"
$ todo mark-done "Buy milk"
$ todo list
# TO DO

* Write post
* Have fun
* Update CV

# DONE

* Feed the cat
* Buy milk

Initial setup¶

Starting a new Rust project is extremely simple with Cargo:

$ cargo new todo-cli

This will create the required structure in a new directory and create two files: Cargo.toml and src/main.rs. The latter will contain some placeholder code that we can use to check our setup

main.rs

fn main() {
    println!("Hello, world!");
}

We can run the code with cargo run or build it into a stand-alone executable with cargo build. This will compile the code in debug mode by default and put the executable in the directory target/debug. Cargo can be used to run tests as well (cargo test).

Cargo

Cargo [docs] is the Rust package manager and the default solution to manage dependencies, compile packages and in general to manage your code. It's highly recommended to learn at least the basics of this powerful tool.

See the source code

CLI management¶

Command line interfaces are typically not part of the classic TDD cycle, as they should be part of integration tests. Now, the definition that the Rust community uses for integration tests is

Integration tests are external to your crate and use only its public interface in the same way any other code would. Their purpose is to test that many parts of your library work correctly together.

So, the integration they consider here is that between multiple parts of a library. What I am referring to here is more properly system integration tests, where we test the public interface of a whole tool. Long story short, I will not write tests for the CLI commands.

In the aforementioned post, Claudio Restifo suggests we can read command line arguments using std::env::args() directly with something like

fn main() {
    let action = std::env::args().nth(1)
        .expect("Please specify an action");
    let item = std::env::args().nth(2)
        .expect("Please specify an item");

    println!("{:?}, {:?}", action, item);
}

Modules

In Rust a module can be used directly as long as it is part of the current project. The standard library is clearly visible by default, while other modules have to be declared in the file Cargo.toml. It is then perfectly acceptable to write let action = std::env::args()....

Use declarations, however, can import other modules into the current namespace, to make the code more readable.

The method nth [docs] returns (not too surprisingly) the nth element of an iterator.

Iterators

The Rust documentation contains a very useful section on iterators [docs].

The function std::env::args [docs] (used to access the command line arguments in the traditional Unix fashion) returns Args [docs], which implements the trait Iterator.

As it happens in object-oriented programming languages (which Rust is not), the expression "implements an interface" is often simplified to "is". So, colloquially speaking, we can say that std::env::Args is an Iterator [docs].

The prototype of nth is

fn nth(&mut self, n: usize) -> Option<Self::Item>

and it mentions Option<Self::Item> as the return type. The type Option provides a method expect [docs] that returns either the content of the Some value or panics, printing the given message in the backtrace.

Option

Option [docs] and Result [docs] are a versatile way to manage optional results (either something or nothing) and results (either something good or an error), and are among the most important structures to learn in Rust.

Running the code above with cargo run produces the following output, where we can see the message set by the first call to expect.

    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/todo-cli`
thread 'main' panicked at src/main.rs:80:42:
Please specify an action
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Better CLI management with Clap¶

Clap stands for Command Line Argument Parser and is a nice crate that simplifies the creation of advanced command line interfaces. I installed it using

$ cargo add clap --features derive

as detailed in the documentation and my code is now

use clap::Parser;

#[derive(Parser)]
struct Cli {
    command: String,
    key: String,
}

fn main() {
    let args = Cli::parse();

    println!("Command line: {} {}", args.command, args.key);
}

Clap allows me to add long and short options as well, so later I will use it to specify the database file name. For now, however, this is enough.

derive

The attribute derive [docs] is another cornerstone of the language and is used everywhere. The machinery behind it is not trivial, but I recommend getting used to the syntax and the standard use cases.

See the source code

A simple list of elements¶

From Claudio's post I got the idea of using a hash map for the list of items. That's a simple and effective solution, in particular given the fact that Rust provides the collection type out of the box.

As I want to use TDD, I begin with a test. In Rust, we put tests and code in the same file (but for integration tests between modules), so I can write a simple test at the bottom of the file to check that a TodoList type exists and can be initialised.

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn init_todo() {
        let todo = TodoList::new();
    }
}

TDD

TDD is one of my favourite methodologies and I'm happy to see that Rust allows me to follow it. I can't recommend TDD enough! The Rust book contains a pretty detailed chapter on how to write tests.

Clearly, when I run cargo test I get a compile error. Let's implement the type then

use std::collections::HashMap;

[...]

struct TodoList {
    // true = to do, false = done
    items: HashMap<String, bool>,
}

As you see, I had to write a comment as a reminder of the meaning of the boolean values. I also suspect that I will need to use the type HashMap<String, bool> multiple times, so I will probably end up creating a type alias of some sort.

To initialise such structure I have to create an implementation of the function new

Version 1

impl TodoList {
    fn new() -> TodoList {
        let items: HashMap<String, bool> =
            HashMap::<String, bool>::new();

        TodoList { items: items }
    }
}

struct and impl

Rust is not an object-oriented programming language, so it uses plain structs to encapsulate data. The Rust book has a full chapter on struct and impl.

Thanks to type inference, the explicit definition of types after the call to HashMap is not needed and I can write

Version 2

        let items: HashMap<String, bool> = HashMap::new();

Version 3

        let items = HashMap::<String, bool>::new();

For such a simple initialisation, I might also write directly

Version 4

        TodoList {
            items: HashMap::<String, bool>::new(),
        }

However, I will soon replace the ::new() with something more complicated that reads a file, so I decided to keep version 2. This code passes the test I wrote, so it's good enough for now.

At this point I can also initialise the list in the main function

struct TodoList {
    // true = to do, false = done
    items: HashMap<String, bool>,
}

impl TodoList {
    fn new() -> TodoList {
        let items: HashMap<String, bool> = HashMap::new();

        TodoList { items: items }
    }
}

fn main() {
    let args = Cli::parse();

    let todo = TodoList::new();

    println!("Command line: {} {}", args.command, args.key);
}

Please note that I'm not being too strict with dead code here and the compile will complain about unused variables and fields. I like this, and I won't add underscores to silence the warnings since they are a good reminder of what I still have to implement.

See the source code

Adding items¶

A good improvement at this point would be to create a method to add items to the list. First, the mandatory test

    #[test]
    fn add_item() {
        let mut todo = TodoList::new();
        todo.add(String::from("Something to do"));
        assert_eq!(todo.items.get("Something to do"), Some(&true))
    }

The type HashMap provides a method called insert [docs] which is exactly what I need

impl TodoList {

    ...

    fn add(&mut self, key: String) {
        self.items.insert(key, true);
    }
}

And once again this code passes the test, so I consider it good enough.

self and Self

In Rust self is a keyword [docs] and not just a name as it happens in Python. Rust considers self of type Self [docs], which is the type we are implementing in a trait or impl block.

The code fn add(&mut self, key: String) { above is equivalent to fn add(self: &mut Self, key: String) {. However, self cannot be renamed to something like foo, as Rust is expecting a parameter with that specific name.

References and mutability

I found confusing, at first, that in Rust we usually call &mut a mutable reference. In my head, I always translate it into a reference to mutable data as this helps me to remember what I am doing here.

In short, in Rust we need to declare explicitly when we intend to consider a value mutable using the keyword mut [docs], and this is valid also when we pass arguments to functions. If we decide to borrow data instead of moving it, we can use references, that in C terms are equivalent to protected pointers. We can also pass a reference to data that we intend to mutate, which is where &mut comes into play.

However, as I mentioned I think it's important to understand that the reference (a pointer) is not mutating. The data referenced by it is.

See the source code

Multiple additions and updates¶

If I add the same key multiple times I want the list to contain only one occurrence, so I test this.

    #[test]
    fn add_item_already_exist() {
        let mut todo = TodoList::new();
        todo.add(String::from("Something to do"));
        todo.add(String::from("Something to do"));
        assert_eq!(todo.items.get("Something to do"), Some(&true));
        assert_eq!(todo.items.len(), 1);
    }

The test passes already, thanks to the properties of the hash map.

I also want the second insertion not to update the value of the existing element, and in this case the test is

    #[test]
    fn add_item_does_not_change_value() {
        let mut todo = TodoList::new();
        todo.add(String::from("Something to do"));

        if let Some(x) = todo.items.get_mut("Something to do") {
            *x = false;
        }

        todo.add(String::from("Something to do"));
        assert_eq!(todo.items.get("Something to do"), Some(&false));
        assert_eq!(todo.items.len(), 1);
    }

I have to manually change the value inside the map using get_mut [docs] that returns a mutable reference to the value. This test doesn't pass, as insert actually updates the existing value.

At the time of writing the method try_insert of HashMap is experimental, so I implemented a custom solution

use std::collections::hash_map::Entry;

[...]


    fn add(&mut self, key: String) {
        if let Entry::Vacant(entry) = self.items.entry(key) {
            entry.insert(true);
        }
    }

Here, I'm basically checking if an entry for key is vacant (does not exist) and I create it only in that case. This code passes all tests.

if let

I consider if let [docs] a very powerful piece of syntax. I care only about one of the possible outcomes, so I don't want to waste time defining it in a full-fledged match.

See the source code

Marking items¶

The second method I want to add is mark that allows me to set the value of the boolean corresponding to a given key. This will be used to flag an item as "done" or "to be done". The test is

    #[test]
    fn mark_item() {
        let mut todo = TodoList::new();
        todo.add(String::from("Something to do"));
        todo.mark(String::from("Something to do"), false);
        assert_eq!(todo.items.get("Something to do"), Some(&false))
        todo.mark(String::from("Something to do"), true);
        assert_eq!(todo.items.get("Something to do"), Some(&true))
    }

Here, I can follow the same strategy I used in the test add_item_does_not_change_value

impl TodoList {

    ...

    fn mark(&mut self, key: String, value: bool) {
        if let Some(x) = self.items.get_mut(&key) {
            *x = value;
        }
    }
}

What if the key is not in the list, though? The function get_mut returns an Option, but mark should signal with a Result that something didn't work. I can test this with

    #[test]
    fn mark_item_does_not_exist() {
        let mut todo = TodoList::new();
        assert_eq!(
            todo.mark(String::from("Something to do"), false),
            Err(String::from("Something to do"))
        );
    }

The new version of the function is then

impl TodoList {

    ...

    fn mark(&mut self, key: String, value: bool) -> Result<String, String> {
        let x = self.items.get_mut(&key).ok_or(&key)?;
        *x = value;

        Ok(key)
    }
}

The method ok_or [docs] converts an Option into a Result, so I just call ? to propagate the error.

The question mark operator

The operator ? is one of the best features of Rust, and it's explained in this chapter of the Rust Book. I find it such a simple yet extremely powerful way to deal with error propagation.

See the source code

Listing items¶

At this point I want to add the method list that allows me to see the items contained in TodoList. I'd like to separate the logic from the presentation so the method will return two lists of items, one for each value of the connected boolean.

This means that the output of the method should in my opinion be a tuple of iterators, one on the items with state "to be done" and one on the ones in state "done".

Iterators

Iterators are a big thing in Rust, and I can understand why as they definitely boost performances saving memory. The Rust book has a chapter on them, and there is clearly plenty of documentation for the relative trait.

I start with tests as usual

    #[test]
    fn list_items() {
        let mut todo = TodoList::new();
        todo.add(String::from("Something to do"));
        todo.add(String::from("Something else to do"));
        todo.add(String::from("Something done"));
        todo.mark(String::from("Something done"), false);

        let (todo_items, done_items) = todo.list();

        let todo_items: Vec<String> = todo_items.map(|x| x.clone()).collect();
        let done_items: Vec<String> = done_items.map(|x| x.clone()).collect();

        assert!(todo_items.iter().any(|e| e == "Something to do"));
        assert!(todo_items.contains(&String::from("Something else to do")));
        assert_eq!(todo_items.len(), 2);
        assert!(done_items.contains(&String::from("Something done")));
        assert_eq!(done_items.len(), 1);
    }

There is a lot to say here, and please remember the caveat that I'm not sure what I'm doing is the best thing.

I add some elements to the list and mark one as done, then I call the method list to get two iterators and test them. However, iterators can be traversed only once, so to test them properly I prefer to convert them into vectors using the method collect [docs].

To generate the two iterators I will probably use HashMap::iter [docs], which means they will have an element type &String, as we are interested in the item key.

As far as I can tell, there are several different strategies I can use here.

I can generate vectors of &String using the elements directly from the iterators and then use the method Vec::contains [docs]. However, the latter wants to receive a reference to the searched value, which means that I would end up with

let todo_items: Vec<&String> = todo_items.collect();
assert!(todo_items.contains(&&String::from("Something else to do")));

While this is perfectly reasonable in terms of memory consumption and performances, the double && is a bit ugly. So, considering that I'm writing a test, where performances are not the major concern, I'd prefer to simplify the syntax. I can create a vector of String values and check them

let todo_items: Vec<String> = todo_items.cloned().collect();
assert!(todo_items.contains(&String::from("Something else to do")));

The syntax todo_items.cloned() is equivalent to todo_items.map(|x| x.clone()) and leverages the implicit dereferencing of x. Here, copied() cannot be used as String doesn't implement the trait Copy.

A good alternative to contains is any, which however works on iterators. A final version of the code is then

let todo_items: Vec<String> = todo_items.cloned().collect();
assert!(todo_items.iter().any(|e| e == "Something to do"));

Which is also more elegant since it uses the comparison between a String (which is the iterator item type) and an &str (the right side). At this point my test is

#[test]
fn list_items() {
    let mut todo = TodoList::new();
    todo.add(String::from("Something to do"));
    todo.add(String::from("Something else to do"));
    todo.add(String::from("Something done"));
    todo.mark(String::from("Something done"), false);

    let (todo_items, done_items) = todo.list();

    let todo_items: Vec<String> = todo_items.cloned().collect();
    let done_items: Vec<String> = done_items.cloned().collect();

    assert!(todo_items.iter().any(|e| e == "Something to do"));
    assert!(todo_items.iter().any(|e| e == "Something else to do"));
    assert_eq!(todo_items.len(), 2);
    assert!(done_items.iter().any(|e| e == "Something done"));
    assert_eq!(done_items.len(), 1);
}

An implementation of the method list that passes this test is

    fn list(&self) ->
       (impl Iterator<Item = &String>, impl Iterator<Item = &String>) {
        (
            self.items.iter().filter(|x| *x.1 == true).map(|x| x.0),
            self.items.iter().filter(|x| *x.1 == false).map(|x| x.0),
        )
    }

Here, the powerful keyword impl declares that whatever comes out of that function implements the Iterator trait with an element type String. The code uses iter [docs] to create an iterator on the elements of the hash map (element type (&String, &bool), then uses map [docs] to extract the first element of each tuple. All in all, the function returns a tuple of Map [docs] which is a type that implements Iterator.

See the source code

Exposing commands on the CLI¶

It's time to expose the methods I implemented on the CLI. I realised that commands like add and mark-done require a second argument (the key), other commands like list don't.

So, the first change is to make the key argument optional.

#[derive(Parser)]
struct Cli {
    command: String,

    key: Option<String>,
}

Purely to have something to play with, I will also add some values to the list in main. This is temporary, as long as I don't implement a file storage mechanism.

fn main() {
    let args = Cli::parse();

    let mut todo = TodoList::new();

    todo.add("Something to do".to_string());
    todo.add("Something else to do".to_string());
    todo.add("Something done".to_string());
    todo.mark("Something done".to_string(), false).unwrap();
}

Last, the command-method binding part. A match construct is the best option in this case, something like

match args.command.as_str() {
    "add" => ...,
    "mark-done" => ...,
    "list" => ...
}

match

The match control flow construct is a blessing that comes directly from functional programming, where pattern matching is an important tool. The Rust book has a chapter dedicated to it and a chapter on the pattern syntax.

However, since each method has a different return type, I need the whole construct to return a uniform Result that can be used to print a meaningful state message at the end of the execution.

The code I wrote is the following

fn main() {
   ...

   let result = match args.command.as_str() {
        "add" => match args.key {
            Some(key) => {
                todo.add(key);
                Ok(())
            }
            None => Err("Key cannot be empty!".to_string()),
        },
        "mark-done" => match args.key {
            Some(key) => todo
                .mark(key, false)
                .map_err(|e| format!("Invalid key {}", e))
                .and(Ok(())),
            None => Err("Key cannot be empty!".to_string()),
        },
        "list" => {
            let (todo_items, done_items) = todo.list();

            println!("# TO DO");
            println!();
            todo_items.for_each(|x| println!(" * {}", x));

            println!();

            println!("# DONE");
            println!();
            done_items.for_each(|x| println!(" * {}", x));

            Ok(())
        }
        cmd => Err(format!("Command {} not recognised", cmd)),
    };

    match result {
        Err(e) => println!("ERROR: {}", e),
        Ok(_) => println!("SUCCESS"),
    }
}

Option and Result

It's paramount to learn how to convert Option [docs] into Result [docs] and vice versa, as well as how to convert a Result type into a different one. Being familiar with functions like map_err [docs] or and [docs] will drastically change the quality of your Rust code.

See the source code

Tidy up¶

At this point I went through the code and fixed some of the warning the compiler was still giving me. These all come from the tests, where I created the todo variable but never used it, and where I ignored the results returned by calls of todo.mark. There, I used unwrap [docs] as I'm happy for that to panic if something goes wrong.

See the source code

Final words¶

What a journey so far! It's really true that you can't consider a language learned until you start from scratch and try to use it to implement a real application. Well, it's not over yet, I'm still missing an important part which is the file storage.

If you have comments, suggestions, or corrections, please let me know! I am more than happy to learn something new from other coders and to publish updates to the post.

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

TDD in Python with pytest - Part 5

2020-09-21T10:30:00+02:00

This is the fifth and last post in the series "TDD in Python with pytest" where I develop a simple project following a strict TDD methodology. The posts come from my book Clean Architectures in Python and have been reviewed to get rid of some bad naming choices of the version published in the book.

You can find the first post here.

In this post I will conclude the discussion about mocks introducing patching.

Patching¶

Mocks are very simple to introduce in your tests whenever your objects accept classes or instances from outside. In that case, as shown in the previous sections, you just have to instantiate the class Mock and pass the resulting object to your system. However, when the external classes instantiated by your library are hardcoded this simple trick does not work. In this case you have no chance to pass a fake object instead of the real one.

This is exactly the case addressed by patching. Patching, in a testing framework, means to replace a globally reachable object with a mock, thus achieving the goal of having the code run unmodified, while part of it has been hot swapped, that is, replaced at run time.

A warm-up example¶

Clone the repository fileinfo that you can find here and move to the branch develop. As I did for the project simple_calculator, the branch master contains the full solution, and I use it to maintain the repository, but if you want to code along you need to start from scratch. If you prefer, you can clearly clone it on GitHub and make your own copy of the repository.

git clone https://github.com/lgiordani/fileinfo
cd fileinfo
git checkout --track origin/develop

Create a virtual environment following your preferred process and install the requirements

pip install -r requirements/dev.txt

You should at this point be able to run

pytest -svv

and get an output like

=============================== test session starts ===============================
platform linux -- Python XXXX, pytest-XXXX, py-XXXX, pluggy-XXXX --
fileinfo/venv3/bin/python3
cachedir: .cache
rootdir: fileinfo, inifile: pytest.ini
plugins: cov-XXXX
collected 0 items 

============================== no tests ran in 0.02s ==============================

Let us start with a very simple example. Patching can be complex to grasp at the beginning so it is better to start learning it with trivial use cases. The purpose of this library is to develop a simple class that returns information about a given file. The class shall be instantiated with the file path, which can be relative.

The starting point is the class with the method __init__. If you want you can develop the class using TDD, but for the sake of brevity I will not show here all the steps that I followed. This is the set of tests I have in tests/test_fileinfo.py

tests/test_fileinfo.py

from fileinfo.fileinfo import FileInfo


def test_init():
    filename = 'somefile.ext'
    fi = FileInfo(filename)
    assert fi.filename == filename


def test_init_relative():
    filename = 'somefile.ext'
    relative_path = '../{}'.format(filename)
    fi = FileInfo(relative_path)
    assert fi.filename == filename

and this is the code of the class FileInfo in the file fileinfo/fileinfo.py

fileinfo/fileinfo.py

import os


class FileInfo:
    def __init__(self, path):
        self.original_path = path
        self.filename = os.path.basename(path)

Git tag: first-version

As you can see the class is extremely simple, and the tests are straightforward. So far I didn't add anything new to what we discussed in the previous posts.

Now I want the method get_info to return a tuple with the file name, the original path the class was instantiated with, and the absolute path of the file. Pretending we are in the directory /some/absolute/path, the class should work as shown here

>>> fi = FileInfo('../book_list.txt')
>>> fi.get_info()
('book_list.txt', '../book_list.txt', '/some/absolute')

You can quickly realise that you have a problem writing the test. There is no way to easily test something as "the absolute path", since the outcome of the function called in the test is supposed to vary with the path of the test itself. Let us try to write part of the test

def test_get_info():
    filename = 'somefile.ext'
    original_path = '../{}'.format(filename)
    fi = FileInfo(original_path)
    assert fi.get_info() == (filename, original_path, '???')

where the '???' string highlights that I cannot put something sensible to test the absolute path of the file.

Patching is the way to solve this problem. You know that the function will use some code to get the absolute path of the file. So, within the scope of this test only, you can replace that code with something different and perform the test. Since the replacement code has a known outcome writing the test is now possible.

Patching, thus, means to inform Python that during the execution of a specific portion of the code you want a globally accessible module/object replaced by a mock. Let's see how we can use it in our example

tests/test_fileinfo.py

from unittest.mock import patch

[...]

def test_get_info():
    filename = 'somefile.ext'
    original_path = '../{}'.format(filename)

    with patch('os.path.abspath') as abspath_mock:
        test_abspath = 'some/abs/path'
        abspath_mock.return_value = test_abspath
        fi = FileInfo(original_path)
        assert fi.get_info() == (filename, original_path, test_abspath)

You clearly see the context in which the patching happens, as it is enclosed in a with statement. Inside this statement the module os.path.abspath will be replaced by a mock created by the function patch and called abspath_mock. So, while Python executes the lines of code enclosed by the statement with any call to os.path.abspath will return the object abspath_mock.

The first thing we can do, then, is to give the mock a known return_value. This way we solve the issue that we had with the initial code, that is using an external component that returns an unpredictable result. The line

tests/test_fileinfo.py

from unittest.mock import patch

[...]

def test_get_info():
    filename = 'somefile.ext'
    original_path = '../{}'.format(filename)

    with patch('os.path.abspath') as abspath_mock:
        test_abspath = 'some/abs/path'
        abspath_mock.return_value = test_abspath
        fi = FileInfo(original_path)
        assert fi.get_info() == (filename, original_path, test_abspath)

instructs the patching mock to return the given string as a result, regardless of the real values of the file under consideration.

The code that make the test pass is

fileinfo/fileinfo.py

class FileInfo:
    [...]

    def get_info(self):
        return (
            self.filename,
            self.original_path,
            os.path.abspath(self.original_path)
        )

When this code is executed by the test the function os.path.abspath is replaced at run time by the mock that we prepared there, which basically ignores the input value self.original_path and returns the fixed value it was instructed to use.

Git tag: patch-with-context-manager

It is worth at this point discussing outgoing messages again. The code that we are considering here is a clear example of an outgoing query, as the method get_info is not interested in changing the status of the external component. In the previous post we reached the conclusion that testing the return value of outgoing queries is pointless and should be avoided. With patch we are replacing the external component with something that we know, using it to test that our object correctly handles the value returned by the outgoing query. We are thus not testing the external component, as it has been replaced, and we are definitely not testing the mock, as its return value is already known.

Obviously to write the test you have to know that you are going to use the function os.path.abspath, so patching is somehow a "less pure" practice in TDD. In pure OOP/TDD you are only concerned with the external behaviour of the object, and not with its internal structure. This example, however, shows that this pure approach has some limitations that you have to cope with, and patching is a clean way to do it.

The patching decorator¶

The function patch we imported from the module unittest.mock is very powerful, as it can temporarily replace an external object. If the replacement has to or can be active for the whole test, there is a cleaner way to inject your mocks, which is to use patch as a function decorator.

This means that you can decorate the test function, passing as argument the same argument you would pass if patch was used in a with statement. This requires however a small change in the test function prototype, as it has to receive an additional argument, which will become the mock.

Let's change test_get_info, removing the statement with and decorating the function with patch

tests/test_fileinfo.py

@patch('os.path.abspath')
def test_get_info(abspath_mock):
    test_abspath = 'some/abs/path'
    abspath_mock.return_value = test_abspath

    filename = 'somefile.ext'
    original_path = '../{}'.format(filename)

    fi = FileInfo(original_path)
    assert fi.get_info() == (filename, original_path, test_abspath)

Git tag: patch-with-function-decorator

As you can see the decorator patch works like a big with statement for the whole function. The argument abspath_mock passed to the test becomes internally the mock that replaces os.path.abspath. Obviously this way you replace os.path.abspath for the whole function, so you have to decide case by case which form of the function patch you need to use.

Multiple patches¶

You can patch more that one object in the same test. For example, consider the case where the method get_info calls os.path.getsize in addition to os.path.abspath in order to return the size of the file. You have at this point two different outgoing queries, and you have to replace both with mocks to make your class work during the test.

This can be easily done with an additional patch decorator

tests/test_fileinfo.py

@patch('os.path.getsize')
@patch('os.path.abspath')
def test_get_info(abspath_mock, getsize_mock):
    filename = 'somefile.ext'
    original_path = '../{}'.format(filename)

    test_abspath = 'some/abs/path'
    abspath_mock.return_value = test_abspath

    test_size = 1234
    getsize_mock.return_value = test_size

    fi = FileInfo(original_path)
    assert fi.get_info() == (filename, original_path, test_abspath, test_size)

Please note that the decorator which is nearest to the function is applied first. Always remember that the decorator syntax with @ is a shortcut to replace the function with the output of the decorator, so two decorators result in

@decorator1
@decorator2
def myfunction():
    pass

which is a shorcut for

def myfunction():
    pass
myfunction = decorator1(decorator2(myfunction))

This explains why, in the test code, the function receives first abspath_mock and then getsize_mock. The first decorator applied to the function is the patch of os.path.abspath, which appends the mock that we call abspath_mock. Then the patch of os.path.getsize is applied and this appends its own mock.

The code that makes the test pass is

fileinfo/fileinfo.py

class FileInfo:
    [...]

    def get_info(self):
        return (
            self.filename,
            self.original_path,
            os.path.abspath(self.original_path),
            os.path.getsize(self.original_path)
        )

Git tag: multiple-patches

We can write the above test using two with statements as well

tests/test_fileinfo.py

def test_get_info():
    filename = 'somefile.ext'
    original_path = '../{}'.format(filename)

    with patch('os.path.abspath') as abspath_mock:
        test_abspath = 'some/abs/path'
        abspath_mock.return_value = test_abspath

        with patch('os.path.getsize') as getsize_mock:
            test_size = 1234
            getsize_mock.return_value = test_size

            fi = FileInfo(original_path)
            assert fi.get_info() == (
                filename,
                original_path,
                test_abspath,
                test_size
            )

Using more than one with statement, however, makes the code difficult to read, in my opinion, so in general I prefer to avoid complex with trees if I do not really need to use a limited scope of the patching.

Checking call parameters¶

When you patch, your internal algorithm is not executed, as the patched method just return the values it has been instructed to return. This is connected to what we said about testing external systems, so everything is good, but while we don't want to test the internals of the module os.path, we want to be sure that we are passing the correct values to the external methods.

This is why mocks provide methods like assert_called_with (and other similar methods), through which we can check the values passed to a patched method when it is called. Let's add the checks to the test

tests/test_fileinfo.py

@patch('os.path.getsize')
@patch('os.path.abspath')
def test_get_info(abspath_mock, getsize_mock):
    test_abspath = 'some/abs/path'
    abspath_mock.return_value = test_abspath

    filename = 'somefile.ext'
    original_path = '../{}'.format(filename)

    test_size = 1234
    getsize_mock.return_value = test_size

    fi = FileInfo(original_path)
    info = fi.get_info() 

    abspath_mock.assert_called_with(original_path)
    getsize_mock.assert_called_with(original_path)
    assert info == (filename, original_path, test_abspath, test_size)

As you can see, I first invoke fi.get_info storing the result in the variable info, check that the patched methods have been called witht the correct parameters, and then assert the format of its output.

The test passes, confirming that we are passing the correct values.

Git tag: addding-checks-for-input-values

Patching immutable objects¶

The most widespread version of Python is CPython, which is written, as the name suggests, in C. Part of the standard library is also written in C, while the rest is written in Python itself.

The objects (classes, modules, functions, etc.) that are implemented in C are shared between interpreters, and this requires those objects to be immutable, so that you cannot alter them at runtime from a single interpreter.

An example of this immutability can be given easily using a Python console

>>> a = 1
>>> a.conjugate = 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'int' object attribute 'conjugate' is read-only

Here I'm trying to replace a method with an integer, which is pointless per se, but clearly shows the issue we are facing.

What has this immutability to do with patching? What patch does is actually to temporarily replace an attribute of an object (method of a class, class of a module, etc.), which also means that if we try to replace an attribute in an immutable object the patching action will fail.

A typical example of this problem is the module datetime, which is also one of the best candidates for patching, since the output of time functions is by definition time-varying.

Let me show the problem with a simple class that logs operations. I will temporarily break the TDD methodology writing first the class and then the tests, so that you can appreciate the problem.

Create a file called logger.py and put there the following code

fileinfo/logger.py

import datetime


class Logger:
    def __init__(self):
        self.messages = []

    def log(self, message):
        self.messages.append((datetime.datetime.now(), message))

This is pretty simple, but testing this code is problematic, because the method log produces results that depend on the actual execution time. The call to datetime.datetime.now is however an outgoing query, and as such it can be replaced by a mock with patch.

If we try to do it, however, we will have a bitter surprise. This is the test code, that you can put in tests/test_logger.py

tests/test_logger.py

from unittest.mock import patch

from fileinfo.logger import Logger


@patch('datetime.datetime.now')
def test_log(mock_now):
    test_now = 123
    test_message = "A test message"
    mock_now.return_value = test_now

    test_logger = Logger()
    test_logger.log(test_message)
    assert test_logger.messages == [(test_now, test_message)]

When you try to execute this test you will get the following error

TypeError: can't set attributes of built-in/extension type 'datetime.datetime'

which is raised because patching tries to replace the function now in datetime.datetime with a mock, and since the module is immutable this operation fails.

Git tag: initial-logger-not-working

There are several ways to address this problem. All of them, however, start from the fact that importing or subclassing an immutable object gives you a mutable "copy" of that object.

The easiest example in this case is the module datetime itself. In the function test_log we tried to patch directly the object datetime.datetime.now, affecting the builtin module datetime. The file logger.py, however, does import datetime, so this latter becomes a local symbol in the module logger. This is exactly the key for our patching. Let us change the code to

tests/test_logger.py

@patch('fileinfo.logger.datetime.datetime')
def test_log(mock_datetime):
    test_now = 123
    test_message = "A test message"
    mock_datetime.now.return_value = test_now

    test_logger = Logger()
    test_logger.log(test_message)
    assert test_logger.messages == [(test_now, test_message)]

Git tag: correct-patching

If you run the test now, you can see that the patching works. What we did was to inject our mock in fileinfo.logger.datetime.datetime instead of datetime.datetime.now. Two things changed, thus, in our test. First, we are patching the module imported in the file logger.py and not the module provided globally by the Python interpreter. Second, we have to patch the whole module because this is what is imported by the file logger.py. If you try to patch fileinfo.logger.datetime.datetime.now you will find that it is still immutable.

Another possible solution to this problem is to create a function that invokes the immutable object and returns its value. This last function can be easily patched, because it just uses the builtin objects and thus is not immutable. This solution, however, requires changing the source code to allow testing, which is far from being optimal. Obviously it is better to introduce a small change in the code and have it tested than to leave it untested, but whenever is possible I try as much as possible to avoid solutions that introduce code which wouldn't be required without tests.

Mocks and proper TDD¶

Following a strict TDD methodology means writing a test before writing the code that passes that test. This can be done because we use the object under test as a black box, interacting with it through its API, and thus not knowing anything of its internal structure.

When we mock systems we break this assumption. In particular we need to open the black box every time we need to patch an hardcoded external system. Let's say, for example, that the object under test creates a temporary directory to perform some data processing. This is a detail of the implementation and we are not supposed to know it while testing the object, but since we need to mock the file creation to avoid interaction with the external system (storage) we need to become aware of what happens internally.

This also means that writing a test for the object before writing the implementation of the object itself is difficult. Pretty often, thus, such objects are built with TDD but iteratively, where mocks are introduced after the code has been written.

While this is a violation of the strict TDD methodology, I don't consider it a bad practice. TDD helps us to write better code consistently, but good code can be written even without tests. The real outcome of TDD is a test suite that is capable of detecting regressions or the removal of important features in the future. This means that breaking strict TDD for a small part of the code (patching objects) will not affect the real result of the process, only change the way we achieve it.

A warning¶

Mocks are a good way to approach parts of the system that are not under test but that are still part of the code that we are running. This is particularly true for parts of the code that we wrote, which internal structure is ultimately known. When the external system is complex and completely detached from our code, mocking starts to become complicated and the risk is that we spend more time faking parts of the system than actually writing code.

In this cases we definitely crossed the barrier between unit testing and integration testing. You may see mocks as the bridge between the two, as they allow you to keep unit-testing parts that are naturally connected ("integrated") with external systems, but there is a point where you need to recognise that you need to change approach.

This threshold is not fixed, and I can't give you a rule to recognise it, but I can give you some advice. First of all keep an eye on how many things you need to mock to make a test run, as an increasing number of mocks in a single test is definitely a sign of something wrong in the testing approach. My rule of thumb is that when I have to create more than 3 mocks, an alarm goes off in my mind and I start questioning what I am doing.

The second advice is to always consider the complexity of the mocks. You may find yourself patching a class but then having to create monsters like cls_mock().func1().func2().func3.assert_called_with(x=42) which is a sign that the part of the system that you are mocking is deep into some code that you cannot really access, because you don't know it's internal mechanisms.

The third advice is to consider mocks as "hooks" that you throw at the external system, and that break its hull to reach its internal structure. These hooks are obviously against the assumption that we can interact with a system knowing only its external behaviour, or its API. As such, you should keep in mind that each mock you create is a step back from this perfect assumption, thus "breaking the spell" of the decoupled interaction. Doing this makes it increasingly complex to create mocks, and this will contribute to keep you aware of what you are doing (or overdoing).

Final words¶

Mocks are a very powerful tool that allows us to test code that contains outgoing messages. In particular they allow us to test the arguments of outgoing commands. Patching is a good way to overcome the fact that some external components are hardcoded in our code and are thus unreachable through the arguments passed to the classes or the methods under analysis.

Updates¶

2021-03-06 GitHub user 4myhw spotted an inconsistency between the code on GitHub and the code in the post. Thanks!

2022-11-19 GitHub user rioj7 found and corrected a typo. Thanks!

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

TDD in Python with pytest - Part 4

2020-09-17T11:30:00+02:00

This is the fourth post in the series "TDD in Python with pytest" where I develop a simple project following a strict TDD methodology. The posts come from my book Clean Architectures in Python and have been reviewed to get rid of some bad naming choices of the version published in the book.

You can find the first post here.

In this post I will discuss a very interesting and useful testing tool: mocks.

Basic concepts¶

As we saw in the previous post the relationship between the component that we are testing and other components of the system can be complex. Sometimes idempotency and isolation are not easy to achieve, and testing outgoing commands requires to check the parameters sent to the external component, which is not trivial.

The main difficulty comes from the fact that your code is actually using the external system. When you run it in production the external system will provide the data that your code needs and the whole process can work as intended. During testing, however, you don't want to be bound to the external system, for the reasons explained in the previous post, but at the same time you need it to make your code work.

So, you face a complex issue. On the one hand your code is connected to the external system (be it hardcoded or chosen programmatically), but on the other hand you want it to run without the external system being active (or even present).

This problem can be solved with the use of mocks. A mock, in the testing jargon, is an object that simulates the behaviour of another (more complex) object. Wherever your code connects to an external system, during testing you can replace the latter with a mock, pretending the external system is there and properly checking that your component behaves like intended.

First steps¶

Let us try and work with a mock in Python and see what it can do. First of all fire up a Python shell and import the library

>>> from unittest import mock

The main object that the library provides is Mock and you can instantiate it without any argument

>>> m = mock.Mock()

This object has the peculiar property of creating methods and attributes on the fly when you require them. Let us first look inside the object to get an idea of what it provides

>>> dir(m)
[
    'assert_any_call', 'assert_called_once_with',
    'assert_called_with', 'assert_has_calls',
    'attach_mock', 'call_args', 'call_args_list',
    'call_count', 'called', 'configure_mock',
    'method_calls', 'mock_add_spec', 'mock_calls',
    'reset_mock', 'return_value', 'side_effect'
]

As you can see there are some methods which are already defined into the object Mock. Let's try to read a non-existent attribute

>>> m.some_attribute
<Mock name='mock.some_attribute' id='140222043808432'>
>>> dir(m)
[
    'assert_any_call', 'assert_called_once_with',
    'assert_called_with', 'assert_has_calls',
    'attach_mock', 'call_args', 'call_args_list',
    'call_count', 'called', 'configure_mock',
    'method_calls', 'mock_add_spec', 'mock_calls',
    'reset_mock', 'return_value', 'side_effect',
    'some_attribute'
]

As you can see this class is somehow different from what you are used to. First of all, its instances do not raise an AttributeError when asked for a non-existent attribute, but they happily return another instance of Mock itself. Second, the attribute you tried to access has now been created inside the object and accessing it returns the same mock object as before.

>>> m.some_attribute
<Mock name='mock.some_attribute' id='140222043808432'>

Mock objects are callables, which means that they may act both as attributes and as methods. If you try to call the mock, it just returns another mock with a name that includes parentheses to signal its callable nature

>>> m.some_attribute()
<Mock name='mock.some_attribute()' id='140247621475856'>

As you can understand, such objects are the perfect tool to mimic other objects or systems, since they may expose any API without raising exceptions. To use them in tests, however, we need them to behave just like the original, which implies returning sensible values or performing real operations.

Simple return values¶

The simplest thing a mock can do for you is to return a given value every time you call one of its methods. This is configured setting the attribute return_value of a mock object

>>> m.some_attribute.return_value = 42
>>> m.some_attribute()
42

Now, as you can see the object does not return a mock object any more, instead it just returns the static value stored in the attribute return_value. Since in Python everything is an object you can return here any type of value: simple types like an integer of a string, more complex structures like dictionaries or lists, classes that you defined, instances of those, or functions.

Pay attention that what the mock returns is exactly the object that it is instructed to use as return value. If the return value is a callable such as a function, calling the mock will return the function itself and not the result of the function. Let me give you an example

>>> def print_answer():
...  print("42")
... 
>>> 
>>> m.some_attribute.return_value = print_answer
>>> m.some_attribute()
<function print_answer at 0x7f8df1e3f400>

As you can see calling some_attribute just returns the value stored in return_value, that is the function itself. This is not exactly what we were aiming for. To make the mock call the object that we use as a return value we have to use a slightly more complex attribute called side_effect.

Complex return values¶

The side_effect parameter of mock objects is a very powerful tool. It accepts three different flavours of objects: callables, iterables, and exceptions, and changes its behaviour accordingly.

If you pass an exception the mock will raise it

>>> m.some_attribute.side_effect = ValueError('A custom value error')
>>> m.some_attribute()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/unittest/mock.py", line 939, in __call__
    return _mock_self._mock_call(*args, **kwargs)
  File "/usr/lib/python3.6/unittest/mock.py", line 995, in _mock_call
    raise effect
ValueError: A custom value error

If you pass an iterable, such as for example a generator, a plain list, tuple, or similar objects, the mock will yield the values of that iterable, i.e. return every value contained in the iterable on subsequent calls of the mock.

>>> m.some_attribute.side_effect = range(3)
>>> m.some_attribute()
0
>>> m.some_attribute()
1
>>> m.some_attribute()
2
>>> m.some_attribute()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/unittest/mock.py", line 939, in __call__
    return _mock_self._mock_call(*args, **kwargs)
  File "/usr/lib/python3.6/unittest/mock.py", line 998, in _mock_call
    result = next(effect)
StopIteration

As promised, the mock just returns every object found in the iterable (in this case a range object) one at a time until the generator is exhausted. According to the iterator protocol once every item has been returned the object raises the StopIteration exception, which means that you can safely use it in a loop.

Last, if you feed side_effect a callable, the latter will be executed with the parameters passed when calling the attribute. Let's consider again the simple example given in the previous section

>>> def print_answer():
...     print("42")       
>>> m.some_attribute.side_effect = print_answer
>>> m.some_attribute()
42

A slightly more complex example is that of a function with arguments

>>> def print_number(num):
...     print("Number:", num)
... 
>>> m.some_attribute.side_effect = print_number
>>> m.some_attribute(5)
Number: 5

As you can see the arguments passed to the attribute are directly used as arguments for the stored function. This is very powerful, especially if you stop thinking about "functions" and start considering "callables". Indeed, given the nature of Python objects we know that instantiating an object is not different from calling a function, which means that side_effect can be given a class and return a instance of it

>>> class Number:
...     def __init__(self, value):
...         self._value = value
...     def print_value(self):
...         print("Value:", self._value)
... 
>>> m.some_attribute.side_effect = Number
>>> n = m.some_attribute(26)
>>> n
<__main__.Number object at 0x7f8df1aa4470>
>>> n.print_value()
Value: 26

Asserting calls¶

As I explained in the previous post outgoing commands shall be tested checking the correctness of the message argument. This can be easily done with mocks, as these objects record every call that they receive and the arguments passed to it.

Let's see a practical example

from unittest import mock
import myobj


def test_connect():
    external_obj = mock.Mock()

    myobj.MyObj(external_obj)

    external_obj.connect.assert_called_with()

Here, the class myobj.MyObj needs to connect to an external object, for example a remote repository or a database. The only thing we need to know for testing purposes is if the class called the method connect of the external object without any parameter.

So the first thing we do in this test is to instantiate the mock object. This is a fake version of the external object, and its only purpose is to accept calls from the object MyObj under test and possibly return sensible values. Then we instantiate the class MyObj passing the external object. We expect the class to call the method connect so we express this expectation calling external_obj.connect.assert_called_with.

What happens behind the scenes? The class MyObj receives the fake external object and somewhere in its initialization process calls the method connect of the mock object. This call creates the method itself as a mock object. This new mock records the parameters used to call it and the subsequent call to its method assert_called_with checks that the method was called and that no parameters were passed.

In this case an object like

class MyObj():
    def __init__(self, repo):
        repo.connect()

would pass the test, as the object passed as repo is a mock that does nothing but record the calls. As you can see, the method __init__ actually calls repo.connect, and repo is expected to be a full-featured external object that provides connect in its API. Calling repo.connect when repo is a mock object, instead, silently creates the method (as another mock object) and records that the method has been called once without arguments.

The method assert_called_with allows us to also check the parameters we passed when calling. To show this let us pretend that we expect the method MyObj.setup to call setup(cache=True, max_connections=256) on the external object. Remember that this is an outgoing command, so we are interested in checking the parameters and not the result.

The new test can be something like

def test_setup():
    external_obj = mock.Mock()
    obj = myobj.MyObj(external_obj)
    obj.setup()
    external_obj.setup.assert_called_with(cache=True, max_connections=256)

In this case an object that passes the test can be

class MyObj():
    def __init__(self, repo):
        self._repo = repo
        repo.connect()

    def setup(self):
        self._repo.setup(cache=True, max_connections=256)

If we change the method setup to

    def setup(self):
        self._repo.setup(cache=True)

the test will fail with the following error

E           AssertionError: Expected call: setup(cache=True, max_connections=256)
E           Actual call: setup(cache=True)

Which I consider a very clear explanation of what went wrong during the test execution.

As you can read in the official documentation, the object Mock provides other methods and attributes, like assert_called_once_with, assert_any_call, assert_has_calls, assert_not_called, called, call_count, and many others. Each of those explores a different aspect of the mock behaviour concerning calls. Make sure to read their description and go through the examples.

A simple example¶

To learn how to use mocks in a practical case, let's work together on a new module in the simple_calculator package. The target is to write a class that downloads a JSON file with data on meteorites and computes some statistics on the dataset using the class SimpleCalculator. The file is provided by NASA at this URL.

The class contains a method get_data that queries the remote server and returns the data, and a method average_mass that uses the method SimpleCalculator.avg to compute the average mass of the meteorites and return it. In a real world case, like for example in a scientific application, I would probably split the class in two. One class manages the data, updating it whenever it is necessary, and another one manages the statistics. For the sake of simplicity, however, I will keep the two functionalities together in this example.

Let's see a quick example of what is supposed to happen inside our code. An excerpt of the file provided from the server is

[
    {
        "fall": "Fell",
        "geolocation": {
            "type": "Point",
            "coordinates": [6.08333, 50.775]
        },
        "id":"1",
        "mass":"21",
        "name":"Aachen",
        "nametype":"Valid",
        "recclass":"L5",
        "reclat":"50.775000",
        "reclong":"6.083330",
        "year":"1880-01-01T00:00:00.000"
    },
    {
        "fall": "Fell",
        "geolocation": {
            "type": "Point",
            "coordinates": [10.23333, 56.18333]
        },
        "id":"2",
        "mass":"720",
        "name":"Aarhus",
        "nametype":"Valid",
        "recclass":"H6",
        "reclat":"56.183330",
        "reclong":"10.233330",
        "year":"1951-01-01T00:00:00.000"
    }
]

So a good way to compute the average mass of the meteorites is

import urllib.request
import json

from simple_calculator.main import SimpleCalculator

URL = ("https://data.nasa.gov/resource/y77d-th95.json")

with urllib.request.urlopen(URL) as url:
    data = json.loads(url.read().decode())

masses = [float(d['mass']) for d in data if 'mass' in d]

print(masses)

calculator = SimpleCalculator()

avg_mass = calculator.avg(masses)

print(avg_mass)

Where the list comprehension filters out those elements which do not have a attribute mass. This code returns the value 50190.19568930039, so that is the average mass of the meteorites contained in the file.

Now we have a proof of concept of the algorithm, so we can start writing the tests. We might initially come up with a simple solution like

def test_average_mass():
    metstats = MeteoriteStats()

    data = metstats.get_data()

    assert metstats.average_mass(data) == 50190.19568930039

This little test contains, however, two big issues. First of all the method get_data is supposed to use the Internet connection to get the data from the server. This is a typical example of an outgoing query, as we are not trying to change the state of the web server providing the data. You already know that you should not test the return value of an outgoing query, but you can see here why you shouldn't use real data when testing either. The data coming from the server can change in time, and this can invalidate your tests.

Testing such a case becomes very simple with mocks. Since the class has a public method get_data that interacts with the external component, it is enough to temporarily replace it with a mock that provides sensible values. Create the file tests/test_meteorites.py and put this code in it

tests/test_meteorites.py

from unittest import mock

from simple_calculator.meteorites import MeteoriteStats


def test_average_mass():
    metstats = MeteoriteStats()

    metstats.get_data = mock.Mock()
    metstats.get_data.return_value = [
        {
            "fall": "Fell",
            "geolocation": {
                "type": "Point",
                "coordinates": [6.08333, 50.775]
            },
            "id":"1",
            "mass":"21",
            "name":"Aachen",
            "nametype":"Valid",
            "recclass":"L5",
            "reclat":"50.775000",
            "reclong":"6.083330",
            "year":"1880-01-01T00:00:00.000"},
        {
            "fall": "Fell",
            "geolocation": {
                "type": "Point",
                "coordinates": [10.23333, 56.18333]
            },
            "id":"2",
            "mass":"720",
            "name":"Aarhus",
            "nametype":"Valid",
            "recclass":"H6",
            "reclat":"56.183330",
            "reclong":"10.233330",
            "year":"1951-01-01T00:00:00.000"
        }
    ]

    result = metstats.average_mass(metstats.get_data())

    assert result == 370.5

When we run this test we are not testing that the external server provides the correct data. We are testing the process implemented by average_mass, feeding the algorithm some known input. This is not different from the first tests that we implemented: in that case we were testing an addition, here we are testing a more complex algorithm, but the concept is the same.

We can now write a class that passes this test. Put the following code in simple_calculator/meteorites.py alongside with main.py

simple_calculator/meteorites.py

import urllib.request
import json

from simple_calculator.main import SimpleCalculator

URL = ("https://data.nasa.gov/resource/y77d-th95.json")


class MeteoriteStats:
    def get_data(self):
        with urllib.request.urlopen(URL) as url:
            return json.loads(url.read().decode())

    def average_mass(self, data):
        calculator = SimpleCalculator()

        masses = [float(d['mass']) for d in data if 'mass' in d]

        return calculator.avg(masses)

As you can see the class contains the code we wrote as a proof of concept, slightly reworked to match the methods we used in the test. Run the test suite now, and you will see that the latest test we wrote passes.

Please note that we are not testing the method get_data. That method uses the function urllib.request.urlopen that opens an Internet connection without passing through any other public object that we can replace at run time during the test. We need then a tool to replace internal parts of our objects when we run them, and this is provided by patching, which will be the topic of the next post.

Git tag: meteoritestats-class-added

Final words¶

Mocks are very important, and as a Python programmer you need to know the subtleties of their implementation. Aside from the technical details, however, I believe it is mandatory to master the different types of tests that I discussed in the previous post, and to learn when to use simple assertions and when to pull a bigger gun like a mock object.

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

TDD in Python with pytest - Part 3

2020-09-15T08:00:00+02:00

This is the third post in the series "TDD in Python from scratch" where I develop a simple project following a strict TDD methodology. The posts come from my book Clean Architectures in Python and have been reviewed to get rid of some bad naming choices of the version published in the book.

What I introduced in the previous two posts is commonly called "unit testing", since it focuses on testing a single and very small unit of code. As simple as it may seem, the TDD process has some caveats that are worth being discussed. In this chapter I discuss some aspects of TDD and unit testing that I consider extremely important.

Tests should be fast¶

You will run your tests many times, potentially you should run them every time you save your code. Your tests are the watchdogs of your code, the dashboard warning lights that signal a correct status or some malfunction. This means that your testing suite should be fast. If you have to wait minutes for each execution to finish, chances are that you will end up running your tests only after some long coding session, which means that you are not using them as guides.

It's true however that some tests may be intrinsically slow, or that the test suite might be so big that running it would take an amount of time which makes continuous testing uncomfortable. In this case you should identify a subset of tests that run quickly and that can show you if something is not working properly, the so-called "smoke tests", and leave the rest of the suite for longer executions that you run less frequently. Typically, the library part of your project has tests that run very quickly, as testing functions does not require specific set-ups, while the user interface tests (be it a CLI or a GUI) are usually slower. If your tests are well-structured you can also run just the tests that are connected with the subsystem that you are dealing with.

Tests should be idempotent¶

Idempotency in mathematics and computer science identifies processes that can be run multiple times without changing the status of the system. Since this latter doesn't change, the tests can be run in whichever order without changing their results. If a test interacts with an external system leaving it in a different state you will have random failures depending on the execution order.

The typical example is when you interact with the filesystem in your tests. A test may create a file and not remove it, and this makes another test fail because the file already exists, or because the directory is not empty. Whatever you do while interacting with external systems has to be reverted after the test. If you run your tests concurrently, however, even this precaution is not enough.

This poses a big problem, as interacting with external systems is definitely to be considered dangerous. Mocks, introduced in the next chapter, are a very good tool to deal with this aspect of testing.

Tests should be isolated¶

In computer science isolation means that a component shall not change its behaviour depending on something that happens externally. In particular it shouldn't be affected by the execution of other components in the system (spatial isolation) and by the previous execution of the component itself (temporal isolation). Each test should run as much as possible in an isolated universe.

While this is easy to achieve for small components, like we did with the class SimpleCalculator, it might be almost impossible to do in more complex cases. Whenever you write a routine that deals with time, for example, be it the current date or a time interval, you are faced with something that flows incessantly and that cannot be stopped or slowed down. This is also true in other cases, for example if you are testing a routine that accesses an external service like a website. If the website is not reachable the test will fail, but this failure comes from an external source, not from the code under test.

Mocks or fake objects are a good tool to enforce isolation in tests that need to communicate with external actors in the system.

External systems¶

It is important to understand that the above definitions (idempotency, isolation) depend on the scope of the test. You should consider external whatever part of the system is not directly involved in the test, even though you need to use it to run the test itself. You should also try to reduce the scope of the test as much as possible.

Let me give you an example. Consider a web application and imagine a test that checks that a user can log in. The login process involves many layers: the user inputs, the username and the password in a GUI and submits the form, the GUI communicates with the core of the application that finds the user in the DB and checks the password hash against the one stored there, then sends back a message that grants access to the user, and the GUI stores a cookie to keep the user logged in. Suppose now that the test fails. Where is the error? Is it in the query that retrieves the user from the DB? Or in the routine that hashes the password? Or is it just an issue in the connectivity between the application and the database?

As you can see there are too many possible points of failure. While this is a perfectly valid integration test, it is definitely not a unit test. Unit tests try to test the smallest possible units of code in your system, usually simple routines like functions or object methods. Integration tests, instead, put together whole systems that have already been tested and test that they can work together.

Too many times developers confuse integration tests with unit tests. One simple example: every time a web framework makes you test your models against a real database you are mixing a unit test (the methods of the model object work) with an integration one (the model object connects with the database and can store/retrieve data). You have to learn how to properly identify what is external to your system in the scope of a given test, so your tests can be focused and small.

Focus on messages¶

I will never recommend enough Sandi Metz's talk "The Magic Tricks of Testing" where she considers the different messages that a software component has to deal with. She comes up with 3 different origins for messages (incoming, sent to self, and outgoing) and 2 types (query and command). The very interesting conclusion she reaches is that you should only test half of them, and I believe this is one of the most useful results you can learn as a software developer. In this section I will shamelessly start from Sandi Metz's categorisations and give a personal view of the matter. I absolutely recommend to watch the original talk as it is both short and very effective.

Testing is all about the behaviour of a component when it is used, i.e. when it is connected to other components that interact with it. This interaction is well represented by the word "message", which has hereafter the simple meaning of "data exchanged between two actors".

We can then classify the interactions happening in our system, and thus to our components, by flow and by type (Sandi Metz speaks of origin and type).

Message flow¶

The flow is defined as the tuple (source, destination), that is where the message comes from and what is its destination. There are three different combinations that we are interested in: (outside, self), (self, self), and (self, outside), where self is the object we are testing, and outside is a generic object that lives in the system. There is a fourth combination, (outside, outside) that is not relevant for the testing, since it doesn't involve the object under analysis.

So (outside, self) contains all the messages that other parts of the system send to our component. These messages correspond to the public API of the component, that is the set of entry points the component makes available to interact with it. Notable examples are the public methods of an object in an object-oriented programming language or the HTTP endpoints of a Web application. This flow represents the incoming messages.

At the opposite side of the spectrum there is (self, outside), which is the set of messages that the component under test sends to other parts of the system. These are for example the external calls that an object does to a library or to other objects, or the API of other applications we rely on, like databases or Web applications. This flow describes all the outgoing messages.

Between the two there is (self, self), which identifies the messages that the component sends to itself, i.e. the use that the component does of its own internal API. This can be the set of private methods of an object or the business logic inside a Web application. The important thing about this last case is that while the component is seen as a black box by the rest of the system it actually has an internal structure and it uses it to run. This flow contains all the private messages.

Message type¶

Messages can be further divided according to the interaction the source requires to have with the target: queries and commands. Queries are messages that do not change the status of the component, they just extract information. The class SimpleCalculator that we developed in the previous section is a typical example of object that exposes query methods. Adding two numbers doesn't change the status of the object, and you will receive the same answer every time you call the method add.

Commands are the opposite. They do not extract any information, but they change the status of the object. A method of an object that increases an internal counter or a method that adds values to an array are perfect examples of commands.

It's perfectly normal to combine a query and a command in a single message, as long as you are aware that your message is changing the status of the component. Remember that changing the status is something that can have concrete secondary effect.

The testing grid¶

Combining 3 flows and 2 message types we get 6 different message cases that involve the component under testing. For each one of this cases we have to decide how to test the interaction represented by that flow and message type.

Incoming queries¶

An incoming query is a message that an external actor sends to get a value from your component. Testing this behaviour is straightforward, as you just need to write a test that sends the message and makes an assertion on the returned value. A concrete example of this is what we did to test the method add of SimpleCalculator.

Incoming commands¶

An incoming command comes from an external actor that wants to change the status of the system. There should be a way for an external actor to check the status, which translates into the need of having either a companion incoming query message that allows to extract the status (or at least the part of the status affected by the command), or the knowledge that the change is going to affect the behaviour of another query. A simple example might be a method that sets the precision (number of digits) of the division in the object SimpleCalculator. Setting that value changes the result of a query, which can be used to test the effect of the incoming command.

Private queries¶

A private query is a message that the component sends to self to get a value without affecting its own state, and it is basically nothing more than an explicit use of some internal logic. This happens often in object-oriented languages because you extracted some common logic from one or more methods of an object and created a private method to avoid duplication.

Since private queries use the internal logic you shouldn't test them. This might be surprising, as private methods are code, and code should be tested, but remember that other methods are calling them, so the effects of that code are not invisible, they are tested by the tests of the public entry points, although indirectly. The only effect you would achieve by testing private methods is to lock the tests to the internal implementation of the component, which by definition shouldn't be used by anyone outside of the component itself. This in turn, makes refactoring painful, because you have to keep redundant tests in sync with the changes that you do, instead of using them as a guide for the code changes like TDD wants you to do.

As Sandi Metz says, however, this is not an inflexible rule. Whenever you see that testing an internal method makes the structure more robust feel free to do it. Be aware that you are locking the implementation, so do it only where it makes a real difference businesswise.

Private commands¶

Private commands shouldn't be treated differently than private queries. They change the status of the component, but this is again part of the internal logic of the component itself, so you shouldn't test private commands either. As stated for private queries, feel free to do it if this makes a real difference.

Outgoing queries and commands¶

An outgoing query is a message that the component under testing sends to an external actor asking for a value, without changing the status of the actor itself. The correctness of the returned value, given the inputs, is not part of what you want to test, because that is an incoming query for the external actor. Let me repeat this: you don't want to test that the external actor return the correct value given some inputs.

This is perhaps one of the biggest mistakes that programmers make when they test their applications. Definitely it is a mistake that I made many times. We tend to introduce tests that, starting from the code of our component, end up testing different components.

Outgoing commands are messages sent to external actors in order to change their state. Since our component sends such messages to cause an effect in another part of the system we have to be sure that the sent values are correct. We do not want to test that the state of the external actor change accordingly, as this is part of the testing suite of the external actor itself (incoming command).

From this consideration it is evident that you shouldn't test the results of any outgoing query or command. Possibly, you should avoid running them at all, otherwise you will need the external system to be up and running when you run the test suite.

We want to be sure, however, that our component uses the API of the external actor in a proper way and the standard technique to test this is to use mocks, that is components that simulate other components. Mocks are an important tool in the TDD methodology and for this reason they are the topic of the next chapter.

| Flow     | Type    | Test? |
|----------|---------|-------|
| Incoming | Query   | Yes   |
| Incoming | Command | Yes   |
| Private  | Query   | Maybe |
| Private  | Command | Maybe |
| Outgoing | Query   | Mock  |
| Outgoing | Command | Mock  |

Final words¶

Since the discovery of TDD few things changed the way I write code more than these considerations on what I am supposed to test. Out of 6 different types of tests we discovered that 2 shouldn't be tested, 2 of them require a very simple technique based on assertions, and the last 2 are the only ones that requires an advanced technique (mocks). This should cheer you up, as for once a good methodology doesn't add new rules and further worries, but removes one third of them, even forbidding you to implement them!

In the next two posts I will discuss mocks and patches, two very important testing tools to have in your belt.

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

TDD in Python with pytest - Part 2

2020-09-11T10:30:00+02:00

This is the second post in the series TDD in Python with pytest where I develop a simple project following a strict TDD methodology. The posts come from my book Clean Architectures in Python and have been reviewed to get rid of some bad naming choices of the version published in the book.

You can find the first post here.

Step 7 - Division¶

The requirements state that there shall be a division function, and that it has to return a float value. This is a simple condition to test, as it is sufficient to divide two numbers that do not give an integer result

tests/test_main.py

def test_div_two_numbers_float():
    calculator = SimpleCalculator()

    result = calculator.div(13, 2)

    assert result == 6.5

The test suite fails with the usual error that signals a missing method. The implementation of this function is very simple as the operator / in Python performs a float division

simple_calculator/main.py

class SimpleCalculator:
    [...]

    def div(self, a, b):
        return a / b

Git tag: step-7-float-division

If you run the test suite again all the test should pass. There is a second requirement about this operation, however, that states that division by zero shall return inf.

I already mentioned in the previous post that this is not a good requirement, and please don't go around telling people that I told you to create function that return either floats or strings. This is a simple requirement that I will use to show you how to deal with exceptions.

The test that comes from the requirement is simple

tests/test_main.py

def test_div_by_zero_returns_inf():
    calculator = SimpleCalculator()

    result = calculator.div(5, 0)

    assert result == float('inf')

And the test suite fails now with this message

__________________________ test_div_by_zero_returns_inf ___________________________

    def test_div_by_zero_returns_inf():
        calculator = SimpleCalculator()

>       result = calculator.div(5, 0)

tests/test_main.py:70:  
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <simple_calculator.main.SimpleCalculator object at 0x7f0b0b733990>, a = 5, b = 0  1

    def div(self, a, b):
>       return a / b
E       ZeroDivisionError: division by zero

simple_calculator/main.py:17: ZeroDivisionError

Note that when an exception happens in the code and not in the test, the pytest output changes slightly. The first part of the message shows where the test fails, but then there is a second part that shows the internal code that raised the exception and provides information about the value of local variables on the first line 1.

We might implement two different solutions to satisfy this requirement and its test. The first one is to prevent b to be 0

simple_calculator/main.py

    def div(self, a, b):
        if not b:
            return float('inf')

        return a / b

and the second one is to intercept the exception with a try/except block

simple_calculator/main.py

    def div(self, a, b):
        try:
            return a / b
        except ZeroDivisionError:
            return float('inf')

Both solutions make the test suite pass, so both are correct. I leave to you the decision about which is the best one, syntactically speaking.

Git tag: step-7-float-division

Step 8 - Testing exceptions¶

A further requirement is that multiplication by zero must raise a ValueError exception. This means that we need a way to test if our code raises an exception, which is the opposite of what we did until now. In the previous tests, the condition to pass was that there was no exception in the code, while in this test the condition will be that an exception has been raised.

Again, this is a requirement I made up just for the sake of showing you how do deal with exceptions, so if you think this is a silly behaviour for a multiplication function you are probably right.

Pytest provides a context manager named raises that runs the code contained in it and passes only if the given exception is produced by that code.

tests/test_main.py

import pytest

[...]

def test_mul_by_zero_raises_exception():
    calculator = SimpleCalculator()

    with pytest.raises(ValueError):
        calculator.mul(3, 0)

In this case, thus, pytest runs the line calculator.mul(3, 0). If the method doesn't raise the exception ValueError the test will fail. Indeed, if you run the test suite now, you will get the following failure

________________________ test_mul_by_zero_raises_exception ________________________

    def test_mul_by_zero_raises_exception():
        calculator = SimpleCalculator()

        with pytest.raises(ValueError):
>           calculator.mul(3, 0)
E           Failed: DID NOT RAISE <class 'ValueError'>

tests/test_main.py:81: Failed

which signals that the code didn't raise the expected exception.

The code that makes the test pass needs to test if one of the inputs of the function mul is 0. This can be done with the help of the built-in function all, which accepts an iterable and returns True only if all the values contained in it are True. Since in Python the value 0 is not true, we may write

simple_calculator/main.py

    def mul(self, *args):
        if not all(args):
            raise ValueError
        return reduce(lambda x, y: x*y, args)

and make the test suite pass. The condition checks that there are no false values in the tuple args, that is there are no zeros.

Git tag: step-8-multiply-by-zero

Step 9 - A more complex set of requirements¶

Until now the requirements were pretty simple, and it was easy to map each of them directly into tests. It's time to try to tackle a more complex problem. The remaining requirements say that the class has to provide a function to compute the average of an iterable, and that this function shall accept two optional upper and lower thresholds to remove outliers.

Let's break these two requirements into a set of simpler ones

The function accepts an iterable and computes the average, i.e. avg([2, 5, 12, 98]) == 29.25
The function accepts an optional upper threshold. It must remove all the values that are greater than the threshold before computing the average, i.e. avg([2, 5, 12, 98], ut=90) == avg([2, 5, 12])
The function accepts an optional lower threshold. It must remove all the values that are less then the threshold before computing the average, i.e. avg([2, 5, 12, 98], lt=10) == avg([12, 98])
The upper threshold is not included when removing data, i.e. avg([2, 5, 12, 98], ut=12) == avg([2, 5, 12])
The lower threshold is not included when removing data, i.e. avg([2, 5, 12, 98], lt=5) == avg([5, 12, 98])
The function works with an empty list, returning 0, i.e. avg([]) == 0
The function works if the list is empty after outlier removal, i.e. avg([12, 98], lt=15, ut=90) == 0
The function outlier removal works if the list is empty, i.e. avg([], lt=15, ut=90) == 0

As you can see a requirement can produce multiple tests. Some of these are clearly expressed by the requirement (numbers 1, 2, 3), some of these are choices that we make (numbers 4, 5, 6) and can be discussed, some are boundary cases that we have to discover thinking about the problem (numbers 6, 7, 8).

There is a fourth category of tests, which are the ones that come from bugs that you discover. We will discuss about those later in this chapter.

Now, if you followed the posts coding along it is time to try to tackle a problem on your own. Why don't you try to go on and implement these features? Each of the eight requirements can be directly mapped into a test, and you know how to write tests and code that passes them. The next steps show my personal solution, which is just one of the possible ones, so you can compare what you did with what I came up with to solve the tests.

Step 9.1 - Average of an iterable

Let's start adding a test for requirement number 1

tests/test_main.py

def test_avg_correct_average():
    calculator = SimpleCalculator()

    result = calculator.avg([2, 5, 12, 98])

    assert result == 29.25

We feed the function avg a list of generic numbers, which average we calculated with an external tool. The first run of the test suite fails with the usual complaint about a missing function, and we can make the test pass with a simple use of sum and len, as both built-in functions work on iterables

simple_calculator/main.py

class SimpleCalculator:
    [...]

    def avg(self, it):
        return sum(it)/len(it)

Here, it stands for iterable, as this function works with anything that supports the loop protocol.

Git tag: step-9-1-average-of-an-iterable

Step 9.2 - Upper threshold

The second requirement mentions an upper threshold, but we are free with regards to the API, i.e. the requirement doesn't specify how the threshold is supposed to be specified or named. I decided to call the upper threshold parameter ut, so the test becomes

tests/test_main.py

def test_avg_removes_upper_outliers():
    calculator = SimpleCalculator()

    result = calculator.avg([2, 5, 12, 98], ut=90)

    assert result == pytest.approx(6.333333)

As you can see the parameter ut=90 is supposed to remove the element 98 from the list and then compute the average of the remaining elements. Since the result has an infinite number of digits I used the function pytest.approx to check the result.

The test suite fails because the function avg doesn't accept the parameter ut

_________________________ test_avg_removes_upper_outliers _________________________

    def test_avg_removes_upper_outliers():
        calculator = SimpleCalculator()

>       result = calculator.avg([2, 5, 12, 98], ut=90)
E       TypeError: avg() got an unexpected keyword argument 'ut'

tests/test_main.py:95: TypeError

There are two problems now that we have to solve, as it happened for the second test we wrote in this project. The new ut argument needs a default value, so we have to manage that case, and then we have to make the upper threshold work. My solution is

simple_calculator/main.py

    def avg(self, it, ut=None):
        if not ut:
            ut = max(it)

        _it = [x for x in it if x <= ut]

        return sum(_it)/len(_it)

The idea here is that ut is used to filter the iterable keeping all the elements that are less than or equal to the threshold. This means that the default value for the threshold has to be neutral with regards to this filtering operation. Using the maximum value of the iterable makes the whole algorithm work in every case, while for example using a big fixed value like 9999 would introduce a bug, as one of the elements of the iterable might be bigger than that value.

Git tag: step-9-2-upper-threshold

Step 9.3 - Lower threshold

The lower threshold is the mirror of the upper threshold, so it doesn't require many explanations. The test is

tests/test_main.py

def test_avg_removes_lower_outliers():
    calculator = SimpleCalculator()

    result = calculator.avg([2, 5, 12, 98], lt=10)

    assert result == pytest.approx(55)

and the code of the function avg now becomes

simple_calculator/main.py

    def avg(self, it, lt=None, ut=None):
        if not lt:
            lt = min(it)

        if not ut:
            ut = max(it)

        _it = [x for x in it if x >= lt and x <= ut]

        return sum(_it)/len(_it)

Git tag: step-9-3-lower-threshold

Step 9.4 and 9.5 - Boundary inclusion

As you can see from the code of the function avg, the upper and lower threshold are included in the comparison, so we might consider the requirements as already satisfied. TDD, however, pushes you to write a test for each requirement (as we saw it's not unusual to actually have multiple tests per requirements), and this is what we are going to do.

The reason behind this is that you might get the expected behaviour for free, like in this case, because some other code that you wrote to pass a different test provides that feature as a side effect. You don't know, however what will happen to that code in the future, so if you don't have tests that show that all your requirements are satisfied you might lose features without knowing it.

The test for the fourth requirement is

tests/test_main.py

def test_avg_upper_threshold_is_included():
    calculator = SimpleCalculator()

    result = calculator.avg([2, 5, 12, 98], ut=98)

    assert result == 29.25

Git tag: step-9-4-upper-threshold-is-included

while the test for the fifth one is

tests/test_main.py

def test_avg_lower_threshold_is_included():
    calculator = SimpleCalculator()

    result = calculator.avg([2, 5, 12, 98], lt=2)

    assert result == 29.25

Git tag: step-9-5-lower-threshold-is-included

And, as expected, both pass without any change in the code. Do you remember rule number 5? You should ask yourself why the tests don't fail. In this case we reasoned about that before, so we can accept that the new tests don't require any code change to pass.

Step 9.6 - Empty list

Requirement number 6 is something that wasn't clearly specified in the project description so we decided to return 0 as the average of an empty list. You are free to change the requirement and decide to raise an exception, for example.

The test that implements this requirement is

tests/test_main.py

def test_avg_empty_list():
    calculator = SimpleCalculator()

    result = calculator.avg([])

    assert result == 0

and the test suite fails with the following error

_______________________________ test_avg_empty_list _______________________________

    def test_avg_empty_list():
        calculator = SimpleCalculator()

>       result = calculator.avg([])

tests/test_main.py:127:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <simple_calculator.main.SimpleCalculator object at 0x7feeb7098a10>, it = [], lt = None, ut = None

    def avg(self, it, lt=None, ut=None):
        if not lt:
>           lt = min(it)
E           ValueError: min() arg is an empty sequence

simple_calculator/main.py:26: ValueError

The function min that we used to compute the default lower threshold doesn't work with an empty list, so the code raises an exception. The simplest solution is to check for the length of the iterable before computing the default thresholds

simple_calculator/main.py

    def avg(self, it, lt=None, ut=None):
        if not len(it):
            return 0

        if not lt:
            lt = min(it)

        if not ut:
            ut = max(it)

        _it = [x for x in it if x >= lt and x <= ut]

        return sum(_it)/len(_it)

Git tag: step-9-6-empty-list

As you can see the function avg is already pretty rich, but at the same time it is well structured and understandable. This obviously happens because the example is trivial, but cleaner code is definitely among the benefits of TDD.

Step 9.7 - Empty list after applying the thresholds

The next requirement deals with the case in which the outlier removal process empties the list. The test is the following

tests/test_main.py

def test_avg_manages_empty_list_after_outlier_removal():
    calculator = SimpleCalculator()

    result = calculator.avg([12, 98], lt=15, ut=90)

    assert result == 0

and the test suite fails with a ZeroDivisionError, because the length of the iterable is now 0.

________________ test_avg_manages_empty_list_after_outlier_removal ________________

    def test_avg_manages_empty_list_after_outlier_removal():
        calculator = SimpleCalculator()

>       result = calculator.avg([12, 98], lt=15, ut=90)

tests/test_main.py:135:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <simple_calculator.main.SimpleCalculator object at 0x7f9e60c3ba90>, it = [12, 98], lt = 15, ut = 90

    def avg(self, it, lt=None, ut=None):
        if not len(it):
            return 0

        if not lt:
            lt = min(it)

        if not ut:
            ut = max(it)

        _it = [x for x in it if x >= lt and x <= ut]

>       return sum(_it)/len(_it)
E       ZeroDivisionError: division by zero

simple_calculator/main.py:36: ZeroDivisionError

The easiest solution is to introduce a new check on the length of the iterable

simple_calculator/main.py

    def avg(self, it, lt=None, ut=None):
        if not len(it):
            return 0

        if not lt:
            lt = min(it)

        if not ut:
            ut = max(it)

        _it = [x for x in it if x >= lt and x <= ut]

        if not len(_it):
            return 0

        return sum(_it)/len(_it)

And this code makes the test suite pass. As I stated before, code that makes the tests pass is considered correct, but you are always allowed to improve it. In this case I don't really like the repetition of the length check, so I might try to refactor the function to get a cleaner solution. Since I have all the tests that show that the requirements are satisfied, I am free to try to change the code of the function.

After some attempts I found this solution

simple_calculator/main.py

    def avg(self, it, lt=None, ut=None):
        _it = it[:]

        if lt:
            _it = [x for x in _it if x >= lt]

        if ut:
            _it = [x for x in _it if x <= ut]

        if not len(_it):
            return 0

        return sum(_it)/len(_it)

which looks reasonably clean, and makes the whole test suite pass.

Git tag: step-9-7-empty-list-after-thresholds

Step 9.8 - Empty list before applying the thresholds

The last requirement checks another boundary case, which happens when the list is empty and we specify one of or both the thresholds. This test will check that the outlier removal code doesn't assume the list contains elements.

tests/test_main.py

def test_avg_manages_empty_list_before_outlier_removal():
    calculator = SimpleCalculator()

    result = calculator.avg([], lt=15, ut=90)

    assert result == 0

This test doesn't fail. So, according to the TDD methodology, we should provide a reason why this happens and decide if we want to keep the test. The reason is because the two list comprehensions used to filter the elements work perfectly with empty lists. As for the test, it comes directly from a corner case, and it checks a behaviour which is not already covered by other tests. This makes me decide to keep the test.

Git tag: step-9-8-empty-list-before-thresholds

Step 9.9 - Zero as lower/upper threshold

This is perhaps the most important step of the whole chapter, for two reasons.

First of all, the test added in this step was added by two readers of my book about clean architectures (Faust Gertz and Michael O'Neill), and this shows a real TDD workflow. After you published you package (or your book, in this case) someone notices a wrong behaviour in some use case. This might be a big flaw or a tiny corner case, but in any case they can come up with a test that exposes the bug, and maybe even with a patch to the code, but the most important part is the test.

Whoever discovers the bug has a clear way to show it, and you, as an author/maintainter/developer can add that test to your suite and work on the code until that passes. The rest of the test suite will block any change in the code that disrupts the behaviour you already tested. As I already stressed multiple times, we could do the same without TDD, but if we need to change a substantial amount of code there is nothing like a test suite that can guarantee we are not re-introducing bugs (also called regressions).

Second, this step shows an important part of the TDD workflow: checking corner cases. In general you should pay a lot of attention to the boundaries of a domain, and test the behaviour of the code in those cases.

This test shows that the code doesn't manage zero-valued lower thresholds correctly

tests/test_main.py

def test_avg_manages_zero_value_lower_outlier():
    calculator = SimpleCalculator()

    result = calculator.avg([-1, 0, 1], lt=0)

    assert result == 0.5

The reason is that the function avg contains a check like if lt:, which fails when lt is 0, as that is a false value. The check should be if lt is not None:, so that part of the function avg becomes

simple_calculator/main.py

        if lt is not None:
            _it = [x for x in _it if x >= lt]

It is immediately clear that the upper threshold has the same issue, so the two tests I added are

tests/test_main.py

def test_avg_manages_zero_value_lower_outlier():
    calculator = SimpleCalculator()

    result = calculator.avg([-1, 0, 1], lt=0)

    assert result == 0.5


def test_avg_manages_zero_value_upper_outlier():
    calculator = SimpleCalculator()

    result = calculator.avg([-1, 0, 1], ut=0)

    assert result == -0.5

and the final version of avg is

simple_calculator/main.py

    def avg(self, it, lt=None, ut=None):
        _it = it[:]

        if lt is not None:
            _it = [x for x in _it if x >= lt]

        if ut is not None:
            _it = [x for x in _it if x <= ut]

        if not len(_it):
            return 0

        return sum(_it)/len(_it)

Git tag: step-9-9-zero-as-lower-upper-threshold

Step 9.10 - Refactoring for generators

One of the readers of this series, Dmitry Labazkin, was following the series and noticed that the final implementation has some drawbacks, namely:

According to the requirements, this method should accept any iterable, but the implementation can't process generators (which are iterators and also iterables). For example, the function len() cannot be used with generators.
The iterable is copied, which is something we try to avoid to reduce memory usage.
Globally, the iterator is read 4 times, which affects performances.

These are interesting points, and he provides an implementation that solves them all. It's important to mention that the first point is closely related to requirements, so it should be represented by a unit test, while the other two are connected with performances and cannot be tested with pytest. However, any refactoring that produces code we consider better (for example from the performances point of view) can be tested by the existing tests. In other words, we can provide an alternative implementation and still make sure it works correctly.

Dmitry adds a test to check that generators are supported

tests/test_main.py

def test_avg_accepts_generators():
    calculator = SimpleCalculator()
    result = calculator.avg(i for i in [2, 5, 12, 98])
    assert result == 29.25

His implementation of the function avg() passes that test and the previous ones we wrote

simple_calculator/main.py

    def avg(self, it, lt=None, ut=None):
        count = 0
        total = 0

        for number in it:
            if lt is not None and number < lt:
                continue
            if ut is not None and number > ut:
                continue
            count += 1
            total += number

        if count == 0:
            return 0

        return total / count

One might argue that this implementation is less pythonic as it doesn't use fancy list comprehensions, but again, that is a matter of style (and performances). The point about generators is correct, but if that wasn't included in the requirements we might accept either implementation. I personally believe this new implementation is much better than the previous one, as I like to keep a low memory fingerprint, but if we were sure the calculator is used only on small sequences the concern might be overkill.

Git tag: step-9-10-refactoring-for-generators

Recap of the TDD rules¶

Through this very simple example we learned 6 important rules of the TDD methodology. Let us review them, now that we have some experience that can make the words meaningful

Test first, code later
Add the bare minimum amount of code you need to pass the tests
You shouldn't have more than one failing test at a time
Write code that passes the test. Then refactor it.
A test should fail the first time you run it. If it doesn't ask yourself why you are adding it.
Never refactor without tests.

How many assertions?¶

I am frequently asked "How many assertions do you put in a test?", and I consider this question important enough to discuss it in a dedicated section. To answer this question I want to briefly go back to the nature of TDD and the role of the test suite that we run.

The whole point of automated tests is to run through a set of checkpoints that can quickly reveal that there is a problem in a specific area. Mind the words "quickly" and "specific". When I run the test suite and an error occurs I'd like to be able to understand as fast as possible where the problem lies. This doesn't (always) mean that the problem will have a quick resolution, but at least I can be immediately aware of which part of the system is misbehaving.

On the other hand, we don't want to have too many test for the same condition, on the contrary we want to avoid testing the same condition more than once as tests have to be maintained. A test suite that is too fine-grained might result in too many tests failing because of the same problem in the code, which might be daunting and not very informative.

My advice is to group together assertions that can be executed after running the same setup, if they test the same process. For example, you might consider the two functions add and sub that we tested in this chapter. They require the same setup, which is to instantiate the class SimpleCalculator (a setup that they share with many other tests), but they are actually testing two different processes. A good sign of this is that you should rename the test to test_add_or_sub, and a failure in this test would require a further investigation in the test output to check which method of the class is failing.

If you have to test that a method returns positive even numbers, instead, you will have consider running the method and then writing two assertions, one that checks that the number is positive, and one that checks it is even. This makes sense, as a failure in one of the two means a failure of the whole process.

As a rule of thumb, then, consider if the test is a logical AND between conditions or a logical OR. In the former case go for multiple assertions, in the latter create multiple test functions.

How to manage bugs or missing features¶

In this chapter we developed the project from scratch, so the challenge was to come up with a series of small tests starting from the requirements. At a certain point in the life of your project you will have a stable version in production (this expression has many definitions, but in general it means "used by someone other than you") and you will need to maintain it. This means that people will file bug reports and feature requests, and TDD gives you a clear strategy to deal with those.

From the TDD point of view both a bug and a missing feature are cases not currently covered by a test, so I will refer to them collectively as bugs, but don't forget that I'm talking about the second ones as well.

The first thing you need to do is to write one or more tests that expose the bug. This way you can easily decide when the code that you wrote is correct or good enough. For example, let's assume that a user files an issue on the project SimpleCalculator saying: "The function add doesn't work with negative numbers". You should definitely try to get a concrete example from the user that wrote the issue and some information about the execution environment (as it is always possible that the problem comes from a different source, like for example an old version of a library your package relies on), but in the meanwhile you can come up with at least 3 tests: one that involves two negative numbers, one with a negative number as the first argument, and one with a negative numbers as the second argument.

You shouldn't write down all of them at once. Write the first test that you think might expose the issue and see if it fails. If it doesn't, discard it and write a new one. From the TDD point of view, if you don't have a failing test there is no bug, so you have to come up with at least one test that exposes the issue you are trying to solve.

At this point you can move on and try to change the code. Remember that you shouldn't have more than one failing test at a time, so start doing this as soon as you discover a test case that shows there is a problem in the code.

Once you reach a point where the test suite passes without errors stop and try to run the code in the environment where the bug was first discovered (for example sharing a branch with the user that created the ticket) and iterate the process.

The problem of types¶

Other than contributing to the TDD steps, Dmitry Labazkin asked some relevant questions about types, that I will summarise here. You can read his original questions in issue #11 and issue #12.

The question of type checking is thorny, and since this is an introductory series I will discuss it briefly and give some pointers. Don't get me wrong, though. As I will say later, this is one of the most important topics we can discuss in computer science.

Overall the problem Dmitry raises is that operators like addition and multiplication are valid for types other than integers (like floats) and also non-numeric ones (like strings). In Python, it is possible to multiply a string by a number and obtain a concatenation of that number of copies of the original string. At the same time, however, subtraction and division are not defined for strings, so some of the questions we can ask are:

can SimpleCalculator be used on non-integer numeric types?
can SimpleCalculator be used on non-numeric types?
shall we explicitly check in the code that the input values belong to a certain type?
shall we write tests to rule out other types?

As I said, such questions are deceptively simple, so let's tackle them step by step.

Let's assume it makes sense for our class to work with numeric types. In Python there is no way to prevent a program from calling SimpleCalculator().add("string1", "string2"), which would fail as the current implementation uses the built-in function sum that doesn't work on strings (unless you call it with a specific initial value). However, calling SimpleCalculator().mul("abc", 3) would result in "abcabcabc", as the internal implementation quietly supports strings.

Given the inconsistency, we might be tempted to rule out non-numeric types explicitly. In other words, we might want to add code to our calculator that actively checks if we are passing a non-numeric type. In that case we shall also add tests for those types, according to the TDD methodology, as no code can be added without tests.

The reason why this topic is thorny is because Python relies heavily on polymorphism, which means that it is more interested in the behaviour of an object more than in its nature. In other words, an object can be considered a number because it is an instance of int or float, for example, but it could just be a class we made up that behaves like one of those types. Using Abstract Base Classes like numbers is useful to check if an object is an instance of one of the types encompassed by the hierarchy (again, types such as int and float) but doesn't automatically include everything that behaves like a number. We can create a class that behaves like int without belonging to the hierarchy of numbers.

Ultimately, this is the reason why Python programmers have to remember that the operator + can be used with types like int, string, and list, but cannot be used with dictionaries. Conversely, len can be used on dictionaries and lists, but cannot be used on integers. We need to remember it, as these operators are polymorphic (there is no operator int+ or float+) but don't make sense or are not implemented for some types.

Those basic operators and functions raise an exception when the wrong type is passed, so we might be tempted to do the same and explicitly raise an exception when the wrong type is passed to SimpleCalculator. Again, the focus is on behaviour and implementation. If our implementation doesn't work with instances of certain classes an exception will occur already, and we don't need to do it explicitly. The aforementioned snipped SimpleCalculator().add("string1", "string2") would raise a TypeError because the underlying sum doesn't like strings. We don't need to do it explicitly.

In conclusion, my answers to the questions above are:

Can SimpleCalculator be used on non-integer numeric types? Probably, given the implementation is not specific to integers, but if we want to be sure we should add some tests to expose the functionality. So far, according to TDD, the class is certified to work with integers only. In this case, I might want to add some tests to show that it works with floats. But if someone feeds the class float-like objects that for some reason do not support the operator / some part of the calculator won't work, and there is no way to test all those conditions.

Can SimpleCalculator be used on non-numeric types? Yes, to a certain extent. mul can be used on sequences, for example. It is a calculator, though, so it doesn't make much sense to try to use it on non-numeric types. Users can feed the calculator any sort of non-numeric types and we cannot do anything to prevent it.

Shall we explicitly check in the code that the input values belong to a certain type? This goes against the nature of Python: if a certain function or method doesn't work with a specific type an exception will be raised.

Shall we write tests to rule out other types? Since it is basically impossible to write code that narrows the set of accepted types it is also impossible to write useful tests to check this. We can check that it doesn't work on strings, but what about other sequences? We can check it doesn't work with classes that inherit from Sequence, but what about classes that do not and behave the same?

In a dynamically typed language like Python, polymorphism and operator overloading are embedded in the language. I think the deeply polymorphic nature of Python is one of the most important aspects any user of this language should understand. It is an incredibly sharp double-edged sword, as it is at the same time extremely powerful and dangerous. "Everything is an object" might sound very simple at first, but it hides a degree of complexity that sooner of later has to be faced by those who want to be proficient with the language.

I wrote some posts that might help you to understand these topics. You can find them grouped here.

Final words¶

I hope you found the project entertaining and that you can now appreciate the power of TDD. The journey doesn't end here, though. In the next post I will discuss the practice of writing unit tests in depth, and then introduce you to another powerful tool: mocks.

Updates¶

2021-01-03: George fixed a typo, thanks!

2023-09-03: Dmitry Labazkin provided a new test for the method avg and a better implementation. He also asked relevant questions about type checking that I addressed in a new section. Thanks Dmitry!

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

TDD in Python with pytest - Part 1

2020-09-10T10:30:00+02:00

This series of posts comes directly from my book Clean Architectures in Python. As I am reviewing the book to prepare a second edition, I realised that Harry Percival was right when he said that the initial part on TDD shouldn't be in the book. That's a prerequisite to follow the chapters on the clean architecture, but it is something many programmers already know and they might be surprised to find it in a book that discusses architectures.

So, I decided to move it here before I start working on a new version of the book. I also followed the advice of valorien, who pointed out that the main example had some bad naming choices, and so I reworked the code.

Introduction¶

Test-Driven Development (TDD) is fortunately one of the names that I can spot most frequently when people talk about methodologies. Unfortunately, many programmers still do not follow it, fearing that it will impose a further burden on the already difficult life of a developer.

In this chapter I will try to outline the basic concept of TDD and to show you how your job as a programmer can greatly benefit from it. I will develop a very simple project to show how to practically write software following this methodology.

TDD is a methodology, something that can help you to create better code. But it is not going to solve all your problems. As with all methodologies you have to pay attention not to commit blindly to it. Try to understand the reasons why certain practices are suggested by the methodology and you will also understand when and why you can or have to be flexible.

Keep also in mind that testing is a broader concept that doesn't end with TDD, which focuses a lot on unit testing, a specific type of test that helps you to develop the API of your library/package. There are other types of tests, like integration or functional ones, that are not specifically part of the TDD methodology, strictly speaking, even though the TDD approach can be extended to any testing activity.

A real-life example¶

Let's start with a simple example taken from a programmer's everyday life.

The programmer is in the office with other colleagues, trying to nail down an issue in some part of the software. Suddenly the boss storms into the office, and addresses the programmer:

Boss: I just met with the rest of the board. Our clients are not happy, we didn't fix enough bugs in the last two months.

Programmer: I see. How many bugs did we fix?

Boss: Well, not enough!

Programmer: OK, so how many bugs do we have to fix every month?

Boss: More!

I guess you feel very sorry for the poor programmer. Apart from the aggressive attitude of the boss, what is the real issue in this conversation? At the end of it there is no hint for the programmer and their colleagues about what to do next. They don't have any clue about what they have to change. They can definitely try to work harder, but the boss didn't refer to actual figures, so it will be definitely hard for the developers to understand if they improved "enough".

The classical sorites paradox may help to understand the issue. One of the standard formulations, taken from the Wikipedia page, is

1,000,000 grains of sand is a heap of sand (Premise 1)

A heap of sand minus one grain is still a heap. (Premise 2)

So 999,999 grains is a heap of sand.

A heap of sand minus one grain is still a heap. (Premise 2)

So 999,998 grains is a heap of sand.

So one grain is a heap of sand.

Where is the issue? The concept expressed by the word "heap" is nebulous, it is not defined clearly enough to allow the process to find a stable point, or a solution.

When you write software you face that same challenge. You cannot conceive a function and just expect it "to work", because this is not clearly defined. How do you test if the function that you wrote "works"? What do you mean by "works"? TDD forces you to clearly state your goal before you write the code. Actually, the TDD mantra is "Test first, code later", which can be translated to "Goal first, solution later". Will shortly see a practical example of this.

For the time being, consider that this is a valid practice also outside the realm of software creation. Whoever runs a business knows that you need to be able to extract some numbers (KPIs) from the activity of your company, because it is by comparing those numbers with some predefined thresholds that you can easily tell if the business is healthy or not. KPIs are a form of test, and you have to define them in advance, according to the expectations or needs that you have.

Pay attention. Nothing prevents you from changing the thresholds as a reaction to external events. You may consider that, given the incredible heat wave that hit your country, the amount of coats that your company sold could not reach the goal. So, because of a specific event, you can justify a change in the test (KPI). If you didn't have the test you would have just generically recorded that you earned less money.

Going back to software and TDD, following this methodology you are forced to state clear goals like

sum(4, 5) == 9

Let me read this test for you: there will be a sum function available in the system that accepts two integers. If the two integers are 4 and 5 the function will return 9.

As you can see there are many things that are tested by this statement.

The function exists and can be imported
The function accepts two integers
Passing 4 and 5 as inputs, the output of the function will be 9.

Pay attention that at this stage there is no code that implements the function sum, the tests will fail for sure.

As we will see with a practical example in the next chapter, what I explained in this section will become a set of rules of the methodology.

A simple TDD project¶

The project we are going to develop is available at https://github.com/lgiordani/simple_calculator.

This project is purposefully extremely simple. You don't need to be an experienced Python programmer to follow this chapter, but you need to know the basics of the language. The goal of this series of posts is not that of making you write the best Python code, but that of allowing you learn the TDD work flow, so don't be too worried if your code is not perfect.

Methodologies are like sports or arts: you cannot learn them just by reading their description on a book. You have to practice them. Thus, you should avoid as much as possible to just follow this chapter reading the code passively. Instead, you should try to write the code and to try new solutions to the problems that I discuss. This is very important, as it actually makes you use TDD. This way, at the end of the chapter you will have a personal experience of what TDD is like.

The repository is tagged, and at the end of each section you will find a link to the relative tag that contains my working solution. Please note that it is entirely possible your solution is different from mine: there are several aspects of coding, like for example style, that are not related to unit testing and TDD.

Setup the project¶

Clone the project repository and move to the branch develop. The branch master contains the full solution, and I use it to maintain the repository, but if you want to code along you need to start from scratch. I recommend you fork the repository on GitHub so that you are able to commit your changes.

git clone https://github.com/YOURUSERNAME/simple_calculator
cd simple_calculator
git checkout --track origin/develop

Create a virtual environment following your preferred process and install the requirements

pip install -r requirements/dev.txt

You should at this point be able to run

pytest -svv

and get an output like

================================ test session starts ===============================
platform XXXX -- Python XXXX, pytest-XXXX, py-XXXX, pluggy-XXXX -- XXXX
cachedir: .pytest_cache
rootdir: XXXX
configfile: XXXX
plugins: XXXX
collected 0 items 

=============================== no tests ran in 0.02s ==============================

You can see here the operating system and a short list of the versions of the main packages involved in running pytest: Python, pytest itself, and some of its components and plugins. You can also see here where pytest is reading its configuration from. As this header is standard I will omit it from the output that I will show in the rest of the chapter. The specific versions of the packages are not important for this series.

Requirements¶

The goal of the project is to write a class SimpleCalculator that performs calculations: addition, subtraction, multiplication, and division. Addition and multiplication shall accept multiple arguments. Division shall return a float value, and division by zero shall return the string "inf". Multiplication by zero must raise a ValueError exception. The class will also provide a function to compute the average of an iterable like a list. This function gets two optional upper and lower thresholds and should remove from the computation the values that fall outside these boundaries.

As you can see the requirements are pretty simple, and a couple of them are definitely not "good" requirements, like the behaviour of division and multiplication. I added those requirements for the sake of example, to show how to deal with exceptions when developing in TDD.

An interesting topic to discuss is that of data types: shall the calculator perform addition between integers or between floats? What about complex numbers, strings, and other items that can be "added" together? And what about the other operations? I consider this an advanced topic, in particular in Python, so for now I will consider only integers as inputs and discuss the problem of different types later in the series.

Step 1 - Adding two numbers¶

The first test we are going to write is one that checks if the class SimpleCalculator can perform an addition. Add the following code to the file tests/test_main.py

tests/test_main.py

from simple_calculator.main import SimpleCalculator 1

def test_add_two_numbers(): 2
    calculator = SimpleCalculator()

    result = calculator.add(4, 5)

    assert result == 9

As you can see the first thing we do is to import the class SimpleCalculator 1 that we are supposed to write. This class doesn't exist yet, don't worry, you didn't skip any passage.

The test is a standard function 2 (this is how pytest works), and the function name shall begin with test_ so that pytest can automatically discover all the tests. I tend to give my tests a descriptive name, so it is easier later to come back and understand what the test is about with a quick glance. You are free to follow the style you prefer but in general remember that naming components in a proper way is one of the most difficult things in programming. So better to get a handle on it as soon as possible.

The body of the test function is pretty simple. The class SimpleCalculator is instantiated, and the method add of the instance is called with two numbers, 4 and 5. The result is stored in the variable result, which is later the subject of the test itself. The statement assert result == 9 first computes result == 9 which is a boolean, with a value that is either True or False. The keyword assert, then, silently passes if the argument is True, but raises an exception if it is False.

And this is how you write tests in pytest: if your code doesn't raise any exception the test passes, otherwise it fails. The keyword assert is used to force an exception in case of wrong result. Remember that pytest doesn't consider the return value of the function, so it can detect a failure only if it raises an exception.

Save the file and go back to the terminal. Execute pytest -svv and you should receive the following error message

====================================== ERRORS ======================================
_______________________ ERROR collecting tests/test_main.py _______________________

[...]

tests/test_main.py:4: in <module>
    from simple_calculator.main import SimpleCalculator
E   ImportError: cannot import name 'SimpleCalculator' from 'simple_calculator.main'
!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!
============================== 1 error in 0.20 seconds =============================

No surprise here, actually, as we just tried to use something that doesn't exist. This is good, the test is showing us that something we suppose exists actually doesn't.

TDD rule number 1: Test first, code later

This, by the way, is not yet an error in a test. The error happens very soon, during the tests collection phase (as shown by the message in the bottom line Interrupted: 1 errors during collection). Given this, the methodology is still valid, as we wrote a test and it fails because of an error or a missing feature in the code.

Let's fix this issue. Open the file simple_calculator/main.py and add this code

simple_calculator/main.py

class SimpleCalculator:
    pass

But, I hear you scream, this class doesn't implement any of the requirements that are in the project. Yes, this is the hardest lesson you have to learn when you start using TDD. The development of the code is ruled by the tests, not by the requirements. The requirements are used to write the tests, the tests are used to write the code. You shouldn't worry about something that is more than one level above the current one in this workflow.

TDD rule number 2: Add the reasonably minimum amount of code you need to pass the tests

Run the test again, and this time you should receive a different error, that is

tests/test_main.py::test_add_two_numbers FAILED

===================================== FAILURES =====================================
______________________________ test_add_two_numbers _______________________________


    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       AttributeError: 'SimpleCalculator' object has no attribute 'add'

tests/test_main.py:10: AttributeError
============================= 1 failed in 0.04 seconds =============================

This is the first proper pytest failure report that we receive. You see a list of files containing tests and the result of each test

tests/test_main.py::test_add_two_numbers FAILED

Later we will see that the syntax FILENAME::TESTNAME can be given directly to pytest to run a single test. In this case we already have only one test, but later you might run a single failing test giving the name shown here on the command line. For example

pytest -svv tests/test_main.py::test_add_two_numbers

The second part of the output shows details on the failing tests, if any

______________________________ test_add_two_numbers _______________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       AttributeError: 'SimpleCalculator' object has no attribute 'add'

tests/test_main.py:10: AttributeError

For each failing test, pytest shows a header with the name of the test and the part of the code that raised the exception. At the end of each box, pytest shows the line of the test file where the error happened.

Back to the project. The new error is no surprise, as the test uses the method add that wasn't defined in the class. I bet you already guessed what I'm going to do, didn't you? This is the code that you should add to the class

simple_calculator/main.py

class SimpleCalculator:
    def add(self):
        pass

And again, as you notice, we made the smallest possible addition to the code to pass the test. Running pytest again you should receive a different error message

_______________________________ test_add_two_numbers _______________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       TypeError: add() takes 1 positional argument but 3 were given

tests/test_main.py:10: TypeError

The function we defined doesn't accept any argument other than self (def add(self)), but in the test we pass three of them (calculator.add(4, 5). Remember that in Python self is passed implicitly when you call a function. Our move at this point is to change the function to accept the parameters that it is supposed to receive, namely two numbers. The code now becomes

simple_calculator/main.py

class SimpleCalculator:
    def add(self, a, b):
        pass

Run the test again, and you will receive another error

______________________________ test_add_two_numbers ________________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

        result = calculator.add(4, 5)

>       assert result == 9
E       assert None == 9
E         -None
E         +9

tests/test_main.py:12: AssertionError

The function returns None, as it doesn't contain any code, while the test expects it to return 9. What do you think is the minimum code you can add to pass this test?

Well, the answer is

simple_calculator/main.py

class SimpleCalculator:
    def add(self, a, b):
        return 9

and this may surprise you (it should!). You might have been tempted to add some code that performs an addition between a and b, but this would violate the TDD principles, because you would have been driven by the requirements and not by the tests.

When you run pytest again, you will be rewarded by a success message

tests/test_main.py::test_add_two_numbers PASSED

I know this sound weird, but think about it for a moment: if your code works (that is, it passes the tests), you don't need to change anything, as your tests should specify everything the code should do. Maybe in the future you will discover that this solution is not good enough, and at that point you will have to change it (this will happen with the next test, in this case). But for now everything works, and you shouldn't implement more than this.

Git tag: step-1-adding-two-numbers

Step 2 - Adding three numbers¶

The requirements state that "Addition and multiplication shall accept multiple arguments". This means that we should be able to execute not only add(4, 5) like we did, but also add(4, 5, 11), add(4, 5, 11, 2), and so on. We can start testing this behaviour with the following test, that you should put in tests/test_main.py, after the previous test that we wrote.

tests/test_main.py

def test_add_three_numbers():
    calculator = SimpleCalculator()

    result = calculator.add(4, 5, 6)

    assert result == 15

This test fails when we run the test suite

_____________________________ test_add_three_numbers _______________________________

    def test_add_three_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5, 6)
E       TypeError: SimpleCalculator.add() takes 3 positional arguments but 4 were given

tests/test_main.py:18: TypeError

for the obvious reason that the function we wrote in the previous section accepts only 2 arguments other than self. What is the minimum code that you can write to fix this test?

Well, the simplest solution is to add another argument, so my first attempt is

simple_calculator/main.py

class SimpleCalculator:
    def add(self, a, b, c):
        return 9

which solves the previous error, but creates a new one. If that wasn't enough, it also makes the first test fail!

______________________________ test_add_two_numbers ________________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       TypeError: SimpleCalculator.add() missing 1 required positional argument: 'c'

tests/test_main.py:10: TypeError
_____________________________ test_add_three_numbers _______________________________

    def test_add_three_numbers():
        calculator = SimpleCalculator()

        result = calculator.add(4, 5, 6)

>       assert result == 15
E       assert 9 == 15

tests/test_main.py:20: AssertionError

The first test now fails because the new add method requires three arguments and we are passing only two. The second tests fails because the method add returns 9 and not 15 as expected by the test.

When multiple tests fail it's easy to feel discomforted and lost. Where are you supposed to start fixing this? Well, one possible solution is to undo the previous change and to try a different solution, but in general you should try to get to a situation in which only one test fails.

TDD rule number 3: You shouldn't have more than one failing test at a time

This is very important as it allows you to focus on one single test and thus one single problem. Clearly, we need to keep an eye on the global problem that we are trying to solve, but real test batteries can contain hundreds of tests and it is not practical to try to tackle all of them together.

Commenting tests to make them inactive is a perfectly valid way to have only one failing test. Pytest, however, has a smarter solution: you can use the option -k that allows you to specify a matching name. That option has a lot of expressive power, but for now we can just give it the name of the test that we want to run

pytest -svv -k test_add_two_numbers

This option allows you to select multiple tests that share the same prefix, for example. If you want to run a single specific test you can also name it on the command line with the syntax we discussed previously

pytest -svv tests/test_main.py::test_add_two_numbers

Either way, pytest will run only the first test and return the same result returned before, since we didn't change the test itself

______________________________ test_add_two_numbers ________________________________

    def test_add_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.add(4, 5)
E       TypeError: SimpleCalculator.add() missing 1 required positional argument: 'c'

tests/test_main.py:10: TypeError

To fix this error we can obviously revert the addition of the third argument, but this would mean going back to the previous solution. Obviously tests focus on a very small part of the code, but we have to keep in mind what we are doing in terms of the big picture. A better solution is to add a default value to the third argument. The additive identity is 0, so the new code of the method add is

simple_calculator/main.py

class SimpleCalculator:
    def add(self, a, b, c=0):
        return 9

And this makes the first test pass. At this point we can run the full suite with pytest -svv and see what happens

_____________________________ test_add_three_numbers ______________________________

    def test_add_three_numbers():
        calculator = SimpleCalculator()

        result = calculator.add(4, 5, 6)

>       assert result == 15
E       assert 9 == 15

tests/test_main.py:20: AssertionError

The second test still fails, because the returned value that we hard coded doesn't match the expected one. At this point the tests show that our previous solution (return 9) is not sufficient anymore, and we have to try to implement something more complex.

I want to stress this. You should implement the minimal change in the code that makes tests pass. If that solution is not enough there will be a test that shows it. Now, as you can see, the addition of a new requirement changes the tests, adding a new one, and the old solution is not sufficient any more.

How can we solve this? We know that writing return 15 will make the first test fail (you may try, if you want), so here we have to be a bit smarter and try a better solution, that in this case is actually to implement a real sum

simple_calculator/main.py

class SimpleCalculator:
    def add(self, a, b, c=0):
        return a + b + c

This solution makes both tests pass, so the entire suite runs without errors.

Git tag: step-2-adding-three-numbers

I can see your face, your are probably frowning at the fact that it took us 10 minutes to write a method that performs the addition of two or three numbers. On the one hand, keep in mind that I'm going at a very slow pace, this being an introduction, and for these first tests it is better to take the time to properly understand every single step. Later, when you will be used to TDD, some of these steps will be implicit. On the other hand, TDD is slower than untested development, but the time that you invest writing tests now is usually negligible compared to the amount of time you would spend trying to identify and fix bugs later.

Step 3 - Adding multiple numbers¶

The requirements are not yet satisfied, however, as they mention "multiple" numbers and not just three. How can we test that we can add a generic amount of numbers? We might add a test_add_four_numbers, a test_add_five_numbers, and so on, but this will cover specific cases and will never cover all of them. Sad to say, it is impossible to test that generic condition, or, at least in this case, so complex that it is not worth trying to do it.

What you shall do in TDD is to test boundary cases. In general you should always try to find the so-called "corner cases" of your algorithm and write tests that show that the code covers them. For example, if you are testing some code that accepts as inputs a number from 1 to 100, you need a test that runs it with a generic number like 42 (which is far from being generic, but don't panic!), but you definitely want to have a specific test that runs the algorithm with the number 1 and one that runs with the number 100. You also want to have tests that show the algorithm doesn't work with 0 and with 101, but we will talk later about testing error conditions.

In our example there is no real limitation to the number of arguments that you pass to your function. Before Python 3.7 there was a limit of 256 arguments, which has been removed in that version of the language, but these are limitations enforced by an external system, and they are not real boundaries of your algorithm.

The definition of "external system" obviously depends on what you are testing. If you are implementing a programming language you want to have tests that show how many arguments you can pass to a function, or that check the amount of memory used by certain language features. In this case we accept the Python language as the environment in which we work, so we don't want to test its features.

The solution, in this case, might be to test a reasonable high amount of input arguments, to check that everything works. In particular, we should try to keep in mind that our goal is to devise as much as possible a generic solution. For example, we easily realise that we cannot come up with a function like

    def add(self, a, b, c=0, d=0, e=0, f=0, g=0, h=0, i=0):

as it is not generic, it is just covering a greater amount of inputs (9, in this case, but not 10 or more).

That said, a good test might be the following

tests/test_main.py

def test_add_many_numbers():
    numbers = range(100)

    calculator = SimpleCalculator()

    result = calculator.add(*numbers)

    assert result == 4950

which creates an array (strictly speaking a range, which is an iterable) of all the numbers from 0 to 99. The sum of all those numbers is 4950, which is what the algorithm shall return.

Please note that the assertion doesn't implement any algorithm to find the solution. I calculated the answer manually and hard coded it in the test. You should try as much as possible to minimise the algorithmic complexity of tests, instead "stating the facts". The reason is simple: the more complex the code of the test is, the higher the chances of introducing a bug in the test.

The test suite fails because we are giving the function too many arguments

______________________________ test_add_many_numbers _______________________________

    def test_add_many_numbers():
        numbers = range(100)

        calculator = SimpleCalculator()

>       result = calculator.add(*numbers)
E       TypeError: SimpleCalculator.add() takes from 3 to 4 positional arguments but 101 were given

tests/test_main.py:28: TypeError

The minimum amount of code that we can add, this time, will not be so trivial, as we have to pass three tests. This is actually the greatest advantage of TDD: the tests that we wrote are still there and will check that the previous conditions are still satisfied. And since tests are committed with the code they will always be there.

The Python way to support a generic number of arguments (technically called variadic functions) is through the use of the syntax *args, which stores in args a tuple that contains all the arguments.

simple_calculator/main.py

class SimpleCalculator:
    def add(self, *args):
        return sum(args)

At that point we can use the built-in function sum to sum all the arguments. This solution makes the whole test suite pass without errors, so it is correct.

Git tag: step-3-adding-multiple-numbers

Pay attention here, please. In TDD, a solution is not correct when it is beautiful, when it is smart, or when it uses the latest feature of the language. All these things are good, but TDD wants your code to pass the tests. So, your code might be ugly, convoluted, and slow, but if it passes the test it is correct. This in turn means that TDD doesn't cover all the needs of your software project. Delivering fast routines, for example, might be part of the advantage you have on your competitors, but it is not really testable with the TDD methodology (typically, performance testing is done in a completely different way).

Part of the TDD methodology, then, deals with "refactoring", which means changing the code in a way that doesn't change the outputs, which in turns means that all your tests keep passing. Once you have a proper test suite in place, you can focus on the beauty of the code, or you can introduce smart solutions according to what the language allows you to do. We will discuss refactoring further later in this post.

TDD rule number 4: Write code that passes the test. Then refactor it.

Step 4 - Subtraction¶

From the requirements we know that we have to implement a function to subtract numbers, but this doesn't mention multiple arguments (as it would be complex to define what subtracting 3 of more numbers actually means). The tests that implements this requirements is

tests/test_main.py

def test_subtract_two_numbers():
    calculator = SimpleCalculator()

    result = calculator.sub(10, 3)

    assert result == 7

which doesn't pass with the following error

____________________________ test_subtract_two_numbers ____________________________

    def test_subtract_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.sub(10, 3)
E       AttributeError: 'SimpleCalculator' object has no attribute 'sub'

tests/test_main.py:36: AttributeError

Now that you understood the TDD process, and that you know you should avoid over-engineering, you can also skip some of the passages that we run through in the previous sections. A good solution for this test is

simple_calculator/main.py

    def sub(self, a, b):
        return a - b

which makes the test suite pass.

Git tag: step-4-subtraction

Step 5 - Multiplication¶

It's time to move to multiplication, which has many similarities to addition. The requirements state that we have to provide a function to multiply numbers and that this function shall allow us to multiply multiple arguments. In TDD you should try to tackle problems one by one, possibly dividing a bigger requirement in multiple smaller ones.

In this case the first test can be the multiplication of two numbers, as it was for addition.

tests/test_main.py

def test_mul_two_numbers():
    calculator = SimpleCalculator()

    result = calculator.mul(6, 4)

    assert result == 24

And the test suite fails as expected with the following error

______________________________ test_mul_two_numbers _______________________________

    def test_mul_two_numbers():
        calculator = SimpleCalculator()

>       result = calculator.mul(6, 4)
E       AttributeError: 'SimpleCalculator' object has no attribute 'mul'

tests/test_main.py:44: AttributeError

We face now a classical TDD dilemma. Shall we implement the solution to this test as a function that multiplies two numbers, knowing that the next test will invalidate it, or shall we already consider that the target is that of implementing a variadic function and thus use *args directly?

In this case the choice is not really important, as we are dealing with very simple functions. In other cases, however, it might be worth recognising that we are facing the same issue we solved in a similar case and try to implement a smarter solution from the very beginning. In general, however, you should not implement anything that you don't plan to test in one of the next few tests that you will write.

If we decide to follow the strict TDD, that is implement the simplest first solution, the bare minimum code that passes the test would be

simple_calculator/main.py

    def mul(self, a, b):
        return a * b

Git tag: step-5-multiply-two-numbers

To show you how to deal with redundant tests I will in this case choose the second path, and implement a smarter solution for the present test. Keep in mind however that it is perfectly correct to implement that solution shown above and then move on and try to solve the problem of multiple arguments later.

The problem of multiplying a tuple of numbers can be solved in Python using the function reduce. This function implements a typical algorithm that "reduces" an array to a single number, applying a given function. The algorithm steps are the following

1. Apply the function to the first two elements 2. Remove the first two elements from the array 3. Apply the function to the result of the previous step and to the first element of the array 4. Remove the first element 5. If there are still elements in the array go back to step 3

So, suppose the function is

def mul2(a, b):
    return a * b

and the array is

a = [2, 6, 4, 8, 3]

The steps followed by the algorithm will be

1. Apply the function to 2 and 6 (first two elements). The result is 2 * 6, that is 12 2. Remove the first two elements, the array is now a = [4, 8, 3] 3. Apply the function to 12 (result of the previous step) and 4 (first element of the array). The new result is 12 * 4, that is 48 4. Remove the first element, the array is now a = [8, 3] 5. Apply the function to 48 (result of the previous step) and 8 (first element of the array). The new result is 48 * 8, that is 384 6. Remove the first element, the array is now a = [3] 7. Apply the function to 384 (result of the previous step) and 3 (first element of the array). The new result is 384 * 3, that is 1152 8. Remove the first element, the array is now empty and the procedure ends

Going back to our class SimpleCalculator, we might import reduce from the module functools and use it on the array args. We need to provide a function that we can define in the function mul itself.

simple_calculator/main.py

from functools import reduce


class SimpleCalculator:
    [...]

    def mul(self, *args):
        def mul2(a, b):
            return a * b

        return reduce(mul2, args)

Git tag: step-5-multiply-two-numbers-smart

More information about the algorithm reduce can be found on the MapReduce Wikipedia page https://en.wikipedia.org/wiki/MapReduce. The Python function documentation can be found at https://docs.python.org/3.10/library/functools.html#functools.reduce.

The above code makes the test suite pass, so we can move on and address the next problem. As happened with addition we cannot properly test that the function accepts a potentially infinite number of arguments, so we can test a reasonably high number of inputs.

tests/test_main.py

def test_mul_many_numbers():
    numbers = range(1, 10)

    calculator = SimpleCalculator()

    result = calculator.mul(*numbers)

    assert result == 362880

Git tag: step-5-multiply-many-numbers

We might use 100 arguments as we did with addition, but the multiplication of all numbers from 1 to 100 gives a result with 156 digits and I don't really need to clutter the tests file with such a monstrosity. As I said, testing multiple arguments is testing a boundary, and the idea is that if the algorithm works for 2 numbers and for 10 it will work for 10 thousands arguments as well.

If we run the test suite now all tests pass, and this should worry you.

Yes, you shouldn't be happy. When you follow TDD each new test that you add should fail. If it doesn't fail you should ask yourself if it is worth adding that test or not. This is because chances are that you are adding a useless test and we don't want to add useless code, because code has to be maintained, so the less the better.

In this case, however, we know why the test already passes. We implemented a smarter algorithm as a solution for the first test knowing that we would end up trying to solve a more generic problem. And the value of this new test is that it shows that multiple arguments can be used, while the first test doesn't.

So, after these considerations, we can be happy that the second test already passes.

TDD rule number 5: A test should fail the first time you run it. If it doesn't, ask yourself why you are adding it.

Step 6 - Refactoring¶

Previously, I introduced the concept of refactoring, which means changing the code without altering the results. How can you be sure you are not altering the behaviour of your code? Well, this is what the tests are for. If the new code keeps passing the test suite you can be sure that you didn't remove any feature.

In theory, refactoring shouldn't add any new behaviour to the code, as it should be an idempotent transformation. There is no real practical way to check this, and we will not bother with it now. You should be concerned with this if you are discussing security, as your code shouldn't add any entry point you don't want to be there. In this case you will need tests that check the absence of features instead of their presence.

This means that if you have no tests you shouldn't refactor. But, after all, if you have no tests you shouldn't have any code, either, so refactoring shouldn't be a problem you have. If you have some code without tests (I know you have it, I do), you should seriously consider writing tests for it, at least before changing it. More on this in a later section.

For the time being, let's see if we can work on the code of the class SimpleCalculator without altering the results. I do not really like the definition of the function mul2 inside the function mul. It is obviously perfectly fine and valid, but for the sake of example I will pretend we have to get rid of it.

Python provides a useful function to multiply two objects in the module operator of the standard library

simple_calculator/main.py

import operator
from functools import reduce


class SimpleCalculator:
    [...]

    def mul(self, *args):
        return reduce(operator.mul, args)

Running the test suite I can see that all the test pass, so my refactoring is correct.

Git tag: step-6-refactoring

TDD rule number 6: Never refactor without tests.

Final words¶

Well, I think we learned a lot. We started with no knowledge of TDD and we managed to implement a fully tested class with 3 methods. We also briefly touched the topic of refactoring, which is of paramount importance in development. In the next post I will cover the remaining requirements: division, testing exceptions, and the average function.

Updates¶

2021-01-03: George fixed a typo, thanks!

2021-08-11: Andrea Mignone fixed a link. Thank you!

2023-09-03: Dmitry Labazkin and Ilaletdinov Almaz suggested using operator.mul instead of a lambda in the final refactoring. Thanks both!

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

A game of tokens: write an interpreter in Python with TDD - Part 5

2020-08-09T18:00:00+01:00

Introduction¶

This is part 5 of A game of tokens, a series of posts where I build an interpreter in Python following a pure TDD methodology and engaging you in a sort of a game: I give you the tests and you have to write the code that passes them. After part 4 I had a long hiatus because I focused on other projects, but now I resurrected this series and I'm moving on.

First of all I reviewed the first 4 posts, merging the posts that contained the solutions. While this is definitely better for me, I think it might be better for the reader as well, this way it should be easier to follow along. Remember however that you learn if you do, not if you read!

Secondly, I was wondering in which direction to go, and I decided to shamelessly follow the steps of Ruslan Spivak, who first inspired this set of posts and who set off to build an Pascal interpreter; you can find the impressive series of posts Ruslan wrote on his website. Thank you Ruslan for the great posts!

So, let's go Pascal!

Tools update¶

I introduced black into my development toolset, so I used it to reformat the code

black smallcalc/*.py tests/*.py

And added a configuration file .flake8 for Flake8 to avoid the two tools to clash

[flake8]
# Recommend matching the black line length (default 88),
# rather than using the flake8 default of 79:
max-line-length = 100
ignore = E231 E741

Level 17 - Reserved keywords and new assignment¶

Since Pascal has reserved keywords, I need tokens that have the keyword itself as value (something similar to Erlang's atoms). For this reason I changed test_empty_token_has_length_zero into

def test_empty_token_has_the_length_of_the_type_itself():
    t = token.Token("sometype")

    assert len(t) == len("sometype")
    assert bool(t) is True

and modified the code in the class Token to pass it

   def __len__(self):
        return len(self.value) if self.value else len(self.type)

The keywords I will introduce in this post are BEGIN and END, so I need a test that shows they are supported

def test_get_tokens_understands_begin_and_end():
    l = clex.CalcLexer()

    l.load("BEGIN END")

    assert l.get_tokens() == [
        token.Token(clex.BEGIN),
        token.Token(clex.END),
        token.Token(clex.EOL),
        token.Token(clex.EOF),
    ]

The block BEGIN ... END is a generic compound block in Pascal (more on this later), and a Pascal program is made of that plus a final dot. Since the dot is already used for floats I need a test that shows it is correctly lexed.

def test_get_tokens_understands_final_dot():
    l = clex.CalcLexer()

    l.load("BEGIN END.")

    assert l.get_tokens() == [
        token.Token(clex.BEGIN),
        token.Token(clex.END),
        token.Token(clex.DOT),
        token.Token(clex.EOL),
        token.Token(clex.EOF),
    ]

Last, Pascal assignments are sligthly different from what we already implemented, as they use the symbol := instead of just =. We face a choice here, as we have to decide where to put the logic of our programming language: shall the lexer identify : and = separately, and let the parser deal with the two tokens in sequence, or shall we make the lexer emit an ASSIGNMENT token directly? I went for the first one, so that the lexer can be kept simple (no lookahead in it), but you are obviously free to try something different. For me the test that checks the assignment is

def test_get_tokens_understands_assignment_and_semicolon():
    l = clex.CalcLexer()

    l.load("a := 5;")

    assert l.get_tokens() == [
        token.Token(clex.NAME, "a"),
        token.Token(clex.LITERAL, ":"),
        token.Token(clex.LITERAL, "="),
        token.Token(clex.INTEGER, "5"),
        token.Token(clex.LITERAL, ";"),
        token.Token(clex.EOL),
        token.Token(clex.EOF),
    ]

You may have noticed I also decided to check for the semicolon in this test. Even here, we might discuss if it's meaningful to test two different things together, and generally speaking I'm in favour of a high granularity in tests, which however means that I try to avoid testing unrelated and complicated features together. In Pascal, the semicolon is used to separate statements, so it is likely be found at the end of something like an assignment. For this reason, and considering that it's a small feature, I put it in a context inside this test, and will extract it if more complex requirements arise in the future.

The parser has to be changed to support the new assignment, and to do that we first need to change the tests. The symbol = has to be replaced with := in the following tests: test_parse_assignment, test_parse_assignment_with_expression, test_parse_assignment_expression_with_variables, and test_parse_line_supports_assigment.

Solution¶

Supporting reserved keywords is just a matter of defining specific token types for them

BEGIN = "BEGIN"
DOT = "DOT"

RESERVED_KEYWORDS = [BEGIN, END]

and changing the method _process_name in order to detect them

def _process_name(self):
    regexp = re.compile(r"[a-zA-Z_]+")

    match = regexp.match(self._text_storage.tail)

    if not match:
        return None

    token_string = match.group()

    if token_string in RESERVED_KEYWORDS:
        tok = token.Token(token_string)
    else:
        tok = token.Token(NAME, token_string)

    return self._set_current_token_and_skip(tok)

I decided to put the logic in this method because after all reserved keywords are exactly names with a specific meaning. I might have created a dedicated method _process_keyword but it would basically have been a copy of _process_name so this solution makes sense to me.

To support the final dot I added a token for it

DOT = "DOT"

and a processing method

   def _process_dot(self):
        regexp = re.compile(r"\.$")

        match = regexp.match(self._text_storage.tail)

        if match:
            return self._set_current_token_and_skip(token.Token(DOT))

which is then introduced with a high priority in get_token

    def get_token(self):
        eof = self._process_eof()
        if eof:
            return eof

        eol = self._process_eol()
        if eol:
            return eol

        dot = self._process_dot()
        if dot:
            return dot

        self._process_whitespace()

        name = self._process_name()
        if name:
            return name

        number = self._process_number()
        if number:
            return number

        literal = self._process_literal()
        if literal:
            return literal

To pass the parser tests I just need to change the implementation of parse_assignment

def parse_assignment(self):
        variable = self._parse_variable()
        self.lexer.discard(token.Token(clex.LITERAL, ":"))
        self.lexer.discard(token.Token(clex.LITERAL, "="))
        value = self.parse_expression()

Level 18 - Statements and compound statements¶

In Pascal a compound statement is a list of statements enclosed between BEGIN and END, so the final grammar we want to have in this post is

compound_statement : BEGIN statement_list END

statement_list : statement | statement SEMI statement_list

statement : compound_statement | assignment_statement | empty

assignment_statement : variable ASSIGN expr

As you can see this is a recursive definition, as the statement_list contains one or more statement, and each of them can be a compound_statement. The following is indeed a valid Pascal program

BEGIN
    BEGIN
        BEGIN
            writeln("Valid!")
        END
    END
END.

Recursive algorithms are not simple, and it takes some time to tackle them properly. Let's try to implement one small feature at a time. The first test is that parse_statement should be able to parse assignments

def test_parse_statement_assignment():
    p = cpar.CalcParser()
    p.lexer.load("x := 5")

    node = p.parse_statement()

    assert node.asdict() == {
        "type": "assignment",
        "variable": "x",
        "value": {"type": "integer", "value": 5},
    }

In future, statements will be more than just assignments, so this test is the first of many others that we will eventually have for parse_statement. The second test we need is that a compound statement can contain an empty list of statements.

def test_parse_empty_compound_statement():
    p = cpar.CalcParser()
    p.lexer.load("BEGIN END")

    node = p.parse_compound_statement()

    assert node.asdict() == {"type": "compound_statement", "statements": []}

After this is done, I want to test that the compound statement can contains one single statement

def test_parse_compound_statement_one_statement():
    p = cpar.CalcParser()
    p.lexer.load("BEGIN x:= 5 END")

    node = p.parse_compound_statement()

    assert node.asdict() == {
        "type": "compound_statement",
        "statements": [
            {
                "type": "assignment",
                "variable": "x",
                "value": {"type": "integer", "value": 5},
            }
        ],
    }

and multiple statements separated by semicolon

def test_parse_compound_statement_multiple_statements():
    p = cpar.CalcParser()
    p.lexer.load("BEGIN x:= 5; y:=6; z:=7 END")

    node = p.parse_compound_statement()

    assert node.asdict() == {
        "type": "compound_statement",
        "statements": [
            {
                "type": "assignment",
                "variable": "x",
                "value": {"type": "integer", "value": 5},
            },
            {
                "type": "assignment",
                "variable": "y",
                "value": {"type": "integer", "value": 6},
            },
            {
                "type": "assignment",
                "variable": "z",
                "value": {"type": "integer", "value": 7},
            },
        ],
    }

Solution¶

To pass the first test it is sufficient to add a method parse_statement that calls parse_assignment

    def parse_statement(self):
        with self.lexer:
            return self.parse_assignment()

The second test requires a bit more code. I need to define a method parse_compound_statement and this has to return a specific new type of node. A compound statement is s list of statements that have to be executed in order, so it's time to define a class CompoundStatementNode

class CompoundStatementNode(Node):

    node_type = "compound_statement"

    def __init__(self, statements=None):
        self.statements = statements if statements else []

    def asdict(self):
        return {
            "type": self.node_type,
            "statements": [statement.asdict() for statement in self.statements],
        }

and at this point parse_compound_statement is trivial, at least for now

    def parse_compound_statement(self):
        self.lexer.discard(token.Token(clex.BEGIN))
        self.lexer.discard(token.Token(clex.END))

        return CompoundStatementNode()

With the third test we have to add the processing of a single statement. As this is optional, it's a good use case for our lexer as a context manager

    def parse_compound_statement(self):
        nodes = []

        self.lexer.discard(token.Token(clex.BEGIN))

        with self.lexer:
            statement_node = self.parse_statement()
            if statement_node:
                nodes.append(statement_node)

        self.lexer.discard(token.Token(clex.END))

        return CompoundStatementNode(nodes)

And finally, for the fourth test, I have to process optional further statements separated by semicolons. For this, I make use of the method peek_token to look ahead and see if there is another statement to process

    def parse_compound_statement(self):
        nodes = []

        self.lexer.discard(token.Token(clex.BEGIN))

        with self.lexer:
            statement_node = self.parse_statement()
            if statement_node:
                nodes.append(statement_node)

            while self.lexer.peek_token() == token.Token(clex.LITERAL, ";"):
                self.lexer.discard(token.Token(clex.LITERAL, ";"))

                statement_node = self.parse_statement()

                if statement_node:
                    nodes.append(statement_node)

        self.lexer.discard(token.Token(clex.END))

        return CompoundStatementNode(nodes)

Level 19 - Recursive compound statements¶

To verify that compound statements are actually recursive, we can add this test

def test_parse_compound_statement_multiple_statements_with_compund_statement():
    p = cpar.CalcParser()
    p.lexer.load("BEGIN x:= 5; BEGIN y := 6 END ; z:=7 END")

    node = p.parse_compound_statement()

    assert node.asdict() == {
        "type": "compound_statement",
        "statements": [
            {
                "type": "assignment",
                "variable": "x",
                "value": {"type": "integer", "value": 5},
            },
            {
                "type": "compound_statement",
                "statements": [
                    {
                        "type": "assignment",
                        "variable": "y",
                        "value": {"type": "integer", "value": 6},
                    }
                ],
            },
            {
                "type": "assignment",
                "variable": "z",
                "value": {"type": "integer", "value": 7},
            },
        ],
    }

where the second statement is a compound statement itself. After this is done we can test the visitor (tests/test_calc_visitor.py) and see if we can process single statements

def test_visitor_compound_statement_one_statement():
    ast = {
        "type": "compound_statement",
        "statements": [
            {
                "type": "assignment",
                "variable": "x",
                "value": {"type": "integer", "value": 5},
            }
        ],
    }

    v = cvis.CalcVisitor()
    assert v.visit(ast) is None
    assert v.isvariable("x") is True
    assert v.valueof("x") == 5
    assert v.typeof("x") == "integer"

Multiple statements

def test_visitor_compound_statement_multiple_statements():
    ast = {
        "type": "compound_statement",
        "statements": [
            {
                "type": "assignment",
                "variable": "x",
                "value": {"type": "integer", "value": 5},
            },
            {
                "type": "assignment",
                "variable": "y",
                "value": {"type": "integer", "value": 6},
            },
            {
                "type": "assignment",
                "variable": "z",
                "value": {"type": "integer", "value": 7},
            },
        ],
    }

    v = cvis.CalcVisitor()
    assert v.visit(ast) is None

    assert v.isvariable("x") is True
    assert v.valueof("x") == 5
    assert v.typeof("x") == "integer"

    assert v.isvariable("y") is True
    assert v.valueof("y") == 6
    assert v.typeof("y") == "integer"

    assert v.isvariable("z") is True
    assert v.valueof("z") == 7
    assert v.typeof("z") == "integer"

and recursive compound statements

def test_visitor_compound_statement_multiple_statements_with_compund_statement():
    ast = {
        "type": "compound_statement",
        "statements": [
            {
                "type": "assignment",
                "variable": "x",
                "value": {"type": "integer", "value": 5},
            },
            {
                "type": "compound_statement",
                "statements": [
                    {
                        "type": "assignment",
                        "variable": "y",
                        "value": {"type": "integer", "value": 6},
                    }
                ],
            },
            {
                "type": "assignment",
                "variable": "z",
                "value": {"type": "integer", "value": 7},
            },
        ],
    }

    v = cvis.CalcVisitor()
    assert v.visit(ast) is None

    assert v.isvariable("x") is True
    assert v.valueof("x") == 5
    assert v.typeof("x") == "integer"

    assert v.isvariable("y") is True
    assert v.valueof("y") == 6
    assert v.typeof("y") == "integer"

    assert v.isvariable("z") is True
    assert v.valueof("z") == 7
    assert v.typeof("z") == "integer"

Solution¶

Before I added the first test I quickly refactored the code to follow the grammar a bit more closely, introducing parse_statement_list and calling it from parse_compound_statement. This is just a matter of isolating the part of the code that deals with the list of statements in its own method

    def parse_statement_list(self):
        nodes = []

        statement_node = self.parse_statement()
        if statement_node:
            nodes.append(statement_node)

        while self.lexer.peek_token() == token.Token(clex.LITERAL, ";"):
            self.lexer.discard(token.Token(clex.LITERAL, ";"))

            statement_node = self.parse_statement()

            if statement_node:
                nodes.append(statement_node)

        return nodes

    def parse_compound_statement(self):
        nodes = []

        self.lexer.discard(token.Token(clex.BEGIN))

        with self.lexer:
            nodes = self.parse_statement_list()

        self.lexer.discard(token.Token(clex.END))

        return CompoundStatementNode(nodes)

after this I introduce the new test, and to pass it I need to change parse_statement so that it parses either an assignment or a compound statement

    def parse_statement(self):
        with self.lexer:
            return self.parse_assignment()

        return self.parse_compound_statement()

Before I move to the visitor, I want to discuss a choice that I have here. The current version of the method parse_statement_list

    def parse_statement_list(self):
        nodes = []

        statement_node = self.parse_statement()
        if statement_node:
            nodes.append(statement_node)

        while self.lexer.peek_token() == token.Token(clex.LITERAL, ";"):
            self.lexer.discard(token.Token(clex.LITERAL, ";"))

            statement_node = self.parse_statement()

            if statement_node:
                nodes.append(statement_node)

        return nodes

might be easily written in a recursive way, to better match the grammar, becoming

    def parse_statement_list(self):
        nodes = []

        statement_node = self.parse_statement()
        if statement_node:
            nodes.append(statement_node)

        with self.lexer:
            self.lexer.discard(token.Token(clex.LITERAL, ";"))
            nodes.extend(self.parse_statement_list())

        return nodes

As you can see if you replace the code all the test pass, so the solution is technically correct. While recursive algorithms are elegant and compact, however, in this case I will stick to the first version. Using a recursive approach introduces a limit to the number of calls, and while in this little project we won't probably have this issue, I think it is worth mentioning it. Both solutions are correct, though, so feel free to choose the recursive path if you happen to like it more.

The tests for the visitor can be passed with a minimal change, as the visitor itself just needs to be aware of compound_statement nodes and to know how to process them. So, I added a new condition to the method visit

        if node["type"] == "compound_statement":
            [self.visit(node) for node in node["statements"]]

which passes all the three new tests added for the visitor.

Level 20 - Pascal programs and case insensitive names¶

A Pascal program ends with a dot, so we should introduce a new endpoint parse_program and test that it works. The first test verifies that we can parse an empty program

def test_parse_empty_program():
    p = cpar.CalcParser()
    p.lexer.load("BEGIN END.")

    node = p.parse_program()

    assert node.asdict() == {"type": "compound_statement", "statements": []}

and the second tests that the final dot can't be missing

import pytest

from smallcalc.calc_lexer import TokenError


def test_parse_program_requires_the_final_dot():
    p = cpar.CalcParser()
    p.lexer.load("BEGIN END")

    with pytest.raises(TokenError):
        p.parse_program()

Notice that I imported pytest and the TokenError exception to build a negative test (i.e. to test something that fails). The last test verifies a non-empty program can be parsed

def test_parse_program_with_nested_statements():
    p = cpar.CalcParser()
    p.lexer.load("BEGIN x:= 5; BEGIN y := 6 END ; z:=7 END.")

    node = p.parse_program()

    assert node.asdict() == {
        "type": "compound_statement",
        "statements": [
            {
                "type": "assignment",
                "variable": "x",
                "value": {"type": "integer", "value": 5},
            },
            {
                "type": "compound_statement",
                "statements": [
                    {
                        "type": "assignment",
                        "variable": "y",
                        "value": {"type": "integer", "value": 6},
                    }
                ],
            },
            {
                "type": "assignment",
                "variable": "z",
                "value": {"type": "integer", "value": 7},
            },
        ],
    }

When all these tests pass we are almost done for this post, and we just need to make the parser treat names in a case insensitive way. In Pascal, both variables and keywords are case-insensitive, so BEGIN and begin are the same keyword (or BeGiN, though I think this might be a misinterpretation of the concept of "snake case" =) ), and the same is valid for variables: you can define MYVAR and use myvar.

To test this behaviour I changed the test test_get_tokens_understands_uppercase_letters into test_get_tokens_is_case_insensitive

def test_get_tokens_is_case_insensitive():
    l = clex.CalcLexer()

    l.load("SomeVar")

    assert l.get_tokens() == [
        token.Token(clex.NAME, "somevar"),
        token.Token(clex.EOL),
        token.Token(clex.EOF),
    ]

and added the test for the two keywords we defined so far

def test_get_tokens_understands_begin_and_end_case_insensitive():
    l = clex.CalcLexer()

    l.load("begin end")

    assert l.get_tokens() == [
        token.Token(clex.BEGIN),
        token.Token(clex.END),
        token.Token(clex.EOL),
        token.Token(clex.EOF),
    ]

Solution¶

To parse a program we need to introduce the aptly named endpoint parse_program, which just parses a compound statement (the program) and the final dot.

    def parse_program(self):
        compound_statement = self.parse_compound_statement()
        self.lexer.discard(token.Token(clex.DOT))

        return compound_statement

As for the case insensitive names, it's just a matter of changing the method _process_name

    def _process_name(self):
        regexp = re.compile(r"[a-zA-Z_]+")

        match = regexp.match(self._text_storage.tail)

        if not match:
            return None

        token_string = match.group()

        if token_string.upper() in RESERVED_KEYWORDS:
            tok = token.Token(token_string.upper())
        else:
            tok = token.Token(NAME, token_string.lower())

        return self._set_current_token_and_skip(tok)

Note that I decided to keep internally keywords with uppercase names and variables with lowercase ones. This is really just a matter of personal taste at this point of the project (and probably will always be), so feel free to follow the structure you like the most.

Final words¶

That was something! I was honestly impressed by how easily I could introduce changes in the language and add new feature, a testimony that the TDD methodology is a really powerful tool to have in your belt. Thanks again to Ruslan Spivak for his work and his inspiring posts!

The code I developed in this post is available on the GitHub repository tagged with part5 (link).

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

Flask project setup: TDD, Docker, Postgres and more - Part 3

2020-07-07T13:00:00+01:00

In this series of posts I explore the development of a Flask project with a setup that is built with efficiency and tidiness in mind, using TDD, Docker and Postgres.

Catch-up¶

In the first and second posts I created a Flask project with a tidy setup, using Docker to run the development environment and the tests, and mapping important commands in a management script, so that the configuration can be in a single file and drive the whole system.

In this post I will show you how to easily create scenarios, that is databases created on the fly with custom data, so that it is possible to test queries in isolation, either with the Flask application or with the command line. I will also show you how to define a configuration for production and give some hints for the deployment.

Step 1 - Creating scenarios¶

The idea of scenarios is simple. Sometimes you need to investigate specific use cases for bugs, or maybe increase the performances of some database queries, and you might need to do this on a customised database. This is a scenario, a Python file that populates the database with a specific set of data and that allows you to run the application or the database shell on it.

Often the development database is a copy of the production one, maybe with sensitive data stripped to avoid leaking private information, and while this gives us a realistic case where to test queries (e.g. how does the query perform on 1 million lines?) it might not help during the initial investigations, where you need to have all the data in front of you to properly understand what happens. Whoever learned how joins work in relational databases understands what I mean here.

In principle, to create a scenario we just need to spin up an empty database and to run the scenario code against it. In practice, things are not much more complicated, but there are a couple of minor issues that we need to solve.

First, I am already running a database for the development and one for the testing. The second is ephemeral, but I decided to setup the project so that I can run the tests while the development database is up, and the way I did it was using port 5432 (the standard Postgres one) for development and 5433 for testing. Spinning up scenarios adds more databases to the equation. Clearly I do not expect to run 5 scenarios at the same time while running the development and the test databases, but I make myself a rule to make something generic as soon I do it for the third time.

This means that I won't create a database for a scenario on port 5434 and will instead look for a more generic solution. This is offered me by the Docker networking model, where I can map a container port to the host but avoid assigning the destination port, and it will be chosen randomly by Docker itself among the unprivileged ones. This means that I can create a Postgres container mapping port 5432 (the port in the container) and having Docker connect it to port 32838 in the host (for example). As long as the application knows which port to use this is absolutely the same as using port 5432.

Unfortunately the Docker interface is not extremely script-friendly when it comes to providing information and I have to parse the output a bit. Practically speaking, after I spin up the containers, I will run the command docker-compose port db 5432 which will return a string like 0.0.0.0:32838, and I will extract the port from it. Nothing major, but these are the (sometimes many) issues you face when you orchestrate different systems together.

The new management script is

manage.py

#! /usr/bin/env python

import os
import json
import signal
import subprocess
import time
import shutil

import click
import psycopg2
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT


# Ensure an environment variable exists and has a value
def setenv(variable, default):
    os.environ[variable] = os.getenv(variable, default)


setenv("APPLICATION_CONFIG", "development")

APPLICATION_CONFIG_PATH = "config"
DOCKER_PATH = "docker"


def app_config_file(config):
    return os.path.join(APPLICATION_CONFIG_PATH, f"{config}.json")


def docker_compose_file(config):
    return os.path.join(DOCKER_PATH, f"{config}.yml")


def configure_app(config):
    # Read configuration from the relative JSON file
    with open(app_config_file(config)) as f:
        config_data = json.load(f)

    # Convert the config into a usable Python dictionary
    config_data = dict((i["name"], i["value"]) for i in config_data)

    for key, value in config_data.items():
        setenv(key, value)


@click.group()
def cli():
    pass


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def flask(subcommand):
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = ["flask"] + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()


def docker_compose_cmdline(commands_string=None):
    config = os.getenv("APPLICATION_CONFIG")
    configure_app(config)

    compose_file = docker_compose_file(config)

    if not os.path.isfile(compose_file):
        raise ValueError(f"The file {compose_file} does not exist")

    command_line = [
        "docker-compose",
        "-p",
        config,
        "-f",
        compose_file,
    ]

    if commands_string:
        command_line.extend(commands_string.split(" "))

    return command_line


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def compose(subcommand):
    cmdline = docker_compose_cmdline() + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()


def run_sql(statements):
    conn = psycopg2.connect(
        dbname=os.getenv("POSTGRES_DB"),
        user=os.getenv("POSTGRES_USER"),
        password=os.getenv("POSTGRES_PASSWORD"),
        host=os.getenv("POSTGRES_HOSTNAME"),
        port=os.getenv("POSTGRES_PORT"),
    )

    conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
    cursor = conn.cursor()
    for statement in statements:
        cursor.execute(statement)

    cursor.close()
    conn.close()


def wait_for_logs(cmdline, message):
    logs = subprocess.check_output(cmdline)
    while message not in logs.decode("utf-8"):
        time.sleep(0.1)
        logs = subprocess.check_output(cmdline)


@cli.command()
def create_initial_db():
    configure_app(os.getenv("APPLICATION_CONFIG"))

    try:
        run_sql([f"CREATE DATABASE {os.getenv('APPLICATION_DB')}"])
    except psycopg2.errors.DuplicateDatabase:
        print(
            f"The database {os.getenv('APPLICATION_DB')} already exists and will not be recreated"
        )


@cli.command()
@click.argument("filenames", nargs=-1)
def test(filenames):
    os.environ["APPLICATION_CONFIG"] = "testing"
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = docker_compose_cmdline("up -d")
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline("logs db")
    wait_for_logs(cmdline, "ready to accept connections")

    run_sql([f"CREATE DATABASE {os.getenv('APPLICATION_DB')}"])

    cmdline = ["pytest", "-svv", "--cov=application", "--cov-report=term-missing"]
    cmdline.extend(filenames)
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline("down")
    subprocess.call(cmdline)


@cli.group()
def scenario():
    pass


@scenario.command()
@click.argument("name")
def up(name): 1
    os.environ["APPLICATION_CONFIG"] = f"scenario_{name}"
    config = os.getenv("APPLICATION_CONFIG")

    scenario_config_source_file = app_config_file("scenario")
    scenario_config_file = app_config_file(config)

    if not os.path.isfile(scenario_config_source_file):
        raise ValueError(f"File {scenario_config_source_file} doesn't exist")
    shutil.copy(scenario_config_source_file, scenario_config_file) 3

    scenario_docker_source_file = docker_compose_file("scenario")
    scenario_docker_file = docker_compose_file(config)

    if not os.path.isfile(scenario_docker_source_file):
        raise ValueError(f"File {scenario_docker_source_file} doesn't exist")
    shutil.copy(docker_compose_file("scenario"), scenario_docker_file) 4

    configure_app(f"scenario_{name}")

    cmdline = docker_compose_cmdline("up -d") 5
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline("logs db")
    wait_for_logs(cmdline, "ready to accept connections")

    cmdline = docker_compose_cmdline("port db 5432") 6
    out = subprocess.check_output(cmdline)
    port = out.decode("utf-8").replace("\n", "").split(":")[1]
    os.environ["POSTGRES_PORT"] = port

    run_sql([f"CREATE DATABASE {os.getenv('APPLICATION_DB')}"])

    scenario_module = f"scenarios.{name}"
    scenario_file = os.path.join("scenarios", f"{name}.py")
    if os.path.isfile(scenario_file): 7
        import importlib

        os.environ["APPLICATION_SCENARIO_NAME"] = name

        scenario = importlib.import_module(scenario_module)
        scenario.run()

    cmdline = " ".join( 8
        docker_compose_cmdline(
            "exec db psql -U {} -d {}".format(
                os.getenv("POSTGRES_USER"), os.getenv("APPLICATION_DB")
            )
        )
    )
    print("Your scenario is ready. If you want to open a SQL shell run")
    print(cmdline)


@scenario.command()
@click.argument("name")
def down(name): 2
    os.environ["APPLICATION_CONFIG"] = f"scenario_{name}"
    config = os.getenv("APPLICATION_CONFIG")

    cmdline = docker_compose_cmdline("down")
    subprocess.call(cmdline)

    scenario_config_file = app_config_file(config)
    os.remove(scenario_config_file)

    scenario_docker_file = docker_compose_file(config)
    os.remove(scenario_docker_file)


if __name__ == "__main__":
    cli()

where I added the commands scenario up 1 and scenario down 2. As you can see the function up first copies the files config/scenario.json 3 and docker/scenario.yml 4 (that I still have to create) into files named after the scenario.

Then I run the command up -d 5 and wait for the database to be ready, as I already do for tests. After that, it's time to extract the port of the container with some very simple Python string processing 6 and to initialise the correct environment variable.

Last, I import and execute the Python file 7 containing the code of the scenario itself and print a friendly message with the command line to run psql 8 to have a Postgres shell into the newly created database.

The function down simply tears down the containers and removes the scenario configuration files.

The two missing config files are pretty simple. The docker compose configuration is

docker/scenario.yml

version: '3.4'

services:
  db:
    image: postgres
    environment:
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    ports:
      - "5432" 1
  web:
    build:
      context: ${PWD}
      dockerfile: docker/Dockerfile
    environment:
      FLASK_ENV: ${FLASK_ENV}
      FLASK_CONFIG: ${FLASK_CONFIG}
      APPLICATION_DB: ${APPLICATION_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_HOSTNAME: "db"
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    command: flask run --host 0.0.0.0
    volumes:
      - ${PWD}:/opt/code
    ports:
      - "5000"

Here you can see that the database is ephemeral, that the port on the host is automatically assigned 1, and that I also spin up the application (mapping it to a random port as well to avoid clashing with the development one).

The configuration file is

config/scenario.json

[
  {
    "name": "FLASK_ENV",
    "value": "development"
  },
  {
    "name": "FLASK_CONFIG",
    "value": "development"
  },
  {
    "name": "POSTGRES_DB",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_USER",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_HOSTNAME",
    "value": "localhost"
  },
  {
    "name": "POSTGRES_PASSWORD",
    "value": "postgres"
  },
  {
    "name": "APPLICATION_DB",
    "value": "application"
  }
]

which doesn't add anything new to what I already did for development and testing.

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Resources

Expose ports in docker-compose
Docker Compose port command - A command to print the port exposed by a container
psql - PostgreSQL interactive terminal

Scenario example 1

Let's have a look at a very simple scenario that doesn't do anything on the database, just to understand the system. The code for the scenario is

scenarios/foo.py

import os


def run():
    print("HEY! This is scenario", os.environ["APPLICATION_SCENARIO_NAME"])

When I run the scenario I get the following output

$ ./manage.py scenario up foo
Creating network "scenario_foo_default" with the default driver
Creating scenario_foo_db_1  ... done
Creating scenario_foo_web_1 ... done
HEY! This is scenario foo
Your scenario is ready. If you want to open a SQL shell run
docker-compose -p scenario_foo -f docker/scenario_foo.yml exec db psql -U postgres -d application

The command docker ps shows that my development environment is happily running alongside with the scenario

$ docker ps
CONTAINER ID  IMAGE             COMMAND                 [...]  PORTS                    NAMES
85258892a2df  scenario_foo_web  "flask run --host 0.…"  [...]  0.0.0.0:32826->5000/tcp  scenario_foo_web_1
a031b6429e07  postgres          "docker-entrypoint.s…"  [...]  0.0.0.0:32827->5432/tcp  scenario_foo_db_1
1a449d23da01  development_web   "flask run --host 0.…"  [...]  0.0.0.0:5000->5000/tcp   development_web_1
28aa566321b5  postgres          "docker-entrypoint.s…"  [...]  0.0.0.0:5432->5432/tcp   development_db_1

And the output of the command scenario up foo contains the string HEY! This is scenario foo that was printed by the file foo.py. We can also successfully run the suggested command

$ docker-compose -p scenario_foo -f docker/scenario_foo.yml exec db psql -U postgres -d application
psql (12.3 (Debian 12.3-1.pgdg100+1))
Type "help" for help.

application=# \l
                                  List of databases
    Name     |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
-------------+----------+----------+------------+------------+-----------------------
 application | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 postgres    | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0   | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
             |          |          |            |            | postgres=CTc/postgres
 template1   | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
             |          |          |            |            | postgres=CTc/postgres
(4 rows)

application=#

And inside the database we find the database application created explicitly for the scenario (the name is specified in config/scenario.json). If you don't know psql you can exit with \q or Ctrl-d.

Before tearing down the scenario have a look at the two files config/scenario_foo.json and docker/scenario_foo.yml. They are just copies of config/scenario.json and docker/scenario.yml but I think seeing them there might help to understand how the whole thing works. When you are done run ./manage.py scenario down foo.

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Scenario example 2

Let's do something a bit more interesting. The new scenario is contained in scenarios/users.py

scenarios/users.py

from application.app import create_app
from application.models import db, User


app = create_app("development") 1


def run():
    with app.app_context():
        db.drop_all()
        db.create_all()

        # Administrator
        admin = User(email="admin@server.com")
        db.session.add(admin) 2

        # First user
        user1 = User(email="user1@server.com")
        db.session.add(user1)

        # Second user
        user2 = User(email="user2@server.com")
        db.session.add(user2)

        db.session.commit()

I decided to be as agnostic as possible in the scenarios, to avoid creating something too specific that eventually would not give me enough flexibility to test what I need. This means that the scenario has to create the app 1 and to use the database session explicitly 2, as I do in this example. The application is created with the configuration "development" 1. Remember that this is the Flask configuration that you find in application/config.py, not the one that is in config/development.json.

I can run the scenario with

$ ./manage.py scenario up users

and then connect to the database to find my users

$ docker-compose -p scenario_users -f docker/scenario_users.yml exec db psql -U postgres -d application
psql (12.3 (Debian 12.3-1.pgdg100+1))
Type "help" for help.

application=# \dt
         List of relations
 Schema | Name  | Type  |  Owner
--------+-------+-------+----------
 public | users | table | postgres
(1 row)

application=# select * from users;
 id |      email
----+------------------
  1 | admin@server.com
  2 | user1@server.com
  3 | user2@server.com
(3 rows)

application=# \q

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Step 2 - Simulating the production environment¶

As I stated at the very beginning of this mini series of posts, one of my goals was to run in development the same database that I run in production, and for this reason I went through the configuration steps that allowed me to have a Postgres container running both in development and during tests. In a real production scenario Postgres would probably run in a separate instance, for example on the RDS service in AWS, but as long as you have the connection parameters nothing changes in the configuration.

Docker actually allows us to easily simulate the production environment. If our notebook was connected 24/7 we might as well host the production there directly. Not that I recommend this nowadays, but this is how many important companies begun many years ago when cloud computing had not been here yet. Instead of installing a LAMP stack we configure containers, but the idea doesn't change.

I will then create a configuration that simulates a production environment and then give some hints on how to translate this into a proper production infrastructure. If you want to have a clear picture of the components of a web application in production read my post Dissecting a web stack that analyses them one by one.

The first component that we have to change here is the HTTP server. In development we use Flask's development server, and the first message that server prints is WARNING: This is a development server. Do not use it in a production deployment. Got it, Flask! A good choice to replace it is Gunicorn, so first of all I add it in the requirements

requirements/production.txt

Flask
flask-sqlalchemy
psycopg2
flask-migrate
gunicorn

Then I need to create a docker-compose configuration for production

docker/production.yml

version: '3.4'

services:
  db:
    image: postgres
    environment:
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    ports:
      - "${POSTGRES_PORT}:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data
  web:
    build:
      context: ${PWD}
      dockerfile: docker/Dockerfile.production
    environment:
      FLASK_ENV: ${FLASK_ENV}
      FLASK_CONFIG: ${FLASK_CONFIG}
      APPLICATION_DB: ${APPLICATION_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_HOSTNAME: "db"
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_PORT: ${POSTGRES_PORT}
    command: gunicorn -w 4 -b 0.0.0.0 wsgi:app 1
    volumes:
      - ${PWD}:/opt/code
    ports:
      - "8000:8000" 2

volumes:
  pgdata:

As you can see here the command that runs the application is slightly different 1. It exposes 4 processes (-w 4) on the container's address 0.0.0.0 loading the object app from the file wsgi.py (wsgi:app). As by default Gunicorn exposes port 8000 I mapped that 2 to the same port in the host.

Then I created the file Dockerfile.production that defines the production image of the web application

docker/Dockerfile.production

FROM python:3

ENV PYTHONUNBUFFERED 1

RUN mkdir /opt/code
RUN mkdir /opt/requirements
WORKDIR /opt/code

ADD requirements /opt/requirements
RUN pip install -r /opt/requirements/production.txt

The last thing I need is a configuration file

config/production.json

[
  {
    "name": "FLASK_ENV",
    "value": "production"
  },
  {
    "name": "FLASK_CONFIG",
    "value": "production"
  },
  {
    "name": "POSTGRES_DB",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_USER",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_HOSTNAME",
    "value": "localhost"
  },
  {
    "name": "POSTGRES_PORT",
    "value": "5432"
  },
  {
    "name": "POSTGRES_PASSWORD",
    "value": "postgres"
  },
  {
    "name": "APPLICATION_DB",
    "value": "application"
  }
]

as you can notice this is not very different from the development one, as I just changed the values of FLASK_ENV and FLASK_CONFIG. Clearly this contains a secret that shouldn't be written in plain text, POSTGRES_PASSWORD, but after all this is a simulation of production. In a real environment secrets should be kept in an encrypted manager such as AWS Secrets Manager.

Remember that FLASK_ENV changes the internal settings of Flask, most notably disabling the debugger, and that FLASK_CONFIG=production loads the object ProductionConfig from application/config.py. That object is empty for the moment, but it might contain public configuration for the production server.

I can now build the image with

$ APPLICATION_CONFIG="production" ./manage.py compose build web

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Resources

Gunicorn - A Python WSGI HTTP Server

Step 3 - Scale up¶

Mapping the container port to the host is not a great idea, though, as it makes it impossible to scale up and down to serve more load, which is the main point of running containers in production. This might be solved in many ways in the cloud, for example in AWS you might run the container in AWS Fargate and register them in an Application Load Balancer. Another way to do it on a single host is to run a Web Server in front of your HTTP server, and this might be easily implemented with Docker Compose.

I will add nginx and serve HTTP from there, reverse proxying the application containers through docker-compose networking. First of all the new configuration for docker-compose

docker/production.yml

version: '3.4'

services:
  db:
    image: postgres
    environment:
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    ports:
      - "${POSTGRES_PORT}:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data
  web:
    build:
      context: ${PWD}
      dockerfile: docker/Dockerfile.production
    environment:
      FLASK_ENV: ${FLASK_ENV}
      FLASK_CONFIG: ${FLASK_CONFIG}
      APPLICATION_DB: ${APPLICATION_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_HOSTNAME: "db"
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_PORT: ${POSTGRES_PORT}
    command: gunicorn -w 4 -b 0.0.0.0 wsgi:app
    volumes:
      - ${PWD}:/opt/code
  nginx:
    image: nginx
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
    ports:
      - 8080:8080

volumes:
  pgdata:

As you can see I added a service nginx that runs the default Nginx image, mapping a custom configuration file that I will create in a minute. The application container doesn't need any port mapping, as I won't access it directly from the host anymore. The Nginx configuration file is

docker/nginx/nginx.conf

worker_processes 1;

events { worker_connections 1024; }

http {

    sendfile on;

    upstream app { 1
        server web:8000;
    }

    server {
        listen 8080; 2

        location / {
            proxy_pass         http://app;
            proxy_redirect     off;
            proxy_set_header   Host $host;
            proxy_set_header   X-Real-IP $remote_addr;
            proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header   X-Forwarded-Host $server_name;
        }
    }
}

This is a pretty standard configuration, and in a real production environment I would add many other configuration values (most notably serving HTTPS instead of HTTP). The section upstream 1 leverages docker-compose networking referring to web, which in the internal DNS directly maps to the IPs of the service with the same name. The port 8000 comes from the default Gunicorn port that I already mentioned before. I won't run the nginx container as root on my notebook, so I expose port 8080 2 instead of the traditional 80 for HTTP, and this is also something that might be different in a real production environment.

I can at this point run

$ APPLICATION_CONFIG="production" ./manage.py compose up -d
Starting production_db_1    ... done
Starting production_nginx_1 ... done
Starting production_web_1   ... done

It's interesting to have a look at the logs of the nginx container, as Nginx by default prints all the incoming requests

$ APPLICATION_CONFIG="production" ./manage.py compose logs -f nginx
Attaching to production_nginx_1
[...]
nginx_1  | 172.30.0.1 - - [05/Jul/2020:10:40:44 +0000] "GET / HTTP/1.1" 200 13 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0"

The last line is what I get when I visit localhost:8080 while the production setup is up and running.

Scaling up and down the service is now a breeze

$ APPLICATION_CONFIG="production" ./manage.py compose up -d --scale web=3
production_db_1 is up-to-date
Starting production_web_1 ... 
Starting production_web_1 ... done
Creating production_web_2 ... done
Creating production_web_3 ... done

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Resources

Nginx - An HTTP and reverse proxy server (and more)
Docker nginx - the official nginx Docker image
Docker Compose logs command - A command to print container logs
Docker Compose up command - The new way to scale containers in docker-compose

Bonus step - A closer look at Docker networking¶

I mentioned that Docker Compose creates a connection between services, and used that in the configuration of the nginx container, but I understand that this might look like black magic to some people. While I believe that this is actually black magic, I also think that we can investigate it a bit, so let's open the grimoire and reveal (some of) the dark secrets of Docker networking.

While the production setup is running we can connect to the nginx container and see what is happening in real time, so first of all I run a bash shell on it

$ APPLICATION_CONFIG="production" ./manage.py compose exec nginx bash

Once inside I can see my configuration file at /etc/nginx/nginx.conf, but this has not changed. Remember that Docker networking doesn't work as a templating engine, but with a local DNS. This means that if we try to resolve web from inside the container we should see multiple IPs. The command dig is a good tool to investigate the DNS, but it doesn't come preinstalled in the nginx container, so I need to run

root@33cbaea369be:/# apt update && apt install dnsutils

and at this point I can run it

root@33cbaea369be:/# dig web

; <<>> DiG 9.11.5-P4-5.1+deb10u1-Debian <<>> web
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30539
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;web.                           IN      A

;; ANSWER SECTION:
web.                    600     IN      A       172.30.0.4
web.                    600     IN      A       172.30.0.6
web.                    600     IN      A       172.30.0.5

;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Sun Jul 05 10:58:18 UTC 2020
;; MSG SIZE  rcvd: 78

root@33cbaea369be:/#

The command outputs 3 IPs, which correspond to the 3 containers of the service web that I am currently running. If I scale down (from outside the container)

$ APPLICATION_CONFIG="production" ./manage.py compose up -d --scale web=1

then the output of dig becomes

root@33cbaea369be:/# dig web

; <<>> DiG 9.11.5-P4-5.1+deb10u1-Debian <<>> web
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13146
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;web.                           IN      A

;; ANSWER SECTION:
web.                    600     IN      A       172.30.0.4

;; Query time: 0 msec
;; SERVER: 127.0.0.11#53(127.0.0.11)
;; WHEN: Sun Jul 05 11:01:46 UTC 2020
;; MSG SIZE  rcvd: 40

root@33cbaea369be:/#

How to create the production infrastructure¶

This will be a very short section, as creating infrastructure and deploying in production are complex topics, so I want to just give some hints to stimulate your research.

AWS ECS is basically Docker in the cloud, and the whole structure can map almost 1 to 1 to the docker-compose setup, so it is worth learning. ECS can work on explicit EC2 instances that you manage, or in Fargate, which means that the EC2 instances running the containers are transparently managed by AWS itself.

Terraform is a good tool to create infrastructure. It has many limitations, mostly coming from its custom HCL language, but it's slowly becoming better (version 0.13 will finally allow us to run for loops on modules, for example). Despite its shortcomings, it's a great tool to create static infrastructure, so I recommend working on it.

Terraform is not the right tool to deploy your code, though, as that requires a dynamic interaction with the system, so you need to setup a good Continuous Integration system. Jenkins is a very well known open source CI, but I personally ended up dropping it because it doesn't seem to be designed for large scale systems. For example, it is very complicated to automate the deploy of a Jenkins server, and dynamic large scale systems should require zero manual intervention to be created. Anyway, Jenkins is a good tool to start with, but you might want to have a look at other products like CircleCI or Buildkite.

When you create your deploy pipeline you need to do much more than just creating the image and running it, at least for real applications. You need to decide when to apply database migrations and if you have a web front-end you will also need to compile and install the JavaScript assets. Since you don't want to have downtime when you deploy you will need to look into blue/green deployments, and in general to strategies that allow you to run different versions of the application at the same time, at least for short periods of time. Or for longer periods, if you want to perform A/B testing or zonal deployments.

Final words¶

This is the last post of this short series. I hope you learned something useful, and that it encouraged you to properly setup your projects and to investigate technologies like Docker. As always, feel free to send me feedback or questions, and if you find my posts useful please share them with whoever you thing might be interested.

Updates¶

2020-12-22 I reviewed the whole tutorial and corrected several typos

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.

Flask project setup: TDD, Docker, Postgres and more - Part 2

2020-07-06T13:00:00+01:00

In this series of posts I explore the development of a Flask project with a setup that is built with efficiency and tidiness in mind, using TDD, Docker and Postgres.

Catch-up¶

In the previous post we started from an empty project and learned how to add the minimal code to run a Flask project. Then we created a static configuration file and a management script that wraps the commands flask and docker-compose to run the application with a specific configuration.

In this post I will show you how to run a production-ready database alongside your code in a Docker container, both in your development setup and for the tests.

Step 1 - Adding a database container¶

A database is an integral part of a web application, so in this step I will add my database of choice, Postgres, to the project setup. To do this I need to add a service in the docker-compose configuration file

docker/development.yml

version: '3.4'

services:
  db:
    image: postgres
    environment:
      POSTGRES_DB: ${POSTGRES_DB} 1
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    ports:
      - "${POSTGRES_PORT}:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data 2
  web:
    build:
      context: ${PWD}
      dockerfile: docker/Dockerfile
    environment:
      FLASK_ENV: ${FLASK_ENV}
      FLASK_CONFIG: ${FLASK_CONFIG}
    command: flask run --host 0.0.0.0
    volumes:
      - ${PWD}:/opt/code
    ports:
      - "5000:5000"

volumes:
  pgdata: 3

The variables starting with POSTGRES_ are requested by the PostgreSQL Docker image. In particular, remember that POSTGRESQL_DB 1 is the database that gets created by default when you create the image, and also the one that contains data on other databases as well, so for the application we usually want to use a different one.

Notice also that I'm creating a persistent volume for the service db 2 3, so that the content of the database is not lost when we tear down the container. For this service I'm using the default image, so no build step is needed.

To orchestrate this setup we need to add those variables to the JSON configuration

config/development.json

[
  {
    "name": "FLASK_ENV",
    "value": "development"
  },
  {
    "name": "FLASK_CONFIG",
    "value": "development"
  },
  {
    "name": "POSTGRES_DB",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_USER",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_HOSTNAME", 1
    "value": "localhost"
  },
  {
    "name": "POSTGRES_PORT",
    "value": "5432"
  },
  {
    "name": "POSTGRES_PASSWORD",
    "value": "postgres"
  }
]

These are all development variables so there are no secrets. In production we will need a way to keep the secrets in a safe place and convert them into environment variables. The AWS Secrets Manager for example can directly map secrets into environment variables passed to the containers, saving you from having to explicitly connect to the service with the API.

You may have noticed the variable POSTGRES_HOSTNAME 1, which is not used in the Docker Compose file. Generally we want the database to be accessible by utility scripts, so we want to record the host name where the database is running. As we will see shortly, other Docker containers do not need this, but migrations will.

We can run the commands ./manage.py compose up -d and ./manage.py compose down here to check that the database container works properly. Please note that the first time you run the command compose -d Docker will create the volume and build the Postgres image, and this might take some time.

CONTAINER ID  IMAGE       COMMAND                 ...  PORTS                   NAMES
9b5828dccd1c  docker_web  "flask run --host 0.…"  ...  0.0.0.0:5000->5000/tcp  docker_web_1
4440a18a1527  postgres    "docker-entrypoint.s…"  ...  0.0.0.0:5432->5432/tcp  docker_db_1

Now we need to connect the application to the database and to do this we can leverage flask-sqlalchemy. As we will use this at every stage of the life of the application, the requirement goes among the production ones. We also need psycopg2 as it is the library used to connect to Postgres.

requirements/production.txt

Flask
flask-sqlalchemy
psycopg2

Remember to run pip install -r requirements/development.txt to install the requirements locally and ./manage.py compose build web to rebuild the image.

At this point I need to create a connection string in the configuration of the application. The connection string parameters come from the same environment variables used to spin up the container db

application/config.py

import os

class Config(object):
    """Base configuration"""

    user = os.environ["POSTGRES_USER"]
    password = os.environ["POSTGRES_PASSWORD"]
    hostname = os.environ["POSTGRES_HOSTNAME"]
    port = os.environ["POSTGRES_PORT"]
    database = os.environ["APPLICATION_DB"] 1

    SQLALCHEMY_DATABASE_URI = (
        f"postgresql+psycopg2://{user}:{password}@{hostname}:{port}/{database}"
    )
    SQLALCHEMY_TRACK_MODIFICATIONS = False


class ProductionConfig(Config):
    """Production configuration"""


class DevelopmentConfig(Config):
    """Development configuration"""


class TestingConfig(Config):
    """Testing configuration"""

    TESTING = True

As you can see, here I use the variable APPLICATION_DB 1 and not POSTGRES_DB, so I need to specify that as well in the config file. The reason, as I mentioned before, is that we prefer to separate the default database, used by Postgres to manage all other databases, from the one used specifically by our application.

config/development.json

[
  {
    "name": "FLASK_ENV",
    "value": "development"
  },
  {
    "name": "FLASK_CONFIG",
    "value": "development"
  },
  {
    "name": "POSTGRES_DB",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_USER",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_HOSTNAME",
    "value": "localhost"
  },
  {
    "name": "POSTGRES_PORT",
    "value": "5432"
  },
  {
    "name": "POSTGRES_PASSWORD",
    "value": "postgres"
  },
  {
    "name": "APPLICATION_DB",
    "value": "application"
  }
]

At this point the application container needs to access some of the Postgres environment variables and the APPLICATION_DB one

docker/development.yml

version: '3.4'

services:
  db:
    image: postgres
    environment:
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    ports:
      - "${POSTGRES_PORT}:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data
  web:
    build:
      context: ${PWD}
      dockerfile: docker/Dockerfile
    environment:
      FLASK_ENV: ${FLASK_ENV}
      FLASK_CONFIG: ${FLASK_CONFIG}
      APPLICATION_DB: ${APPLICATION_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_HOSTNAME: "db"
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_PORT: ${POSTGRES_PORT}
    command: flask run --host 0.0.0.0
    volumes:
      - ${PWD}:/opt/code
    ports:
      - "5000:5000"

volumes:
  pgdata:

Please note that the container web receives the environment variables we pass to the Postgres container because it requires them to connect to the db. The variable POSTGRES_HOSTNAME is passed to give the application the address of the database, and thanks to Docker Compose internal DNS we can simply pass the name of the container. We could not pass the value localhost, as the application, which is running in a container, cannot access the host through that address (unless we use other network modes, which is not ideal).

Running compose now spins up both Flask and Postgres, but the application is not properly connected to the database yet.

Let's have a look inside the DB to see what our configuration created. First run ./manage.py compose up -d to spin up the containers, then connect to the Postgres DB with ./manage.py compose exec db psql -U postgres. Please note that we have to specify the user with -U. The default value is root, but we changed it to postgres with the variable POSTGRES_USER.

You should see a command line like

$ ./manage.py compose exec db psql -U postgres
psql (13.0 (Debian 13.0-1.pgdg100+1))
Type "help" for help.

postgres=#

Also note that by default we are logging into the database called postgres, which was configured by the variable POSTGRES_DB. We can list the databases with \l

postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
(3 rows)

postgres=#

Last, note that the application database configured with APPLICATION_DB is not present because we haven't created it yet. All the environment variables prefixed by POSTGRES_ are used automatically by the Docker image to perform the initial configuration, which is why the database postgres is already there.

You can exit psql with Ctrl-D or exit.

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Resources

Postgres Docker image
Flask-SQLAlchemy - A Flask extension to work with SQLAlchemy
SQLAlchemy and PostgreSQL - The Python SQL Toolkit and ORM

Step 2 - Connecting the application and the database¶

To connect the Flask application with the database running in the container we need to initialise a SQLAlchemy object and add it to the application factory.

application/models.py

from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy()

application/app.py

from flask import Flask


def create_app(config_name):

    app = Flask(__name__)

    config_module = f"application.config.{config_name.capitalize()}Config"

    app.config.from_object(config_module)

    from application.models import db

    db.init_app(app)

    @app.route("/")
    def hello_world():
        return "Hello, World!"

    return app

A pretty standard way to manage the database in Flask is to use flask-migrate, that adds some commands that allow us to create migrations and apply them.

With flask-migrate you have to create the migrations folder once and for all with flask db init and then, every time you change your models, run flask db migrate -m "Some message" and flask db upgrade. As both db init and db migrate create files in the current directory we now face a problem that every Docker-based setup has to face: file permissions.

The situation is the following: the application is running in the Docker container as root, and there is no connection between the users namespace in the container and that of the host. The result is that if the Docker container creates files in a directory that is mounted from the host (like the one that contains the application code in our example), those files will result as belonging to root. While this doesn't make impossible to work (we usually can become root on our development machines), it is annoying to say the least. The solution is to run those commands from outside the container, but this requires the Flask application to be configured.

Fortunately I wrapped the command flask in the script manage.py, which loads all the required environment variables. Let's add flask-migrate to the production requirements

requirements/production.txt

Flask
flask-sqlalchemy
psycopg2
flask-migrate

Remember to run pip install -r requirements/development.txt to install the requirements locally and ./manage.py compose build web to rebuild the image. Please note that you need the executable pg_config and some other development tools installed in your system. If you get an error message from pip please check the documentation of your operating system to find out what to do to install the required packages. For Ubuntu Linux I had to run sudo apt install build-essential python3-dev libpq-dev.

Now we can initialise a Migrate object and add it to the application factory

application/models.py

from flask_sqlalchemy import SQLAlchemy
from flask_migrate import Migrate

db = SQLAlchemy()
migrate = Migrate()

application/app.py

from flask import Flask


def create_app(config_name):

    app = Flask(__name__)

    config_module = f"application.config.{config_name.capitalize()}Config"

    app.config.from_object(config_module)

    from application.models import db, migrate

    db.init_app(app)
    migrate.init_app(app, db)

    @app.route("/")
    def hello_world():
        return "Hello, World!"

    return app

I can now run the database initialisation script

$ ./manage.py flask db init
  Creating directory /home/leo/devel/flask-tutorial/migrations ...  done
  Creating directory /home/leo/devel/flask-tutorial/migrations/versions ...  done
  Generating /home/leo/devel/flask-tutorial/migrations/env.py ...  done
  Generating /home/leo/devel/flask-tutorial/migrations/README ...  done
  Generating /home/leo/devel/flask-tutorial/migrations/script.py.mako ...  done
  Generating /home/leo/devel/flask-tutorial/migrations/alembic.ini ...  done
  Please edit configuration/connection/logging settings in '/home/leo/devel/flask-tutorial/migrations/alembic.ini' before proceeding.

And, when we will start creating models we will use the commands ./manage.py flask db migrate and ./manage.py flask db upgrade. You will find a complete example at the end of this post.

For the time being let's have a brief look at what was created here. The command db init created the directory migrations and inside it some default configuration files and templates. The migration scripts will be created in the directory migrations/versions but at the moment that directory is empty, as we have no models and we run no migrations (only the initialisation of the system). No changes have been made to the database. The command db init can be run even without running containers (you can remove the directory migrations and try it).

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Resources

Flask-SQLAlchemy - A Flask extension to work with SQLAlchemy
Flask-Migrate - A Flask extension to handle database migrations with Alembic.

Step 3 - Testing setup¶

I want to use a TDD approach as much as possible when developing my applications, so I need to setup a good testing environment upfront, and it has to be as ephemeral as possible. It is not unusual in big projects to create (or scale up) infrastructure components explicitly to run tests, and through Docker and docker-compose we can easily do the same. Namely, I will:

Spin up a test database in a container without permanent volumes
Initialise it
Run all the tests against it
Tear down the container

This approach has one big advantage, which is that it requires no previous setup and can this be executed on infrastructure created on the fly. It also has disadvantages, however, as it can slow down the testing part of the application, which should be as fast as possible in a TDD setup. Tests that involve the database, however, should be considered integration tests, and not run continuously in a TDD process, which is impossible (or very hard) when using a framework that merges the concept of entity and database model. If you want to know more about this read the book that I wrote on the subject.

Another advantage of this setup is it that we might need other things during the test, e.g. Celery, other databases, other servers. They can all be created through the docker-compose file.

Generally speaking testing is an umbrella under which many different things can happen. As I will use pytest I can run the full suite, but I might want to select specific tests, mentioning a single file or using the powerful option -k that allows me to select tests by pattern-matching their name. For this reason I want to map the management command line to that of pytest.

Let's add pytest to the testing requirements, along with a couple of packages to monitor the test coverage

requirements/testing.txt

-r production.txt

pytest
coverage
pytest-cov

As you can see I also use the coverage plugin to keep an eye on how well I cover the code with the tests. Remember to run pip install -r requirements/development.txt to install the requirements locally and ./manage.py compose build web to rebuild the image.

Warning: before you change the script manage.py make sure you terminate all the running containers running ./manage.py compose down. The next version will change the naming convention for containers and you might end up with some stale containers and run into issues with the database.

manage.py

#! /usr/bin/env python

import os
import json
import signal
import subprocess
import time

import click


# Ensure an environment variable exists and has a value
def setenv(variable, default):
    os.environ[variable] = os.getenv(variable, default)


setenv("APPLICATION_CONFIG", "development") 2


def configure_app(config): 1
    # Read configuration from the relative JSON file
    with open(os.path.join("config", f"{config}.json")) as f:
        config_data = json.load(f)

    # Convert the config into a usable Python dictionary
    config_data = dict((i["name"], i["value"]) for i in config_data)

    for key, value in config_data.items():
        setenv(key, value)


@click.group()
def cli():
    pass


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def flask(subcommand):
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = ["flask"] + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()


def docker_compose_cmdline(config): 3
    configure_app(os.getenv("APPLICATION_CONFIG"))

    docker_compose_file = os.path.join("docker", f"{config}.yml")

    if not os.path.isfile(docker_compose_file):
        raise ValueError(f"The file {docker_compose_file} does not exist")

    return [
        "docker-compose",
        "-p", 4
        config,
        "-f",
        docker_compose_file,
    ]


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def compose(subcommand):
    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()

@cli.command()
@click.argument("filenames", nargs=-1)
def test(filenames):
    os.environ["APPLICATION_CONFIG"] = "testing" 5
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + ["up", "-d"]
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + ["logs", "db"]
    logs = subprocess.check_output(cmdline)
    while "ready to accept connections" not in logs.decode("utf-8"): 6
        time.sleep(0.1)
        logs = subprocess.check_output(cmdline)

    cmdline = ["pytest", "-svv", "--cov=application", "--cov-report=term-missing"]
    cmdline.extend(filenames)
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + ["down"]
    subprocess.call(cmdline)


if __name__ == "__main__":
    cli()

Notable changes are

The environment configuration code is now in the function configure_app 1. This allows me to force the variable APPLICATION_CONFIG inside the script 2 and then configure the environment, which saves me from having to call tests with APPLICATION_CONFIG=testing flask test.
Both commands flask and compose use the configuration development. Since that is the default value of the variable APPLICATION_CONFIG they just have to call the function configure_app.
The docker-compose command line is needed both in the commands compose and in test, so I isolated some code into a function called docker_compose_cmdline 3 which returns a list as needed by subprocess functions. The command line now uses also the option -p (project name) 4 to give a prefix to the containers. This way we can run tests while running the development server.
The command test forces APPLICATION_CONFIG to be testing 5, which loads the file config/testing.json, then runs docker-compose using the file docker/testing.yml (both file have not been created yet), runs the pytest command line, and tears down the testing database container. Before running the tests the script waits for the service to be available 6. Postgres doesn't allow connection until the database is ready to accept them.

config/testing.json

[
  {
    "name": "FLASK_ENV",
    "value": "production"
  },
  {
    "name": "FLASK_CONFIG",
    "value": "testing"
  },
  {
    "name": "POSTGRES_DB",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_USER",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_HOSTNAME",
    "value": "localhost" 2
  },
  {
    "name": "POSTGRES_PORT",
    "value": "5433" 1
  },
  {
    "name": "POSTGRES_PASSWORD",
    "value": "postgres"
  },
  {
    "name": "APPLICATION_DB",
    "value": "test"
  }
]

Note that here I specified the value 5433 for POSTGRES_PORT 1. This allows us to spin up the test database container while the development one is running, as that will use port 5432 and you can't have two different containers using the same port on the host. A more general solution could be to leave Docker pick a random host port for the container and then use that, but this requires a bit more code to be properly implemented, so I will come back to this problem when setting up the scenarios.

Also note that I set the variable POSTGRES_HOSTNAME to localhost in this file 2. We will run the tests on the local machine and not in a container, so we can't use the DNS provided by Docker Compose.

The last piece of setup that we need is the orchestration configuration for Docker Compose

docker/testing.yml

version: '3.4'

services:
  db:
    image: postgres
    environment:
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    ports:
      - "${POSTGRES_PORT}:5432"

Now we can run ./manage.py test and get

Creating network "testing_default" with the default driver
Creating testing_db_1 ... done
========================== test session starts =========================
platform linux -- Python 3.7.5, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
-- /home/leo/devel/flask-tutorial/venv3/bin/python3
cachedir: .pytest_cache
rootdir: /home/leo/devel/flask-tutorial
plugins: cov-2.10.0
collected 0 items
Coverage.py warning: No data was collected. (no-data-collected)


----------- coverage: platform linux, python 3.7.5-final-0 -----------
Name                    Stmts   Miss  Cover   Missing
-----------------------------------------------------
application/app.py         11     11     0%   1-21
application/config.py      13     13     0%   1-31
application/models.py       4      4     0%   1-5
-----------------------------------------------------
TOTAL                      28     28     0%

======================== no tests ran in 0.07s =======================
Stopping testing_db_1 ... done
Removing testing_db_1 ... done
Removing network testing_default

Note that the command first creates the testing database container testing_db_1, then runs pytest, and finally stops and remove the container. This is exactly what we wanted to achieve to run tests in isolation. At the moment, however there are no tests, and the testing database is empty.

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Resources

pytest - A full-featured Python testing framework
Useful pytest command line options

Step 4 - Initialise the testing database¶

When you develop a web application and then run it in production, you typically create the database once and then upgrade it through migrations. When running tests we need to create the database every time, so I need to add a way to run SQL commands on the testing database before I run pytest.

As running sql commands directly on the the database is often useful I will create a function that wraps the boilerplate for the connection. The command that creates the initial database at that point will be trivial.

manage.py

#! /usr/bin/env python

import os
import json
import signal
import subprocess
import time

import click
import psycopg2
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT


# Ensure an environment variable exists and has a value
def setenv(variable, default):
    os.environ[variable] = os.getenv(variable, default)


setenv("APPLICATION_CONFIG", "development")


def configure_app(config):
    # Read configuration from the relative JSON file
    with open(os.path.join("config", f"{config}.json")) as f:
        config_data = json.load(f)

    # Convert the config into a usable Python dictionary
    config_data = dict((i["name"], i["value"]) for i in config_data)

    for key, value in config_data.items():
        setenv(key, value)


@click.group()
def cli():
    pass


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def flask(subcommand):
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = ["flask"] + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()


def docker_compose_cmdline(config):
    configure_app(os.getenv("APPLICATION_CONFIG"))

    docker_compose_file = os.path.join("docker", f"{config}.yml")

    if not os.path.isfile(docker_compose_file):
        raise ValueError(f"The file {docker_compose_file} does not exist")

    return [
        "docker-compose",
        "-p",
        config,
        "-f",
        docker_compose_file,
    ]


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def compose(subcommand):
    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()


def run_sql(statements):
    conn = psycopg2.connect(
        dbname=os.getenv("POSTGRES_DB"),
        user=os.getenv("POSTGRES_USER"),
        password=os.getenv("POSTGRES_PASSWORD"),
        host=os.getenv("POSTGRES_HOSTNAME"),
        port=os.getenv("POSTGRES_PORT"),
    )

    conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
    cursor = conn.cursor()
    for statement in statements:
        cursor.execute(statement)

    cursor.close()
    conn.close()


@cli.command()
def create_initial_db(): 1
    configure_app(os.getenv("APPLICATION_CONFIG"))

    try:
        run_sql([f"CREATE DATABASE {os.getenv('APPLICATION_DB')}"])
    except psycopg2.errors.DuplicateDatabase:
        print(
            f"The database {os.getenv('APPLICATION_DB')} already exists and will not be recreated"
        )


@cli.command()
@click.argument("filenames", nargs=-1)
def test(filenames):
    os.environ["APPLICATION_CONFIG"] = "testing"
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + ["up", "-d"]
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + ["logs", "db"]
    logs = subprocess.check_output(cmdline)
    while "ready to accept connections" not in logs.decode("utf-8"):
        time.sleep(0.1)
        logs = subprocess.check_output(cmdline)

    run_sql([f"CREATE DATABASE {os.getenv('APPLICATION_DB')}"])

    cmdline = ["pytest", "-svv", "--cov=application", "--cov-report=term-missing"]
    cmdline.extend(filenames)
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + ["down"]
    subprocess.call(cmdline)


if __name__ == "__main__":
    cli()

As you can see I took the opportunity to write the command create_initial_db 1 as well, that just runs the very same SQL command that creates the testing database, but in any configuration I will use.

Before moving on I think it's time to refactor the file manage.py. Refactoring is not mandatory, but I feel like some parts of the script are not generic enough, and when I will add the scenarios I will definitely need my functions to be flexible.

The new script is

manage.py

#! /usr/bin/env python

import os
import json
import signal
import subprocess
import time

import click
import psycopg2
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT


# Ensure an environment variable exists and has a value
def setenv(variable, default):
    os.environ[variable] = os.getenv(variable, default)


setenv("APPLICATION_CONFIG", "development")

APPLICATION_CONFIG_PATH = "config"
DOCKER_PATH = "docker"


def app_config_file(config): 1
    return os.path.join(APPLICATION_CONFIG_PATH, f"{config}.json")


def docker_compose_file(config): 2
    return os.path.join(DOCKER_PATH, f"{config}.yml")


def configure_app(config):
    # Read configuration from the relative JSON file
    with open(app_config_file(config)) as f:
        config_data = json.load(f)

    # Convert the config into a usable Python dictionary
    config_data = dict((i["name"], i["value"]) for i in config_data)

    for key, value in config_data.items():
        setenv(key, value)


@click.group()
def cli():
    pass


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def flask(subcommand):
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = ["flask"] + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()


def docker_compose_cmdline(commands_string=None): 4
    config = os.getenv("APPLICATION_CONFIG")
    configure_app(config)

    compose_file = docker_compose_file(config)

    if not os.path.isfile(compose_file):
        raise ValueError(f"The file {compose_file} does not exist")

    command_line = [
        "docker-compose",
        "-p",
        config,
        "-f",
        compose_file,
    ]

    if commands_string:
        command_line.extend(commands_string.split(" "))

    return command_line


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def compose(subcommand):
    cmdline = docker_compose_cmdline() + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()


def run_sql(statements):
    conn = psycopg2.connect(
        dbname=os.getenv("POSTGRES_DB"),
        user=os.getenv("POSTGRES_USER"),
        password=os.getenv("POSTGRES_PASSWORD"),
        host=os.getenv("POSTGRES_HOSTNAME"),
        port=os.getenv("POSTGRES_PORT"),
    )

    conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
    cursor = conn.cursor()
    for statement in statements:
        cursor.execute(statement)

    cursor.close()
    conn.close()


def wait_for_logs(cmdline, message): 3
    logs = subprocess.check_output(cmdline)
    while message not in logs.decode("utf-8"):
        time.sleep(0.1)
        logs = subprocess.check_output(cmdline)


@cli.command()
def create_initial_db():
    configure_app(os.getenv("APPLICATION_CONFIG"))

    try:
        run_sql([f"CREATE DATABASE {os.getenv('APPLICATION_DB')}"])
    except psycopg2.errors.DuplicateDatabase:
        print(
            f"The database {os.getenv('APPLICATION_DB')} already exists and will not be recreated"
        )


@cli.command()
@click.argument("filenames", nargs=-1)
def test(filenames):
    os.environ["APPLICATION_CONFIG"] = "testing"
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = docker_compose_cmdline("up -d")
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline("logs db")
    wait_for_logs(cmdline, "ready to accept connections")

    run_sql([f"CREATE DATABASE {os.getenv('APPLICATION_DB')}"])

    cmdline = ["pytest", "-svv", "--cov=application", "--cov-report=term-missing"]
    cmdline.extend(filenames)
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline("down")
    subprocess.call(cmdline)


if __name__ == "__main__":
    cli()

Notable changes:

I created two new functions app_config_file 1 and docker_compose_file 2 that encapsulate the creation of the file paths.
I isolated the code that waits for a message in the database container logs, creating the function wait_for_logs 3.
The command docker_compose_cmdline 4 now receives a string and converts it into a list internally. This way expressing commands is more natural, as it doesn't require the ugly list syntax that subprocess works with.

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Resources

Psycopg – PostgreSQL database adapter for Python

Step 5 - Fixtures for tests¶

Pytest uses fixtures for tests, so we should prepare some basic ones that will be generally useful. First let's include pytest-flask, which provides already some basic fixtures

requirements/testing.txt

-r production.txt

pytest
coverage
pytest-cov
pytest-flask

Then add the fixtures app and database to the file tests/conftest.py. The first is required by pytest-flask itself (it's used by other fixtures) and the second one is useful every time you need to interact with the database itself.

tests/conftest.py

import pytest

from application.app import create_app
from application.models import db


@pytest.fixture
def app():
    app = create_app("testing")

    return app


@pytest.fixture(scope="function")
def database(app):
    with app.app_context():
        db.drop_all()
        db.create_all()

    yield db

Remember to create the empty file tests/__init__.py to make pytest correctly load the code.

As you can see, the fixture database uses the methods drop_all and create_all to reset the database. The reason is that this fixture is recreated for each function, and we can't be sure a previous function left the database clean. As a matter of fact, we might be almost sure of the opposite.

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Resources

pytest fixtures - One of the most powerful features of pytest
pytest-flask - A plugin for pytest that simplifies testing Flask applications
Flask-SQLAlchemy API

Bonus step - A full TDD example¶

Before wrapping up this post, I want to give you a full example of the TDD process that I would follow given the current state of the setup, which is already complete enough to start the development of an application. Let's pretend my goal is that of adding a User model that can be created with an id (primary key) and an email fields.

First of all I write a test that creates a user in the database and then retrieves it, checking its attributes

tests/test_user.py

from application.models import User


def test__create_user(database):
    email = "some.email@server.com"
    user = User(email=email)
    database.session.add(user)
    database.session.commit()

    user = User.query.first()

    assert user.email == email

Running this test results in an error, because the module User does not exist

$ ./manage.py test
Creating network "testing_default" with the default driver
Creating testing_db_1 ... done
======================================= test session starts ======================================
platform linux -- Python 3.7.5, pytest-5.4.3, py-1.9.0, pluggy-0.13.1 --
/home/leo/devel/flask-tutorial/venv3/bin/python3
cachedir: .pytest_cache
rootdir: /home/leo/devel/flask-tutorial
plugins: flask-1.0.0, cov-2.10.0
collected 0 items / 1 error
============================================= ERRORS =============================================
___________________________ ERROR collecting tests/tests/test_user.py ___________________________
ImportError while importing test module '/home/leo/devel/flask-tutorial/tests/tests/test_user.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
venv3/lib/python3.7/site-packages/_pytest/python.py:511: in _importtestmodule
    mod = self.fspath.pyimport(ensuresyspath=importmode)
venv3/lib/python3.7/site-packages/py/_path/local.py:704: in pyimport
    __import__(modname)
venv3/lib/python3.7/site-packages/_pytest/assertion/rewrite.py:152: in exec_module
    exec(co, module.__dict__)
tests/tests/test_user.py:1: in <module>
    from application.models import User
E   ImportError: cannot import name 'User' from 'application.models'
	(/home/leo/devel/flask-tutorial/application/models.py)

----------- coverage: platform linux, python 3.7.5-final-0 -----------
Name                    Stmts   Miss  Cover   Missing
-----------------------------------------------------
application/app.py         11      9    18%   6-21
application/config.py      14     14     0%   1-32
application/models.py       4      0   100%
-----------------------------------------------------
TOTAL                      29     23    21%

==================================== short test summary info ===================================
ERROR tests/tests/test_user.py
!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================================= 1 error in 0.20s =======================================
Stopping testing_db_1 ... done
Removing testing_db_1 ... done
Removing network testing_default
$

I won't show here all the steps of the strict TDD methodology, and implement directly the final solution, which is

application/models.py

from flask_sqlalchemy import SQLAlchemy
from flask_migrate import Migrate

db = SQLAlchemy()
migrate = Migrate()


class User(db.Model):
    __tablename__ = "users"
    id = db.Column(db.Integer, primary_key=True)
    email = db.Column(db.String, unique=True, nullable=False)

With this model the test passes

$ ./manage.py test
Creating network "testing_default" with the default driver
Creating testing_db_1 ... done
=================================== test session starts ==================================
platform linux -- Python 3.7.5, pytest-5.4.3, py-1.9.0, pluggy-0.13.1 --
/home/leo/devel/flask-tutorial/venv3/bin/python3
cachedir: .pytest_cache
rootdir: /home/leo/devel/flask-tutorial
plugins: flask-1.0.0, cov-2.10.0
collected 1 item

tests/test_user.py::test__create_user PASSED

----------- coverage: platform linux, python 3.7.5-final-0 -----------
Name                    Stmts   Miss  Cover   Missing
-----------------------------------------------------
application/app.py         11      1    91%   19
application/config.py      14      0   100%
application/models.py       8      0   100%
-----------------------------------------------------
TOTAL                      33      1    97%


==================================== 1 passed in 0.14s ===================================
Stopping testing_db_1 ... done
Removing testing_db_1 ... done
Removing network testing_default
$

Please not that this is a very simple example and that in a real case I would add some other tests before accepting this code. In particular we should check that the field email can be empty, and maybe also test some validation on that field.

Let's add a very simple route to use the newly created model

application/app.py

from flask import Flask
from application.models import User


def create_app(config_name):

    app = Flask(__name__)

    config_module = f"application.config.{config_name.capitalize()}Config"

    app.config.from_object(config_module)

    from application.models import db, migrate

    db.init_app(app)
    migrate.init_app(app, db)

    @app.route("/")
    def hello_world():
        return "Hello, World!"

    @app.route("/users")
    def users():
        num_users = User.query.count()
        return f"Number of users: {num_users}"

    return app

As you can see I didn't introduce anything too complicated. I import the model User and count the number of entries in its table. We will create the table in a minute with the migration that flask db migrate will create for us, so we expect this to just return a page that says "Number of users: 0", but it's a good demonstration that the connection with the database is working.

So, let's generate the migration in the database. Spin up the development environment with

$ ./manage.py compose up -d

If this is the first time I spin up the environment I have to create the application database and to initialise the migrations, so I run

$ ./manage.py create-initial-db

As we already initialised Alembic before we don't need to run the command db init. If you do, it will return Error: Directory migrations already exists and is not empty. Now I can create the migration with

$ ./manage.py flask db migrate -m "Initial user model"
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.autogenerate.compare] Detected added table 'users'
  Generating /home/leo/devel/flask-tutorial/migrations/versions/7a09d7f8a8fa_initial_user_model.py ...  done

As you can see from the output, this created the file migrations/versions/7a09d7f8a8fa_initial_user_model.py. The number 7a09d7f8a8fa is just an hex version of a UUID ,so it will be different for you, while the name comes from the commit message. The file itself contains SQLAlchemy code that changes the DB according to the code that we wrote in the application.

Finally I can apply the migration with

$ ./manage.py flask db upgrade
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 7a09d7f8a8fa, Initial user model

At this point we can run ./manage.py compose exec db psql -U postgres again and see what happened to the database.

$ ./manage.py compose exec db psql -U postgres
psql (13.0 (Debian 13.0-1.pgdg100+1))
Type "help" for help.

postgres=# \l
                                  List of databases
    Name     |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
-------------+----------+----------+------------+------------+-----------------------
 application | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 postgres    | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0   | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
             |          |          |            |            | postgres=CTc/postgres
 template1   | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
             |          |          |            |            | postgres=CTc/postgres
(4 rows)

You see here that the database application configured with APPLICATION_DB has beed created. You can now connect to it and list the tables

postgres=# \c application
You are now connected to database "application" as user "postgres".
application=# \dt
              List of relations
 Schema |      Name       | Type  |  Owner   
--------+-----------------+-------+----------
 public | alembic_version | table | postgres
 public | users           | table | postgres
(2 rows)

The content of the table alembic_version shouldn't be surprising, as it's the UUID used for the migration

application=# select * from alembic_version;
 version_num  
--------------
 7a09d7f8a8fa
(1 row)

The table users contains the fields id and email according to the model that we wrote in Python

application=# \d users
                                 Table "public.users"
 Column |       Type        | Collation | Nullable |              Default              
--------+-------------------+-----------+----------+-----------------------------------
 id     | integer           |           | not null | nextval('users_id_seq'::regclass)
 email  | character varying |           | not null | 
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
    "users_email_key" UNIQUE CONSTRAINT, btree (email)

You can also open your browser and head to http://localhost:5000/users to see the new route in action. After this we can safely commit my code and move on with the next requirement.

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Final words¶

I hope this post already showed you why a good setup can make the difference. The project is clean and wrapping the command in the management script plus the centralised config proved to be a good choice as it allowed me to solve the problem of migrations and testing in (what I think is) an elegant way. In the next post I'll show you how to easily create scenarios where you can test queries with only specific data in the database. If you find my posts useful please share them with whoever you thing might be interested.

Updates¶

2020-07-13 Vlad Pavlichek found and fixed a typo in the post, where manage.py was missing the extension .py. Thanks Vlad!

2020-12-22 I reviewed the whole tutorial and corrected several typos

Feedback¶

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.