In this series of posts I explore the development of a Flask project with a setup that is built with efficiency and tidiness in mind, using TDD, Docker and Postgres.

Catch-up

In the previous post we started from an empty project and learned how to add the minimal code to run a Flask project. Then we created a static configuration file and a management script that wraps the commands flask and docker-compose to run the application with a specific configuration.

In this post I will show you how to run a production-ready database alongside your code in a Docker container, both in your development setup and for the tests.

Step 1 - Adding a database container

A database is an integral part of a web application, so in this step I will add my database of choice, Postgres, to the project setup. To do this I need to add a service in the docker-compose configuration file

docker/development.yml
version: '3.4'

services:
  db:
    image: postgres
    environment:
      POSTGRES_DB: ${POSTGRES_DB} 1
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    ports:
      - "${POSTGRES_PORT}:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data 2
  web:
    build:
      context: ${PWD}
      dockerfile: docker/Dockerfile
    environment:
      FLASK_ENV: ${FLASK_ENV}
      FLASK_CONFIG: ${FLASK_CONFIG}
    command: flask run --host 0.0.0.0
    volumes:
      - ${PWD}:/opt/code
    ports:
      - "5000:5000"

volumes:
  pgdata: 3

The variables starting with POSTGRES_ are requested by the PostgreSQL Docker image. In particular, remember that POSTGRESQL_DB 1 is the database that gets created by default when you create the image, and also the one that contains data on other databases as well, so for the application we usually want to use a different one.

Notice also that I'm creating a persistent volume for the service db 2 3, so that the content of the database is not lost when we tear down the container. For this service I'm using the default image, so no build step is needed.

To orchestrate this setup we need to add those variables to the JSON configuration

config/development.json
[
  {
    "name": "FLASK_ENV",
    "value": "development"
  },
  {
    "name": "FLASK_CONFIG",
    "value": "development"
  },
  {
    "name": "POSTGRES_DB",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_USER",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_HOSTNAME", 1
    "value": "localhost"
  },
  {
    "name": "POSTGRES_PORT",
    "value": "5432"
  },
  {
    "name": "POSTGRES_PASSWORD",
    "value": "postgres"
  }
]

These are all development variables so there are no secrets. In production we will need a way to keep the secrets in a safe place and convert them into environment variables. The AWS Secrets Manager for example can directly map secrets into environment variables passed to the containers, saving you from having to explicitly connect to the service with the API.

You may have noticed the variable POSTGRES_HOSTNAME 1, which is not used in the Docker Compose file. Generally we want the database to be accessible by utility scripts, so we want to record the host name where the database is running. As we will see shortly, other Docker containers do not need this, but migrations will.

We can run the commands ./manage.py compose up -d and ./manage.py compose down here to check that the database container works properly. Please note that the first time you run the command compose -d Docker will create the volume and build the Postgres image, and this might take some time.

CONTAINER ID  IMAGE       COMMAND                 ...  PORTS                   NAMES
9b5828dccd1c  docker_web  "flask run --host 0.…"  ...  0.0.0.0:5000->5000/tcp  docker_web_1
4440a18a1527  postgres    "docker-entrypoint.s…"  ...  0.0.0.0:5432->5432/tcp  docker_db_1

Now we need to connect the application to the database and to do this we can leverage flask-sqlalchemy. As we will use this at every stage of the life of the application, the requirement goes among the production ones. We also need psycopg2 as it is the library used to connect to Postgres.

requirements/production.txt
Flask
flask-sqlalchemy
psycopg2

Remember to run pip install -r requirements/development.txt to install the requirements locally and ./manage.py compose build web to rebuild the image.

At this point I need to create a connection string in the configuration of the application. The connection string parameters come from the same environment variables used to spin up the container db

application/config.py
import os

class Config(object):
    """Base configuration"""

    user = os.environ["POSTGRES_USER"]
    password = os.environ["POSTGRES_PASSWORD"]
    hostname = os.environ["POSTGRES_HOSTNAME"]
    port = os.environ["POSTGRES_PORT"]
    database = os.environ["APPLICATION_DB"] 1

    SQLALCHEMY_DATABASE_URI = (
        f"postgresql+psycopg2://{user}:{password}@{hostname}:{port}/{database}"
    )
    SQLALCHEMY_TRACK_MODIFICATIONS = False


class ProductionConfig(Config):
    """Production configuration"""


class DevelopmentConfig(Config):
    """Development configuration"""


class TestingConfig(Config):
    """Testing configuration"""

    TESTING = True

As you can see, here I use the variable APPLICATION_DB 1 and not POSTGRES_DB, so I need to specify that as well in the config file. The reason, as I mentioned before, is that we prefer to separate the default database, used by Postgres to manage all other databases, from the one used specifically by our application.

config/development.json
[
  {
    "name": "FLASK_ENV",
    "value": "development"
  },
  {
    "name": "FLASK_CONFIG",
    "value": "development"
  },
  {
    "name": "POSTGRES_DB",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_USER",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_HOSTNAME",
    "value": "localhost"
  },
  {
    "name": "POSTGRES_PORT",
    "value": "5432"
  },
  {
    "name": "POSTGRES_PASSWORD",
    "value": "postgres"
  },
  {
    "name": "APPLICATION_DB",
    "value": "application"
  }
]

At this point the application container needs to access some of the Postgres environment variables and the APPLICATION_DB one

docker/development.yml
version: '3.4'

services:
  db:
    image: postgres
    environment:
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    ports:
      - "${POSTGRES_PORT}:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data
  web:
    build:
      context: ${PWD}
      dockerfile: docker/Dockerfile
    environment:
      FLASK_ENV: ${FLASK_ENV}
      FLASK_CONFIG: ${FLASK_CONFIG}
      APPLICATION_DB: ${APPLICATION_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_HOSTNAME: "db"
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_PORT: ${POSTGRES_PORT}
    command: flask run --host 0.0.0.0
    volumes:
      - ${PWD}:/opt/code
    ports:
      - "5000:5000"

volumes:
  pgdata:

Please note that the container web receives the environment variables we pass to the Postgres container because it requires them to connect to the db. The variable POSTGRES_HOSTNAME is passed to give the application the address of the database, and thanks to Docker Compose internal DNS we can simply pass the name of the container. We could not pass the value localhost, as the application, which is running in a container, cannot access the host through that address (unless we use other network modes, which is not ideal).

Running compose now spins up both Flask and Postgres, but the application is not properly connected to the database yet.

Let's have a look inside the DB to see what our configuration created. First run ./manage.py compose up -d to spin up the containers, then connect to the Postgres DB with ./manage.py compose exec db psql -U postgres. Please note that we have to specify the user with -U. The default value is root, but we changed it to postgres with the variable POSTGRES_USER.

You should see a command line like

$ ./manage.py compose exec db psql -U postgres
psql (13.0 (Debian 13.0-1.pgdg100+1))
Type "help" for help.

postgres=#

Also note that by default we are logging into the database called postgres, which was configured by the variable POSTGRES_DB. We can list the databases with \l

postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
-----------+----------+----------+------------+------------+-----------------------
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
(3 rows)

postgres=# 

Last, note that the application database configured with APPLICATION_DB is not present because we haven't created it yet. All the environment variables prefixed by POSTGRES_ are used automatically by the Docker image to perform the initial configuration, which is why the database postgres is already there.

You can exit psql with Ctrl-D or exit.

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Resources

Step 2 - Connecting the application and the database

To connect the Flask application with the database running in the container we need to initialise a SQLAlchemy object and add it to the application factory.

application/models.py
from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy()
application/app.py
from flask import Flask


def create_app(config_name):

    app = Flask(__name__)

    config_module = f"application.config.{config_name.capitalize()}Config"

    app.config.from_object(config_module)

    from application.models import db

    db.init_app(app)

    @app.route("/")
    def hello_world():
        return "Hello, World!"

    return app

A pretty standard way to manage the database in Flask is to use flask-migrate, that adds some commands that allow us to create migrations and apply them.

With flask-migrate you have to create the migrations folder once and for all with flask db init and then, every time you change your models, run flask db migrate -m "Some message" and flask db upgrade. As both db init and db migrate create files in the current directory we now face a problem that every Docker-based setup has to face: file permissions.

The situation is the following: the application is running in the Docker container as root, and there is no connection between the users namespace in the container and that of the host. The result is that if the Docker container creates files in a directory that is mounted from the host (like the one that contains the application code in our example), those files will result as belonging to root. While this doesn't make impossible to work (we usually can become root on our development machines), it is annoying to say the least. The solution is to run those commands from outside the container, but this requires the Flask application to be configured.

Fortunately I wrapped the command flask in the script manage.py, which loads all the required environment variables. Let's add flask-migrate to the production requirements

requirements/production.txt
Flask
flask-sqlalchemy
psycopg2
flask-migrate

Remember to run pip install -r requirements/development.txt to install the requirements locally and ./manage.py compose build web to rebuild the image. Please note that you need the executable pg_config and some other development tools installed in your system. If you get an error message from pip please check the documentation of your operating system to find out what to do to install the required packages. For Ubuntu Linux I had to run sudo apt install build-essential python3-dev libpq-dev.

Now we can initialise a Migrate object and add it to the application factory

application/models.py
from flask_sqlalchemy import SQLAlchemy
from flask_migrate import Migrate

db = SQLAlchemy()
migrate = Migrate()
application/app.py
from flask import Flask


def create_app(config_name):

    app = Flask(__name__)

    config_module = f"application.config.{config_name.capitalize()}Config"

    app.config.from_object(config_module)

    from application.models import db, migrate

    db.init_app(app)
    migrate.init_app(app, db)

    @app.route("/")
    def hello_world():
        return "Hello, World!"

    return app

I can now run the database initialisation script

$ ./manage.py flask db init
  Creating directory /home/leo/devel/flask-tutorial/migrations ...  done
  Creating directory /home/leo/devel/flask-tutorial/migrations/versions ...  done
  Generating /home/leo/devel/flask-tutorial/migrations/env.py ...  done
  Generating /home/leo/devel/flask-tutorial/migrations/README ...  done
  Generating /home/leo/devel/flask-tutorial/migrations/script.py.mako ...  done
  Generating /home/leo/devel/flask-tutorial/migrations/alembic.ini ...  done
  Please edit configuration/connection/logging settings in '/home/leo/devel/flask-tutorial/migrations/alembic.ini' before proceeding.

And, when we will start creating models we will use the commands ./manage.py flask db migrate and ./manage.py flask db upgrade. You will find a complete example at the end of this post.

For the time being let's have a brief look at what was created here. The command db init created the directory migrations and inside it some default configuration files and templates. The migration scripts will be created in the directory migrations/versions but at the moment that directory is empty, as we have no models and we run no migrations (only the initialisation of the system). No changes have been made to the database. The command db init can be run even without running containers (you can remove the directory migrations and try it).

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Resources

Step 3 - Testing setup

I want to use a TDD approach as much as possible when developing my applications, so I need to setup a good testing environment upfront, and it has to be as ephemeral as possible. It is not unusual in big projects to create (or scale up) infrastructure components explicitly to run tests, and through Docker and docker-compose we can easily do the same. Namely, I will:

  1. Spin up a test database in a container without permanent volumes
  2. Initialise it
  3. Run all the tests against it
  4. Tear down the container

This approach has one big advantage, which is that it requires no previous setup and can this be executed on infrastructure created on the fly. It also has disadvantages, however, as it can slow down the testing part of the application, which should be as fast as possible in a TDD setup. Tests that involve the database, however, should be considered integration tests, and not run continuously in a TDD process, which is impossible (or very hard) when using a framework that merges the concept of entity and database model. If you want to know more about this read the book that I wrote on the subject.

Another advantage of this setup is it that we might need other things during the test, e.g. Celery, other databases, other servers. They can all be created through the docker-compose file.

Generally speaking testing is an umbrella under which many different things can happen. As I will use pytest I can run the full suite, but I might want to select specific tests, mentioning a single file or using the powerful option -k that allows me to select tests by pattern-matching their name. For this reason I want to map the management command line to that of pytest.

Let's add pytest to the testing requirements, along with a couple of packages to monitor the test coverage

requirements/testing.txt
-r production.txt

pytest
coverage
pytest-cov

As you can see I also use the coverage plugin to keep an eye on how well I cover the code with the tests. Remember to run pip install -r requirements/development.txt to install the requirements locally and ./manage.py compose build web to rebuild the image.

Warning: before you change the script manage.py make sure you terminate all the running containers running ./manage.py compose down. The next version will change the naming convention for containers and you might end up with some stale containers and run into issues with the database.

manage.py
#! /usr/bin/env python

import os
import json
import signal
import subprocess
import time

import click


# Ensure an environment variable exists and has a value
def setenv(variable, default):
    os.environ[variable] = os.getenv(variable, default)


setenv("APPLICATION_CONFIG", "development") 2


def configure_app(config): 1
    # Read configuration from the relative JSON file
    with open(os.path.join("config", f"{config}.json")) as f:
        config_data = json.load(f)

    # Convert the config into a usable Python dictionary
    config_data = dict((i["name"], i["value"]) for i in config_data)

    for key, value in config_data.items():
        setenv(key, value)


@click.group()
def cli():
    pass


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def flask(subcommand):
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = ["flask"] + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()


def docker_compose_cmdline(config): 3
    configure_app(os.getenv("APPLICATION_CONFIG"))

    docker_compose_file = os.path.join("docker", f"{config}.yml")

    if not os.path.isfile(docker_compose_file):
        raise ValueError(f"The file {docker_compose_file} does not exist")

    return [
        "docker-compose",
        "-p", 4
        config,
        "-f",
        docker_compose_file,
    ]


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def compose(subcommand):
    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()

@cli.command()
@click.argument("filenames", nargs=-1)
def test(filenames):
    os.environ["APPLICATION_CONFIG"] = "testing" 5
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + ["up", "-d"]
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + ["logs", "db"]
    logs = subprocess.check_output(cmdline)
    while "ready to accept connections" not in logs.decode("utf-8"): 6
        time.sleep(0.1)
        logs = subprocess.check_output(cmdline)

    cmdline = ["pytest", "-svv", "--cov=application", "--cov-report=term-missing"]
    cmdline.extend(filenames)
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + ["down"]
    subprocess.call(cmdline)


if __name__ == "__main__":
    cli()

Notable changes are

  • The environment configuration code is now in the function configure_app 1. This allows me to force the variable APPLICATION_CONFIG inside the script 2 and then configure the environment, which saves me from having to call tests with APPLICATION_CONFIG=testing flask test.
  • Both commands flask and compose use the configuration development. Since that is the default value of the variable APPLICATION_CONFIG they just have to call the function configure_app.
  • The docker-compose command line is needed both in the commands compose and in test, so I isolated some code into a function called docker_compose_cmdline 3 which returns a list as needed by subprocess functions. The command line now uses also the option -p (project name) 4 to give a prefix to the containers. This way we can run tests while running the development server.
  • The command test forces APPLICATION_CONFIG to be testing 5, which loads the file config/testing.json, then runs docker-compose using the file docker/testing.yml (both file have not been created yet), runs the pytest command line, and tears down the testing database container. Before running the tests the script waits for the service to be available 6. Postgres doesn't allow connection until the database is ready to accept them.
config/testing.json
[
  {
    "name": "FLASK_ENV",
    "value": "production"
  },
  {
    "name": "FLASK_CONFIG",
    "value": "testing"
  },
  {
    "name": "POSTGRES_DB",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_USER",
    "value": "postgres"
  },
  {
    "name": "POSTGRES_HOSTNAME",
    "value": "localhost" 2
  },
  {
    "name": "POSTGRES_PORT",
    "value": "5433" 1
  },
  {
    "name": "POSTGRES_PASSWORD",
    "value": "postgres"
  },
  {
    "name": "APPLICATION_DB",
    "value": "test"
  }
]

Note that here I specified the value 5433 for POSTGRES_PORT 1. This allows us to spin up the test database container while the development one is running, as that will use port 5432 and you can't have two different containers using the same port on the host. A more general solution could be to leave Docker pick a random host port for the container and then use that, but this requires a bit more code to be properly implemented, so I will come back to this problem when setting up the scenarios.

Also note that I set the variable POSTGRES_HOSTNAME to localhost in this file 2. We will run the tests on the local machine and not in a container, so we can't use the DNS provided by Docker Compose.

The last piece of setup that we need is the orchestration configuration for Docker Compose

docker/testing.yml
version: '3.4'

services:
  db:
    image: postgres
    environment:
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    ports:
      - "${POSTGRES_PORT}:5432"

Now we can run ./manage.py test and get

Creating network "testing_default" with the default driver
Creating testing_db_1 ... done
========================== test session starts =========================
platform linux -- Python 3.7.5, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
-- /home/leo/devel/flask-tutorial/venv3/bin/python3
cachedir: .pytest_cache
rootdir: /home/leo/devel/flask-tutorial
plugins: cov-2.10.0
collected 0 items
Coverage.py warning: No data was collected. (no-data-collected)


----------- coverage: platform linux, python 3.7.5-final-0 -----------
Name                    Stmts   Miss  Cover   Missing
-----------------------------------------------------
application/app.py         11     11     0%   1-21
application/config.py      13     13     0%   1-31
application/models.py       4      4     0%   1-5
-----------------------------------------------------
TOTAL                      28     28     0%

======================== no tests ran in 0.07s =======================
Stopping testing_db_1 ... done
Removing testing_db_1 ... done
Removing network testing_default

Note that the command first creates the testing database container testing_db_1, then runs pytest, and finally stops and remove the container. This is exactly what we wanted to achieve to run tests in isolation. At the moment, however there are no tests, and the testing database is empty.

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Resources

Step 4 - Initialise the testing database

When you develop a web application and then run it in production, you typically create the database once and then upgrade it through migrations. When running tests we need to create the database every time, so I need to add a way to run SQL commands on the testing database before I run pytest.

As running sql commands directly on the the database is often useful I will create a function that wraps the boilerplate for the connection. The command that creates the initial database at that point will be trivial.

manage.py
#! /usr/bin/env python

import os
import json
import signal
import subprocess
import time

import click
import psycopg2
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT


# Ensure an environment variable exists and has a value
def setenv(variable, default):
    os.environ[variable] = os.getenv(variable, default)


setenv("APPLICATION_CONFIG", "development")


def configure_app(config):
    # Read configuration from the relative JSON file
    with open(os.path.join("config", f"{config}.json")) as f:
        config_data = json.load(f)

    # Convert the config into a usable Python dictionary
    config_data = dict((i["name"], i["value"]) for i in config_data)

    for key, value in config_data.items():
        setenv(key, value)


@click.group()
def cli():
    pass


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def flask(subcommand):
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = ["flask"] + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()


def docker_compose_cmdline(config):
    configure_app(os.getenv("APPLICATION_CONFIG"))

    docker_compose_file = os.path.join("docker", f"{config}.yml")

    if not os.path.isfile(docker_compose_file):
        raise ValueError(f"The file {docker_compose_file} does not exist")

    return [
        "docker-compose",
        "-p",
        config,
        "-f",
        docker_compose_file,
    ]


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def compose(subcommand):
    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()


def run_sql(statements):
    conn = psycopg2.connect(
        dbname=os.getenv("POSTGRES_DB"),
        user=os.getenv("POSTGRES_USER"),
        password=os.getenv("POSTGRES_PASSWORD"),
        host=os.getenv("POSTGRES_HOSTNAME"),
        port=os.getenv("POSTGRES_PORT"),
    )

    conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
    cursor = conn.cursor()
    for statement in statements:
        cursor.execute(statement)

    cursor.close()
    conn.close()


@cli.command()
def create_initial_db(): 1
    configure_app(os.getenv("APPLICATION_CONFIG"))

    try:
        run_sql([f"CREATE DATABASE {os.getenv('APPLICATION_DB')}"])
    except psycopg2.errors.DuplicateDatabase:
        print(
            f"The database {os.getenv('APPLICATION_DB')} already exists and will not be recreated"
        )


@cli.command()
@click.argument("filenames", nargs=-1)
def test(filenames):
    os.environ["APPLICATION_CONFIG"] = "testing"
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + ["up", "-d"]
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + ["logs", "db"]
    logs = subprocess.check_output(cmdline)
    while "ready to accept connections" not in logs.decode("utf-8"):
        time.sleep(0.1)
        logs = subprocess.check_output(cmdline)

    run_sql([f"CREATE DATABASE {os.getenv('APPLICATION_DB')}"])

    cmdline = ["pytest", "-svv", "--cov=application", "--cov-report=term-missing"]
    cmdline.extend(filenames)
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline(os.getenv("APPLICATION_CONFIG")) + ["down"]
    subprocess.call(cmdline)


if __name__ == "__main__":
    cli()

As you can see I took the opportunity to write the command create_initial_db 1 as well, that just runs the very same SQL command that creates the testing database, but in any configuration I will use.

Before moving on I think it's time to refactor the file manage.py. Refactoring is not mandatory, but I feel like some parts of the script are not generic enough, and when I will add the scenarios I will definitely need my functions to be flexible.

The new script is

manage.py
#! /usr/bin/env python

import os
import json
import signal
import subprocess
import time

import click
import psycopg2
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT


# Ensure an environment variable exists and has a value
def setenv(variable, default):
    os.environ[variable] = os.getenv(variable, default)


setenv("APPLICATION_CONFIG", "development")

APPLICATION_CONFIG_PATH = "config"
DOCKER_PATH = "docker"


def app_config_file(config): 1
    return os.path.join(APPLICATION_CONFIG_PATH, f"{config}.json")


def docker_compose_file(config): 2
    return os.path.join(DOCKER_PATH, f"{config}.yml")


def configure_app(config):
    # Read configuration from the relative JSON file
    with open(app_config_file(config)) as f:
        config_data = json.load(f)

    # Convert the config into a usable Python dictionary
    config_data = dict((i["name"], i["value"]) for i in config_data)

    for key, value in config_data.items():
        setenv(key, value)


@click.group()
def cli():
    pass


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def flask(subcommand):
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = ["flask"] + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()


def docker_compose_cmdline(commands_string=None): 4
    config = os.getenv("APPLICATION_CONFIG")
    configure_app(config)

    compose_file = docker_compose_file(config)

    if not os.path.isfile(compose_file):
        raise ValueError(f"The file {compose_file} does not exist")

    command_line = [
        "docker-compose",
        "-p",
        config,
        "-f",
        compose_file,
    ]

    if commands_string:
        command_line.extend(commands_string.split(" "))

    return command_line


@cli.command(context_settings={"ignore_unknown_options": True})
@click.argument("subcommand", nargs=-1, type=click.Path())
def compose(subcommand):
    cmdline = docker_compose_cmdline() + list(subcommand)

    try:
        p = subprocess.Popen(cmdline)
        p.wait()
    except KeyboardInterrupt:
        p.send_signal(signal.SIGINT)
        p.wait()


def run_sql(statements):
    conn = psycopg2.connect(
        dbname=os.getenv("POSTGRES_DB"),
        user=os.getenv("POSTGRES_USER"),
        password=os.getenv("POSTGRES_PASSWORD"),
        host=os.getenv("POSTGRES_HOSTNAME"),
        port=os.getenv("POSTGRES_PORT"),
    )

    conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
    cursor = conn.cursor()
    for statement in statements:
        cursor.execute(statement)

    cursor.close()
    conn.close()


def wait_for_logs(cmdline, message): 3
    logs = subprocess.check_output(cmdline)
    while message not in logs.decode("utf-8"):
        time.sleep(0.1)
        logs = subprocess.check_output(cmdline)


@cli.command()
def create_initial_db():
    configure_app(os.getenv("APPLICATION_CONFIG"))

    try:
        run_sql([f"CREATE DATABASE {os.getenv('APPLICATION_DB')}"])
    except psycopg2.errors.DuplicateDatabase:
        print(
            f"The database {os.getenv('APPLICATION_DB')} already exists and will not be recreated"
        )


@cli.command()
@click.argument("filenames", nargs=-1)
def test(filenames):
    os.environ["APPLICATION_CONFIG"] = "testing"
    configure_app(os.getenv("APPLICATION_CONFIG"))

    cmdline = docker_compose_cmdline("up -d")
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline("logs db")
    wait_for_logs(cmdline, "ready to accept connections")

    run_sql([f"CREATE DATABASE {os.getenv('APPLICATION_DB')}"])

    cmdline = ["pytest", "-svv", "--cov=application", "--cov-report=term-missing"]
    cmdline.extend(filenames)
    subprocess.call(cmdline)

    cmdline = docker_compose_cmdline("down")
    subprocess.call(cmdline)


if __name__ == "__main__":
    cli()

Notable changes:

  • I created two new functions app_config_file 1 and docker_compose_file 2 that encapsulate the creation of the file paths.
  • I isolated the code that waits for a message in the database container logs, creating the function wait_for_logs 3.
  • The command docker_compose_cmdline 4 now receives a string and converts it into a list internally. This way expressing commands is more natural, as it doesn't require the ugly list syntax that subprocess works with.

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Resources

  • Psycopg – PostgreSQL database adapter for Python

Step 5 - Fixtures for tests

Pytest uses fixtures for tests, so we should prepare some basic ones that will be generally useful. First let's include pytest-flask, which provides already some basic fixtures

requirements/testing.txt
-r production.txt

pytest
coverage
pytest-cov
pytest-flask

Then add the fixtures app and database to the file tests/conftest.py. The first is required by pytest-flask itself (it's used by other fixtures) and the second one is useful every time you need to interact with the database itself.

tests/conftest.py
import pytest

from application.app import create_app
from application.models import db


@pytest.fixture
def app():
    app = create_app("testing")

    return app


@pytest.fixture(scope="function")
def database(app):
    with app.app_context():
        db.drop_all()
        db.create_all()

    yield db

Remember to create the empty file tests/__init__.py to make pytest correctly load the code.

As you can see, the fixture database uses the methods drop_all and create_all to reset the database. The reason is that this fixture is recreated for each function, and we can't be sure a previous function left the database clean. As a matter of fact, we might be almost sure of the opposite.

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Resources

Bonus step - A full TDD example

Before wrapping up this post, I want to give you a full example of the TDD process that I would follow given the current state of the setup, which is already complete enough to start the development of an application. Let's pretend my goal is that of adding a User model that can be created with an id (primary key) and an email fields.

First of all I write a test that creates a user in the database and then retrieves it, checking its attributes

tests/test_user.py
from application.models import User


def test__create_user(database):
    email = "some.email@server.com"
    user = User(email=email)
    database.session.add(user)
    database.session.commit()

    user = User.query.first()

    assert user.email == email

Running this test results in an error, because the module User does not exist

$ ./manage.py test
Creating network "testing_default" with the default driver
Creating testing_db_1 ... done
======================================= test session starts ======================================
platform linux -- Python 3.7.5, pytest-5.4.3, py-1.9.0, pluggy-0.13.1 --
/home/leo/devel/flask-tutorial/venv3/bin/python3
cachedir: .pytest_cache
rootdir: /home/leo/devel/flask-tutorial
plugins: flask-1.0.0, cov-2.10.0
collected 0 items / 1 error
============================================= ERRORS =============================================
___________________________ ERROR collecting tests/tests/test_user.py ___________________________
ImportError while importing test module '/home/leo/devel/flask-tutorial/tests/tests/test_user.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
venv3/lib/python3.7/site-packages/_pytest/python.py:511: in _importtestmodule
    mod = self.fspath.pyimport(ensuresyspath=importmode)
venv3/lib/python3.7/site-packages/py/_path/local.py:704: in pyimport
    __import__(modname)
venv3/lib/python3.7/site-packages/_pytest/assertion/rewrite.py:152: in exec_module
    exec(co, module.__dict__)
tests/tests/test_user.py:1: in <module>
    from application.models import User
E   ImportError: cannot import name 'User' from 'application.models'
	(/home/leo/devel/flask-tutorial/application/models.py)

----------- coverage: platform linux, python 3.7.5-final-0 -----------
Name                    Stmts   Miss  Cover   Missing
-----------------------------------------------------
application/app.py         11      9    18%   6-21
application/config.py      14     14     0%   1-32
application/models.py       4      0   100%
-----------------------------------------------------
TOTAL                      29     23    21%

==================================== short test summary info ===================================
ERROR tests/tests/test_user.py
!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================================= 1 error in 0.20s =======================================
Stopping testing_db_1 ... done
Removing testing_db_1 ... done
Removing network testing_default
$ 

I won't show here all the steps of the strict TDD methodology, and implement directly the final solution, which is

application/models.py
from flask_sqlalchemy import SQLAlchemy
from flask_migrate import Migrate

db = SQLAlchemy()
migrate = Migrate()


class User(db.Model):
    __tablename__ = "users"
    id = db.Column(db.Integer, primary_key=True)
    email = db.Column(db.String, unique=True, nullable=False)

With this model the test passes

$ ./manage.py test
Creating network "testing_default" with the default driver
Creating testing_db_1 ... done
=================================== test session starts ==================================
platform linux -- Python 3.7.5, pytest-5.4.3, py-1.9.0, pluggy-0.13.1 --
/home/leo/devel/flask-tutorial/venv3/bin/python3
cachedir: .pytest_cache
rootdir: /home/leo/devel/flask-tutorial
plugins: flask-1.0.0, cov-2.10.0
collected 1 item

tests/test_user.py::test__create_user PASSED

----------- coverage: platform linux, python 3.7.5-final-0 -----------
Name                    Stmts   Miss  Cover   Missing
-----------------------------------------------------
application/app.py         11      1    91%   19
application/config.py      14      0   100%
application/models.py       8      0   100%
-----------------------------------------------------
TOTAL                      33      1    97%


==================================== 1 passed in 0.14s ===================================
Stopping testing_db_1 ... done
Removing testing_db_1 ... done
Removing network testing_default
$ 

Please not that this is a very simple example and that in a real case I would add some other tests before accepting this code. In particular we should check that the field email can be empty, and maybe also test some validation on that field.

Let's add a very simple route to use the newly created model

application/app.py
from flask import Flask
from application.models import User


def create_app(config_name):

    app = Flask(__name__)

    config_module = f"application.config.{config_name.capitalize()}Config"

    app.config.from_object(config_module)

    from application.models import db, migrate

    db.init_app(app)
    migrate.init_app(app, db)

    @app.route("/")
    def hello_world():
        return "Hello, World!"

    @app.route("/users")
    def users():
        num_users = User.query.count()
        return f"Number of users: {num_users}"

    return app

As you can see I didn't introduce anything too complicated. I import the model User and count the number of entries in its table. We will create the table in a minute with the migration that flask db migrate will create for us, so we expect this to just return a page that says "Number of users: 0", but it's a good demonstration that the connection with the database is working.

So, let's generate the migration in the database. Spin up the development environment with

$ ./manage.py compose up -d

If this is the first time I spin up the environment I have to create the application database and to initialise the migrations, so I run

$ ./manage.py create-initial-db

As we already initialised Alembic before we don't need to run the command db init. If you do, it will return Error: Directory migrations already exists and is not empty. Now I can create the migration with

$ ./manage.py flask db migrate -m "Initial user model"
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.autogenerate.compare] Detected added table 'users'
  Generating /home/leo/devel/flask-tutorial/migrations/versions/7a09d7f8a8fa_initial_user_model.py ...  done

As you can see from the output, this created the file migrations/versions/7a09d7f8a8fa_initial_user_model.py. The number 7a09d7f8a8fa is just an hex version of a UUID ,so it will be different for you, while the name comes from the commit message. The file itself contains SQLAlchemy code that changes the DB according to the code that we wrote in the application.

Finally I can apply the migration with

$ ./manage.py flask db upgrade
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 7a09d7f8a8fa, Initial user model

At this point we can run ./manage.py compose exec db psql -U postgres again and see what happened to the database.

$ ./manage.py compose exec db psql -U postgres
psql (13.0 (Debian 13.0-1.pgdg100+1))
Type "help" for help.

postgres=# \l
                                  List of databases
    Name     |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges   
-------------+----------+----------+------------+------------+-----------------------
 application | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 postgres    | postgres | UTF8     | en_US.utf8 | en_US.utf8 | 
 template0   | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
             |          |          |            |            | postgres=CTc/postgres
 template1   | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
             |          |          |            |            | postgres=CTc/postgres
(4 rows)

You see here that the database application configured with APPLICATION_DB has beed created. You can now connect to it and list the tables

postgres=# \c application
You are now connected to database "application" as user "postgres".
application=# \dt
              List of relations
 Schema |      Name       | Type  |  Owner   
--------+-----------------+-------+----------
 public | alembic_version | table | postgres
 public | users           | table | postgres
(2 rows)

The content of the table alembic_version shouldn't be surprising, as it's the UUID used for the migration

application=# select * from alembic_version;
 version_num  
--------------
 7a09d7f8a8fa
(1 row)

The table users contains the fields id and email according to the model that we wrote in Python

application=# \d users
                                 Table "public.users"
 Column |       Type        | Collation | Nullable |              Default              
--------+-------------------+-----------+----------+-----------------------------------
 id     | integer           |           | not null | nextval('users_id_seq'::regclass)
 email  | character varying |           | not null | 
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
    "users_email_key" UNIQUE CONSTRAINT, btree (email)

You can also open your browser and head to http://localhost:5000/users to see the new route in action. After this we can safely commit my code and move on with the next requirement.

Git commit

You can see the changes made in this step through this Git commit or browse the files.

Final words

I hope this post already showed you why a good setup can make the difference. The project is clean and wrapping the command in the management script plus the centralised config proved to be a good choice as it allowed me to solve the problem of migrations and testing in (what I think is) an elegant way. In the next post I'll show you how to easily create scenarios where you can test queries with only specific data in the database. If you find my posts useful please share them with whoever you thing might be interested.

Updates

2020-07-13 Vlad Pavlichek found and fixed a typo in the post, where manage.py was missing the extension .py. Thanks Vlad!

2020-12-22 I reviewed the whole tutorial and corrected several typos

Feedback

Feel free to reach me on Twitter if you have questions. The GitHub issues page is the best place to submit corrections.