In 2015 I was introduced by my friend Roberto Ciatti to the concept of Clean Architecture, as it is called by Robert Martin. The well-known Uncle Bob talks a lot about this concept at conferences and wrote some very interesting posts about it. What he calls "Clean Architecture" is a way of structuring a software system, a set of consideration (more than strict rules) about the different layers and the role of the actors in it.
As he clearly states in a post aptly titled The Clean Architecture, the idea behind this design is not new. As a matter of fact, it is a set of concepts that have been pushed by many software engineers over the last 3 decades. One of the first implementations may be found in the Boundary-Control-Entity model proposed by Ivar Jacobson in his masterpiece Object-Oriented Software Engineering: A Use Case Driven Approach published in 1992, but Martin lists other more recent versions of this architecture.
I will not repeat here what he had already explained better than I can do, so I will just point out some resources you may check to start exploring these concepts:
- The Clean Architecture a post by Robert Martin that concisely describes the goals of the architecture. It also lists resources that describe similar architectures.
- The Open Closed Principle a post by Robert Martin not strictly correlated with the Clean Architecture concept but important for the separation concept.
- Hakka Labs: Robert "Uncle Bob" Martin - Architecture: The Lost Years a video of Robert Martin from Hakka Labs.
- DDD & Testing Strategy by Lauri Taimila
- Clean Architecture by Robert Martin, published by Prentice Hall.
A day in the life of a clean system¶
I will introduce here a (very simple) system designed with a clean architecture. The purpose of this section is to familiarise with main concepts like separation of concerns and inversion of control, which are paramount in system design. While I describe how data flows in the system, I will purposefully omit details, so that we can focus on the global idea and not worry too much about the implementation.
The data flow
In the rest of the book, we will design together part of a simple web application that provides a room renting system. So, let's consider that our "Rent-o-Matic" application (inspired by the Sludge-O-Matic™ from Day of the Tentacle) is running at https://www.rentomatic.com, and that a user wants to see the available rooms. They open the browser and type the address, then clicking on menus and buttons they reach the page with the list of all the rooms that our company rents.
Let's assume that this URL is
/rooms?status=available. When the user's browser accesses that URL, an HTTP request reaches our system, where there is a component that is waiting for HTTP connections. Let's call this component "web framework".
The purpose of the web framework is to understand the HTTP request and to retrieve the data that we need to provide a response. In this simple case there are two important parts of the request, namely the endpoint itself (
/rooms), and a single query string parameter,
status=available. Endpoints are like commands for our system, so when a user accesses one of them, they signal to the system that a specific service has been requested, which in this case is the list of all the rooms that are available for rent.
The domain in which the web framework operates is that of the HTTP protocol, so when the web framework has decoded the request it should pass the relevant information to another component that will process it. This other component is called use case, and it is the crucial and most important component of the whole clean system as it implements the business logic.
The business logic is an important concept in system design. You are creating a system because you have some knowledge that you think might be useful to the world, or at the very least marketable. This knowledge is, at the end of the day, a way to process data, a way to extract or present data that maybe others don't have. A search engine can find all the web pages that are related to the terms in a query, a social network shows you the posts of people you follow and sorts them according to a specific algorithm, a travel company finds the best options for your journey between two locations, and so on. All these are good examples of business logic.
Business logic is the specific algorithm or process that you want to implement, the way you transform data to provide a service. It is the most important part of the system.
The use case implements a very specific part of the whole business logic. In this case we have a use case to search for rooms with a given value of the parameter
status. This means that the use case has to extract all the rooms that are managed by our company and filter them to show only the ones that are available.
Why can't the web framework do it? Well, the main purpose of a good system architecture is to separate concerns, that is to keep different responsibilities and domains separated. The web framework is there to process the HTTP protocol, and is maintained by programmers that are concerned with that specific part of the system, and adding the business logic to it mixes two very different fields.
Different parts a system should manage different parts of the process. Whenever two separate parts of a system work on the same data or the same part of a process they are coupled. While coupling is unavoidable, the higher the coupling between two components the harder is to change one without affecting the other.
As we will see, separating layers allows us to maintain the system with less effort, making single parts of it more testable and easily replaceable.
In the example that we are discussing here, the use case needs to fetch all the rooms that are in an available state, extracting them from a source of data. This is the business logic, and in this case it is very straightforward, as it will probably consist of a simple filtering on the value of an attribute. This might however not be the case. An example of a more advanced business logic might be an ordering based on a recommendation system, which might require the use case to connect with more components than just the data source.
So, the information that the use case wants to process is stored somewhere. Let's call this component storage system. Many of you probably already pictured a database in your mind, maybe a relational one, but that is just one of the possible data sources. The abstraction represented by the storage system is: anything that the use case can access and that can provide data is a source. It might be a file, a database (either relational or not), a network endpoint, or a remote sensor.
When designing a system, it is paramount to think in terms of abstractions, or building blocks. A component has a role in the system, regardless of the specific implementation of that component. The higher the level of the abstraction, the less detailed are the components. Clearly, high-level abstractions don't consider practical problems, which is why the abstract design has to be then implemented using specific solutions or technologies.
For simplicity's sake, let's use a relational database like Postgres in this example, as it is likely to be familiar to the majority of readers, but keep in mind the more generic case.
How does the use case connect with the storage system? Clearly, if we hard code into the use case the calls to a specific system (e.g. using SQL) the two components will be strongly coupled, which is something we try to avoid in system design. Coupled components are not independent, they are tightly connected, and changes occurring in one of the two force changes in the second one (and vice versa). This also means that testing components is more difficult, as one component cannot live without the other, and when the second component is a complex system like a database this can severely slow down development.
For example, let's assume the use case called directly a specific Python library to access PostgreSQL such as psycopg. This would couple the use case with that specific source, and a change of database would result in a change of its code. This is far from being ideal, as the use case contains the business logic, which has not changed moving from one database system to the other. Parts of the system that do not contain the business logic should be treated like implementation details.
A specific solution or technology is called a detail when it is not central to the design as a whole. The word doesn't refer to the inherent complexity of the subject, which might be greater than that of more central parts.
A relational database is hundred of times richer and more complex than an HTTP endpoint, and this in turn is more complex than ordering a list of objects, but the core of the application is the use case, not the way we store data or the way we provide access to that. Usually, implementation details are mostly connected with performances or usability, while the core parts implement the pure business logic.
How can we avoid strong coupling? A simple solution is called inversion of control, and I will briefly sketch it here, and show a proper implementation in a later section of the book, when we will implement this very example.
Inversion of control happens in two phases. First, the called object (the database in this case) is wrapped with a standard interface. This is a set of functionalities shared by every implementation of the target, and each interface translates the functionalities to calls to the specific language of the wrapped implementation.
A technique used to avoid strong coupling between components of a system, that involves wrapping them so that they expose a certain interface. A component expecting that interface can then connect to them without knowing the details of the specific implementation, and thus being strongly coupled to the interface instead of the specific implementation.
A real world example of this is that of power plugs: electric appliances are designed to be connected not with specific power plugs, but to any power plug that is build according to the specification (size, number of poles, etc). When you buy a TV in the UK, you expect it to come with a UK plug (BS 1363). If it doesn't, you need an adapter that allows you to plug electronic devices into sockets of a foreign nation. In this case, we need to connect the use case (TV) to a database (power system) that has not been designed to match a common interface.
In the example we are discussing, the use case needs to extract all rooms with a given status, so the database wrapper needs to provide a single entry point that we might call
In the second phase of inversion of control the caller (the use case) is modified to avoid hard coding the call to the specific implementation, as this would again couple the two. The use case accepts an incoming object as a parameter of its constructor, and receives a concrete instance of the adapter at creation time. The specific technique used to implement this depends greatly on the programming language we use. Python doesn't have an explicit syntax for interfaces, so we will just assume the object we pass implements the required methods.
Now the use case is connected with the adapter and knows the interface, and it can call the entry point
list_rooms_with_status passing the status
available. The adapter knows the details of the storage system, so it converts the method call and the parameter in a specific call (or set of calls) that extract the requested data, and then converts them in the format expected by the use case. For example, it might return a Python list of dictionaries that represent rooms.
At this point, the use case has to apply the rest of the business logic, if needed, and return the result to the web framework.
Advantages of a layered architecture
As you can see, the stages of this process are clearly separated, and there is a great deal of data transformation between them. Using common data formats is one of the way we achieve independence, or loose coupling, between components of a computer system.
To better understand what loose coupling means for a programmer, let's consider the last picture. In the previous paragraphs I gave an example of a system that uses a web framework for the user interface and a relational database for the data source, but what would change if the front-end part was a command-line interface? And what would change if, instead of a relational database, there was another type of data source, for example a set of text files?
As you can see, both changes would require the replacement of some components. After all, we need different code to manage a command line instead of a web page. But the external shape of the system doesn't change, neither does the way data flows. We created a system in which the user interface (web framework, command-line interface) and the data source (relational database, text files) are details of the implementation, and not core parts of it.
The main immediate advantage of a layered architecture, however, is testability. When you clearly separate components you clearly establish the data each of them has to receive and produce, so you can ideally disconnect a single component and test it in isolation. Let's take the Web framework component that we added and consider it for a moment forgetting the rest of the architecture. We can ideally connect a tester to its inputs and outputs as you can see in the figure
We know that the Web framework receives an HTTP request 1 with a specific target and a specific query string, and that it has to call 2 a method on the use case passing specific parameters. When the use case returns data 3, the Web framework has to convert that into an HTTP response 4. Since this is a test we can have a fake use case, that is an object that just mimics what the use case does without really implementing the business logic. We will then test that the Web framework calls the method 2 with the correct parameters, and that the HTTP response 4 contains the correct data in the proper format, and all this will happen without involving any other part of the system.
Clean Architectures in Python: the book¶
I hope you found this introduction useful. What you read so far was the first chapter of the book "Clean Architectures in Python" that you can read online at The Digital Cat Books. The book is available as PDF and ebook on Leanpub.
Chapter 2 of the book briefly discusses the components and the ideas behind this software architecture. Chapter 3 runs through a concrete example of clean architecture and chapter 4 expands the example adding a web application on top of it. Chapter 5 discusses error management and improvements to the Python code developed in the previous chapters. Chapters 6 and 7 show how to plug different database systems to the web service created previously, and chapter 8 wraps up the example showing how to run the application with a production-ready configuration.
Clean Architectures in Python: the book
Python Mocks: a gentle introduction - Part 1
TDD in Python with pytest - Part 5
TDD in Python with pytest - Part 1