The optimal python project structure
In this post, I will describe a python project structure that I have found extremely useful over a wide variety of projects. We’re going to build this structure from the ground up so that you can better understand the ideas that have lead me to this optimal layout. In this post, I will only include what I consider to be absolutely necessary for any python project. You can find the full project structure in my github repository.
Let’s start with a directory called «my_project». This is the project directory. As you can see, it’s totally empty.
The project vs the package
The python project is everything in the base directory. All files related to your python application will be in the project directory.
The package, on the other hand, is as subdirectory inside the project with the same name as the project itself. This package contains the source code of your application. The reason for having this package directory is to separate source code from other files. When we pip install our project, we will tell pip to only include the files contained in the package directory.
Many people get confused by the distinction between project and package. As a reminder: the project goes to source control, the package gets installed.
Since the source code is the most important part of our project, let’s start by adding a package to our project structure.
Why __init__.py?
The package needs to contain at least an __init__.py file. This tells python that this directory is indeed a package. When python loads this package, it automatically runs the __init__.py. Therefore, it can be useful to include initialisation steps for the package. In all my projects, I add at least two things to this __init__.py:
-
A variable called ROOT_DIR containing the absolute path to the location of the package. I have always found it useful for a package to know it’s absolute location. For instance, this can be used to load non-source files contained in the package. Relying on the current working directory is not a good idea. Anyone can modify the current working directory, and your application won’t always be launched from the same location.
-
The configuration of the logger for my package. Python logging is a broad topic and a post for another time. It’s enough to say that the logger for a package must be initialised once for the whole package. It therefore makes sense to have it inside the __init__.py.
Here is an example of what my __init__.py might look like:
from os.path import dirname, abspath
ROOT_DIR = dirname(abspath(__file__))
###Logging initialisation code would go here###
Our project now looks like this:
Helping git with a .gitignore
Any software project should be version controlled. By far the most widely used version control system is Git. A .gitignore file is a text file describing some files that should not be included in version control. There are many reasons for not wanting to source control certain files. For instance, the file could contain sensitive data such as passwords. You might also want to exclude large data files such as images.
Some examples of .gitignore files can be found in this github repository.
I usually start with a small number of files and build gradually. My basic .gitignore might resemble this:
# Pycache and compiled python files.
__pycache__/
*py[cod]
# Juyter notebook checkpoints
.ipynb_checkpoints
#Egg info files produced by pip installation
*egg-info
Helping yourself and others with a README.md
A readme is a documentation file often written in markdown.
It contains useful information so that others can understand what your project is about. A readme should at least contain a simple description of your project and instructions for how to install and use the package.
# My Project
A simple project providing a useful base structure for python packages.
## Installation
To install this project's package run:
pip install /path/to/my_project
To install the package in editable mode, use:
pip install –editable /path/to/my_project
Helping pip with a setup.py
A setup.py is a python file that contains information about the package you are installing.
A very minimal setup.py contains the following :
import setuptools
setuptools.setup(name='my_project', packages=['my_project'])
Here we are saying that the name of our package should be « my_project ». This name will be used in the package metadata stored by pip. The « packages » parameter takes the name of package directory to install. Previously, I said only the package part of our project structure would be installed, this « packages » parameter is why.
At this point, your project contains a perfectly valid, pip installable package. The huge advantages that come with having a pip installable package might not be immediately apparent. But trust me, making a project pip installable isn’t just about shipping your application to other users. It’s also extremely useful for development reasons. But that’s a discussion for another post.
Tracking requirements with requirements.txt
As your project grows, it will likely include more and more dependencies. A good way of tracking dependencies is through a requirements.txt.
This requirements.txt contains all the packages that your project needs and that are not part of the standard library. We will use this requirements.txt to make pip automatically download and install requirements for us.
Let’s make a very simple requirements.txt and add numpy to it (numpy is a package for scientific computing in python).
numpy==1.18.2
This file alone is not very useful, we need to tell pip about these requirements. For that, we must slightly update our setup.py.
import setuptools
with open('requirements.txt', 'r') as f:
install_requires = f.read().splitlines()
setuptools.setup(name='my_project',
packages=['my_project'],
install_requires=install_requires)
As you can see, we have added the packages contained in the requirements.txt to our package setup. Therefore, when pip installs our package, it will search for that version of numpy. If it does not find it, it will download it for us.
There are many more things to say about package dependencies and requirements.txt. For instance, your requirements.txt does not have to specify exact versions, and there are tools for automatically generating requirements files based on source code (e.g pipreqs).
Package installation an dependency management can be hugely improved through the use of and environment manager. Check out my article on python anaconda for more information about using package managers.
The License
Including a license in your python project structure is important. Especially if it is going to be deployed publicly. Others should know what they are entitled to do with your software, or it might prevent them from using it. If you don’t know which license to choose, take a look at choosealicense.com.
Last but not least: tests
Testing is often overlooked in software development. In this post I will not go into the details of testing. However, I always keep my tests in a separate directory from the package source code.
Conclusion
I have just shown you what I consider to be the absolute minimal python project structure. There are many benefits to keeping your python projects well structured. Whether it’s for professional or personal projects, a systematic approach to organising your code will speed up development and bring clarity to your work.
Let’s take a look at our final python project structure:
There are many ways to extend this basic structure. This sample project is stored on githhub, don’t hesitate to fork or clone the repository and play around with it yourself. For potential ideas on extending your base python project, take a look at this repository from Neuraxio. It contains a more detailed version of a setup.py and a small test example.
This post is included in a series on python development fundamentals. Please check out the other posts in the series for more information.
I hope you enjoyed reading. Have a great day.