$ python >>> import this The Zen of Python, by Tim Peters .... Namespaces are one honking great idea -- let's do more of those!
We are going to concern ourselves with the bottom item (I have omitted the rest. If you have never read the Zen of Python then go and
Namespaces are one honking great idea -- let's do more of those!
I think this is something we can all agree on. Split code into namespaces, packages and modules appropriately.
One worry that I had was that when you reach for a namespace it is because the codebase is large enough to require splitting up. Large namespaces mean that I would end up with a big repository - a prospect that I wanted to avoid. This is something that is rather easy to do in Java/.NET enterprise land, but something I was altogether unsure about in the Python world. Turns out it is rather straight forward, something I learnt at my current day job and explorted and extended, where all of our code is heavily namespaced and split into tiny packages.
I am going to just go straight into an example and try and convince you of why you might want to do this towards the end.
Building Namespace Packages
Our first example namespace package looks something like this:
├── companyname │ ├── __init__.py │ └── auth │ └── __init__.py └── setup.py
In our example we are going to nest everything under our company name namespace and only have one package in this namespace,
Lets look at a cut down
setup.py for this structure:
from setuptools import setup setup( name='companyname-auth', packages=( 'companyname.auth', ), namespace_packages=( 'companyname', ) )
I will refer you to the setuptools docs for more information on building packages. For this example we only need to concern ourselves with the above three attributes within the
By convention we have named the package
companyname-auth, we also name the repository
companyname-auth for consistency. This is the name used when doing a
We then have
packages, a list of all module paths underneath our namespace that contain code. Now normally we could use
setuptools.find_packages() to do the hard work for us, annoyingly enough, this doesn’t work with namespace packages.
Finally we have
namespace_packages. As the name suggests, this lists all packages that are namespace packages. There are a few caveats for a package to be considered a namespace package:
__init__.pymust only include code that is the equivalent of:
- The package cannot include any other files.
In older python versions, namespaces are not a language feature. Instead the functionality is provided by setuptools and
pkg_resources. PEP-420 adds namespace packages as a language feature to Python 3.3, unfortunately for us we’re still pre-Python 3.
Installing our example
Now when you
python setup.py install this package it will end up in
site-packages/companyname/auth. A feat that is altogether underwhelming. But here is:
└── companyname └── auth └── __init__.py
from companyname.auth import ....
Let us have another package called
companyname-api. If we were to create a new
companyname-api package with the same structure, do a bit of find-replace,
python setup.py install it, it will end up within the environment at
site-packages/companyname/api right alongside our existing
auth package looking something like this:
└── companyname └── api └── __init__.py └── auth └── __init__.py
Taking the example even further, we might decide that
companyname.api is a rather large namespace, it has multiple components that make up an
api, some of these may even be optional. So we simply make
api a namespace package.
companyname-api does have some shared functionality for all
companyname-api-* packages so we also create
core package that can nicely house this functionality. The project structure may look like this:
├── companyname │ ├── __init__.py │ └── api │ ├── __init__.py │ └── core │ └── __init__.py └── setup.py
Note that to satisfy caveat (2) of namespaces packages, our api package code is nested in
With the below
from setuptools import setup setup( name='companyname-api', packages=( 'companyname.api.core', ), namespace_packages=( 'companyname.api', 'companyname', ), )
├── companyname │ ├── __init__.py │ └── api │ ├── __init__.py │ └── contactus │ └── __init__.py └── setup.py
├── companyname │ ├── __init__.py │ └── api │ ├── __init__.py │ └── posts │ └── __init__.py └── setup.py
Once installed we end up with the following in our site-packages directory:
└── companyname └── api | └── contactus | | └── __init__.py | └── core | | └── __init__.py | └── posts | └── __init__.py └── auth └── __init__.py
Pretty cool, right?
So there are quite a few benefits to using packages as well as few for namespace packages.
Sub packages within namespaces can be self contained in their own source code repository. To me this is the issue I set out to solve. Great, I can treat each component as its own thing.
Each seperate repository is independently versioned. Now this does come with its own number of problems, but I see it as a huge advantage. You end up in a situation where you can form well reasoned opinions purely based on version numbers. I warn you that if you go from working in a world where version numbers are somewhat arbritrary to one of lots of packages and numbers with meaning, it is easy to end up in either dependency hell or version promiscuity or both. I advise that you follow semver or another formalised approach to versioning.
Test suites are likely to remain fast under a minimal amount of effort. Historically using a framework like Django where you want to test against a real database, it doesn’t take that many tests to end up with a test suite longer than a couple of seconds, to some this is just too slow. Now suddenly you have less code per package, less tests are required to test everying so you end up with a faster test suites.
Changelogs and version remain focused. Each individual package can have its own changelog, issue tracker, etc. There is less to think about when working with that individual package.
Extra optional packages (similar to
django.contrib.* packages) can be namespaced under the namespace without being bundled.
A few warnings
PIP doesn’t really do real dependency resolution. So you can only rely on semver to take care of you to some extent.
If you do anything that relies on per repository licensing, things can get needlessly expensive because their pricing model doesn’t match your reality. For example Github and Coveralls both follow this model.
This approach may not make sense in your situation. For us we build highly customised client sites and APIs that all use a varying degree of functionality from our toolbox. There are some core packages that pretty much all clients need and then there are those that only few use. A client project only pulls in the packages that are necessary, we add client specific code to the project and keep our libraries generic enough for re-use. This approach allows us to iterate and evolve different parts of our code base in isolation and has worked for us so far.
You can check out the full example on Github.
When working on a local namespace package there are a few gotchas. Returning to our example above,
companyname-api-posts will in all likelihood require
companyname-api and will
setup.py. To develop locally the advised approach is to
pip install -e /path/to/companyname-api.
Performing local development on
companyname-api-posts may go something like this:
$ cd /path/to/companyname-api-posts $ pip install -e . $ ./runtests.py ... ImportError: Could not import 'companyname.api.posts...`
You now have two directories on your path matching
companyname/api, one in
site-packages containing the
companyname-api package and another in your development copy of
companyname-api-posts. But for some reason it cannot find
companyname.api.posts on the path. This is because
pkg_resources is not a language feature. You need to
import pkg_resources somewhere prior to attempting to finding the second directory. This is not an issue when everything is installed in the python environment as it it is not spread out over multiple directories and behaves as expected.
This issue evaded me for longer than I care to admit. On a few rare occasions I had issues with getting tests to run when I had a bare project structure, but then magically it would all start working once I had written some tests and Django model code. Weird. Turns out
pkg_resources and because I do Django development, and Django does quite a bit of initialisation work,
pytz was in most cases importing
pkg_resources for me before it attempted to import the development package .
PEP-420 formalises namespace packages in Python 3.3 as a language feature. My understanding is that
pkg_resources continues to work alongside PEP-420. I haven’t done any research into migrating away from
$DAY_JOB is on Python 2.7 for the foreseeable future.