Python 2.7, Django and namespace packages

Apr 9, 2015 23:50 · 1493 words · 8 minutes read packaging namespaces python

Introducing import this

$ python
>>> import this
The Zen of Python, by Tim Peters
....
Namespaces are one honking great idea -- let's do more of those!

We are going to concern ourselves with the bottom item (I have omitted the rest. If you have never read the Zen of Python then go and import this).

Namespaces are one honking great idea -- let's do more of those!

I think this is something we can all agree on. Split code into namespaces, packages and modules appropriately.

One worry that I had was that when you reach for a namespace it is because the codebase is large enough to require splitting up. Large namespaces mean that I would end up with a big repository - a prospect that I wanted to avoid. This is something that is rather easy to do in Java/.NET enterprise land, but something I was altogether unsure about in the Python world. Turns out it is rather straight forward, something I learnt at my current day job and explorted and extended, where all of our code is heavily namespaced and split into tiny packages.

I am going to just go straight into an example and try and convince you of why you might want to do this towards the end.

Building Namespace Packages

Our first example namespace package looks something like this:

├── companyname
│   ├── __init__.py
│   └── auth
│       └── __init__.py
└── setup.py

In our example we are going to nest everything under our company name namespace and only have one package in this namespace, auth.

Lets look at a cut down setup.py for this structure:

from setuptools import setup

setup(
    name='companyname-auth', 
    packages=(
        'companyname.auth',
    ),
    namespace_packages=(
        'companyname',
    )
)

I will refer you to the setuptools docs for more information on building packages. For this example we only need to concern ourselves with the above three attributes within the setup(...) call.

By convention we have named the package companyname-auth, we also name the repository companyname-auth for consistency. This is the name used when doing a pip install.

We then have packages, a list of all module paths underneath our namespace that contain code. Now normally we could use setuptools.find_packages() to do the hard work for us, annoyingly enough, this doesn’t work with namespace packages.

Finally we have namespace_packages. As the name suggests, this lists all packages that are namespace packages. There are a few caveats for a package to be considered a namespace package:

  1. The __init__.py must only include code that is the equivalent of: __import__('pkg_resources').declare_namespace(__name__)
  2. The package cannot include any other files.

In older python versions, namespaces are not a language feature. Instead the functionality is provided by setuptools and pkg_resources. PEP-420 adds namespace packages as a language feature to Python 3.3, unfortunately for us we’re still pre-Python 3.

Installing our example

Now when you python setup.py install this package it will end up in site-packages/companyname/auth. A feat that is altogether underwhelming. But here is:

└── companyname
    └── auth
        └── __init__.py    

We can from companyname.auth import ....

Let us have another package called companyname-api. If we were to create a new companyname-api package with the same structure, do a bit of find-replace, python setup.py install it, it will end up within the environment at site-packages/companyname/api right alongside our existing auth package looking something like this:

└── companyname
    └── api
        └── __init__.py
    └── auth
        └── __init__.py    

Taking the example even further, we might decide that companyname.api is a rather large namespace, it has multiple components that make up an api, some of these may even be optional. So we simply make api a namespace package. companyname-api does have some shared functionality for all companyname-api-* packages so we also create core package that can nicely house this functionality. The project structure may look like this:

companyname-api

├── companyname
│   ├── __init__.py
│   └── api
│       ├── __init__.py
│       └── core
│           └── __init__.py
└── setup.py

Note that to satisfy caveat (2) of namespaces packages, our api package code is nested in core.

With the below setup.py:

from setuptools import setup
setup(
    name='companyname-api',
    packages=(
        'companyname.api.core',
    ),
    namespace_packages=(
        'companyname.api',
        'companyname',
    ),
)

companyname-api-contactus

├── companyname
│   ├── __init__.py
│   └── api
│       ├── __init__.py
│       └── contactus
│           └── __init__.py
└── setup.py

companyname-api-posts

├── companyname
│   ├── __init__.py
│   └── api
│       ├── __init__.py
│       └── posts
│           └── __init__.py
└── setup.py

Once installed we end up with the following in our site-packages directory:

└── companyname
    └── api
    |   └── contactus
    |   |   └── __init__.py    
    |   └── core
    |   |   └── __init__.py       
    |   └── posts
    |       └── __init__.py
    └── auth
        └── __init__.py 

Pretty cool, right?

The Benefits

So there are quite a few benefits to using packages as well as few for namespace packages.

Sub packages within namespaces can be self contained in their own source code repository. To me this is the issue I set out to solve. Great, I can treat each component as its own thing.

Each seperate repository is independently versioned. Now this does come with its own number of problems, but I see it as a huge advantage. You end up in a situation where you can form well reasoned opinions purely based on version numbers. I warn you that if you go from working in a world where version numbers are somewhat arbritrary to one of lots of packages and numbers with meaning, it is easy to end up in either dependency hell or version promiscuity or both. I advise that you follow semver or another formalised approach to versioning.

Test suites are likely to remain fast under a minimal amount of effort. Historically using a framework like Django where you want to test against a real database, it doesn’t take that many tests to end up with a test suite longer than a couple of seconds, to some this is just too slow. Now suddenly you have less code per package, less tests are required to test everying so you end up with a faster test suites.

Changelogs and version remain focused. Each individual package can have its own changelog, issue tracker, etc. There is less to think about when working with that individual package.

Extra optional packages (similar to django.contrib.* packages) can be namespaced under the namespace without being bundled.

A few warnings

PIP doesn’t really do real dependency resolution. So you can only rely on semver to take care of you to some extent.

If you do anything that relies on per repository licensing, things can get needlessly expensive because their pricing model doesn’t match your reality. For example Github and Coveralls both follow this model.

This approach may not make sense in your situation. For us we build highly customised client sites and APIs that all use a varying degree of functionality from our toolbox. There are some core packages that pretty much all clients need and then there are those that only few use. A client project only pulls in the packages that are necessary, we add client specific code to the project and keep our libraries generic enough for re-use. This approach allows us to iterate and evolve different parts of our code base in isolation and has worked for us so far.

You can check out the full example on Github.

Local Development

When working on a local namespace package there are a few gotchas. Returning to our example above, companyname-api-posts will in all likelihood require companyname-api and will install_requires=('companyname-api') in setup.py. To develop locally the advised approach is to pip install -e /path/to/companyname-api.

Performing local development on companyname-api-posts may go something like this:

$ cd /path/to/companyname-api-posts
$ pip install -e .
$ ./runtests.py
...
ImportError: Could not import 'companyname.api.posts...`

You now have two directories on your path matching companyname/api, one in site-packages containing the companyname-api package and another in your development copy of companyname-api-posts. But for some reason it cannot find companyname.api.posts on the path. This is because pkg_resources is not a language feature. You need to import pkg_resources somewhere prior to attempting to finding the second directory. This is not an issue when everything is installed in the python environment as it it is not spread out over multiple directories and behaves as expected.

This issue evaded me for longer than I care to admit. On a few rare occasions I had issues with getting tests to run when I had a bare project structure, but then magically it would all start working once I had written some tests and Django model code. Weird. Turns out pytz imports pkg_resources and because I do Django development, and Django does quite a bit of initialisation work, pytz was in most cases importing pkg_resources for me before it attempted to import the development package .

Python 3.3+

PEP-420 formalises namespace packages in Python 3.3 as a language feature. My understanding is that pkg_resources continues to work alongside PEP-420. I haven’t done any research into migrating away from pkg_resource as $DAY_JOB is on Python 2.7 for the foreseeable future.