Using Protobuf With PyPy

In case you're not familiar with PyPy yet, it's a Python interpreter with JIT compiler.

PyPy has many advantages. It can run Python code faster then any existing interpreter, has a lower memory footprint then CPython, it allows you to choose the right level of language abstraction for your code depending on how fast you need it to be since it supports running Python, RPython and allows you to provide C and C++ extensions and It's 99% compatible with CPython.

One of the major differences and what makes PyPy adoption a non-trivial process is the why PyPy interfaces with CPython extensions. If you're interfacing with C code you need to ensure that you use the right bindings.

CPython extensions use Python's C API which PyPy does not have. PyPy can run some CPython extensions using cpyext but those extensions will run slower on PyPy due to the need to emulate reference counting. That also means that Cython generated extensions will be slower.

PyPy on the other hand has support for the not very commonly used ctypes and CFFI which can be run on both interpreters. CFFI runs much faster on PyPy than ctypes so if you're targeting PyPy it's better to use it. I was told that CFFI has been known to be a bit slower than C extensions on CPython because it does not access the C API directly but it wraps it. I could not find evidence for that claim although it makes sense.

If you're interfacing with C++ code, there has been a discussion to port boost::python to PyPy but I don't think it has been done yet.

PyPy provides cppyy which is written in CPython and thus able to JIT out most of the binding related code. cppyy has multiple backends it can use. In this post I'm going to cover Reflex since it's the default and it is currently the most stable backend.

In this post I'm going to demonstrate how to run Protobuf 2.5.1 on PyPy. Protobuf's compiler generates Python code which relies on a CPython extension in order to interface with the Protobuf implementation and is I mentioned before this is not going to be as fast as it should be for PyPy. Google provides a pure python implementation of Protobuf in version 3.0 which is not released yet and breaks compatibility in some aspects from Protobuf 2.x.

In order to use Protobuf with PyPy we have two options. Using CFFI or cppyy. Using CFFI requires us to be familiar with Protobuf's C API and writing the bindings manually which is not something we wanted since we needed to start working on our product as soon as possible. Fortunately Protobuf also generates C++ code so we can use cppyy in order to create a C++ binding to our Protobuf messages.

Installing the dependencies

Installing Protobuf

In order to install Protobuf type:

Installing Reflex

Reflex has some dependencies that are required in order to compile and use it.

In order to compile Reflex type:

In case anyone runs into trouble the full output of a successful installation can be found here.

If you want to always be able to use reflex add the export statements to your .bashrc file and restart your shell.

In order to test that the process succeeded type:

You'll get the usage information of the genreflex command line program.

In order to check if PyPy recognizes the Reflex backend type:

If something went wrong you'll see the following traceback:

Compiling the Protobuf messages

For this post we're going to use the following Protobuf messages:

In order to compile the mesages to C++ type:

If the compilation succeeded you'll see OK printed to your screen.

Generating the bindings

This is where things get interesting. The first time we run genreflex we got some errors:

This error is caused by what is probably a bug in GCCXML since the code can be compiled by g++. Googling reveals that ptrdif_t is defined in the standard C++ library.

The solution is to include the std namespace prefix before each usage of the ptrdiff_t type.

For your convinience here's a patch that does exactly that. Make sure to back up the original header.

Apply it by typing:

After that, generate the bindings again and you should not be getting any errors.

Don't rush into compiling this just yet. Since we're using strings and we haven't provided any reflection information on std::string, cppyy will segfault.

In order to resolve that we need to specify the --deep argument when invoking genreflex. Generaly speaking this is always a good idea to specify this argument if you want to avoid these kind of problems later on as I have found out.

After running genreflex with the --deep command line argument compile the generated bindings:

After that you may load the resulting shared object into PyPy and start working with it.

It's not all roses and peaches. There are still issues with the Reflex backend. Exceptions from C++ will just be printed instead of being raised on the Python side for example so test this carefully to see if it works for you too.

As you can see there are API differences between the Python Protobuf implementation and the C++ Protobuf implementation. I leave it as an exercise for the readers to implement the appropriate adapters. If you do so, please leave a comment.

There's also a good chance that we could create a Makefile that automates all these tasks. If you have written one, please also leave a comment.