Python is a high-level programming language whose extremely simple and elegant yet very powerful and expressive syntax has granted it enormous popularity in most programming contexts. Scientific applications are no exception on this respect, and this may look strange at a first glance, given that Python’s runtime is painfully slow and unsuitable for any high-performance computing, and even elementary parallelism is basically undoable in pure Python (or, at least, in its reference implementation, CPython).

While Python’s mainstream popularity is due to its high-level minimalism, the strength behind its success in contexts where performance matters, such as most scientific applications, is due to the extreme ease to invoke native library calls from Python thanks to the low-level Python C API, which allows to easily write arbitrary C libraries (known as Python C extension modules) whose members can be called directly from Python and behave mostly like pure Python packages.

This post presents a brief overview of Python C extensions and the most common tools available for their development, and it shows a way to build a C extension using CMake to generate the configuration in a cross-platform setting, using a concrete example from my work.

Calling native functions from Python

As an example, let assume we want to call the C function printf directly from within the Python REPL. We can do it using the ctypes module, with the following three lines of code

>>> from ctypes import CDLL
>>> libc = CDLL('libc.so.6') # The exact filename may vary on your system
>>> libc.printf(b"We compute %d + %d = %d!\n", 1, 2, 1 + 2)

whose output, unsurprisingly, is

We compute 1 + 2 = 3!
22

where 22 is the return value of the function, corresponding to the number of characters printed.

Writing a Python C extension

While ctypes is a great tool for individual low level function calls, it is not a practical solution to systematically wrap a large API. Here the Python C API comes to the rescue, allowing to implement an arbitrarily complex module directly in C. This of course allows to implement components in C++ as well, as long as C linkage is used for the functions effectively called in the module.

The Python API can be accessed from the header Python.h, that should be included with a CPython installation. The official documentation contains a tutorial showing how to structure the basics of a C extension. Here I am writing a very minimal example, whose code should be pretty self-explanatory, implementing a module that exports a single function to perform integer division.

#include <Python.h>

// This is the definition of a method
static PyObject* division(PyObject *self, PyObject *args) {
    long dividend, divisor;
    if (!PyArg_ParseTuple(args, "ll", &dividend, &divisor)) {
        return NULL;
    }
    if (0 == divisor) {
        PyErr_Format(PyExc_ZeroDivisionError, "Dividing %d by zero!", dividend);
        return NULL;
    }
    return PyLong_FromLong(dividend / divisor);
}

// Exported methods are collected in a table
PyMethodDef method_table[] = {
    {"division", (PyCFunction) division, METH_VARARGS, "Method docstring"},
    {NULL, NULL, 0, NULL} // Sentinel value ending the table
};

// A struct contains the definition of a module
PyModuleDef mymath_module = {
    PyModuleDef_HEAD_INIT,
    "mymath", // Module name
    "This is the module docstring",
    -1,   // Optional size of the module state memory
    method_table,
    NULL, // Optional slot definitions
    NULL, // Optional traversal function
    NULL, // Optional clear function
    NULL  // Optional module deallocation function
};

// The module init function
PyMODINIT_FUNC PyInit_mymath(void) {
    return PyModule_Create(&mymath_module);
}

The module is defined in a standard setup.py script, here in a very minimal form. Calling the setup script allows to build, package, or install a Python module (regardless of the fact that it includes a C extension or not).

from setuptools import setup, Extension

setup(name = "mymath",
      version = "0.1",
      ext_modules = [Extension("mymath", ["mymath.c"])]
      );

In case a C extension module is included, the source files specified in the call to setup() will be automatically compiled when calling python setup.py build: the Python interpreter will take care of invoking the C compiler with proper flags and to link against the proper libraries. The result is a shared library: on Linux the file is called mymath.cpython-37m-x86_64-linux-gnu.so, with a very self-explanatory file name; on Windows the extension is usually .pyd. This shared object can be treated mostly as a pure Python module, and it can be imported and used, for example from the Python REPL:

>>> import mymath
>>> mymath.division(4, 2)
2
>>> mymath.division(4, 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: Dividing 4 by zero!

So far, everything is implemented in pristine C. Libraries such as pybind11 and Boost.Python simplify the creation of a Python extension module ex-novo in C++, handling all the tedious parts related to the boilerplate code required to integrate with the Python interpreter.

Digression: Wrapping a library

While the Python C API is straightforward to use, the lack of high-level functionalities in C can make it tedious to manually write some extensions, especially when it boils down to write middleware to glue some pre-existing library to an extension module. However, many frameworks and automated tools come to help. In particular, creating Python bindings for an existing C/C++ library is straightforward thanks to the Simplified Wrapper and Interface Generator (SWIG), a tool that allows to generate a Python API for an existing library interface in a mostly automatic fashion. As a bonus, once SWIG is set up and in place it can also generate bindings for a multitude of other languages, such as R, Perl, Java, C#, Ruby, and others.

Building a Python extension

Python includes in its standard library the distutils package, which handles the creation of Python modules and provides a portable API to build native C extensions in a cross-platform setup. However, the distutils package is usually not accessed directly, and most packagers use an extended toolkit, setuptools, that provides a consistent interface for configuration, dependency handling, and other simple and advanced features.

As seen before, distutils (and setuptools) can automatically build a Python extension, taking care of invoking the compiler with suitable flags. This works well for simple extensions without external dependencies, while for more elaborated projects it may be necessary to use a custom extension builder, that allows to tinker with the compiler settings and build options. This can be done by creating a subclass of setuptools.command.build_ext, and then passing an instance of this class to the setup() function.

from setuptools import setup, Extension
from setuptools.command.build_ext import build_ext

class my_build_ext(build_ext):
    def build_extensions(self):
        if self.compiler.compiler_type == 'msvc':
            raise Exception("Visual Studio is not supported")

        for e in self.extensions:
            e.extra_compile_args = ['--pedantic']
            e.extra_link_args = ['-lgomp']

        build_ext.build_extensions(self)

setup(name = "mymath",
      version = "0.1",
      ext_modules = [Extension("mymath", ["mymath.c"])],
      cmdclass = {'build_ext': my_build_ext},
      );

Building with CMake

For complex projects, however, this may not be enough, especially when large libraries are involved. For instance, setting robust build options for a cross-platform build of a CUDA or ITK-based application or library can be a challenging task if done manually. This is when CMake comes into play. CMake is a cross-platform build configuration generator tool, originally designed to build the ITK itself and quickly adopted as one of the most popular build systems in the open-source ecosystem. It allows to easily handle configuration, automatically discover build settings for most of the common tools and libraries on different platforms, and it allows to seamlessly integrate within a project any other build dependency that uses CMake for its configuration.

Let assume we are building a C library with non-trivial dependencies, and that we want to turn this library into a Python C extension module. As we have seen so far, there is a wide variety of tools at our disposal for this purpose. If our library is already configured with CMake, one option is to let CMake handle the build of the Python extension module itself. After all, an extension module is just a shared library that exports some specific symbols.

First of all, we ask CMake to find the Python interpreter and libs. We can specify a minimum version if we want, which is 3.5 in this example.

cmake_minimum_required(VERSION 3.10)
project(mymath)

find_package(PythonInterp 3.5 REQUIRED)

# This goes after, since it uses PythonInterp as hint
find_package(PythonLibs 3.5 REQUIRED)

In case we need to pass arrays forth and back between C and Python, the NumPy C API is probably the best option. Once Python is ready, it is easy to locate the required NumPy headers:

# This comes to hand if we also need to use the NumPy C API
exec_program(${PYTHON_EXECUTABLE}
             ARGS "-c \"import numpy; print(numpy.get_include())\""
             OUTPUT_VARIABLE NUMPY_INCLUDE_DIR
             RETURN_VALUE NUMPY_NOT_FOUND
            )
if(NUMPY_NOT_FOUND)
    message(FATAL_ERROR "NumPy headers not found")
endif()

Next we define a target for the extension itself. As said before, the extension module is a shared library: here ${SRCS} is the list of source files. It is important to specify C linkage among the target properties.

add_library(mymath SHARED ${SRCS})

set_target_properties(
    mymath
    PROPERTIES
        PREFIX ""
        OUTPUT_NAME "mymath"
        LINKER_LANGUAGE C
    )

At this point the extension is modeled as a regular CMake target, and this allows to integrate it freely with other targets.

We may still want to let Python launch the build and take care of the installation or packaging of our extension. To do this, we can write a custom build_ext that launches the CMake build. For the sake of clarity, it is possible to define a custom extension class that allows to specify the root folder of the CMake project (cmake_lists_dir). Moreover, we set the sources parameter to an empty list, since in the base class it is not an optional argument, but we do not want setuptools to directly compile any file for us.

class CMakeExtension(Extension):
    def __init__(self, name, cmake_lists_dir='.', **kwa):
        Extension.__init__(self, name, sources=[], **kwa)
        self.cmake_lists_dir = os.path.abspath(cmake_lists_dir)

We can then proceed to define the actual build_ext subclass that is in charge to launch CMake.

class cmake_build_ext(build_ext):
    def build_extensions(self):
        # Ensure that CMake is present and working
        try:
            out = subprocess.check_output(['cmake', '--version'])
        except OSError:
            raise RuntimeError('Cannot find CMake executable')

        for ext in self.extensions:

            extdir = os.path.abspath(os.path.dirname(self.get_ext_fullpath(ext.name)))
            cfg = 'Debug' if options['--debug'] == 'ON' else 'Release'

            cmake_args = [
                '-DCMAKE_BUILD_TYPE=%s' % cfg,
                # Ask CMake to place the resulting library in the directory
                # containing the extension
                '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_{}={}'.format(cfg.upper(), extdir),
                # Other intermediate static libraries are placed in a
                # temporary build directory instead
                '-DCMAKE_ARCHIVE_OUTPUT_DIRECTORY_{}={}'.format(cfg.upper(), self.build_temp),
                # Hint CMake to use the same Python executable that
                # is launching the build, prevents possible mismatching if
                # multiple versions of Python are installed
                '-DPYTHON_EXECUTABLE={}'.format(sys.executable),
                # Add other project-specific CMake arguments if needed
                # ...
            ]

            # We can handle some platform-specific settings at our discretion
            if platform.system() == 'Windows':
                plat = ('x64' if platform.architecture()[0] == '64bit' else 'Win32')
                cmake_args += [
                    # These options are likely to be needed under Windows
                    '-DCMAKE_WINDOWS_EXPORT_ALL_SYMBOLS=TRUE',
                    '-DCMAKE_RUNTIME_OUTPUT_DIRECTORY_{}={}'.format(cfg.upper(), extdir),
                ]
                # Assuming that Visual Studio and MinGW are supported compilers
                if self.compiler.compiler_type == 'msvc':
                    cmake_args += [
                        '-DCMAKE_GENERATOR_PLATFORM=%s' % plat,
                    ]
                else:
                    cmake_args += [
                        '-G', 'MinGW Makefiles',
                    ]

            cmake_args += cmake_cmd_args

            if not os.path.exists(self.build_temp):
                os.makedirs(self.build_temp)

            # Config
            subprocess.check_call(['cmake', ext.cmake_lists_dir] + cmake_args,
                                  cwd=self.build_temp)

            # Build
            subprocess.check_call(['cmake', '--build', '.', '--config', cfg],
                                  cwd=self.build_temp)

A real-life example of Python extension built this way is the disptools (displacement-tools), a small library for the generation of displacement fields with known volume changes, that I implemented and made available on GitHub.

Automating further

Additional support for scientific computing extensions is provided by scikit-build, a Python package providing a build tool alternative to setuptools, that simplifies the build of extensions written in C, C++, Cython, and Fortran. Scikit-build offers a bridge between the setup.py and CMake, and it provides CMake modules to automatically find Cython, NumPy, and F2PY.