Easy dynamic dispatch using GLIBC Hardware Capabilities

TL;DR With GLIBC 2.33+, you can build a shared library multiple times targeting various optimization levels, and the dynamic linker/loader will pick the highest version supported by the current CPU. For example, with the layout below, on a Ryzen 9 5900X, x86-64-v3/libfoo0.so would be loaded:

/usr/lib/glibc-hwcaps/x86-64-v4/libfoo0.so
/usr/lib/glibc-hwcaps/x86-64-v3/libfoo0.so
/usr/lib/glibc-hwcaps/x86-64-v2/libfoo0.so
/usr/lib/libfoo0.so

Longer Version

GLIBC Hardware Capabilities or "hwcaps" are an easy, almost trivial way to add a simple form of dynamic dispatch to any amd64 or POWER build, provided that either the build target or the compiler's optimizations can make use of certain CPU extensions.

Mo Zhou pointed me towards this when I was faced with the challenge of creating a performant Debian package for ggml, the tensor library behind llama.cpp and whisper.cpp.

The Challenge

A performant yet universally loadable library needs to make use of some form of dynamic dispatch to leverage the most effective SIMD extensions available on any given CPU it may run on. Last January, when I first started with the packaging of ggml for Debian, ggml did have support for this through its GGML_CPU_ALL_VARIANTS=ON option, but this was limited to amd64. This meant that on all the other architectures that Debian supports, I would need to target some ancient baseline, thus effectively crippling the package there.

Dynamic Dispatch using hwcaps

hwcaps were introduced in GLIBC 2.33 and replace the (now) Legacy Hardware Capabilities, which were removed in 2.37. The way hwcaps work is delightfully simple: the dynamic linker/loader will look for a shared library not just in the standard library paths, but also in subdirectories thereof of the form hwcaps/<level>, starting with the highest <level> that the current CPU supports. The levels are predefined. I'm using the amd64 levels below.

For ggml, this meant that I simply could build the library in multiple passes, each time targeting a different <level>, and install the result in the corresponding subdirectory, which resulted in the following layout (reduced to libggml.so for brevity):

/usr/lib/x86_64-linux-gnu/ggml/glibc-hwcaps/x86-64-v4/libggml.so
/usr/lib/x86_64-linux-gnu/ggml/glibc-hwcaps/x86-64-v3/libggml.so
/usr/lib/x86_64-linux-gnu/ggml/glibc-hwcaps/x86-64-v2/libggml.so
/usr/lib/x86_64-linux-gnu/ggml/libggml.so

In practice, this means that on a CPU supporting AVX512, the linker/loader would load x86-64-v4/libggml.so if it existed, and otherwise continue to look for the other levels, all the way down to the lowest one. On a CPU which supported only SSE4.2, the lookup process would be the same, ending with picking x86-64-v2/libggml.so. With QEMU, all of this was quickly verified.

Note that the lowest-level library, targeting x86-64-v1, is not installed to a subdirectory, but to the path where the library would normally have been installed. This has the nice property that on systems not using GLIBC, and thus not having hwcaps available, package installation will still result in a loadable library, albeit the version with the worst performance. And a careful observer might have noticed that in the example above, the library is installed to a private ggml/ directory, so this mechanism also works when using RUNPATH or LD_LIBRARY_PATH.

As mentioned above, Debian's ggml package will soon switch to GGML_CPU_ALL_VARIANTS=ON, but this was still quite the useful feature to discover.

15th Anniversary of My First Debian Upload

Time flies! 15 years ago, on 2010-03-18, my first upload to the Debian archive was accepted. Debian had replaced Windows as my primary OS in 2005, but it was only when I saw that package zd1211-firmware had been orphaned that I thought of becoming a contributor. I owned a Zyxel G-202 USB WiFi fob that needed said firmware, and as is so often is with open-source software, I was going to scratch my own itch. Bart Martens thankfully helped me adopt the package, and sponsored my upload.

I then joined Javier Fernández-Sanguino Peña as a cron maintainer and upstream, and also worked within the Debian Python Applications, Debian Python Modules, and Debian Science Teams, where Jakub Wilk and Yaroslav Halchenko were kind enough to mentor me and eventually support my application to become a Debian Maintainer.

Life intervened, and I was mostly inactive in Debian for the next two years. Upon my return in 2014, I had Vincent Cheng to thank for sponsoring most of my newer work, and for eventually supporting my application to become a Debian Developer. It was around that time that I also attended my first DebConf, in Portland, which remains one of my fondest memories. I had never been to an open-source software conference before, and DebConf14 really knocked it out of the park in so many ways.

After another break, I returned in 2019 to work mostly on Python and machine learning libraries. In 2020, I finally completed a process that I had first started in 2012 but had never managed to finish before: converting cron from source format 1.0 (one big diff) to source format 3.0 (quilt) (a series of patches). This was a process where I converted 25 years worth of organic growth into a minimal series of logically grouped changes (more here). This was my white whale.

In early 2023, shortly after the launch of ChatGPT which triggered an unprecedented AI boom, I started contributing to the Debian ROCm Team, where over the following year, I bootstrapped our CI at ci.rocm.debian.net. Debian's current tooling lack a way to express dependencies on specific hardware other than CPU ISA, nor does it have the means to run autopkgtests using such hardware. To get autopkgtests to make use of AMD GPUs in QEMU VMs and in containers, I had to fork autopkgtest, debci, and a few other components, as well as create a fair share of new tooling for ourselves. This worked out pretty well, and the CI has grown to support 17 different AMD GPU architectures. I will share more on this in upcoming posts.

I have mentioned a few contributors by name, but I have countless others to thank for collaborations over the years. It has been a wonderful experience, and I look forward to many years more.