Why it’s so hard to write good libraries

Over the course of my career as a software engineer and even earlier as a hobbyist and enthusiast, one thing I’ve found that I enjoyed doing is extracting some good looking pieces of software from existing systems and turning them into useful libraries. Some of these were released as open source while some were kept in-house.

To list a few, here are the ones I’ve worked on over the years:

  • Memcache++
  • Cpp-Netlib
  • LLVM XRay

I’m writing about my experiences in library design as a response to a request from someone who follows me on LinkedIn who suggested I write about it someday. I thought it was a fair request, so here we are.

So, what then is library design and why is it important to try and do it well?

Introduction

Before anything else, let’s dive into story time!

The year was 2005.

It was early in my career as a software engineer when I joined a mobile content provider in the Philippines. This was a local enterprise which provided a pretty popular service which bridged email to SMS and MMS. In those days, writing highly scalable and performance sensitive systems wasn’t something lots of people came across, nor was there a lot of literature available to someone like me who’s only been in the industry for a year or so.

The system interfaced with a telecom provider’s system through a protocol called SMPP and while there were already some existing solutions that allowed for serving this purpose, those solutions didn’t scale well enough to handle the volume that this service needed to handle. The system they had available was OK but it was complex and nobody else on the team could maintain it.

Management determined that this needed a rewrite and we had to make the transition happen in an aggressive timeline (we had a year to do it before the system collapsed under the sheer load it needed to handle).

This is when one of my early mentors showed me the ACE framework, which was a collection of C++ classes that worked together in this framework that implemented some common communication and concurrency patterns. At the time, if you were building high performance systems, there wasn’t really many choices outside of the ACE framework in C++. It was clearly formidable which made it a popular choice.

What it had in comprehensiveness it lacked in… elegance, if I can use that word. At the time, I didn’t know it, but the framework approach could work well if the problem you’re solving was addressed by the framework but that frameworks in general can get in your way if you’re attempting to do something it didn’t support.

One of the requirements that came up for us was the need to persist messages that are in transit between different parts of the system. I didn’t know it yet, but what we were building was a service-oriented architecture to allow different parts of the system to scale according to load. We were building microservices before they were called that. We were looking around for mature message broker solutions that could handle the scale we were looking at (in the thousands of messages per second at peak times) but those required complicated protocols or expensive licenses to use – this was way before RabbitMQ or ZeroMQ were available.

What did we do?

Like any self-respecting software engineer, we decided to make one ourselves. This time though it would be different, because we built one that was backed by Berkeley DB – way before it became part of Oracle. Think about Berkeley DB (or affectionately called BDB) as a high performance in-process database that also had persistence support – its main function is as a key-value store, which was perfect for our requirement of having fast access to in-flight data while also having it persisted for resilience purposes.

What followed on from this story is lessons early on in my career that informed the way I think about libraries and what good ones look like.

A bad library…

My first attempt at creating a library which provided the functionality was informed by my experiences while I was still studying in university: take some inputs and return some things.

At first reading, that doesn’t quite sound like a bad way of doing things… until you start thinking about context.

A bad library will assume or enforce its context upon its users.

What does this look like in practice?

void sort<class T>(T& container) {
// …
}

Let’s take the (simple) example above of a sort function in C++, just looking at the function signature. At the surface it looks like it’s a harmless and good interface and I agree, it’s nice and succinct. We can quibble around whether we need to use concepts to enforce the requirements on the type, but that’s largely in the margins.

If this was the only function in the library though, it would make it a bad library. Why?

The function unnecessarily makes assumptions and imposes its context on the user.

Let’s think about what a sort implementation will typically need to do:

  • Compare elements (requiring weak partial ordering).
  • Swap elements.

This definition of sort does not let the user provide their own comparison function, nor a different way of swapping elements.

This lack of extension points makes a seemingly simple library bad, compared to one that looks good. If we supported that mechanism, our sort function might look better this way:

class DefaultCompare {
public:
bool operator()<T> (T&& left, T&& right) {
return left < right;
}
};

class DefaultSwap {
public:
void operator()<class T>(T&& left, T&& right) {
using std::swap;
swap(left, right);
}
};

void sort<class R, class C = DefaultCompare, class S = DefaultSwap>(
R&& range,
C compare = C(),
S swap = S()) {
// …
}

This then means that we provide a default comparison implementation, a default swap implementation, and allow users of the library to customise this as they need.

What you might notice here is that a bad library is hard to spot if you’re looking at it from an implementer’s perspective.

That’s a key lesson I learned early on – if you have other humans on the other side of the library you’re writing (which there almost certainly is, especially if you consider that other human “you some time later”) then you need to put the user’s perspective front and centre if you hope to create a good library.

Design Principles

There are some rule-of-thumb design principles for making good libraries. Let’s go through some of the basic ones and dive deeper with examples for each.

  • Modularity: Components of the library should be self-contained, interchangeable, and independent.
  • Reusability: Make it generic enough to be useful for a wide set of contexts but not too generic to be practically unusable.
  • Extensibility: Build in extension points or design it such that future extensions are possible without too much disruption.

Modularity

A library can be considered modular when you can identify components (or modules) that serve well-defined purposes that do not overlap unless necessary.

This applies a lot to Object Oriented Programming (OOP) approaches, where there might be a set of interfaces defined that components in the system might implement, but the components themselves are independent. This also applies in Functional Programming (FP) approaches where the types or type signatures for functions serve as the interfaces and specific implementations (or compositions) are the components that can fit with each other.

In a seminal paper by David Parnas titled “On the Criteria To Be Used in Decomposing Systems into Modules” one key observation made is that good modules typically hide as much information as possible from other modules and that modules are delineated based not on a flowchart/dataflow pattern but on design decisions and clear information boundaries. Once we start thinking of modules as building blocks that allow us to express solutions rather than things that model the program flow, then we can make modules that handle constrained and well-defined design spaces.

Most good libraries follow this model, where independent modules can be used in well-defined design spaces with thoughtful extension points built-in to the module’s interface.

Some good examples of these are the C++ standard library which makes containers be separate from algorithms, but they work synergistically with each other through Generic Programming.

Reusability

A good library will typically be useful in infinitely many programs that need to solve that specific problem in a consistent manner. This is why most successful mainstream programming languages that have come out in the past couple of decades will have a built-in standard library for common data structures and algorithms.

A good way to gauge whether a library is reusable is to analyse whether its interface and implementation imposes more or less requirements from the user.

Let’s see two examples of a container which may have either 0 or 1 element:

// Example 1:
template <class T>
class Maybe {
public:
explicit Maybe(T&& value) : ...
...
};

// Example 2:
class Object;
class MaybeObject {
public:
explicit MaybeObject(Object* object) : ...
...
};

In Example 1, Maybe<T> only requires that T can be treated as a value.

In Example 2, MaybeObject requires that the value being held is an Object and that the’re a pointer to an existing one already present.

Because Maybe<T> can apply to many more types compared to MaybeObject, Maybe<T> can be considered more re-usable.

Extensibility

On balance, libraries will typically have a specific target purpose.

However, libraries that can be extended or support a wider set of extension or customisation points will have more utility.

Examples of these extension points will come in the form of:

  • Providing alternate implementations in generic contexts (e.g., a custom comparator, a custom allocator, a function object/callback, etc.)
  • Allowing for explicit plugins or extensions (e.g., middleware functions in a web service request processing pipeline, custom encoders/decoders, error handling, etc.)
  • Customisation points to support new states or inputs (e.g., error handling, serialisation, etc.)

Examples

I have encountered some great libraries in different programming languages that influenced the way I think about writing libraries or at least well-designed modules. Some of these are:

  • Mediatr, which is an implementation of the Mediator Pattern in C#.
  • Boost.Graph, which has incredibly well designed interfaces for working with graphs (in the math/computer science sense, not the charts/visualisation sense).
  • C++ Ranges, which brought composable design to the masses of C++ programmers.

I’m sure I’m missing more examples and I know there are more well-written libraries out there that took a lot of care and attention to get right.

Conclusion

If you find yourself in the position where you can come up with a library that could be widely useful, consider the ergonomics and the user experience primarily.

Consider the way you decompose the modules, ensure that your library is reusable, and plan for extension points where they would be required.

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *