Chapter 6 - The ZeroMQ Community #

People sometimes ask me what’s so special about ZeroMQ. My standard answer is that ZeroMQ is arguably the best answer we have to the vexing question of “How do we make the distributed software that the 21st century demands?” But more than that, ZeroMQ is special because of its community. This is ultimately what separates the wolves from the sheep.

There are three main open source patterns. The first is the large firm dumping code to break the market for others. This is the Apache Foundation model. The second is tiny teams or small firms building their dream. This is the most common open source model, which can be very successful commercially. The last is aggressive and diverse communities that swarm over a problem landscape. This is the Linux model, and the one to which we aspire with ZeroMQ.

It’s hard to overemphasize the power and persistence of a working open source community. There really does not seem to be a better way of making software for the long term. Not only does the community choose the best problems to solve, it solves them minimally, carefully, and it then looks after these answers for years, decades, until they’re no longer relevant, and then it quietly puts them away.

To really benefit from ZeroMQ, you need to understand the community. At some point down the road you’ll want to submit a patch, an issue, or an add-on. You might want to ask someone for help. You will probably want to bet a part of your business on ZeroMQ, and when I tell you that the community is much, much more important than the company that backs the product, even though I’m CEO of that company, this should be significant.

In this chapter I’m going to look at our community from several angles and conclude by explaining in detail our contract for collaboration, which we call “C4”. You should find the discussion useful for your own work. We’ve also adapted the ZeroMQ C4 process for closed source projects with good success.

We’ll cover:

The rough structure of ZeroMQ as a set of projects
What “software architecture” is really about
Why we use the LGPL and not the BSD license
How we designed and grew the ZeroMQ community
The business that backs ZeroMQ
Who owns the ZeroMQ source code
How to make and submit a patch to ZeroMQ
Who controls what patches actually go into ZeroMQ
How we guarantee compatibility with old code
Why we don’t use public git branches
Who decides on the ZeroMQ road map
A worked example of a change to libzmq

Architecture of the ZeroMQ Community #

You know that ZeroMQ is an LGPL-licensed project. In fact it’s a collection of projects, built around the core library, libzmq. I’ll visualize these projects as an expanding galaxy:

At the core, libzmq is the ZeroMQ core library. It’s written in C++, with a low-level C API. The code is nasty, mainly because it’s highly optimized but also because it’s written in C++, a language that lends itself to subtle and deep nastiness. Martin Sustrik wrote the bulk of this code. Today it has dozens of people who maintain different parts of it.
Around libzmq, there are about 50 bindings. These are individual projects that create higher-level APIs for ZeroMQ, or at least map the low-level API into other languages. The bindings vary in quality from experimental to utterly awesome. Probably the most impressive binding is PyZMQ, which was one of the first community projects on top of ZeroMQ. If you are a binding author, you should really study PyZMQ and aspire to making your code and community as great.
A lot of languages have multiple bindings (Erlang, Ruby, C#, at least) written by different people over time, or taking varying approaches. We don’t regulate these in any way. There are no “official” bindings. You vote by using one or the other, contributing to it, or ignoring it.
There are a series of reimplementations of libzmq, starting with JeroMQ, a full Java translation of the library, which is now the basis for NetMQ, a C# stack. These native stacks offer similar or identical APIs, and speak the same protocol (ZMTP) as libzmq.
On top of the bindings are a lot of projects that use ZeroMQ or build on it. See the “Labs” page on the wiki for a long list of projects and proto-projects that use ZeroMQ in some way. There are frameworks, web servers like Mongrel2, brokers like Majordomo, and enterprise open source tools like Storm.

Libzmq, most of the bindings, and some of the outer projects sit in the ZeroMQ community “organization” on GitHub. This organization is “run” by a group consisting of the most senior binding authors. There’s very little to run as it’s almost all self-managing and there’s zero conflict these days.

iMatix, my firm, plays a specific role in the community. We own the trademarks and enforce them discretely in order to make sure that if you download a package calling itself “ZeroMQ”, you can trust what you are getting. People have on rare occasion tried to hijack the name, maybe believing that “free software” means there is no property at stake and no one willing to defend it. One thing you’ll understand from this chapter is how seriously we take the process behind our software (and I mean “us” as a community, not a company). iMatix backs the community by enforcing that process on anything calling itself “ZeroMQ” or “ZeroMQ”. We also put money and time into the software and packaging for reasons I’ll explain later.

It is not a charity exercise. ZeroMQ is a for-profit project, and a very profitable one. The profits are widely distributed among all those who invest in it. It’s really that simple: take the time to become an expert in ZeroMQ, or build something useful on top of ZeroMQ, and you’ll find your value as an individual, or team, or company increasing. iMatix enjoys the same benefits as everyone else in the community. It’s win-win to everyone except our competitors, who find themselves facing a threat they can’t beat and can’t really escape. ZeroMQ dominates the future world of massively distributed software.

My firm doesn’t just have the community’s back–we also built the community. This was deliberate work; in the original ZeroMQ white paper from 2007, there were two projects. One was technical, how to make a better messaging system. The second was how to build a community that could take the software to dominant success. Software dies, but community survives.

How to Make Really Large Architectures #

There are, it has been said (at least by people reading this sentence out loud), two ways to make really large-scale software. Option One is to throw massive amounts of money and problems at empires of smart people, and hope that what emerges is not yet another career killer. If you’re very lucky and are building on lots of experience, have kept your teams solid, and are not aiming for technical brilliance, and are furthermore incredibly lucky, it works.

But gambling with hundreds of millions of others’ money isn’t for everyone. For the rest of us who want to build large-scale software, there’s Option Two, which is open source, and more specifically, free software. If you’re asking how the choice of software license is relevant to the scale of the software you build, that’s the right question.

The brilliant and visionary Eben Moglen once said, roughly, that a free software license is the contract on which a community builds. When I heard this, about ten years ago, the idea came to me–Can we deliberately grow free software communities?

Ten years later, the answer is “yes”, and there is almost a science to it. I say “almost” because we don’t yet have enough evidence of people doing this deliberately with a documented, reproducible process. It is what I’m trying to do with Social Architecture. ZeroMQ came after Wikidot, after the Digital Standards Organization (Digistan) and after the Foundation for a Free Information Infrastructure (aka the FFII, an NGO that fights against software patents). This all came after a lot of less successful community projects like Xitami and Libero. My main takeaway from a long career of projects of every conceivable format is: if you want to build truly large-scale and long-lasting software, aim to build a free software community.

Psychology of Software Architecture #

Dirkjan Ochtman pointed me to Wikipedia’s definition of Software Architecture as “the set of structures needed to reason about the system, which comprise software elements, relations among them, and properties of both”. For me this vapid and circular jargon is a good example of how miserably little we understand what actually makes a successful large scale software architecture.

Architecture is the art and science of making large artificial structures for human use. If there is one thing I’ve learned and applied successfully in 30 years of making larger and larger software systems, it is this: software is about people. Large structures in themselves are meaningless. It’s how they function for human use that matters. And in software, human use starts with the programmers who make the software itself.

The core problems in software architecture are driven by human psychology, not technology. There are many ways our psychology affects our work. I could point to the way teams seem to get stupider as they get larger or when they have to work across larger distances. Does that mean the smaller the team, the more effective? How then does a large global community like ZeroMQ manage to work successfully?

The ZeroMQ community wasn’t accidental. It was a deliberate design, my contribution to the early days when the code came out of a cellar in Bratislava. The design was based on my pet science of “Social Architecture”, which Wikipedia defines as “the conscious design of an environment that encourages a desired range of social behaviors leading towards some goal or set of goals.” I define this as more specifically as “the process, and the product, of planning, designing, and growing an online community.”

One of the tenets of Social Architecture is that how we organize is more significant than who we are. The same group, organized differently, can produce wholly different results. We are like peers in a ZeroMQ network, and our communication patterns have a dramatic impact on our performance. Ordinary people, well connected, can far outperform a team of experts using poor patterns. If you’re the architect of a larger ZeroMQ application, you’re going to have to help others find the right patterns for working together. Do this right, and your project can succeed. Do it wrong, and your project will fail.

The two most important psychological elements are that we’re really bad at understanding complexity and that we are so good at working together to divide and conquer large problems. We’re highly social apes, and kind of smart, but only in the right kind of crowd.

So here is my short list of the Psychological Elements of Software Architecture:

Stupidity: our mental bandwidth is limited, so we’re all stupid at some point. The architecture has to be simple to understand. This is the number one rule: simplicity beats functionality, every single time. If you can’t understand an architecture on a cold gray Monday morning before coffee, it is too complex.
Selfishness: we act only out of self-interest, so the architecture must create space and opportunity for selfish acts that benefit the whole. Selfishness is often indirect and subtle. For example, I’ll spend hours helping someone else understand something because that could be worth days to me later.
Laziness: we make lots of assumptions, many of which are wrong. We are happiest when we can spend the least effort to get a result or to test an assumption quickly, so the architecture has to make this possible. Specifically, that means it must be simple.
Jealousy: we’re jealous of others, which means we’ll overcome our stupidity and laziness to prove others wrong and beat them in competition. The architecture thus has to create space for public competition based on fair rules that anyone can understand.
Fear: we’re unwilling to take risks, especially if it makes us look stupid. Fear of failure is a major reason people conform and follow the group in mass stupidity. The architecture should make silent experimentation easy and cheap, giving people opportunity for success without punishing failure.
Reciprocity: we’ll pay extra in terms of hard work, even money, to punish cheats and enforce fair rules. The architecture should be heavily rule-based, telling people how to work together, but not what to work on.
Conformity: we’re happiest to conform, out of fear and laziness, which means if the patterns are good, clearly explained and documented, and fairly enforced, we’ll naturally choose the right path every time.
Pride: we’re intensely aware of our social status, and we’ll work hard to avoid looking stupid or incompetent in public. The architecture has to make sure every piece we make has our name on it, so we’ll have sleepless nights stressing about what others will say about our work.
Greed: we’re ultimately economic animals (see selfishness), so the architecture has to give us economic incentive to invest in making it happen. Maybe it’s polishing our reputation as experts, maybe it’s literally making money from some skill or component. It doesn’t matter what it is, but there must be economic incentive. Think of architecture as a market place, not an engineering design.

These strategies work on a large scale but also on a small scale, within an organization or team.

The Importance of Contracts #

Let me discuss a contentious but important area, which is what license to choose. I’ll say “BSD” to cover MIT, X11, BSD, Apache, and similar licenses, and “GPL” to cover GPLv3, LGPLv3, and AGPLv3. The significant difference is the obligation to share back any forked versions, which prevents any entity from capturing the software, and thus keeps it “free”.

A software license isn’t technically a contract since you don’t sign anything. But broadly, calling it a contract is useful since it takes the obligations of each party, and makes them legally enforceable in court, under copyright law.

You might ask, why do we need contracts at all to make open source? Surely it’s all about decency, goodwill, people working together for selfless motives. Surely the principle of “less is more” applies here of all places? Don’t more rules mean less freedom? Do we really need lawyers to tell us how to work together? It seems cynical and even counter-productive to force a restrictive set of rules on the happy communes of free and open source software.

But the truth about human nature is not that pretty. We’re not really angels, nor devils, just self-interested winners descended from a billion-year unbroken line of winners. In business, marriage, and collective works, sooner or later, we either stop caring, or we fight and we argue.

Put this another way: a collective work has two extreme outcomes. Either it’s a failure, irrelevant, and worthless, in which case every sane person walks away, without a fight. Or, it’s a success, relevant, and valuable, in which case we start jockeying for power, control, and often, money.

What a well-written contract does is to protect those valuable relationships from conflict. A marriage where the terms of divorce are clearly agreed up-front is much less likely to end in divorce. A business deal where both parties agree how to resolve various classic conflicts–such as one party stealing the others’ clients or staff–is much less likely to end in conflict.

Similarly, a software project that has a well-written contract that defines the terms of breakup clearly is much less likely to end in breakup. The alternative seems to be to immerse the project into a larger organization that can assert pressure on teams to work together (or lose the backing and branding of the organization). This is for example how the Apache Foundation works. In my experience organization building has its own costs, and ends up favoring wealthier participants (who can afford those sometimes huge costs).

In an open source or free software project, breakup usually takes the form of a fork, where the community splits into two or more groups, each with different visions of the future. During the honeymoon period of a project, which can last years, there’s no question of a breakup. It is as a project begins to be worth money, or as the main authors start to burn out, that the goodwill and generosity tends to dry up.

So when discussing software licenses, for the code you write or the code you use, a little cynicism helps. Ask yourself, not “which license will attract more contributors?” because the answer to that lies in the mission statement and contribution process. Ask yourself, “if this project had a big fight, and split three ways, which license would save us?” Or, “if the whole team was bought by a hostile firm that wanted to turn this code into a proprietary product, which license would save us?”

Long-term survival means enduring the bad times, as well as enjoying the good ones.

When BSD projects fork, they cannot easily merge again. Indeed, one-way forking of BSD projects is quite systematic: every time BSD code ends up in a commercial project, this is what’s happened. When GPL projects fork, however, re-merging is trivial.

The GPL’s story is relevant here. Though communities of programmers sharing their code openly were already significant by the 1980’s, they tended to use minimal licenses that worked as long as no real money got involved. There was an important language stack called Emacs, originally built in Lisp by Richard Stallman. Another programmer, James Gosling (who later gave us Java), rewrote Emacs in C with the help of many contributors, on the assumption that it would be open. Stallman got that code and used it as the basis for his own C version. Gosling then sold the code to a firm which turned around and blocked anyone distributing a competing product. Stallman found this sale of the common work hugely unethical, and began developing a reusable license that would protect communities from this.

What eventually emerged was the GNU General Public License, which used traditional copyright to force remixability. It was a neat hack that spread to other domains, for instance the Creative Commons for photography and music. In 2007, we saw version 3 of the license, which was a response to belated attacks from Microsoft and others on the concept. It has become a long and complex document but corporate copyright lawyers have become familiar with it and in my experience, few companies mind using GPL software and libraries, so long as the boundaries are clearly defined.

Thus, a good contract–and I consider the modern GPL to be the best for software–lets programmers work together without upfront agreements, organizations, or assumptions of decency and goodwill. It makes it cheaper to collaborate, and turns conflict into healthy competition. GPL doesn’t just define what happens with a fork, it actively encourages forks as a tool for experimentation and learning. Whereas a fork can kill a project with a “more liberal” license, GPL projects thrive on forks since successful experiments can, by contract, be remixed back into the mainstream.

Yes, there are many thriving BSD projects and many dead GPL ones. It’s always wrong to generalize. A project will thrive or die for many reasons. However, in a competitive sport, one needs every advantage.

The other important part of the BSD vs. GPL story is what I call “leakage”, which is the effect of pouring water into a pot with a small but real hole in the bottom.

Eat Me #

Here is a story. It happened to the eldest brother-in-law of the cousin of a friend of mine’s colleague at work. His name was, and still is, Patrick.

Patrick was a computer scientist with a PhD in advanced network topologies. He spent two years and his savings building a new product, and choose the BSD license because he believed that would get him more adoption. He worked in his attic, at great personal cost, and proudly published his work. People applauded, for it was truly fantastic, and his mailing lists were soon abuzz with activity and patches and happy chatter. Many companies told him how they were saving millions using his work. Some of them even paid him for consultancy and training. He was invited to speak at conferences and started collecting badges with his name on them. He started a small business, hired a friend to work with him, and dreamed of making it big.

Then one day, someone pointed him to a new project, GPL licensed, which had forked his work and was improving on it. He was irritated and upset, and asked how people–fellow open sourcers, no less!–would so shamelessly steal his code. There were long arguments on the list about whether it was even legal to relicense their BSD code as GPL code. Turned out, it was. He tried to ignore the new project, but then he soon realized that new patches coming from that project couldn’t even be merged back into his work!

Worse, the GPL project got popular and some of his core contributors made first small, and then larger patches to it. Again, he couldn’t use those changes, and he felt abandoned. Patrick went into a depression, his girlfriend left him for an international currency dealer called, weirdly, Patrice, and he stopped all work on the project. He felt betrayed, and utterly miserable. He fired his friend, who took it rather badly and told everyone that Patrick was a closet banjo player. Finally, Patrick took a job as a project manager for a cloud company, and by the age of forty, he had stopped programming even for fun.

Poor Patrick. I almost felt sorry for him. Then I asked him, “Why didn’t you choose the GPL?” “Because it’s a restrictive viral license”, he replied. I told him, “You may have a PhD, and you may be the eldest brother-in-law of the cousin of a friend of my colleague, but you are an idiot and Monique was smart to leave you. You published your work inviting people to please steal your code as long as they kept this ‘please steal my code’ statement in the resulting work”, and when people did exactly that, you got upset. Worse, you were a hypocrite because when they did it in secret, you were happy, but when they did it openly, you felt betrayed.”

Seeing your hard work captured by a smarter team and then used against you is enormously painful, so why even make that possible? Every proprietary project that uses BSD code is capturing it. A public GPL fork is perhaps more humiliating, but it’s fully self-inflicted.

BSD is like food. It literally (and I mean that metaphorically) whispers “eat me” in the little voice one imagines a cube of cheese might use when it’s sitting next to an empty bottle of the best beer in the world, which is of course Orval, brewed by an ancient and almost extinct order of silent Belgian monks called Les Gars Labas Qui Fabrique l’Orval. The BSD license, like its near clone MIT/X11, was designed specifically by a university (Berkeley) with no profit motive to leak work and effort. It is a way to push subsidized technology at below its cost price, a dumping of under-priced code in the hope that it will break the market for others. BSD is an excellent strategic tool, but only if you’re a large well-funded institution that can afford to use Option One. The Apache license is BSD in a suit.

For us small businesses who aim our investments like precious bullets, leaking work and effort is unacceptable. Breaking the market is great, but we cannot afford to subsidize our competitors. The BSD networking stack ended up putting Windows on the Internet. We cannot afford battles with those we should naturally be allies with. We cannot afford to make fundamental business errors because in the end, that means we have to fire people.

It comes down to behavioral economics and game theory. The license we choose modifies the economics of those who use our work. In the software industry, there are friends, foes, and food. BSD makes most people see us as lunch. Closed source makes most people see us as enemies (do you like paying people for software?) GPL, however, makes most people, with the exception of the Patricks of the world, our allies. Any fork of ZeroMQ is license compatible with ZeroMQ, to the point where we encourage forks as a valuable tool for experimentation. Yes, it can be weird to see someone try to run off with the ball but here’s the secret, I can get it back any time I want.

The Process #

If you’ve accepted my thesis up to now, great! Now, I’ll explain the rough process by which we actually build an open source community. This was how we built or grew or gently steered the ZeroMQ community into existence.

Your goal as leader of a community is to motivate people to get out there and explore; to ensure they can do so safely and without disturbing others; to reward them when they make successful discoveries; and to ensure they share their knowledge with everyone else (and not because we ask them, not because they feel generous, but because it’s The Law).

It is an iterative process. You make a small product, at your own cost, but in public view. You then build a small community around that product. If you have a small but real hit, the community then helps design and build the next version, and grows larger. And then that community builds the next version, and so on. It’s evident that you remain part of the community, maybe even a majority contributor, but the more control you try to assert over the material results, the less people will want to participate. Plan your own retirement well before someone decides you are their next problem.

Crazy, Beautiful, and Easy #

You need a goal that’s crazy and simple enough to get people out of bed in the morning. Your community has to attract the very best people and that demands something special. With ZeroMQ, we said we were going to make “the Fastest. Messaging. Ever.”, which qualifies as a good motivator. If we’d said, we’re going to make “a smart transport layer that’ll connect your moving pieces cheaply and flexibly across your enterprise”, we’d have failed.

Then your work must be beautiful, immediately useful, and attractive. Your contributors are users who want to explore just a little beyond where they are now. Make it simple, elegant, and brutally clean. The experience when people run or use your work should be an emotional one. They should feel something, and if you accurately solved even just one big problem that until then they didn’t quite realize they faced, you’ll have a small part of their soul.

It must be easy to understand, use, and join. Too many projects have barriers to access: put yourself in the other person’s mind and see all the reasons they come to your site, thinking “Um, interesting project, but…” and then leave. You want them to stay and try it, just once. Use GitHub and put the issue tracker right there.

If you do these things well, your community will be smart but more importantly, it will be intellectually and geographically diverse. This is really important. A group of like-minded experts cannot explore the problem landscape well. They tend to make big mistakes. Diversity beats education any time.

Stranger, Meet Stranger #

How much up-front agreement do two people need to work together on something? In most organizations, a lot. But you can bring this cost down to near-zero, and then people can collaborate without having ever met, done a phone conference, meeting, or business trip to discuss Roles and Responsibilities over way too many bottles of cheap Korean rice wine.

You need well-written rules that are designed by cynical people like me to force strangers into mutually beneficial collaboration instead of conflict. The GPL is a good start. GitHub and its fork/merge strategy is a good follow-up. And then you want something like our C4 rulebook to control how work actually happens.

C4 (which I now use for every new open source project) has detailed and tested answers to a lot of common mistakes people make, such as the sin of working offline in a corner with others “because it’s faster”. Transparency is essential to get trust, which is essential to get scale. By forcing every single change through a single transparent process, you build real trust in the results.

Another cardinal sin that many open source developers make is to place themselves above others. “I founded this project thus my intellect is superior to that of others”. It’s not just immodest and rude, and usually inaccurate, it’s also poor business. The rules must apply equally to everyone, without distinction. You are part of the community. Your job, as founder of a project, is not to impose your vision of the product over others, but to make sure the rules are good, honest, and enforced.

Infinite Property #

One of the saddest myths of the knowledge business is that ideas are a sensible form of property. It’s medieval nonsense that should have been junked along with slavery, but sadly it’s still making too many powerful people too much money.

Ideas are cheap. What does work sensibly as property is the hard work we do in building a market. “You eat what you kill” is the right model for encouraging people to work hard. Whether it’s moral authority over a project, money from consulting, or the sale of a trademark to some large, rich firm: if you make it, you own it. But what you really own is “footfall”, participants in your project, which ultimately defines your power.

To do this requires infinite free space. Thankfully, GitHub solved this problem for us, for which I will die a grateful person (there are many reasons to be grateful in life, which I won’t list here because we only have a hundred or so pages left, but this is one of them).

You cannot scale a single project with many owners like you can scale a collection of many small projects, each with fewer owners. When we embrace forks, a person can become an “owner” with a single click. Now they just have to convince others to join by demonstrating their unique value.

So in ZeroMQ, we aimed to make it easy to write bindings on top of the core library, and we stopped trying to make those bindings ourselves. This created space for others to make those, become their owners, and get that credit.

Care and Feeding #

I wish a community could be 100% self-steering, and perhaps one day this will work, but today it’s not the case. We’re very close with ZeroMQ, but from my experience a community needs four types of care and feeding:

First, simply because most people are too nice, we need some kind of symbolic leadership or owners who provide ultimate authority in case of conflict. Usually it’s the founders of the community. I’ve seen it work with self-elected groups of “elders”, but old men like to talk a lot. I’ve seen communities split over the question “who is in charge?”, and setting up legal entities with boards and such seems to make arguments over control worse, not better. Maybe because there seems to be more to fight over. One of the real benefits of free software is that it’s always remixable, so instead of fighting over a pie, one simply forks the pie.
Second, communities need living rules, and thus they need a lawyer able to formulate and write these down. Rules are critical; when done right, they remove friction. When done wrong, or neglected, we see real friction and argument that can drive away the nice majority, leaving the argumentative core in charge of the burning house. One thing I’ve tried to do with the ZeroMQ and previous communities is create reusable rules, which perhaps means we don’t need lawyers as much.
Thirdly, communities need some kind of financial backing. This is the jagged rock that breaks most ships. If you starve a community, it becomes more creative but the core contributors burn out. If you pour too much money into it, you attract the professionals, who never say “no”, and the community loses its diversity and creativity. If you create a fund for people to share, they will fight (bitterly) over it. With ZeroMQ, we (iMatix) spend our time and money on marketing and packaging (like this book), and the basic care, like bug fixes, releases, and websites.
Lastly, sales and commercial mediation are important. There is a natural market between expert contributors and customers, but both are somewhat incompetent at talking to each other. Customers assume that support is free or very cheap because the software is free. Contributors are shy at asking a fair rate for their work. It makes for a difficult market. A growing part of my work and my firm’s profits is simply connecting ZeroMQ users who want help with experts from the community able to provide it, and ensuring both sides are happy with the results.

I’ve seen communities of brilliant people with noble goals dying because the founders got some or all of these four things wrong. The core problem is that you can’t expect consistently great leadership from any one company, person, or group. What works today often won’t work tomorrow, yet structures become more solid, not more flexible, over time.

The best answer I can find is a mix of two things. One, the GPL and its guarantee of remixability. No matter how bad the authority, no matter how much they try to privatize and capture the community’s work, if it’s GPL licensed, that work can walk away and find a better authority. Before you say, “all open source offers this,” think it through. I can kill a BSD-licensed project by hiring the core contributors and not releasing any new patches. But even with a billion of dollars, I cannot kill a GPL-licensed project. Two, the philosophical anarchist model of authority, which is that we choose it, it does not own us.

The ZeroMQ Process: C4 #

When we say ZeroMQ we sometimes mean libzmq, the core library. In early 2012, we synthesized the libzmq process into a formal protocol for collaboration that we called the Collective Code Construction Contract, or C4. You can see this as a layer above the GPL. These are our rules, and I’ll explain the reasoning behind each one.

C4 is an evolution of the GitHub Fork + Pull Model. You may get the feeling I’m a fan of git and GitHub. This would be accurate: these two tools have made such a positive impact on our work over the last years, especially when it comes to building community.

Language #

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

By starting with the RFC 2119 language, the C4 text makes very clear its intention to act as a protocol rather than a randomly written set of recommendations. A protocol is a contract between parties that defines the rights and obligations of each party. These can be peers in a network or they can be strangers working in the same project.

I think C4 is the first time anyone has attempted to codify a community’s rulebook as a formal and reusable protocol spec. Previously, our rules were spread out over several wiki pages, and were quite specific to libzmq in many ways. But experience teaches us that the more formal, accurate, and reusable the rules, the easier it is for strangers to collaborate up-front. And less friction means a more scalable community. At the time of C4, we also had some disagreement in the libzmq project over precisely what process we were using. Not everyone felt bound by the same rules. Let’s just say some people felt they had a special status, which created friction with the rest of the community. So codification made things clear.

It’s easy to use C4: just host your project on GitHub, get one other person to join, and open the floor to pull requests. In your README, put a link to C4 and that’s it. We’ve done this in quite a few projects and it does seem to work. I’ve been pleasantly surprised a few times just applying these rules to my own work, like CZMQ. None of us are so amazing that we can work without others.

Goals #

C4 is meant to provide a reusable optimal collaboration model for open source software projects.

The short term reason for writing C4 was to end arguments over the libzmq contribution process. The dissenters went off elsewhere. The ZeroMQ community blossomed smoothly and easily, as I’d predicted. Most people were surprised, but gratified. There’s been no real criticisms of C4 except its branching policy, which I’ll come to later as it deserves its own discussion.

There’s a reason I’m reviewing history here: as founder of a community, you are asking people to invest in your property, trademark, and branding. In return, and this is what we do with ZeroMQ, you can use that branding to set a bar for quality. When you download a product labeled “ZeroMQ”, you know that it’s been produced to certain standards. It’s a basic rule of quality: write down your process; otherwise you cannot improve it. Our processes aren’t perfect, nor can they ever be. But any flaw in them can be fixed, and tested.

Making C4 reusable is therefore really important. To learn more about the best possible process, we need to get results from the widest range of projects.

It has these specific goals: To maximize the scale of the community around a project, by reducing the friction for new Contributors and creating a scaled participation model with strong positive feedbacks;

The number one goal is size and health of the community–not technical quality, not profits, not performance, not market share. The goal is simply the number of people who contribute to the project. The science here is simple: the larger the community, the more accurate the results.

To relieve dependencies on key individuals by separating different skill sets so that there is a larger pool of competence in any required domain;

Perhaps the worst problem we faced in libzmq was dependence on people who could understand the code, manage GitHub branches, and make clean releases–all at the same time. It’s like looking for athletes who can run marathons and sprint, swim, and also lift weights. We humans are really good at specialization. Asking us to be really good at two contradictory things reduces the number of candidates sharply, which is a Bad Thing for any project. We had this problem severely in libzmq in 2009 or so, and fixed it by splitting the role of maintainer into two: one person makes patches and another makes releases.

To allow the project to develop faster and more accurately, by increasing the diversity of the decision making process;

This is theory–not fully proven, but not falsified. The diversity of the community and the number of people who can weigh in on discussions, without fear of being criticized or dismissed, the faster and more accurately the software develops. Speed is quite subjective here. Going very fast in the wrong direction is not just useless, it’s actively damaging (and we suffered a lot of that in libzmq before we switched to C4).

To support the natural life cycle of project versions from experimental through to stable, by allowing safe experimentation, rapid failure, and isolation of stable code;

To be honest, this goal seems to be fading into irrelevance. It’s quite an interesting effect of the process: the git master is almost always perfectly stable. This has to do with the size of changes and their latency, i.e., the time between someone writing the code and someone actually using it fully. However, people still expect “stable” releases, so we’ll keep this goal there for a while.

To reduce the internal complexity of project repositories, thus making it easier for Contributors to participate and reducing the scope for error;

Curious observation: people who thrive in complex situations like to create complexity because it keeps their value high. It’s the Cobra Effect (Google it). Git made branches easy and left us with the all too common syndrome of “git is easy once you understand that a git branch is just a folded five-dimensional lepton space that has a detached history with no intervening cache”. Developers should not be made to feel stupid by their tools. I’ve seen too many top-class developers confused by repository structures to accept conventional wisdom on git branches. We’ll come back to dispose of git branches shortly, dear reader.

To enforce collective ownership of the project, which increases economic incentive to Contributors and reduces the risk of hijack by hostile entities.

Ultimately, we’re economic creatures, and the sense that “we own this, and our work can never be used against us” makes it much easier for people to invest in an open source project like ZeroMQ. And it can’t be just a feeling, it has to be real. There are a number of aspects to making collective ownership work, we’ll see these one-by-one as we go through C4.

Preliminaries #

The project SHALL use the git distributed revision control system.

Git has its faults. Its command-line API is horribly inconsistent, and it has a complex, messy internal model that it shoves in your face at the slightest provocation. But despite doing its best to make its users feel stupid, git does its job really, really well. More pragmatically, I’ve found that if you stay away from certain areas (branches!), people learn git rapidly and don’t make many mistakes. That works for me.

The project SHALL be hosted on github.com or equivalent, herein called the “Platform”.

I’m sure one day some large firm will buy GitHub and break it, and another platform will rise in its place. Until then, Github serves up a near-perfect set of minimal, fast, simple tools. I’ve thrown hundreds of people at it, and they all stick like flies stuck in a dish of honey.

The project SHALL use the Platform issue tracker.

We made the mistake in libzmq of switching to Jira because we hadn’t learned yet how to properly use the GitHub issue tracker. Jira is a great example of how to turn something useful into a complex mess because the business depends on selling more “features”. But even without criticizing Jira, keeping the issue tracker on the same platform means one less UI to learn, one less login, and smooth integration between issues and patches.

The project SHOULD have clearly documented guidelines for code style.

This is a protocol plug-in: insert code style guidelines here. If you don’t document the code style you use, you have no basis except prejudice to reject patches.

A “Contributor” is a person who wishes to provide a patch, being a set of commits that solve some clearly identified problem. A “Maintainer” is a person who merge patches to the project. Maintainers are not developers; their job is to enforce process.

Now we move on to definitions of the parties, and the splitting of roles that saved us from the sin of structural dependency on rare individuals. This worked well in libzmq, but as you will see it depends on the rest of the process. C4 isn’t a buffet; you will need the whole process (or something very like it), or it won’t hold together.

Contributors SHALL NOT have commit access to the repository unless they are also Maintainers. Maintainers SHALL have commit access to the repository.

What we wanted to avoid was people pushing their changes directly to master. This was the biggest source of trouble in libzmq historically: large masses of raw code that took months or years to fully stabilize. We eventually followed other ZeroMQ projects like PyZMQ in using pull requests. We went further, and stipulated that all changes had to follow the same path. No exceptions for “special people”.

Everyone, without distinction or discrimination, SHALL have an equal right to become a Contributor under the terms of this contract.

We had to state this explicitly. It used to be that the libzmq maintainers would reject patches simply because they didn’t like them. Now, that may sound reasonable to the author of a library (though libzmq was not written by any one person), but let’s remember our goal of creating a work that is owned by as many people as possible. Saying “I don’t like your patch so I’m going to reject it” is equivalent to saying, “I claim to own this and I think I’m better than you, and I don’t trust you”. Those are toxic messages to give to others who are thinking of becoming your co-investors.

I think this fight between individual expertise and collective intelligence plays out in other areas. It defined Wikipedia, and still does, a decade after that work surpassed anything built by small groups of experts. For me, we make software by slowly synthesizing the most accurate knowledge, much as we make Wikipedia articles.

Licensing and Ownership #

The project SHALL use the GPLv3 or a variant thereof (LGPL, AGPL).

I’ve already explained how full remixability creates better scale and why the GPL and its variants seems the optimal contract for remixable software. If you’re a large business aiming to dump code on the market, you won’t want C4, but then you won’t really care about community either.

All contributions to the project source code (“patches”) SHALL use the same license as the project.

This removes the need for any specific license or contribution agreement for patches. You fork the GPL code, you publish your remixed version on GitHub, and you or anyone else can then submit that as a patch to the original code. BSD doesn’t allow this. Any work that contains BSD code may also contain unlicensed proprietary code so you need explicit action from the author of the code before you can remix it.

All patches are owned by their authors. There SHALL NOT be any copyright assignment process.

Here we come to the key reason people trust their investments in ZeroMQ: it’s logistically impossible to buy the copyrights to create a closed source competitor to ZeroMQ. iMatix can’t do this either. And the more people that send patches, the harder it becomes. ZeroMQ isn’t just free and open today–this specific rule means it will remain so forever. Note that it’s not the case in all GPL projects, many of which still ask for copyright transfer back to the maintainers.

The project SHALL be owned collectively by all its Contributors.

This is perhaps redundant, but worth saying: if everyone owns their patches, then the resulting whole is also owned by every contributor. There’s no legal concept of owning lines of code: the “work” is at least a source file.

Each Contributor SHALL be responsible for identifying themselves in the project Contributor list.

In other words, the maintainers are not karma accountants. Anyone who wants credit has to claim it themselves.

Patch Requirements #

In this section, we define the obligations of the contributor: specifically, what constitutes a “valid” patch, so that maintainers have rules they can use to accept or reject patches.

Maintainers and Contributors MUST have a Platform account and SHOULD use their real names or a well-known alias.

In the worst case scenario, where someone has submitted toxic code (patented, or owned by someone else), we need to be able to trace who and when, so we can remove the code. Asking for real names or a well-known alias is a theoretical strategy for reducing the risk of bogus patches. We don’t know if this actually works because we haven’t had the problem yet.

A patch SHOULD be a minimal and accurate answer to exactly one identified and agreed problem.

This implements the Simplicity Oriented Design process that I’ll come to later in this chapter. One clear problem, one minimal solution, apply, test, repeat.

A patch MUST adhere to the code style guidelines of the project if these are defined.

This is just sanity. I’ve spent time cleaning up other peoples’ patches because they insisted on putting the else beside the if instead of just below as Nature intended. Consistent code is healthier.

A patch MUST adhere to the “Evolution of Public Contracts” guidelines defined below.

Ah, the pain, the pain. I’m not speaking of the time at age eight when I stepped on a plank with a 4-inch nail protruding from it. That was relatively OK. I’m speaking of 2010-2011 when we had multiple parallel releases of ZeroMQ, each with different incompatible APIs or wire protocols. It was an exercise in bad rules, pointlessly enforced, that still hurts us today. The rule was, “If you change the API or protocol, you SHALL create a new major version”. Give me the nail through the foot; that hurt less.

One of the big changes we made with C4 was simply to ban, outright, this kind of sanctioned sabotage. Amazingly, it’s not even hard. We just don’t allow the breaking of existing public contracts, period, unless everyone agrees, in which case no period. As Linus Torvalds famously put it on 23 December 2012, “WE DO NOT BREAK USERSPACE!”

A patch SHALL NOT include nontrivial code from other projects unless the Contributor is the original author of that code.

This rule has two effects. The first is that it forces people to make minimal solutions because they cannot simply import swathes of existing code. In the cases where I’ve seen this happen to projects, it’s always bad unless the imported code is very cleanly separated. The second is that it avoids license arguments. You write the patch, you are allowed to publish it as LGPL, and we can merge it back in. But you find a 200-line code fragment on the web, and try to paste that, we’ll refuse.

A patch MUST compile cleanly and pass project self-tests on at least the principle target platform.

For cross-platform projects, it is fair to ask that the patch works on the development box used by the contributor.

A patch commit message SHOULD consist of a single short (less than 50 character) line summarizing the change, optionally followed by a blank line and then a more thorough description.

This is a good format for commit messages that fits into email (the first line becomes the subject, and the rest becomes the email body).

A “Correct Patch” is one that satisfies the above requirements.

Just in case it wasn’t clear, we’re back to legalese and definitions.

Development Process #

In this section, we aim to describe the actual development process, step-by-step.

Change on the project SHALL be governed by the pattern of accurately identifying problems and applying minimal, accurate solutions to these problems.

This is a unapologetic ramming through of thirty years’ software design experience. It’s a profoundly simple approach to design: make minimal, accurate solutions to real problems, nothing more or less. In ZeroMQ, we don’t have feature requests. Treating new features the same as bugs confuses some newcomers. But this process works, and not just in open source. Enunciating the problem we’re trying to solve, with every single change, is key to deciding whether the change is worth making or not.

To initiate changes, a user SHALL log an issue on the project Platform issue tracker.

This is meant to stop us from going offline and working in a ghetto, either by ourselves or with others. Although we tend to accept pull requests that have clear argumentation, this rule lets us say “stop” to confused or too-large patches.

The user SHOULD write the issue by describing the problem they face or observe.

“Problem: we need feature X. Solution: make it” is not a good issue. “Problem: user cannot do common tasks A or B except by using a complex workaround. Solution: make feature X” is a decent explanation. Because everyone I’ve ever worked with has needed to learn this, it seems worth restating: document the real problem first, solution second.

The user SHOULD seek consensus on the accuracy of their observation, and the value of solving the problem.

And because many apparent problems are illusionary, by stating the problem explicitly we give others a chance to correct our logic. “You’re only using A and B a lot because function C is unreliable. Solution: make function C work properly.”

Users SHALL NOT log feature requests, ideas, suggestions, or any solutions to problems that are not explicitly documented and provable.

There are several reasons for not logging ideas, suggestions, or feature requests. In our experience, these just accumulate in the issue tracker until someone deletes them. But more profoundly, when we treat all change as problem solutions, we can prioritize trivially. Either the problem is real and someone wants to solve it now, or it’s not on the table. Thus, wish lists are off the table.

Thus, the release history of the project SHALL be a list of meaningful issues logged and solved.

I’d love the GitHub issue tracker to simply list all the issues we solved in each release. Today we still have to write that by hand. If one puts the issue number in each commit, and if one uses the GitHub issue tracker, which we sadly don’t yet do for ZeroMQ, this release history is easier to produce mechanically.

To work on an issue, a Contributor SHALL fork the project repository and then work on their forked repository.

Here we explain the GitHub fork + pull request model so that newcomers only have to learn one process (C4) in order to contribute.

To submit a patch, a Contributor SHALL create a Platform pull request back to the project.

GitHub has made this so simple that we don’t need to learn git commands to do it, for which I’m deeply grateful. Sometimes, I’ll tell people who I don’t particularly like that command-line git is awesome and all they need to do is learn git’s internal model in detail before trying to use it on real work. When I see them several months later they look… changed.

A Contributor SHALL NOT commit changes directly to the project.

Anyone who submits a patch is a contributor, and all contributors follow the same rules. No special privileges to the original authors, because otherwise we’re not building a community, only boosting our egos.

To discuss a patch, people MAY comment on the Platform pull request, on the commit, or elsewhere.

Randomly distributed discussions may be confusing if you’re walking up for the first time, but GitHub solves this for all current participants by sending emails to those who need to follow what’s going on. We had the same experience and the same solution in Wikidot, and it works. There’s no evidence that discussing in different places has any negative effect.

To accept or reject a patch, a Maintainer SHALL use the Platform interface.

Working via the GitHub web user interface means pull requests are logged as issues, with workflow and discussion. I’m sure there are more complex ways to work. Complexity is easy; it’s simplicity that’s incredibly hard.

Maintainers SHALL NOT accept their own patches.

There was a rule we defined in the FFII years ago to stop people burning out: no less than two people on any project. One-person projects tend to end in tears, or at least bitter silence. We have quite a lot of data on burnout, why it happens, and how to prevent it (even cure it). I’ll explore this later in the chapter, because if you work with or on open source you need to be aware of the risks. The “no merging your own patch” rule has two goals. First, if you want your project to be C4-certified, you have to get at least one other person to help. If no one wants to help you, perhaps you need to rethink your project. Second, having a control for every patch makes it much more satisfying, keeps us more focused, and stops us breaking the rules because we’re in a hurry, or just feeling lazy.

Maintainers SHALL NOT make value judgments on correct patches.

We already said this but it’s worth repeating: the role of Maintainer is not to judge a patch’s substance, only its technical quality. The substantive worth of a patch only emerges over time: people use it, and like it, or they do not. And if no one is using a patch, eventually it’ll annoy someone else who will remove it, and no one will complain.

Maintainers SHALL merge correct patches rapidly.

There is a criteria I call change latency, which is the round-trip time from identifying a problem to testing a solution. The faster the better. If maintainers cannot respond to pull requests as rapidly as people expect, they’re not doing their job (or they need more hands).

The Contributor MAY tag an issue as “Ready” after making a pull request for the issue.

By default, GitHub offers the usual variety of issues, but with C4 we don’t use them. Instead, we need just two labels, “Urgent” and “Ready”. A contributor who wants another user to test an issue can then label it as “Ready”.

The user who created an issue SHOULD close the issue after checking the patch is successful.

When one person opens an issue, and another works on it, it’s best to allow the original person to close the issue. That acts as a double-check that the issue was properly resolved.

Maintainers SHOULD ask for improvements to incorrect patches and SHOULD reject incorrect patches if the Contributor does not respond constructively.

Initially, I felt it was worth merging all patches, no matter how poor. There’s an element of trolling involved. Accepting even obviously bogus patches could, I felt, pull in more contributors. But people were uncomfortable with this so we defined the “correct patch” rules, and the Maintainer’s role in checking for quality. On the negative side, I think we didn’t take some interesting risks, which could have paid off with more participants. On the positive side, this has led to libzmq master (and in all projects that use C4) being practically production quality, practically all the time.

Any Contributor who has value judgments on a correct patch SHOULD express these via their own patches.

In essence, the goal here is to allow users to try patches rather than to spend time arguing pros and cons. As easy as it is to make a patch, it’s as easy to revert it with another patch. You might think this would lead to “patch wars”, but that hasn’t happened. We’ve had a handful of cases in libzmq where patches by one contributor were killed by another person who felt the experimentation wasn’t going in the right direction. It is easier than seeking up-front consensus.

Maintainers MAY commit changes to non-source documentation directly to the project.

This exit allows maintainers who are making release notes to push those without having to create an issue which would then affect the release notes, leading to stress on the space time fabric and possibly involuntary rerouting backwards in the fourth dimension to before the invention of cold beer. Shudder. It is simpler to agree that release notes aren’t technically software.

Creating Stable Releases #

We want some guarantee of stability for a production system. In the past, this meant taking unstable code and then over months hammering out the bugs and faults until it was safe to trust. iMatix’s job, for years, has been to do this to libzmq, turning raw code into packages by allowing only bug fixes and no new code into a “stabilization branch”. It’s surprisingly not as thankless as it sounds.

Now, since we went full speed with C4, we’ve found that git master of libzmq is mostly perfect, most of the time. This frees our time to do more interesting things, such as building new open source layers on top of libzmq. However, people still want that guarantee: many users will simply not install except from an “official” release. So a stable release today means two things. First, a snapshot of the master taken at a time when there were no new changes for a while, and no dramatic open bugs. Second, a way to fine tune that snapshot to fix the critical issues remaining in it.

This is the process we explain in this section.

The project SHALL have one branch (“master”) that always holds the latest in-progress version and SHOULD always build.

This is redundant because every patch always builds but it’s worth restating. If the master doesn’t build (and pass its tests), someone needs waking up.

The project SHALL NOT use topic branches for any reason. Personal forks MAY use topic branches.

I’ll come to branches soon. In short (or “tl;dr”, as they say on the webs), branches make the repository too complex and fragile, and require up-front agreement, all of which are expensive and avoidable.

To make a stable release someone SHALL fork the repository by copying it and thus become maintainer of this repository. Forking a project for stabilization MAY be done unilaterally and without agreement of project maintainers.

It’s free software. No one has a monopoly on it. If you think the maintainers aren’t producing stable releases right, fork the repository and do it yourself. Forking isn’t a failure, it’s an essential tool for competition. You can’t do this with branches, which means a branch-based release policy gives the project maintainers a monopoly. And that’s bad because they’ll become lazier and more arrogant than if real competition is chasing their heels.

A stabilization project SHOULD be maintained by the same process as the main project.

Stabilization projects have maintainers and contributors like any project. In practice we usually cherry pick patches from the main project to the stabilization project, but that’s just a convenience.

A patch to a repository declared “stable” SHALL be accompanied by a reproducible test case.

Beware of a one-size-fits-all process. New code does not need the same paranoia as code that people are trusting for production use. In the normal development process, we did not mention test cases. There’s a reason for this. While I love testable patches, many changes aren’t easily or at all testable. However, to stabilize a code base you want to fix only serious bugs, and you want to be 100% sure every change is accurate. This means before and after tests for every change.

Evolution of Public Contracts #

By “public contracts”, I mean APIs and protocols. Up until the end of 2011, libzmq's naturally happy state was marred by broken promises and broken contracts. We stopped making promises (aka “road maps”) for libzmq completely, and our dominant theory of change is now that it emerges carefully and accurately over time. At a 2012 Chicago meetup, Garrett Smith and Chuck Remes called this the “drunken stumble to greatness”, which is how I think of it now.

We stopped breaking public contracts simply by banning the practice. Before then it had been “OK” (as in we did it and everyone complained bitterly, and we ignored them) to break the API or protocol so long as we changed the major version number. Sounds fine, until you get ZeroMQ v2.0, v3.0, and v4.0 all in development at the same time, and not speaking to each other.

All Public Contracts (APIs or protocols) SHOULD be documented.

You’d think this was a given for professional software engineers but no, it’s not. So, it’s a rule. You want C4 certification for your project, you make sure your public contracts are documented. No “It’s specified in the code” excuses. Code is not a contract. (Yes, I intend at some point to create a C4 certification process to act as a quality indicator for open source projects.)

All Public Contracts SHALL use Semantic Versioning.

This rule is mainly here because people asked for it. I’ve no real love for it, as Semantic Versioning is what led to the so-called “Why does ZeroMQ not speak to itself?!” debacle. I’ve never seen the problem that this solved. Something about runtime validation of library versions, or some-such.

All Public Contracts SHOULD have space for extensibility and experimentation.

Now, the real thing is that public contracts do change. It’s not about not changing them. It’s about changing them safely. This means educating (especially protocol) designers to create that space up-front.

A patch that modifies a stable Public Contract SHOULD not break existing applications unless there is overriding consensus on the value of doing this.

Sometimes the patch is fixing a bad API that no one is using. It’s a freedom we need, but it should be based on consensus, not one person’s dogma. However, making random changes “just because” is not good. In ZeroMQ v3.x, did we benefit from renaming ZMQ_NOBLOCK to ZMQ_DONTWAIT? Sure, it’s closer to the POSIX socket recv() call, but is that worth breaking thousands of applications? No one ever reported it as an issue. To misquote Stallman: “your freedom to create an ideal world stops one inch from my application.”

A patch that introduces new features to a Public Contract SHOULD do so using new names.

We had the experience in ZeroMQ once or twice of new features using old names (or worse, using names that were still in use elsewhere). ZeroMQ v3.0 had a newly introduced “ROUTER” socket that was totally different from the existing ROUTER socket in 2.x. Dear lord, you should be face-palming, why? The reason: apparently, even smart people sometimes need regulation to stop them doing silly things.

Old names SHOULD be deprecated in a systematic fashion by marking new names as “experimental” until they are stable, then marking the old names as “deprecated”.

This life cycle notation has the great benefit of actually telling users what is going on with a consistent direction. “Experimental” means “we have introduced this and intend to make it stable if it works”. It does not mean, “we have introduced this and will remove it at any time if we feel like it”. One assumes that code that survives more than one patch cycle is meant to be there. “Deprecated” means “we have replaced this and intend to remove it”.

When sufficient time has passed, old deprecated names SHOULD be marked “legacy” and eventually removed.

In theory this gives applications time to move onto stable new contracts without risk. You can upgrade first, make sure things work, and then, over time, fix things up to remove dependencies on deprecated and legacy APIs and protocols.

Old names SHALL NOT be reused by new features.

Ah, yes, the joy when ZeroMQ v3.x renamed the top-used API functions (zmq_send() and zmq_recv()) and then recycled the old names for new methods that were utterly incompatible (and which I suspect few people actually use). You should be slapping yourself in confusion again, but really, this is what happened and I was as guilty as anyone. After all, we did change the version number! The only benefit of that experience was to get this rule.

When old names are removed, their implementations MUST provoke an exception (assertion) if used by applications.

I’ve not tested this rule to be certain it makes sense. Perhaps what it means is “if you can’t provoke a compile error because the API is dynamic, provoke an assertion”.

Project Administration #

The project founders SHALL act as Administrators to manage the set of project Maintainers.

Someone needs to administer the project, and it makes sense that the original founders start this ball rolling.

The Administrators SHALL ensure their own succession over time by promoting the most effective Maintainers.

At the same time, as founder of a project you really want to get out of the way before you become over-attached to it. Promoting the most active and consistent maintainers is good for everyone.

A new Contributor who makes a correct patch SHALL be invited to become a Maintainer.

I met Felix Geisendörfer in Lyons in 2012 at the Mix-IT conference where I presented Social Architecture and one thing that came out of this was Felix’s now famous Pull Request Hack. It fits elegantly into C4 and solves the problem of maintainers dropping out over time.

Administrators MAY remove Maintainers who are inactive for an extended period of time, or who repeatedly fail to apply this process accurately.

This was Ian Barber’s suggestion: we need a way to crop inactive maintainers. Originally maintainers were self-elected but that makes it hard to drop troublemakers (who are rare, but not unknown).

C4 is not perfect. Few things are. The process for changing it (Digistan’s COSS) is a little outdated now: it relies on a single-editor workflow with the ability to fork, but not merge. This seems to work but it could be better to use C4 for protocols like C4.

A Real-Life Example #

In this email thread, Dan Goes asks how to make a publisher that knows when a new client subscribes, and sends out previous matching messages. It’s a standard pub-sub technique called “last value caching”. Now over a 1-way transport like pgm (where subscribers literally send no packets back to publishers), this can’t be done. But over TCP, it can, if we use an XPUB socket and if that socket didn’t cleverly filter out duplicate subscriptions to reduce upstream traffic.

Though I’m not an expert contributor to libzmq, this seems like a fun problem to solve. How hard could it be? I start by forking the libzmq repository to my own GitHub account and then clone it to my laptop, where I build it:

git clone git@github.com:hintjens/libzmq.git
cd libzmq
./autogen.sh
./configure
make

Because the libzmq code is neat and well-organized, it was quite easy to find the main files to change (xpub.cpp and xpub.hpp). Each socket type has its own source file and class. They inherit from socket_base.cpp, which has this hook for socket-specific options:

//  First, check whether specific socket type overloads the option.
int rc = xsetsockopt (option_, optval_, optvallen_);
if (rc == 0 || errno != EINVAL)
    return rc;

//  If the socket type doesn't support the option, pass it to
//  the generic option parser.
return options.setsockopt (option_, optval_, optvallen_);

Then I check where the XPUB socket filters out duplicate subscriptions, in its xread_activated method:

bool unique;
if (*data == 0)
    unique = subscriptions.rm (data + 1, size - 1, pipe_);
else
    unique = subscriptions.add (data + 1, size - 1, pipe_);

//  If the subscription is not a duplicate store it so that it can be
//  passed to used on next recv call.
if (unique && options.type != ZMQ_PUB)
    pending.push_back (blob_t (data, size));

At this stage, I’m not too concerned with the details of how subscriptions.rm and subscriptions.add work. The code seems obvious except that “subscription” also includes unsubscription, which confused me for a few seconds. If there’s anything else weird in the rm and add methods, that’s a separate issue to fix later. Time to make an issue for this change. I head over to the zeromq.jira.com site, log in, and create a new entry.

Jira kindly offers me the traditional choice between “bug” and “new feature” and I spend thirty seconds wondering where this counterproductive historical distinction came from. Presumably, the “we’ll fix bugs for free, but you pay for new features” commercial proposal, which stems from the “you tell us what you want and we’ll make it for $X” model of software development, and which generally leads to “we spent three times $X and we got what?!” email Fists of Fury.

Putting such thoughts aside, I create an issue #443 and described the problem and plausible solution:

Problem: XPUB socket filters out duplicate subscriptions (deliberate design). However this makes it impossible to do subscription-based intelligence. See http://lists.zeromq.org/pipermail/zeromq-dev/2012-October/018838.html for a use case. Solution: make this behavior configurable with a socket option.

It’s naming time. The API sits in include/zmq.h, so this is where I added the option name. When you invent a concept in an API or anywhere, please take a moment to choose a name that is explicit and short and obvious. Don’t fall back on generic names that need additional context to understand. You have one chance to tell the reader what your concept is and does. A name like ZMQ_SUBSCRIPTION_FORWARDING_FLAG is terrible. It technically kind of aims in the right direction, but is miserably long and obscure. I chose ZMQ_XPUB_VERBOSE: short and explicit and clearly an on/off switch with “off” being the default setting.

So, it’s time to add a private property to the xpub class definition in xpub.hpp:

// If true, send all subscription messages upstream, not just
// unique ones
bool verbose;

And then lift some code from router.cpp to implement the xsetsockopt method. Finally, change the xread_activated method to use this new option, and while at it, make that test on socket type more explicit too:

//  If the subscription is not a duplicate store it so that it can be
//  passed to used on next recv call.
if (options.type == ZMQ_XPUB && (unique || verbose))
    pending.push_back (blob_t (data, size));

The thing builds nicely the first time. This makes me a little suspicious, but being lazy and jet-lagged I don’t immediately make a test case to actually try out the change. The process doesn’t demand that, even if usually I’d do it just to catch that inevitable 10% of mistakes we all make. I do however document this new option on the doc/zmq_setsockopt.txt man page. In the worst case, I added a patch that wasn’t really useful. But I certainly didn’t break anything.

I don’t implement a matching zmq_getsockopt because “minimal” means what it says. There’s no obvious use case for getting the value of an option that you presumably just set, in code. Symmetry isn’t a valid reason to double the size of a patch. I did have to document the new option because the process says, “All Public Contracts SHOULD be documented.”

Committing the code, I push the patch to my forked repository (the “origin”):

git commit -a -m "Fixed issue #443"
git push origin master

Switching to the GitHub web interface, I go to my libzmq fork, and press the big “Pull Request” button at the top. GitHub asks me for a title, so I enter “Added ZMQ_XPUB_VERBOSE option”. I’m not sure why it asks this as I made a neat commit message but hey, let’s go with the flow here.

This makes a nice little pull request with two commits; the one I’d made a month ago on the release notes to prepare for the v3.2.1 release (a month passes so quickly when you spend most of it in airports), and my fix for issue #443 (37 new lines of code). GitHub lets you continue to make commits after you’ve kicked off a pull request. They get queued up and merged in one go. That is easy, but the maintainer may refuse the whole bundle based on one patch that doesn’t look valid.

Because Dan is waiting (at least in my highly optimistic imagination) for this fix, I go back to the zeromq-dev list and tell him I’ve made the patch, with a link to the commit. The faster I get feedback, the better. It’s 1 a.m. in South Korea as I make this patch, so early evening in Europe, and morning in the States. You learn to count timezones when you work with people across the world. Ian is in a conference, Mikko is getting on a plane, and Chuck is probably in the office, but three hours later, Ian merges the pull request.

After Ian merges the pull request, I resynchronize my fork with the upstream libzmq repository. First, I add a remote that tells git where this repository sits (I do this just once in the directory where I’m working):

git remote add upstream git://github.com/zeromq/libzmq.git

And then I pull changes back from the upstream master and check the git log to double-check:

git pull --rebase upstream master
git log

And that is pretty much it, in terms of how much git one needs to learn and use to contribute patches to libzmq. Six git commands and some clicking on web pages. Most importantly to me as a naturally lazy, stupid, and easily confused developer, I don’t have to learn git’s internal models, and never have to do anything involving those infernal engines of structural complexity we call “git branches”. Next up, the attempted assassination of git branches. Let’s live dangerously!

Git Branches Considered Harmful #

One of git’s most popular features is its branches. Almost all projects that use git use branches, and the selection of the “best” branching strategy is like a rite of passage for an open source project. Vincent Driessen’s git-flow may be the best known. It has base branches (master, develop), feature branches, release branches, hotfix branches, and support branches. Many teams have adopted git-flow, which even has git extensions to support it. I’m a great believer in popular wisdom, but sometimes you have to recognize mass delusion for what it is.

Here is a section of C4 that might have shocked you when you first read it:

The project SHALL NOT use topic branches for any reason. Personal forks MAY use topic branches.

To be clear, it’s public branches in shared repositories that I’m talking about. Using branches for private work, e.g., to work on different issues, appears to work well enough, though it’s more complexity than I personally enjoy. To channel Stallman again: “your freedom to create complexity ends one inch from our shared workspace.”

Like the rest of C4, the rules on branches are not accidental. They came from our experience making ZeroMQ, starting when Martin Sustrik and I rethought how to make stable releases. We both love and appreciate simplicity (some people seem to have a remarkable tolerance for complexity). We chatted for a while… I asked him, “I’m going to start making a stable release. Would it be OK for me to make a branch in the git you’re working in?” Martin didn’t like the idea. “OK, if I fork the repository, I can move patches from your repo to that one”. That felt much better to both of us.

The response from many in the ZeroMQ community was shock and horror. People felt we were being lazy and making contributors work harder to find the “right” repository. Still, this seemed simple, and indeed it worked smoothly. The best part was that we each worked as we wanted to. Whereas before, the ZeroMQ repository had felt horribly complex (and it wasn’t even anything like git-flow), this felt simple. And it worked. The only downside was that we lost a single unified history. Now, perhaps historians will feel robbed, but I honestly can’t see that the historical minutiae of who changed what, when, including every branch and experiment, are worth any significant pain or friction.

People have gotten used to the “multiple repositories” approach in ZeroMQ and we’ve started using that in other projects quite successfully. My own opinion is that history will judge git branches and patterns like git-flow as a complex solution to imaginary problems inherited from the days of Subversion and monolithic repositories.

More profoundly, and perhaps this is why the majority seems to be “wrong”: I think the branches versus forks argument is really a deeper design versus evolve argument about how to make software optimally. I’ll address that deeper argument in the next section. For now, I’ll try to be scientific about my irrational hatred of branches, by looking at a number of criteria, and comparing branches and forks in each one.

Simplicity Versus Complexity #

The simpler, the better.

There is no inherent reason why branches are more complex than forks. However, git-flow uses five types of branch, whereas C4 uses two types of fork (development, and stable) and one branch (master). Circumstantial evidence is thus that branches lead to more complexity than forks. For new users, it is definitely, and we’ve measured this in practice, easier to learn to work with many repositories and no branches except master.

Change Latency #

The smaller and more rapid the delivery, the better.

Development branches seem to correlate strongly with large, slow, risky deliveries. “Sorry, I have to merge this branch before we can test the new version” signals a breakdown in process. It’s certainly not how C4 works, which is by focusing tightly on individual problems and their minimal solutions. Allowing branches in development raises change latency. Forks have a different outcome: it’s up to the forker to ensure that his changes merge cleanly, and to keep them simple so they won’t be rejected.

Learning Curve #

The smoother the learning curve, the better.

Evidence definitely shows that learning to use git branches is complex. For some people, this is OK. For most developers, every cycle spent learning git is a cycle lost on more productive things. I’ve been told several times, by different people that I do not like branches because I “never properly learned git”. That is fair, but it is a criticism of the tool, not the human.

Cost of Failure #

The lower the cost of failure, the better.

Branches demand more perfection from developers because mistakes potentially affect others. This raises the cost of failure. Forks make failure extremely cheap because literally nothing that happens in a fork can affect others not using that fork.

Up-front Coordination #

The less need for up-front coordination, the better.

You can do a hostile fork. You cannot do a hostile branch. Branches depend on up-front coordination, which is expensive and fragile. One person can veto the desires of a whole group. For example in the ZeroMQ community we were unable to agree on a git branching model for a year. We solved that by using forking instead. The problem went away.

Scalability #

The more you can scale a project, the better.

The strong assumption in all branch strategies is that the repository is the project. But there is a limit to how many people you can get to agree to work together in one repository. As I explained, the cost of up-front coordination can become fatal. A more realistic project scales by allowing anyone to start their own repositories, and ensuring these can work together. A project like ZeroMQ has dozens of repositories. Forking looks more scalable than branching.

Surprise and Expectations #

The less surprising, the better.

People expect branches and find forks to be uncommon and thus confusing. This is the one aspect where branches win. If you use branches, a single patch will have the same commit hash tag, whereas across forks the patch will have different hash tags. That makes it harder to track patches as they cross forks, true. But seriously, having to track hexadecimal hash tags is not a feature. It’s a bug. Sometimes better ways of working are surprising at first.

Economics of Participation #

The more tangible the rewards, the better.

People like to own their work and get credit for it. This is much easier with forks than with branches. Forks create more competition in a healthy way, while branches suppress competition and force people to collaborate and share credit. This sounds positive but in my experience it demotivates people. A branch isn’t a product you can “own”, whereas a fork can be.

Robustness in Conflict #

The more a model can survive conflict, the better.

Like it or not, people fight over ego, status, beliefs, and theories of the world. Challenge is a necessary part of science. If your organizational model depends on agreement, you won’t survive the first real fight. Branches do not survive real arguments and fights, whereas forks can be hostile, and still benefit all parties. And this is indeed how free software works.

Guarantees of Isolation #

The stronger the isolation between production code and experiment, the better.

People make mistakes. I’ve seen experimental code pushed to mainline production by error. I’ve seen people make bad panic changes under stress. But the real fault is in allowing two entirely separate generations of product to exist in the same protected space. If you can push to random-branch-x, you can push to master. Branches do not guarantee isolation of production critical code. Forks do.

Visibility #

The more visible our work, the better.

Forks have watchers, issues, a README, and a wiki. Branches have none of these. People try forks, build them, break them, patch them. Branches sit there until someone remembers to work on them. Forks have downloads and tarballs. Branches do not. When we look for self-organization, the more visible and declarative the problems, the faster and more accurately we can work.

Conclusions #

In this section, I’ve listed a series of arguments, most of which came from fellow team members. Here’s how it seems to break down: git veterans insist that branches are the way to work, whereas newcomers tend to feel intimidated when asked to navigate git branches. Git is not an easy tool to master. What we’ve discovered, accidentally, is that when you stop using branches at all, git becomes trivial to use. It literally comes down to six commands (clone, remote, commit, log, push, and pull). Furthermore, a branch-free process actually works, we’ve used it for a couple of years now, and no visible downside except surprise to the veterans and growth of “single” projects over multiple repositories.

If you can’t use forks, perhaps because your firm doesn’t trust GitHub’s private repositories, then you can perhaps use topic branches, one per issue. You’ll still suffer the costs of getting up-front consensus, low competitiveness, and risk of human error.

Designing for Innovation #

Let’s look at innovation, which Wikipedia defines as, “the development of new values through solutions that meet new requirements, inarticulate needs, or old customer and market needs in value adding new ways.” This really just means solving problems more cheaply. It sounds straight-forward, but the history of collapsed tech giants proves that it’s not. I’ll try to explain how teams so often get it wrong, and suggest a way for doing innovation right.

The Tale of Two Bridges #

Two old engineers were talking of their lives and boasting of their greatest projects. One of the engineers explained how he had designed one of the greatest bridges ever made.

“We built it across a river gorge,” he told his friend. “It was wide and deep. We spent two years studying the land, and choosing designs and materials. We hired the best engineers and designed the bridge, which took another five years. We contracted the largest engineering firms to build the structures, the towers, the tollbooths, and the roads that would connect the bridge to the main highways. Dozens died during the construction. Under the road level we had trains, and a special path for cyclists. That bridge represented years of my life.”

The second man reflected for a while, then spoke. “One evening me and a friend got drunk on vodka, and we threw a rope across a gorge,” he said. “Just a rope, tied to two trees. There were two villages, one at each side. At first, people pulled packages across that rope with a pulley and string. Then someone threw a second rope, and built a foot walk. It was dangerous, but the kids loved it. A group of men then rebuilt that, made it solid, and women started to cross, everyday, with their produce. A market grew up on one side of the bridge, and slowly that became a large town, because there was a lot of space for houses. The rope bridge got replaced with a wooden bridge, to allow horses and carts to cross. Then the town built a real stone bridge, with metal beams. Later, they replaced the stone part with steel, and today there’s a suspension bridge standing in that same spot.”

The first engineer was silent. “Funny thing,” he said, “my bridge was demolished about ten years after we built it. Turns out it was built in the wrong place and no one wanted to use it. Some guys had thrown a rope across the gorge, a few miles further downstream, and that’s where everyone went.”

How ZeroMQ Lost Its Road Map #

Presenting ZeroMQ at the Mix-IT conference in Lyon in early 2012, I was asked several times for the “road map”. My answer was: there is no road map any longer. We had road maps, and we deleted them. Instead of a few experts trying to lay out the next steps, we were allowing this to happen organically. The audience didn’t really like my answer. So un-French.

However, the history of ZeroMQ makes it quite clear why road maps were problematic. In the beginning, we had a small team making the library, with few contributors, and no documented road map. As ZeroMQ grew more popular and we switched to more contributors, users asked for road maps. So we collected our plans together and tried to organize them into releases. Here, we wrote, is what will come in the next release.

As we rolled out releases, we hit the problem that it’s very easy to promise stuff, and rather harder to make it as planned. For one thing, much of the work was voluntary, and it’s not clear how you force volunteers to commit to a road map. But also, priorities can shift dramatically over time. So we were making promises we could not keep, and the real deliveries didn’t match the road maps.

The second problem was that by defining the road map, we in effect claimed territory, making it harder for others to participate. People do prefer to contribute to changes they believe were their idea. Writing down a list of things to do turns contribution into a chore rather than an opportunity.

Finally, we saw changes in ZeroMQ that were quite traumatic, and the road maps didn’t help with this, despite a lot of discussion and effort to “do it right”. Examples of this were incompatible changes in APIs and protocols. It was quite clear that we needed a different approach for defining the change process.

Software engineers don’t like the notion that powerful, effective solutions can come into existence without an intelligent designer actively thinking things through. And yet no one in that room in Lyon would have questioned evolution. A strange irony, and one I wanted to explore further as it underpins the direction the ZeroMQ community has taken since the start of 2012.

In the dominant theory of innovation, brilliant individuals reflect on large problem sets and then carefully and precisely create a solution. Sometimes they will have “eureka” moments where they “get” brilliantly simple answers to whole large problem sets. The inventor, and the process of invention are rare, precious, and can command a monopoly. History is full of such heroic individuals. We owe them our modern world.

Looking more closely, however, and you will see that the facts don’t match. History doesn’t show lone inventors. It shows lucky people who steal or claim ownership of ideas that are being worked on by many. It shows brilliant people striking lucky once, and then spending decades on fruitless and pointless quests. The best known large-scale inventors like Thomas Edison were in fact just very good at systematic broad research done by large teams. It’s like claiming that Steve Jobs invented every device made by Apple. It is a nice myth, good for marketing, but utterly useless as practical science.

Recent history, much better documented and less easy to manipulate, shows this well. The Internet is surely one of the most innovative and fast-moving areas of technology, and one of the best documented. It has no inventor. Instead, it has a massive economy of people who have carefully and progressively solved a long series of immediate problems, documented their answers, and made those available to all. The innovative nature of the Internet comes not from a small, select band of Einsteins. It comes from RFCs anyone can use and improve, made by hundreds and thousands of smart, but not uniquely smart, individuals. It comes from open source software anyone can use and improve. It comes from sharing, scale of community, and the continuous accretion of good solutions and disposal of bad ones.

Here thus is an alternative theory of innovation:

There is an infinite problem/solution terrain.
This terrain changes over time according to external conditions.
We can only accurately perceive problems to which we are close.
We can rank the cost/benefit economics of problems using a market for solutions.
There is an optimal solution to any solvable problem.
We can approach this optimal solution heuristically, and mechanically.
Our intelligence can make this process faster, but does not replace it.

There are a few corollaries to this:

Individual creativity matters less than process. Smarter people may work faster, but they may also work in the wrong direction. It’s the collective vision of reality that keeps us honest and relevant.
We don’t need road maps if we have a good process. Functionality will emerge and evolve over time as solutions compete for market share.
We don’t invent solutions so much as discover them. All sympathies to the creative soul. It’s just an information processing machine that likes to polish its own ego and collect karma.
Intelligence is a social effect, though it feels personal. A person cut off from others eventually stops thinking. We can neither collect problems nor measure solutions without other people.
The size and diversity of the community is a key factor. Larger, more diverse communities collect more relevant problems, and solve them more accurately, and do this faster, than a small expert group.

So, when we trust the solitary experts, they make classic mistakes. They focus on ideas, not problems. They focus on the wrong problems. They make misjudgments about the value of solving problems. They don’t use their own work.

Can we turn the above theory into a reusable process? In late 2011, I started documenting C4 and similar contracts, and using them both in ZeroMQ and in closed source projects. The underlying process is something I call “Simplicity Oriented Design”, or SOD. This is a reproducible way of developing simple and elegant products. It organizes people into flexible supply chains that are able to navigate a problem landscape rapidly and cheaply. They do this by building, testing, and keeping or discarding minimal plausible solutions, called “patches”. Living products consist of long series of patches, applied one atop the other.

SOD is relevant first because it’s how we evolve ZeroMQ. It’s also the basis for the design process we will use in Chapter 7 - Advanced Architecture using ZeroMQ to develop larger-scale ZeroMQ applications. Of course, you can use any software architecture methodology with ZeroMQ.

To best understand how we ended up with SOD, let’s look at the alternatives.

Trash-Oriented Design #

The most popular design process in large businesses seems to be Trash-Oriented Design, or TOD. TOD feeds off the belief that all we need to make money are great ideas. It’s tenacious nonsense, but a powerful crutch for people who lack imagination. The theory goes that ideas are rare, so the trick is to capture them. It’s like non-musicians being awed by a guitar player, not realizing that great talent is so cheap it literally plays on the streets for coins.

The main output of TODs is expensive “ideation”: concepts, design documents, and products that go straight into the trash can. It works as follows:

The Creative People come up with long lists of “we could do X and Y”. I’ve seen endlessly detailed lists of everything amazing a product could do. We’ve all been guilty of this. Once the creative work of idea generation has happened, it’s just a matter of execution, of course.
So the managers and their consultants pass their brilliant ideas to designers who create acres of preciously refined design documents. The designers take the tens of ideas the managers came up with, and turn them into hundreds of world-changing designs.
These designs get given to engineers who scratch their heads and wonder who the heck came up with such nonsense. They start to argue back, but the designs come from up high, and really, it’s not up to engineers to argue with creative people and expensive consultants.
So the engineers creep back to their cubicles, humiliated and threatened into building the gigantic but oh-so-elegant junk heap. It is bone-breaking work because the designs take no account of practical costs. Minor whims might take weeks of work to build. As the project gets delayed, the managers bully the engineers into giving up their evenings and weekends.
Eventually, something resembling a working product makes it out of the door. It’s creaky and fragile, complex and ugly. The designers curse the engineers for their incompetence and pay more consultants to put lipstick onto the pig, and slowly the product starts to look a little nicer.
By this time, the managers have started to try to sell the product and they find, shockingly, that no one wants it. Undaunted, they courageously build million-dollar web sites and ad campaigns to explain to the public why they absolutely need this product. They do deals with other businesses to force the product on the lazy, stupid, and ungrateful market.
After twelve months of intense marketing, the product still isn’t making profits. Worse, it suffers dramatic failures and gets branded in the press as a disaster. The company quietly shelves it, fires the consultants, buys a competing product from a small startup and rebrands that as its own Version 2. Hundreds of millions of dollars end up in the trash.
Meanwhile, another visionary manager somewhere in the organization drinks a little too much tequila with some marketing people and has a Brilliant Idea.

Trash-Oriented Design would be a caricature if it wasn’t so common. Something like 19 out of 20 market-ready products built by large firms are failures (yes, 87% of statistics are made up on the spot). The remaining 1 in 20 probably only succeeds because the competitors are so bad and the marketing is so aggressive.

The main lessons of TOD are quite straightforward but hard to swallow. They are:

Ideas are cheap. No exceptions. There are no brilliant ideas. Anyone who tries to start a discussion with “oooh, we can do this too!” should be beaten down with all the passion one reserves for traveling evangelists. It is like sitting in a cafe at the foot of a mountain, drinking a hot chocolate and telling others, “Hey, I have a great idea, we can climb that mountain! And build a chalet on top! With two saunas! And a garden! Hey, and we can make it solar powered! Dude, that’s awesome! What color should we paint it? Green! No, blue! OK, go and make it, I’ll stay here and make spreadsheets and graphics!”
The starting point for a good design process is to collect real problems that confront real people. The second step is to evaluate these problems with the basic question, “How much is it worth to solve this problem?” Having done that, we can collect that set of problems that are worth solving.
Good solutions to real problems will succeed as products. Their success will depend on how good and cheap the solution is, and how important the problem is (and sadly, how big the marketing budgets are). But their success will also depend on how much they demand in effort to use–in other words, how simple they are.

Now, after slaying the dragon of utter irrelevance, we attack the demon of complexity.

Complexity-Oriented Design #

Really good engineering teams and small firms can usually build decent products. But the vast majority of products still end up being too complex and less successful than they might be. This is because specialist teams, even the best, often stubbornly apply a process I call Complexity-Oriented Design, or COD, which works as follows:

Management correctly identifies some interesting and difficult problem with economic value. In doing so, they already leapfrog over any TOD team.
The team with enthusiasm starts to build prototypes and core layers. These work as designed and thus encouraged, the team go off into intense design and architecture discussions, coming up with elegant schemas that look beautiful and solid.
Management comes back and challenges the team with yet more difficult problems. We tend to equate cost with value, so the harder and more expensive to solve, the more the solution should be worth, in their minds.
The team, being engineers and thus loving to build stuff, build stuff. They build and build and build and end up with massive, perfectly-designed complexity.
The products go to market, and the market scratches its head and asks, “Seriously, is this the best you can do?” People do use the products, especially if they aren’t spending their own money in climbing the learning curve.
Management gets positive feedback from its larger customers, who share the same idea that high cost (in training and use) means high value, and so continues to push the process.
Meanwhile somewhere across the world, a small team is solving the same problem using a better process, and a year later smashes the market to little pieces.

COD is characterized by a team obsessively solving the wrong problems in a form of collective delusion. COD products tend to be large, ambitious, complex, and unpopular. Much open source software is the output of COD processes. It is insanely hard for engineers to stop extending a design to cover more potential problems. They argue, “What if someone wants to do X?” but never ask themselves, “What is the real value of solving X?”

A good example of COD in practice is Bluetooth, a complex, over-designed set of protocols that users hate. It continues to exist only because in a massively-patented industry there are no real alternatives. Bluetooth is perfectly secure, which is close to pointless for a proximity protocol. At the same time, it lacks a standard API for developers, meaning it’s really costly to use Bluetooth in applications.

On the #zeromq IRC channel, Wintre once wrote of how enraged he was many years ago when he “found that XMMS 2 had a working plugin system, but could not actually play music.”

COD is a form of large-scale “rabbit-holing”, in which designers and engineers cannot distance themselves from the technical details of their work. They add more and more features, utterly misreading the economics of their work.

The main lessons of COD are also simple, but hard for experts to swallow. They are:

Making stuff that you don’t immediately have a need for is pointless. Doesn’t matter how talented or brilliant you are, if you just sit down and make stuff people are not actually asking for, you are most likely wasting your time.
Problems are not equal. Some are simple, and some are complex. Ironically, solving the simpler problems often has more value to more people than solving the really hard ones. So if you allow engineers to just work on random things, they’ll mostly focus on the most interesting but least worthwhile things.
Engineers and designers love to make stuff and decoration, and this inevitably leads to complexity. It is crucial to have a “stop mechanism”, a way to set short, hard deadlines that force people to make smaller, simpler answers to just the most crucial problems.

Simplicity Oriented Design #

Finally, we come to the rare but precious Simplicity Oriented Design, or SOD. This process starts with a realization: we do not know what we have to make until after we start making it. Coming up with ideas or large-scale designs isn’t just wasteful, it’s a direct hindrance to designing the truly accurate solutions. The really juicy problems are hidden like far valleys, and any activity except active scouting creates a fog that hides those distant valleys. You need to keep mobile, pack light, and move fast.

SOD works as follows:

We collect a set of interesting problems (by looking at how people use technology or other products) and we line these up from simple to complex, looking for and identifying patterns of use.
We take the simplest, most dramatic problem and we solve this with a minimal plausible solution, or “patch”. Each patch solves exactly a genuine and agreed-upon problem in a brutally minimal fashion.
We apply one measure of quality to patches, namely “Can this be done any simpler while still solving the stated problem?” We can measure complexity in terms of concepts and models that the user has to learn or guess in order to use the patch. The fewer, the better. A perfect patch solves a problem with zero learning required by the user.
Our product development consists of a patch that solves the problem “we need a proof of concept” and then evolves in an unbroken line to a mature series of products, through hundreds or thousands of patches piled on top of each other.
We do not do anything that is not a patch. We enforce this rule with formal processes that demand that every activity or task is tied to a genuine and agreed-upon problem, explicitly enunciated and documented.
We build our projects into a supply chain where each project can provide problems to its “suppliers” and receive patches in return. The supply chain creates the “stop mechanism” because when people are impatiently waiting for an answer, we necessarily cut our work short.
Individuals are free to work on any projects, and provide patches at any place they feel it’s worthwhile. No individuals “own” any project, except to enforce the formal processes. A single project can have many variations, each a collection of different, competing patches.
Projects export formal and documented interfaces so that upstream (client) projects are unaware of change happening in supplier projects. Thus multiple supplier projects can compete for client projects, in effect creating a free and competitive market.
We tie our supply chain to real users and external clients and we drive the whole process by rapid cycles so that a problem received from outside users can be analyzed, evaluated, and solved with a patch in a few hours.
At every moment from the very first patch, our product is shippable. This is essential, because a large proportion of patches will be wrong (10-30%) and only by giving the product to users can we know which patches have become problems that need solving.

SOD is a hill-climbing algorithm, a reliable way of finding optimal solutions to the most significant problems in an unknown landscape. You don’t need to be a genius to use SOD successfully, you just need to be able to see the difference between the fog of activity and the progress towards new real problems.

People have pointed out that hill-climbing algorithms have known limitations. One gets stuck on local peaks, mainly. But this is nonetheless how life itself works: collecting tiny incremental improvements over long periods of time. There is no intelligent designer. We reduce the risk of local peaks by spreading out widely across the landscape, but it is somewhat moot. The limitations aren’t optional, they are physical laws. The theory says, this is how innovation really works, so better embrace it and work with it than try to work on the basis of magical thinking.

And in fact once you see all innovation as more or less successful hill-climbing, you realize why some teams and companies and products get stuck in a never-never land of diminishing prospects. They simply don’t have the diversity and collective intelligence to find better hills to climb. When Nokia killed their open source projects, they cut their own throat.

A really good designer with a good team can use SOD to build world-class products, rapidly and accurately. To get the most out of SOD the designer has to use the product continuously, from day one, and develop his or her ability to smell out problems such as inconsistency, surprising behavior, and other forms of friction. We naturally overlook many annoyances, but a good designer picks these up and thinks about how to patch them. Design is about removing friction in the use of a product.

In an open source setting, we do this work in public. There’s no “let’s open the code” moment. Projects that do this are in my view missing the point of open source, which is to engage your users in your exploration, and to build community around the seed of the architecture.

Burnout #

The ZeroMQ community has been and still is heavily dependent on pro bono individual efforts. I’d like to think that everyone was compensated in some way for their contributions, and I believe that with ZeroMQ, contributing means gaining expertise in an extraordinarily valuable technology, which leads to improved professional options.

However, not all projects will be so lucky and if you work with or in open source, you should understand the risk of burnout that volunteers face. This applies to all pro bono communities. In this section, I’ll explain what causes burnout, how to recognize it, how to prevent it, and (if it happens) how to try to treat it. Disclaimer: I’m not a psychiatrist and this article is based on my own experiences of working in pro bono contexts for the last 20 years, including free software projects, and NGOs such as the FFII.

In a pro bono context, we’re expected to work without direct or obvious economic incentive. That is, we sacrifice family life, professional advancement, free time, and health in order to accomplish some goal we have decided to accomplish. In any project, we need some kind of reward to make it worth continuing each day. In most pro bono projects the rewards are very indirect, superficially not economical at all. Mostly, we do things because people say, “Hey, great!” Karma is a powerful motivator.

However, we are economic beings, and sooner or later, if a project costs us a great deal and does not bring economic rewards of some kind (money, fame, a new job), we start to suffer. At a certain stage, it seems our subconscious simply gets disgusted and says, “Enough is enough!” and refuses to go any further. If we try to force ourselves, we can literally get sick.

This is what I call “burnout”, though the term is also used for other kinds of exhaustion. Too much investment on a project with too little economic reward, for too long. We are great at manipulating ourselves and others, and this is often part of the process that leads to burnout. We tell ourselves that it’s for a good cause and that the other guy is doing OK, so we should be able to as well.

When I got burned out on open source projects like Xitami, I remember clearly how I felt. I simply stopped working on it, refused to answer any more emails, and told people to forget about it. You can tell when someone’s burned out. They go offline, and everyone starts saying, “He’s acting strange… depressed, or tired…”

Diagnosis is simple. Has someone worked a lot on a project that was not paying back in any way? Did she make exceptional sacrifices? Did he lose or abandon his job or studies to do the project? If you’re answering “yes”, it’s burnout.

There are three simple techniques I’ve developed over the years to reduce the risk of burnout in the teams I work with:

No one is irreplaceable. Working solo on a critical or popular project–the concentration of responsibility on one person who cannot set their own limits–is probably the main factor. It’s a management truism: if someone in your organization is irreplaceable, get rid of him or her.
We need day jobs to pay the bills. This can be hard, but seems necessary. Getting money from somewhere else makes it much easier to sustain a sacrificial project.
Teach people about burnout. This should be a basic course in colleges and universities, as pro bono work becomes a more common way for young people to experiment professionally.

When someone is working alone on a critical project, you know they are going blow their fuses sooner or later. It’s actually fairly predictable: something like 18-36 months depending on the individual and how much economic stress they face in their private lives. I’ve not seen anyone burn-out after half a year, nor last five years in a unrewarding project.

There is a simple cure for burnout that works in at least some cases: get paid decently for your work. However, this pretty much destroys the freedom of movement (across that infinite problem landscape) that the volunteer enjoys.

Patterns for Success #

I’ll end this code-free chapter with a series of patterns for success in software engineering. They aim to capture the essence of what divides glorious success from tragic failure. They were described as “religious maniacal dogma” by a manager, and “anything else would be effing insane” by a colleague, in a single day. For me, they are science. But treat the Lazy Perfectionist and others as tools to use, sharpen, and throw away if something better comes along.

The Lazy Perfectionist #

Never design anything that’s not a precise minimal answer to a problem we can identify and have to solve.

The Lazy Perfectionist spends his idle time observing others and identifying problems that are worth solving. He looks for agreement on those problems, always asking, “What is the real problem”. Then he moves, precisely and minimally, to build, or get others to build, a usable answer to one problem. He uses, or gets others to use those solutions. And he repeats this until there are no problems left to solve, or time or money runs out.

The Benevolent Tyrant #

The control of a large force is the same principle as the control of a few men: it is merely a question of dividing up their numbers. – Sun Tzu

The Benevolent Tyrant divides large problems into smaller ones and throws them at groups to focus on. She brokers contracts between these groups, in the form of APIs and the “unprotocols” we’ll read about in the next chapter. The Benevolent Tyrant constructs a supply chain that starts with problems, and results in usable solutions. She is ruthless about how the supply chain works, but does not tell people what to work on, nor how to do their work.

The Earth and Sky #

The ideal team consists of two sides: one writing code, and one providing feedback.

The Earth and Sky work together as a whole, in close proximity, but they communicate formally through issue tracking. Sky seeks out problems from others and from their own use of the product and feeds these to Earth. Earth rapidly answers with testable solutions. Earth and Sky can work through dozens of issues in a day. Sky talks to other users, and Earth talks to other developers. Earth and Sky may be two people, or two small groups.

The Open Door #

The accuracy of knowledge comes from diversity.

The Open Door accepts contributions from almost anyone. She does not argue quality or direction, instead allowing others to argue that and get more engaged. She calculates that even a troll will bring more diverse opinion to the group. She lets the group form its opinion about what goes into stable code, and she enforces this opinion with help of a Benevolent Tyrant.

The Laughing Clown #

Perfection precludes participation.

The Laughing Clown, often acting as the Happy Failure, makes no claim to high competence. Instead his antics and bumbling attempts provoke others into rescuing him from his own tragedy. Somehow however, he always identifies the right problems to solve. People are so busy proving him wrong they don’t realize they’re doing valuable work.

The Mindful General #

Make no plans. Set goals, develop strategies and tactics.

The Mindful General operates in unknown territory, solving problems that are hidden until they are nearby. Thus she makes no plans, but seeks opportunities, then exploits them rapidly and accurately. She develops tactics and strategies in the field, and teaches these to her soldiers so they can move independently, and together.

If you know the enemy and know yourself, you need not fear the result of a hundred battles. – Sun Tzu

The Social Engineer reads the hearts and minds of those he works with and for. He asks, of everyone, “What makes this person angry, insecure, argumentative, calm, happy?” He studies their moods and dispositions. With this knowledge he can encourage those who are useful, and discourage those who are not. The Social Engineer never acts on his own emotions.

The Constant Gardener #

He will win whose army is animated by the same spirit throughout all its ranks. – Sun Tzu

The Constant Gardener grows a process from a small seed, step-by-step as more people come into the project. She makes every change for a precise reason, with agreement from everyone. She never imposes a process from above but lets others come to consensus, and then he enforces that consensus. In this way, everyone owns the process together and by owning it, they are attached to it.

The Rolling Stone #

After crossing a river, you should get far away from it. – Sun Tzu

The Rolling Stone accepts his own mortality and transience. He has no attachment to his past work. He accepts that all that we make is destined for the trash can, it is just a matter of time. With precise, minimal investments, he can move rapidly away from the past and stay focused on the present and near future. Above all, he has no ego and no pride to be hurt by the actions of others.

The Pirate Gang #

Code, like all knowledge, works best as collective–not private–property.

The Pirate Gang organizes freely around problems. It accepts authority insofar as authority provides goals and resources. The Pirate Gang owns and shares all it makes: every work is fully remixable by others in the Pirate Gang. The gang moves rapidly as new problems emerge, and is quick to abandon old solutions if those stop being relevant. No persons or groups can monopolize any part of the supply chain.

The Flash Mob #

Water shapes its course according to the nature of the ground over which it flows. – Sun Tzu

The Flash Mob comes together in space and time as needed, then disperses as soon as they can. Physical closeness is essential for high-bandwidth communications. But over time it creates technical ghettos, where Earth gets separated from Sky. The Flash Mob tends to collect a lot of frequent flier miles.

The Canary Watcher #

Pain is not, generally, a Good Sign.

The Canary Watcher measures the quality of an organization by their own pain level, and the observed pain levels of those with whom he works. He brings new participants into existing organizations so they can express the raw pain of the innocent. He may use alcohol to get others to verbalize their pain points. He asks others, and himself, “Are you happy in this process, and if not, why not?” When an organization causes pain in himself or others, he treats that as a problem to be fixed. People should feel joy in their work.

The Hangman #

Never interrupt others when they are making mistakes.

The Hangman knows that we learn only by making mistakes, and she gives others copious rope with which to learn. She only pulls the rope gently, when it’s time. A little tug to remind the other of their precarious position. Allowing others to learn by failure gives the good reason to stay, and the bad excuse to leave. The Hangman is endlessly patient, because there is no shortcut to the learning process.

The Historian #

Keeping the public record may be tedious, but it’s the only way to prevent collusion.

The Historian forces discussion into the public view, to prevent collusion to own areas of work. The Pirate Gang depends on full and equal communications that do not depend on momentary presence. No one really reads the archives, but the simply possibility stops most abuses. The Historian encourages the right tool for the job: email for transient discussions, IRC for chatter, wikis for knowledge, issue tracking for recording opportunities.

The Provocateur #

When a man knows he is to be hanged in a fortnight, it concentrates his mind wonderfully. – Samuel Johnson

The Provocateur creates deadlines, enemies, and the occasional impossibility. Teams work best when they don’t have time for the crap. Deadlines bring people together and focus the collective mind. An external enemy can move a passive team into action. The Provocateur never takes the deadline too seriously. The product is always ready to ship. But she gently reminds the team of the stakes: fail, and we all look for other jobs.

The Mystic #

When people argue or complain, just write them a Sun Tzu quotation – Mikko Koppanen

The Mystic never argues directly. He knows that to argue with an emotional person only creates more emotion. Instead he side-steps the discussion. It’s hard to be angry at a Chinese general, especially when he has been dead for 2,400 years. The Mystic plays Hangman when people insist on the right to get it wrong.