Conway's Law and Microservices

Keeping software architecture robust and aligned with business incentives is a tough problem faced by most software companies. As the size of an engineering team grows, its communication cost increases quadratically. This is a result of the growing number of communication links of people as described by Metcalfe’s Law - the number of unique possible links in a network is proportional to the square of the number of participants. Without a clear understanding of how the organization structure affects software development, a company is likely to experience difficulty scaling and iterating once they reach a certain size.

The presented issue does not become significant until launching features or fixing bugs requires a non-negligible amount of overhead. Assume an e-commerce clothing startup is planning to change the pricing of off-season products, but the engineering department is structured so that such an adjustment involves the UI team, the Database team, and the QA team. Once the project is complete, it just happens that they cannot launch the update because the Integration team is attempting to resolve an unrelated email notification issue that is currently blocking the release. Similar scenarios can occur regularly and lead to downstream consequences. In a world where startups succeed by optimizing iteration speed, especially for venture-backed startups, a company is at risk of losing market position if it tolerates expensive development lifecycle.

This leads me to believe that Conway’s Law is perhaps the most important law for modern software businesses.[1] The interrelation between system architecture and organization structure was formalized by Melvin Conway in his paper “How Do Committees Invent” back in 1967. The law states that any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure. This generalization on the sociological behavior of system designers has a profound implication: loosely coupled organizations are more likely to produce modular systems with autonomous components bounded by both business and technical context.[2] To empower engineers to develop and maintain software with high productivity, it almost seems inevitable for a company to move towards a paradigm where a team (defined broadly as well) is responsible for the entire lifecycle of a system component, ranging from development to testing to deployment.

It does not mean there is not place for horizontally integrated teams to exist, but tradeoffs are to be made. There is occasionally pressure to horizontalize the teams to ensure consistency across different parts of the system. In a layered architecture, as mentioned in the above example of owning UI and Database teams, you end up with low velocity but high efficiency in terms of code and functionality duplication. This is, by and large, beneficial for maintaining shared utility. The bottom line is a company needs to prioritize iteration speed in a way for loosely coupled teams to succeed, such as creating a hybrid environment to have vertical and horizontal teams coexist in their bounded contexts.

Decision makers at a software company should always keep Conway’s Law in mind. This is to say that a top priority of the leadership team is to create a collaborative structure and low-friction processes that allow its engineers to operate in alignment with business goals. Many of the most renowned tech companies are aware of this issue and invest heavily into refactoring system architecture and restructuring engineering organizations for long-term benefits. But among all standard practices, this investment is most commonly manifested as service-oriented architecture migration. It signals an org-wide shift towards building microservices expected and necessary. Leveraging Docker and Kubernetes to create a flexible and cloud-agnostic service-oriented infrastructure almost becomes an industry-standard practice. In a sense it is like cargo cult science, where some companies do not understand the benefits of microservices but do it anyway (aka. microservice cargo cult). However, knowing that Airbnb, Uber, Tinder, and more all made this transition and it seems to work quite well with their distributed workforce, it is tempting to insert microservice architecture as part of the equation for building fast-growth startups.

In fact, the so-called “microservice cargo cult” actually makes sense theoretically for building an ecosystem of distributed ownership and high scalability for a software company. Conway’s Law validates the reasoning of leveraging microservices to ensure minimal communication friction for rapid prototyping and experimentation. Certain communication limitations of cross-regional engineering teams are also lifted as a result of granting full-cycle service ownership. It is not hard to see why this architecture style becomes the go-to scaling strategy for many San Francisco/Silicon Valley’s hottest companies, but it is also important to understand its costs than keeping more coarse-grained systems.

Applying Conway’s Law and microservices appropriately can create a powerful mechanism for achieving fast and scalable success. Although the duo has their limitations, learning, experimenting, then understanding them is extremely valuable to any software businesses. They are worth serious attention.


Footnotes:

  1. There are a couple of laws that greatly influenced the advancement of the tech industry: Moore’s Law, Kryder’s Law, Linus’s Law, and so on. Each offers substantial value in their respective domains for evolving a system and making business decisions, but many relevant questions can be neglected or abstracted away with the abundance of cloud services, open-source frameworks, developer tools available to software businesses nowadays.
  2. Sam Newman discusses a few studies on the topics of organizational structure and software modularity in Chapter 10 of his book Building Microservices, supporting the claim that the organizational structure has significant impact on the system it produces with empirical evidence.


Thanks to Eugene Marinelli, Shahid Cohan, Rohan Varma, and Saad Syed for reviewing the draft.