Economic concepts applied in software development, reliability engineering, and technical leadership

Date: June 22, 2017 Last modified: June 22, 2017

One of my favorite subjects as a layperson is economics. I have found many useful applications of methods and concepts to my work in software development, technical leadership, and reliability engineering roles.

This is a post to share some of these.

A few ideas from economics: an overview

Below is an overview of ideas from economics that have resonated with me at work:

Opportunity cost

Looking at what you lose by choosing a specific activity over others.

Causal inference with natural experiments

This uses results from a natural experiment to answer questions that have enormous potential for distributed systems.

Comparative advantage

This is not an intuitive idea and typically used in international trade policy to determine which trade partner would be better to produce a good even when they aren't the best at it globally. This has implications for technical leadership concerning the best person to assign work to given other constraints.

Thinking on the margins

This inspects when the marginal costs outweigh the marginal benefit. This has implications for me in my work when scoping, and specifying infrastructure and system architectures.

Lump of labor fallacy

This is a common misconception in society at large where people think there is a fixed amount of work so automation or immigration yields job losses for manual native workers. There are deeper insights here too. This idea has implications in operations, "DevOps" culture, etc.

Applying the ideas

Opportunity Cost

Opportunity cost is the idea that to engage in some activity we are making those tasked unavailable for other activities for the duration it takes to get the work done.

Paired with the idea of core competencies, which is our distilled understanding of what makes our company, product, and/or team special, we can strategically pick which parts of our application or platform our in-house product team builds versus buys.

Causal inference with natural experiments

To determine causality - engineers of all stripes - design experiments, collect observations and analyze those results derived from the observations. Today most designed experiments are randomized.

However, when it is too expensive to design a randomized experiment or it's difficult to replicate the conditions of the real world, another category of experiment can be used effectively. These are known as natural experiments where recorded observations from the real world environment can be analyzed in a meaningful way.

Consider in the 90s, when New Jersey increased their minimum wage from $4.25 to $5.05 per hour. Adjacent Pennsylvania retained its minimum wage at $4.25 (the price before the increase in New Jersey) so economists surveyed a large number of fast-food restaurants across both states to try to detect whether raising the minimum wage would negatively impact employment growth this in industry. The paper can explain its conclusions better than I can, but the relevant part for our discussion is that state legislation changed outside of the economists' control; the economists didn't set up the conditions of either New Jersey or Pennsylvania so they could run the experiment, they merely exploited naturally occurring conditions to extract observations that are meaningful to an interesting question of causality, e.g. did raising the minimum wage lower employment growth? Read the paper to find out. :)

As a distributed systems engineer who has had to be on-call, I've found myself wondering how a change in one part of the system will impact the behavior of the rest. This is much like an economist studying factors that could impact employment growth. When production is being stressed in a specific way, I find ways to experiment to provide answers to causality questions. These periods are not as long-lived as most economic natural experiments, so your infrastructure must allow for dynamic experimentation. We need to be able to facilitate experimentation through a variety of tools, such as:

  • queryable structured logging

  • good coverage metrics instrumentation

  • visualization

  • distributed tracing

  • traffic mirroring

However, the crucial element is the mindset of conducting natural experiments and knowing when it is fine to do this in a production setting so we can learn more about the real system because often simulated performance or stress tests are not adequate to depict the system as it often is stressed in production.

A highly related term in reliability engineering is what Netflix calls chaos engineering.

Comparative advantage

Comparative advantage has to do with choosing trade partners with a lower relative opportunity cost than other possible trade partners to produce a good.

In technical leadership and engineering management, we can use this idea when trying to determine the best person to work on a project or do a task at a specific time.

Let's say that it is the end of July when many developers are on vacation. We have a need to upgrade an application dependency before August 15th so that other projects can start which depend on that being shipped. In July, we have three developers available but we want to be sure we are finished before the middle of August. Out of the three developers available, one developer has had some experience with upgrading application dependencies, however, this upgrade is a more complex upgrade than they have handled before. Neither of the other two developers has worked on this before. A more experienced developer with direct experience with this scope upgrade complexity is out on vacation until July 11, leaving only two workdays before July 15th to work on it. Given the constraints, we should prefer to assign this work to the mid-level developer because even if the more experienced developer was capable of upgrading within two days, the opportunity costs of not meeting our deadline exceed the extra time it would take the mid-level developer to do the upgrade.

Thinking on the margins

Sometimes a speedup, cost savings, or optimization of another kind can cost more than its benefits offer. This is where the thought process of thinking on the margins helps.

In infrastructure engineering, there are many scenarios where we might be able to take advantage of new technology such as a new cloud provider resource type, yet because it doesn't yet support caching or compression (or whatever) we would then have to implement that in our application code such that the overall speedup would not give us the net benefit we sought. In fact, it might yield net negative benefit when evaluating its use for our specific needs, even if the marketing blurb by the cloud provider reads very convincingly to the CTO or VP of Engineering.

By thinking on the margin we avoid making the mistake of investing the time and effort to migrate to this new resource type when the net benefits do not exist.

Lump of labor fallacy

At the advent of DevOps, in the 2000s when the term was coined, automation was one of the key ideas. In the world of economics, the general notion of automation is associated with a well-known misconception known as the 'lump of labor fallacy'. The premise of 'lump of labor' states that there is a fixed amount of work available in the economy at large such that if you were to automate a task you would permanently eliminate jobs and no new jobs would be created as a byproduct of the automation.

In 1891 economist Schloss rebuked the idea that lowering the number of hours of labor in a workday would see reductions in unemployment and termed this the 'lump of labor fallacy.'

Today many economists use the 'lump of labor fallacy' to rebuke the idea that increasing the productivity of existing labor, growing the labor pool (e.g. via immigration), or increasing automation will increase unemployment.

The original idea of 'lump of labor' (not the fallacy) suggests the economy is a zero-sum game, which most economists do not consider to be true today.

In some countries business leaders and politicians created programs to offer older workers early retirement with the idea that it would make "room" for younger workers who had higher unemployment rates. In reality this was counter productive because it forced a smaller number of workers to bear the brunt of a pension system that had greater load and it did not reduce the unemployment levels in younger generations of workers as much hoped plus took productive and experienced workers out of the system at the same time.

In a Software-as-a-Service engineering organization, you might be able to make changes to your application codebase such that your developers can spend less development time on changes, in theory reducing the time-to-delivery of a new feature. As developers push more changes per day to production, the underlying infrastructure may demonstrate more instability in production which requires more manual troubleshooting and correction from a different team which restricts the number of available time slots for application developers to push their changes. This is similar to the early retirement example where the load has merely be shifted within the overall system which puts more strain on a part that cannot easily absorb the shock.

Also keep in mind as you are able to increase the number of changes pushed to production per day, someone in your engineering organization will need to be thinking about how to decrease the latency of the deploys without increasing the risks too much which is more automation, i.e. automating yourself out of a job in software development can only happen when your organization is very badly managed or the viability of the business is in question.

It is worth keeping these dynamics in mind when you are sequencing your strategic goals such that you ensure load shifts to load-bearing walls.

Concluding thoughts

Transferring and applying concepts from other disciplines (including but not limited to economics) has helped me rethink my approaches in software development, delivery, system design, and management. Hopefully, this provided enough motivation to read the background on the ideas you found most interesting or inspired you to borrow from other disciplines you are familiar with.

You could look at safety and reliability engineering disciplines for SRE applicable ideas from aviation and construction fields as well.

If you might be interested in transitioning your engineering organization to use better methods and process given your existing team through my customized team training and interim transformational leadership services then I would be happy to hear from you via email. :)