Susan Potter
software: Created / Updated

Software Optimization Mental Models for Software Engineers

[Software] Optimization is one of the most intellectually rewarding parts of software development, but it can also be one of the most convoluted and insidious. Many software engineers find themselves diving deep into optimization without fully understanding where to focus their efforts or developing a higher-level understanding. As a result, we often waste time on marginal gains while missing much bigger opportunities for improvements.

Many of us (myself included) have fallen into the trap of diving into optimization without proper preparation. It’s terribly tempting, isn’t it? But let me share with you some mental models that have transformed my approach to optimization except one approach which I described in a separate post on software tuning through property-based thinking .

Measure First, Optimize Second

It’s tempting to jump into optimization as soon as we notice that something isn’t running as quickly as we’d like. Yet optimization without measurement is a classic pitfall. Without measurement data, we’re just guessing which parts of our program are the most expensive.

The key insight is that measurement is the foundation of all good optimization. It’s rather like being a detective - we must gather evidence before making accusations about which parts of our code are the culprits of poor performance.

Two common measurement tools available to us are:

A counter-concern is ensuring we do not sacrifice the compositionality or modularity of our software in ways that make it much harder to maintain (unless it is throwaway code and we have made that decision consciously). We still want to keep our units of work measurable going forward and not create larger and more brittle monolithic computations.

By measuring first, and knowing what it is we are measuring and how that is useful to us ensures that every optimization we make has a tangible and positive impact.

Takeaway: Always profile or create a baseline benchmark before optimizing anything. Focus our efforts on the biggest bottlenecks, not where we think (guess) they might be. Also be aware that some benchmarks measure contrived scenarios and sample-based profilers can be misleading, so think deeply about what we are measuring and how to aligns with the real world environment of our running software.

Review Our Data Structures & Algorithms

The choice of data structures and algorithms often has a far greater impact on performance than any low-level optimization we might attempt.

Think about searching. If we’re using a linear search on a sorted array, we might spend ages trying to optimize the comparison operation. But by simply switching to a binary search, we’d achieve a dramatic improvement that no amount of low-level optimization could match! The difference between O(n) and O(log n) is profound for non-trivial trees, isn’t it?

Here are some delightful examples:

What’s particularly exciting is how these choices compound. Consider a program that frequently needs to look up items by key and also maintain them in sorted order. We might choose a balanced binary search tree (like a Red-Black tree) instead of maintaining both a hash table and a sorted array. One data structure instead of two - isn’t that elegant?

Takeaway: The best optimization often comes not from clever micro-optimizations, but from choosing the right algorithms and data structures for our specific needs. It’s rather like choosing the right tool for a job - a hammer might be a fantastic tool, but it’s not much use when you need to turn a screw!

Reduce Work By Precomputing and Caching (but not too much)

A central idea in optimization is reducing the amount of work the system needs to do. This can be achieved through techniques like precomputation and caching.

Precomputation
involves performing calculations ahead of time so that they don’t need to be repeated during runtime. The classic example is computing constants, or partial results, at compile time.
Caching
stores the results of expensive computations so that when the same inputs appear again, the system can reuse the result rather than recalculating it.

Both techniques minimize redundant work.

One related low-level strategy is strength-reduction, which replaces costly operations with cheaper ones. I include migrating computations to SIMD or GPUs in this genre which becomes more relevant as these capabilities made available to wider runtimes and operating environments. Today our applications consume and produce more data than ever before and we are building longer processing pipelines especially in ML use cases.

Another related strategy is reviewing the data structures and algorithms we use for our particular use cases such that caching and precomputation become more viable options in more scenarios.

Takeaway: Look for opportunities to precompute and cache results (when relevant) and replace costly operations with cheaper alternatives wherever possible (assuming we have measured the speed up and the added complexity is worth the benefit).

Parallelism and Span

Parallelism is a well-known approach to improving performance. Modern hardware is designed to take advantage of multiple cores, and parallel execution can lead to massive speedups—if applied properly.

The key concept here is span, which is the length of the longest path through our computation graph. The shorter our span, the more parallelism we can achieve on capable hardware because more tasks can be executed at the same time. Reducing span is crucial for unlocking the full potential of our system’s concurrency capabilities.

However, not every algorithm parallelizes well. Some are inherently serial and can only use speculative parallelism to achieve speedup. Speculative parallelism involves executing tasks in parallel, even when we’re unsure if all will be needed, and throwing away the results of unnecessary computations later.

This approach can work well for some algorithms but may introduce overhead in others. Understanding when and where to apply parallelism effectively is key to avoiding the trap of over-parallelizing or introducing unnecessary complexity.

Takeaway: Focus on reducing span to maximize parallelism. Be aware that some algorithms are serial by nature and require thoughtful approaches like speculative parallelism.

The Importance of Domain-Specific Optimizations

Compilers do a great job of optimizing general-purpose code, but they are inherently limited by their generality. This is where domain-specific knowledge can give us the edge. When we understand the problem we’re solving, we can often apply optimizations that a compiler can’t see.

For example, if we’re working in the domain of image processing, languages like Halide allow us to express optimizations specifically tailored to this type of work. Similarly, if we’re working with graph algorithms, a tool like GraphIt can help us optimize better than a general-purpose compiler ever could.

These tools allow us to leverage our domain expertise to write high-performance code that would be difficult, if not impossible, to optimize to the same level using standard techniques.

Takeaway: Use domain-specific knowledge and tools to optimize more effectively when possible. Sometimes a general-purpose compiler just won’t cut it.

General Case versus Pathological Cases

One common challenge in optimization is balancing the needs of the general case with the demands of pathological special cases .

For most software systems, optimizing for the majority of use cases leads to broad, system-wide improvements. However, there are often edge cases or rare scenarios that can have pathological runtimes—situations where performance degrades drastically due to specific inputs or conditions.

Addressing these pathological cases can sometimes lead to disproportionate complexity, introducing maintenance burdens or negatively impacting the general case. However, ignoring them can cause our system to fail in high-priority or high-stakes situations.

The trick is to balance the two. First, ensure that the general case is well-optimized—this is where most of our users will benefit. But don’t leave the special cases untouched. Find ways to isolate their impact and optimize them without over-complicating our general architecture.

Takeaway: Optimize the general case first, but don’t ignore special cases with pathological runtimes. Look for ways to isolate and improve them without adding unnecessary complexity.

Locality, Parallelism, Concurrency and Redundant Work

Optimization isn’t just about raw speed; it’s about making intelligent trade-offs. Three key concepts often pull in different directions:

The challenge lies in finding the right balance for our particular application. Improving one aspect—say, by increasing parallelism—may hurt another, such as locality. Intelligent optimization requires recognizing these trade-offs and managing them carefully.

Takeaway: Balance locality, parallelism, and redundant work. Each has its trade-offs, and optimizing one often comes at the expense of another.

Strategic Optimization Requires Clear Thinking

Effective software optimization is not about clever tricks or premature tweaks. It’s about developing a clear understanding of our program’s bottlenecks, making informed trade-offs, and using domain knowledge where general-purpose tools fall short.

The most important skill we can develop as an engineer is the ability to think critically about where to focus our optimization efforts. Measure first, reduce unnecessary work, leverage parallelism intelligently, and always keep trade-offs in mind.

By building strong mental models, we’ll be able to optimize more effectively and avoid the traps of complexity and over-optimization.

If you made it to the end and seek more on this topic, check out my post on optimizing software using properties .

If you enjoyed this content, please consider sharing this link with a friend, following my GitHub, Twitter/X or LinkedIn accounts, or subscribing to my RSS feed.