Is software robustness at odds with performance?
Robustness and correctness are certainly desirable properties of a piece of software. So is performance. Certain mainstream approaches to robustness, such as dynamic checks, have negative performance effects, which may create the impression that robustness or correstness is fundamentally at odds with performance. But is it really the case?
Let me mention two reasons why I am skeptical about such commonly-believed claim. The first reason is based on the properties of one particular programming language ecosystem that is currently gaining popularity.
Type systems
One widely-known counter-example to the impression is the Rust programming language. Rust uses a powerful _type system – an invention of programming language theorists – to rule out a large class of runtime errors, thus improving robustness of software written in that language, while enabling clever compiler optimizations, thus improving runtime performance of such software.
The second reason is based on my own research.
Theorem proving
A few years ago, I found myself to be a heavy user of one piece of software - K-Framework. This software promised to ease the work of developers working on programming language tooling, by employing mathematical models of programming languages. Moreover, K could generate certificates for the tasks it performed, which could be independently checked to confirm the task was performed correctly. That work behind was peer-reviewed and published as a research paper at a major conference (CAV).
I liked the ideas and concepts behind K, but I was also aware of its shortcomings - both practical and theoretical. After some time, I decided to design and implement a different software tool based on the same ideas - Minuska. The main improvement of Minuska over K was supposed to be its rigorous mathematical foundations, and an end-to-end computer-verified formal proof of correctness of Minuska’s behavior. I also wanted to write a scientific paper about that effort.
My PhD advisor, who also became the co-author of the paper, was skeptical about practical aspects of the new tool: specifically, about its performance. What if it will be much slower than K? I conjectured that the new tool will be a bit faster, but I didn’t know until I finished the implementation (and all proofs) and benchmarked Minuska.
At first, I thought something was wrong with my setup. During initial prototypes, Minuska was able to solve a simple problem of 10x larger size than the corresponding problem given to K, and was able to do it 10x faster. Surely this was just because the problem was too easy?
So I conducted more experiments. The performance gain of Minuska with respect to K varied, but was consistently between 10x to 100x, depending on what exactly I measured on the side of K (whether I measured the task itself or the construction or checking of the certificate). Moreover, the numbers reported for Minuska represented the situation when the Minuska was being interpreted inside a theorem prover (Coq). When compiled to a native (unverified) code, the speedup was even greater.
Such speedup was possible because theorem proving enabled me to provide strong correctness guarantees than K, which made the whole business of generating and checking certificates (for each individual task separately) redundant. However, it is also important to mention that I didn’t design the tool with performance in mind. I used very simple (someone might even say: naive) algorithms, and aimed at reducing the trust base of the tool. The main tool that I used for that purpose was mathematical clarity of the tool’s specification.
Conclusion
The takeaway? Correctness and performance are not at odds with each other. Type systems and theorem proving, which belong to the general category of so-called “formal methods”, are useful tools for development of robust and performant systems. All that despite their relative low popularity among modern software practicioners.
Btw, the peer-reviewed paper about Minuska was published at SEFM'24.