As I transitioned from school to academia to industry, there is one idea that has occurred repeatedly. This around the verification of a solution to the problem. In this article I explore this idea.
When I was growing up, I happened to get hold of a book called The Trachtenberg Speed System of Basic Mathematics. The book was written by one Jakow Tractenberg who wrote this book while incarcerated in a Nazi concentration camp. It contains tricks to perform speedy computations for special cases like multiplying two numbers which are close to a power of ten, like 98x97, or multiplying a number by 11 etc. Later on I also got to study a book on Vedic Mathematics which was also similar in spirit.
While the book taught several tricks there was a peripheral lesson that proved to be more useful for me. At one point, the author said that "a problem is not really finished until we have proved that we have the right answer". The author's point was that once we solve a problem we should double check via some tricks that we have not committed a calculation mistake.
The neat trick that the author described was double checking multiplication by divisibility by 9 test. It hinges on two properties of numbers:
I fruitfully employed this method to double check my computations in my academic career.
Then during my undergraduate studies in Computer Science, same idea sprung up again in various contexts.
Given a large number, e.g. 1752315833709922122701709407, it is difficult to determine if it is a prime number or a composite number. But here is a proof that it is composite: 1752315833709922122701709407 = 13472900573921 * 130062255272767. The proof is easily verifiable since multiplication is easy. The fact that factorizing a product of two large prime numbers is a hard problem but multiplying two large prime numbers is easy, is the basis of public key crypotgraphy.
Similarly, while devising an efficient algorithm to sort a list of numbers is moderately difficult, checking if a given list of numbers is sorted is straightforward.
As I advanced in my studies, the ideas exploring the relationship between solving a problem and checking if a given solution to a problem is correct kept recurring in many ways.
For instance, one way to interpret complexity class NP is that it is the set of those yes/no problems for which if the answer is "yes", then there exists a "short" (i.e. polynomially verifiable) proof that the answer is indeed yes. For instance, consider the problem "Does a given graph G has a hamiltonian cycle?". This problem is in NP, because if G has a hamiltonian cycle, you can prove that G has a hamiltonian cycle by simply providing the hamiltonian cycle. Whether the purported hamiltonian cycle indeed cycles over all nodes of graph G is verifiable in polynomial time.
The idea of "proof" of the correctness of a solution recurs in complexity theory. For instance, there is a complexity class IP which is the set of problem which are solvable by an "Interactive Proof System". And there is a surprising result that IP = PSPACE which connects a complexity class defined by interactive proof systems(IP) with a the class of problems which are solvable in polynomial space (PSPACE)
Finally, there was this PCP (probabilistically checkable proofs) theorem which states that every NP problem has a "probablistically checkable proof". I never understood this result, but I could see that theoretical computer scientists were very delighted by this.
As I moved away from academia to industry, and started developing production systems, the theme of quickly checkable proofs recurred in a very different way.
In any technology company which is beyond a minimum scale, there are upwords of dozens or hundres of components (hundreds of thousands, if you are Google) and more components. There are microservices, ETL jobs, cron jobs etc. The developers should be aware whether or not their components are running well. Over time, the way I have developed to monitor the disparate systems is to have each system output some essential metrics and then monitor those metrics on a daily basis. Here are a few examples:
I distinctly remember how I started this practice. In 2012 I inherited a real time bidding server. There was once a problem in some of the bidding servers, which went unnoticed for several days, only to be caught when business people noticed something was amiss. It was an embarassing error for me, though the buisness impact was small. Never again, I decided and then I systematically build various dashboards which updated themselves every 15 minutes. Over the years, these dashboard helped us get a good visibility into the functioning of systems.
A few more notes related to this approach are pertinent.Sometimes, when suggested with the idea of creating dashboard, there is an idea given of setting up alarms. Alarms indeed have their place to get notified quickly when soemthing goes wrong drastically. However, creating a dashboard and observing it periodically helps you understand the system. It helps you develop understanding around questions like
In the dashboards that we build, we don't try to be exhaustive. We just try to ensure that the dashboard provides us a way to confirm that the service is probably working. As we observe the dashboard on the daily basis, we feel the need to incorporate more metrics and so we do that step by step.
So, this theme of quickly checkable proofs, with the proof being not perfect but good in practice, happens all the way from basic arithemtic to production systems. It is perhaps more elementary than that: to check if the rice is cooked, you just check a few grains of rice! Happy system building!