Observability and Controllability are the terms that are frequently used for system validation. To validate a design, you must zero down all the existing functional bugs. Now, how to remove a bug from your design? For this, you need to know the reason and the location of the bug (or fault). This means you should be able to see what is going on inside the system while it is running some applications. So that you can locate the source of the fault whenever you find an error on any of the outputs of the system. In other words, debugging demands better observability of the system’s internal states. Often, a debug engineer wants to control a few of the system’s internal states for a better understanding of the fault and for quick localization. However, during the post-silicon debug, the observability and controllability of system internal states are too much restricted as we have only access to the input and output pins of the chip.
Let’s understand this analogy using the scenario of a doctor diagnosing a patient. This we can compare with the scenario where a debug engineer diagnoses a fault in a design. Let’s say the patient is having fever and visits a doctor. The doctor will initially check the body temperature and blood pressure. This is something equivalent to the debug engineer checking the values at the output pins. If the body temperature is very high and the blood pressure is very low (for example), the doctor will suggest the patient a few blood tests to diagnose the actual cause of the illness. Similarly, if the debug engineer finds wrong values at the output pins, he will try to check the internal signal values relevant to the fault during the debug phase to figure out where exactly the fault occurred on the signal. The blood test reports provide the observability of the patient’s internal conditions to the doctor. Similarly, the debug process generates run-time traces that provide the observability of the system’s internal states to the debug engineer. Many times doctors advise to perform some controlled tests, for example, fasting sugar, to diagnose the illness correctly and quickly. Similarly, sometimes debug engineers try to check the values of some internal signals, but with some controlled inputs provided to some other internal signals of the system to better understand the fault. This we can understand from a very simple example, as illustrated below.
In the above example circuit, A, B, C, D are the primary input pins, and K is the primary output pin. Let’s say the correct expected output is K=1. However, due to a fault in the design, the actual output is K=0. Now to locate the fault, we need to back trace and reach the point where exactly the fault exists. Back trace means we need to observe the value of internal signals, e, f, g, h, i, j etc. However, it is not always possible to trace all the internal signals due to several design constraints, which we would discuss in a separate article.
To understand from where exactly the fault is coming, we need to have a certain level of controllability on the internal of the circuit. In this case, K=0 means the last AND gate is getting a permanent 0 from either i or j. Let’s assume, we do not have observability to i, and j. If we can somehow find which one out of signals i, and j that is stuck to 0 (or getting a continuous 0 value from its upstream circuitry); the fault search space would be drastically reduced to half (effectively reduced debug time). That means we need to back trace either the top half of the circuit or the bottom half based on the i value 0 or j value 0. For this, if we need to have controllability on i and j. Then we can force the value of i or j to 1 to figure out whether the other signal is 0 or not. For example, j value being 0, then even after forcing 1 to i, the final AND gate will produce 0. However, if i value being zero, forcing 1 to i, the output K would be dependent on the value of j. So, such controllability can help us find the fault location quickly and correctly.