Skip to content

The Fiction of the Root Cause: Why Post-Mortems Are Often Rituals

The Fiction of the Root Cause: Why Post-Mortems Are Often Rituals

When failure is a hurricane, we insist on blaming a single drop of rain.

The Corporate Seance

The fluorescent light above the conference table is humming at a frequency that feels like it’s trying to drill into my premotor cortex, and I’m currently oscillating between professional dread and personal humiliation. I just sent a text to my lead developer-meant for my therapist-expressing my deep, existential resentment toward ‘process for the sake of process.’ He hasn’t replied yet, but the 37 seconds of silence have already aged me 7 years. I am sitting in a room with 7 other people, and we are here to perform the corporate equivalent of a seance: the Root Cause Analysis (RCA). We are looking for the ‘ghost’ that killed the production server last Friday at 4:37 PM, but I already know what the outcome will be. We aren’t here to find a truth; we are here to find a sacrifice.

The Anatomy of Simplification

Root: Dave missed step 97 on a 137-item checklist.

Systemic Truth: Culture forced deployment despite low capacity & VP mandate.

At the end of the chain of causality, there is a box containing a name. That name is Dave. The narrative being woven is that Dave, a senior DevOps engineer with 17 years of experience, missed a single step in a 137-item deployment checklist. Therefore, Dave is the root cause. We have a pathological need for simple narratives. Our brains are wired to find a ‘who’ rather than a ‘why.’ If we admit that the failure was a result of complex, non-linear interactions between 77 different variables-some technical, some social, some purely accidental-then we have to admit that we aren’t in control. And in a corporate environment, admitting lack of control is the ultimate sin.

The Ligature and the White Space

“If the ‘f’ and the ‘i’ in a specific font look cramped, it’s usually because the 147 characters surrounding them have established a rhythm that the ‘f’ and ‘i’ can’t sustain. You don’t fix the ligature by just shaving off a pixel; you have to look at the entire weight of the typeface.”

– Zephyr R.-M. (Typeface Designer)

Zephyr spends 7 hours a day looking at the gaps between things. He understands that the ‘white space’-the things that aren’t there-is often more important than the things that are. Corporate RCA ignores the white space. It ignores the missing documentation, the unspoken pressures, the fatigue that doesn’t show up on a Jira ticket, and the subtle erosion of safety margins that happens over 770 days of ‘doing more with less.’

The Cost of Agreeing to the Lie

I’m looking at the Director of Engineering now. He’s leaning forward, his eyes narrowed. He’s doing the ‘accountability’ face. ‘So, the root cause was the manual error on step 97?’ he asks, knowing the answer will simplify his report to the board. Everyone nods. It’s a collective hallucination. We are all agreeing to ignore the 37 other contributing factors because addressing them would require a fundamental restructuring of how we work. It would require admitting that our system is fragile by design. If we blame Dave, we only have to change Dave. If we blame the system, we have to change ourselves.

Systemic Accountability vs. Blame Capital

Blame

Political Capital, Free Cost

VS

System Change

Intrinsic Quality, High Cost

Building Resilience, Not Rituals

This is why I find the approach of certain organizations so refreshing. Instead of waiting for the house to burn down and then blaming the person who left the toaster plugged in, they focus on the intrinsic quality of the materials from the very beginning. Companies like

Benzo labs understand that you can’t inspect quality into a product at the end of the line, and you certainly can’t ‘blame’ quality into a system after a failure.

Systemic Quality Vectors (Simulated Data)

Inspection Phase

42% Coverage

Intrinsic Guarantee

87% Built-in

The Five Whys We Won’t Write Down

If we were honest, the ‘Five Whys’ for last Friday’s crash would look something like this. But we won’t write that down. It’s too messy. It points the finger at the people in the room, not the person who isn’t.

  1. Why 1: Database connection pool exhausted.

  2. Why 2: New microservice made 177 redundant calls/sec.

  3. Why 3: Developer rushed to meet arbitrary Friday 7 PM deadline.

  4. Why 4: PM felt pressure to justify 27% spend increase to stakeholders.

  5. Why 5: Our culture values the appearance of speed over the reality of stability.

The Human Algorithm

Dave is currently at his desk, probably feeling a mix of shame and confusion, unaware that his name is being written into the ledger of corporate sins. I feel for him. Not just because I’ve been Dave, but because I’m currently ‘The Person Who Sent An Insulting Text To His Lead Developer.’ I am currently a ‘root cause’ in someone else’s narrative.

We treat software and organizations like machines that can be perfected, but they are more like gardens that need to be tended. If the tomatoes aren’t growing, you don’t put the dirt on a performance improvement plan. You check the pH, the sunlight, the water, and the surrounding flora. You look at the system.

Blame is free. In fact, blame is more than free; it’s a form of political capital. By identifying a root cause, the Director can signal to his superiors that he is ‘on top of things.’

The Cycle Continues (and the Coffee Purchase)

138

New Item on Checklist

As the meeting winds down, the Director asks if there are any final questions. I want to say something about the 77% increase in technical debt we’ve accumulated this year. I want to mention that Zephyr’s typeface design philosophy could save us more money than any RCA ever will. Instead, I just nod. I watch as the ‘Five Whys’ diagram is saved to a PDF and emailed to the executive suite…

We walk out of the room, and I catch Dave in the hallway. I tell him it’s not his fault. He smiles, a small, 7-watt flicker of relief. We both know that tomorrow, we’ll be back in the same system, waiting for the next ‘root cause’ to reveal itself. My lead developer sent me a text back: ‘Don’t worry. I’ve sent worse to my mom by accident. Also, the checklist is now 147 items. See you at the 4 PM stand-up.’

[The root cause is a ghost we conjure to avoid looking in the mirror]

We are all just parts of a machine that is designed to fail and then find a way to pretend it didn’t. Maybe the real root cause is just our refusal to be human in a space that demands we be algorithms. Or maybe, it’s just that we’ve forgotten how to look at the white space between the letters. Either way, I’m going to go buy Dave a coffee. It won’t fix the server, but it might fix the afternoon.

Tags: