Architecture

Stop Blaming the Tech

Feb 12, 2025

Burning match with water splash over abstract background.

“The first principle is that you must not fool yourself—and you are the easiest person to fool.” —Richard Feynman

A while back, I sat in another migration post-mortem. Three sprints of careful work rolled back in less than an hour. The team had done everything “right”—picked the leading cloud platform, used infrastructure as code, and followed the migration playbook. Our architects had reviewed everything. Yet our carefully planned customer authentication service migration had failed spectacularly, with latency spikes and timeout cascades taking down critical user flows.

The technical discussion was thorough—connection pooling, caching strategies, network topology—but something felt off. In the room that day, no one mentioned the obvious: we had entirely missed the real-time requirements and mindlessly assumed our monitoring would catch any problems.

Having been both the engineer defending decisions and the architect reviewing failures, I’ve noticed something: technical initiatives fail often but rarely because of the technology itself. We keep asking, “Which tools should we use?” when we should be asking, “How are we thinking about this problem?”

A Framework for Better Decisions

After enough painful migrations, I started looking for a better way to evaluate technical decisions. The result was this decision matrix—not because I love frameworks, but because I needed a way to force discussions about assumptions we were making:

Factor

Weight (%)

Score (1–10)

Weighted

Tech Debt

20%

6

1.20

Scalability

25%

8

2.00

Team Capability

15%

7

1.05

Business Impact

25%

8

2.00

Operational Overhead

15%

4

0.60

Final Score

100%


6.85

I first used this when my team pushed hard for a microservices migration. On paper, we looked ready: strong engineers, a modern tech stack, and a clear business case. But the matrix discussion revealed our blind spot: while we scored our technical capabilities high (7.8/10), our operational readiness was much lower (4.7/10). We had great developers but limited experience with distributed systems monitoring and incident response.

This realization changed our approach entirely. Instead of diving into the migration, we spent the next two months building our observability infrastructure and incident response processes. The matrix didn’t make our decision, but it probably saved us from another post-mortem.

Where Technical and Organizational Problems Meet

The patterns I saw in our post-mortems began to show up when I looked at larger industry failures. Take Equifax’s 2017 breach—the story everyone knows is about a missed patch. But dig into their internal audits, and you’ll find something more interesting: their security teams had flagged the vulnerability. However, the organization’s priorities conflicted, and its escalation paths were unclear, so those warnings went nowhere. 

The Boeing 737 MAX failure offers another stark example. Early reporting focused on the MCAS system’s technical flaws. However, the deeper story revealed engineers who saw the problems coming yet couldn’t effectively raise their concerns through layers of program management focused on delivery dates. In both cases, the technical failures were, in fact, organizational failures in disguise.

These aren’t just cautionary tales—they’re patterns I started seeing in our work at different scales. Watching for these patterns has helped our team spot problems earlier:

Architecture Warning Signs

When you experience increasing API latencies or growing endpoint complexity, you often see Conway’s Law—your architecture reflects communication problems in your organization. I learned this while watching a “microservices transformation” turn into a distributed monolith because team boundaries didn’t match service boundaries.

Data Strategy Red Flags

The classic sign here is teams arguing about conflicting metrics in business reviews. It usually means you’ve got data silos and unclear ownership. I watched one project spiral into three months of data reconciliation because nobody had defined who owned the “source of truth” for customer status.

Platform Evolution Alerts

Watch your deployment times and incident response patterns. When they start trending up, look for gaps between your platform team’s assumptions and your developers’ real needs. I’ve seen platform teams build sophisticated CI/CD pipelines that teams couldn’t use because they didn’t match actual development workflows.

Making This Real

This isn’t just theory. Start with your next technical decision. Use the matrix, but focus on the conversations it sparks. When team members score things differently, that’s not a scoring error—it’s where your risks and assumptions are hiding.

Pick one area to watch—your architecture, data, or platform. Look for warning signs, but more importantly, look for what they reveal about how your organization works. Rising API latency might be a technical metric, but it often exposes where teams aren’t communicating well.

Remember: Your systems will do exactly what you tell them to do. The challenge isn’t picking better tools—it’s thinking more clearly about what we’re trying to build and why. The best technical solutions come from better questions, not just different answers.

Let’s talk about your platform challenge.

If your organization is navigating scale under regulatory complexity—or making the shift from reactive delivery to resilient platform engineering—I’d welcome the conversation.

3. Nashville Skyline
1. Nashville Skyline
3. Nashville Skyline
1. Nashville Skyline
3. Nashville Skyline
4. Nashville Skyline
2. Nashville Skyline
4. Nashville Skyline
2. Nashville Skyline

Let’s talk about your platform challenge.

If your organization is navigating scale under regulatory complexity—or making the shift from reactive delivery to resilient platform engineering—I’d welcome the conversation.

3. Nashville Skyline
3. Nashville Skyline
3. Nashville Skyline
3. Nashville Skyline
3. Nashville Skyline
4. Nashville Skyline
2. Nashville Skyline
4. Nashville Skyline
2. Nashville Skyline

Let’s talk about your platform challenge.

If your organization is navigating scale under regulatory complexity—or making the shift from reactive delivery to resilient platform engineering—I’d welcome the conversation.

3. Nashville Skyline
1. Nashville Skyline
3. Nashville Skyline
1. Nashville Skyline
1. Nashville Skyline
4. Nashville Skyline
2. Nashville Skyline
4. Nashville Skyline
2. Nashville Skyline