Architecture

When Fraud Finds Your Platform

Jun 1, 2026

Dark silhouette of a suited figure with flames streaming from head against a bold red background

Three users in one morning. Transactions they hadn’t made. Money gone. By afternoon, a dozen. On a platform processing real financial transactions, this wasn’t a support queue problem but a structural one. By the time you name it fraud, the clock is already running.

The platform wasn’t built with this threat model in mind. When the transaction surface expanded, we attracted bad actors who understood it better than we did. Account takeover fraud—compromised credentials, VPN-masked access, funds moved to temporary accounts before anyone noticed—is a well-worn playbook. We never had to defend against it before. We had the logs and incident reports but lacked a system to evaluate signals fast enough to act before the damage landed.

So we made the call that’s never clean: stop shipping features and fix this. There’s always a roadmap, commitments, and a product org with quarterly priorities and stakeholders who aren’t watching the same fraud queue. But the math was simple once we said it out loud: every release shipped while the fraud vector was open made the problem bigger. Velocity was the accelerant. We paused.

Build vs. Buy Under Fire

Account takeover fraud was a different discipline—we had no in-house fraud detection expertise. We deployed hotfixes while the system still bled. Engineers pulled from roadmap work were manually investigating incidents, buying time while we found a real solution.

Building from scratch wasn’t the answer. A fraud model requires a calibration cycle measured in quarters and a large enough historical dataset to train on. You can’t label fraud you haven’t instrumented. We needed something production-ready in weeks.

We needed a service we could instrument fast, that covered the signal types relevant to account takeover, and that could operate inside a regulated environment on day one. The first vendor we evaluated demonstrated all three. Under other circumstances, we might have run a longer comparison.

What mattered about its design was that it separated signal from verdict. It evaluated risk across four types: IP address, device fingerprint, email, and phone number. Each returned a score plus a set of attributes—connection type, bot indicators, proxy flags. Their models handle the pattern recognition. Our job was to decide what those signals meant for our specific users.

A fraud score is an input, not a decision. A datacenter IP with an elevated score might be a legitimate enterprise user on a corporate VPN. A residential IP with a proxy flag might be a privacy-conscious user who’s never committed fraud. On a financial platform, a false positive is a trust event—sometimes more damaging than the fraud itself.

Building the Judgment Layer

We didn’t wire scores to decisions. We built a rule evaluation layer on top—one handler per signal type, each consuming the service’s response, applying a configurable threshold, and returning a binary determination.

The thresholds weren’t hardcoded. They were pulled from configuration at runtime, allowing us to adjust what “fraud” meant for our platform without deploying new code. We knew the thresholds would need tuning as real traffic patterns emerged and wanted to avoid a deployment cycle for calibration changes. The device fingerprint check showed how the layered system operated in practice.

First gate: if the device fingerprint in the request didn't match the one on record, we triggered step-up verification. On mobile, reinstalls and upgrades produce legitimate mismatches too frequently to justify a hard block.

Second gate: if fraud probability exceeded the configured threshold, we blocked—email and phone signals provided sufficient corroboration.

Third gate: if the score was elevated but below the threshold, we evaluated connection type alongside bot status. A datacenter connection with an active bot flag read differently than a residential connection using a proxy.

Same score, different context, different outcome.

The Signal We Kept Dark

We deferred IP address evaluation entirely in the first release.

The account-takeover pattern made IP the most tempting signal and the most dangerous to miscalibrate. Bad actors masked behind VPNs, but so did legitimate users. Blocking on IP characteristics alone would have caught both. Everyone who looked at the false-positive rate reached the same conclusion: IP addresses are the noisiest of the four signal types, and we hadn’t characterized our legitimate user population well enough to set a trustworthy threshold. We had the integration ready but chose not to enable it.

Because each signal type was independently scoped, deferring IP didn’t require touching anything else. Email, phone, and device fingerprint went live. IP went dark but stayed instrumented—we could watch the signal without acting on it, which meant we were building the dataset we’d need to calibrate it later. When we eventually brought it online, it slotted in. Nothing else changed.

The Lever We Didn’t Pull

Within weeks of shipping, successful fraud all but stopped. The support ticket clusters stopped. Unreconciled transaction reports dropped to near zero. The on-call rotation stopped getting pulled into fraud incidents at scale. Engineers returned to the roadmap.

The IP deferral seemed like a half-measure at the time. In retrospect, it was the most disciplined call in the project. Shipping an uncalibrated block rule would’ve caused a different kind of harm: legitimate transactions declined, accounts locked, and support volume climbed for the wrong reason. A lever you don’t understand is not yours to pull.

More to explore

A hand holds a key card to a hotel door's electronic lock; the reader's light glows green.

Architecture

Jul 14, 2026

Guest Code

Spider-Man pointing meme representing the app and platform teams’ unowned identity verification seam.

Architecture

Jun 27, 2026

Whose Caller Is This?

Raccoon in a white astronaut suit holding a helmet on an orange background.

Reliability

Jun 18, 2026

You Can’t Roll Back a Phone

Architecture

Jul 14, 2026

Guest Code

Architecture

Jun 27, 2026

Whose Caller Is This?

This is half the conversation

The other half happens in my network⁠—engineering, product, and operations leaders working the same problems out loud. Agree, disagree, or somewhere in between, that’s where the second draft happens.

Connect on LinkedIn

General Jackson riverboat passing under Shelby Street Bridge at night

AT&T Building rising above downtown Nashville with Shelby Street Bridge below

Nashville east bank skyline under layered sunset clouds

Shelby Street Bridge illuminated over the Cumberland River at night

This is half the conversation

Connect on LinkedIn

This is half the conversation

Connect on LinkedIn

Nashville Gulch high-rises and Bridgestone Arena glowing at sunset