Architecture
When Fraud Finds Your Platform

Three users in one morning. Transactions they hadn’t made. Money gone. By afternoon, a dozen. On a platform processing real financial transactions, this wasn’t a support queue problem but a structural one. The clock starts once you name it.
The platform wasn’t built with this threat model in mind. When the transaction surface expanded, we attracted bad actors who understood it better than we did. Account takeover fraud—compromised credentials, VPN-masked access, funds moved to temporary accounts before anyone noticed—is a well-worn playbook. We never had to defend against it before. We had the logs and incident reports but lacked a system to evaluate signals fast enough to act before the damage landed.
So we made the call that’s never clean: stop shipping features and fix this. There’s always a roadmap, commitments, and a product org with quarterly priorities and stakeholders who aren’t watching the same fraud queue. But the math was simple once we said it out loud: every release shipped while the fraud vector was open made the problem bigger. Velocity was the accelerant. We paused.
Build vs. Buy Under Fire
We were in over our heads on detection. The team was strong, but account takeover fraud was a different discipline and we had no in-house expertise to build and calibrate a detection model under production pressure. We deployed hotfixes while the system still leaked. That’s what “in over our heads” looked like in practice—not chaos but engineers pulled from roadmap work to manually investigate incidents while we found a real solution.
Building from scratch wasn’t the answer. A fraud model requires a calibration cycle measured in quarters and a large enough historical dataset to train on. You can’t label fraud you haven’t instrumented. We needed something production-ready in weeks.
The evaluation wasn’t rigorous—it couldn’t be. We needed a platform we could instrument fast, that covered the signal types relevant to account takeover, and that could operate inside a regulated environment on day one. The first vendor we evaluated demonstrated all three. Under other circumstances, we might have run a longer comparison, but we had no choice.
What mattered about the platform’s design was that it separated signal from verdict. It evaluated risk across four types: IP address, device fingerprint, email, and phone number. Each returned a score plus a set of attributes—connection type, bot indicators, proxy flags. Their models handle the pattern recognition. Your job is deciding what those signals mean for your specific users.
That distinction matters more than it sounds. A fraud score is an input, not a decision. A datacenter IP with an elevated score might be a legitimate enterprise user on a corporate VPN. A residential IP with a proxy flag might be a privacy-conscious user who has never committed fraud. On a financial platform, a false positive is a trust event—sometimes more damaging than the fraud you were trying to stop.
Building the Judgment Layer
We didn’t wire scores directly to access decisions. We built a rule evaluation layer on top—one handler per signal type, each consuming the platform’s response, applying a configurable threshold, and returning a yes-or-no determination.
The thresholds weren’t hardcoded. They were pulled from configuration at runtime, allowing us to adjust what “fraud” meant for our platform without deploying new code. We knew the thresholds would need tuning as real traffic patterns emerged and wanted to avoid a deployment cycle for calibration changes.
The device fingerprint check showed how the layering worked in practice. First gate: if the device ID in the request didn’t match the device ID the platform returned, that was fraud, full stop. Second gate: if the fraud probability exceeded our configured threshold, block. Third gate: if the score was elevated but below the threshold, we looked at the connection type and bot status together. A datacenter connection with an active bot flag read differently than a residential connection with a proxy. Same score, different context, different result.
The Signal We Kept Dark
We deferred IP address evaluation entirely in the first release.
The account-takeover pattern made IP the most tempting signal and the most dangerous to miscalibrate. Bad actors masked behind VPNs, but so did legitimate users. Blocking on IP characteristics alone would have caught both. Everyone who looked at the false-positive rate reached the same conclusion: IP addresses are the noisiest of the four signal types, and we hadn’t characterized our legitimate user population well enough to set a trustworthy threshold. We had the integration ready but chose not to enable it.
Because each signal type was independently scoped, deferring IP didn’t require touching anything else. Email, phone, and device fingerprint went live. IP went dark but stayed instrumented—we could watch the signal without acting on it, which meant we were building the dataset we’d need to calibrate it later. When we eventually brought it online, it slotted in. Nothing else changed.
The Lever We Didn’t Pull
Fraudulent transaction attempts fell by 82% from our pre-deployment baseline. The support ticket clusters stopped. Unreconciled transaction reports dropped to near zero. The on-call rotation stopped getting pulled into fraud incidents at scale. Engineers who’d been spending cycles on manual investigation regained that capacity.
The IP deferral seemed like a half-measure at the time. In retrospect, it was the most disciplined call in the project. Shipping an uncalibrated block rule would have caused a different kind of harm: legitimate transactions declined, accounts locked, and support volume climbed for the wrong reason. Knowing which lever not to pull until you understand what it’s connected to is harder than it sounds.








