CuprBot Labs: Double Your Capacity. Same Payroll.

Situation

This is enterprise work at large scale. At Index Exchange, a real-time advertising exchange, I worked on the performance of a fleet of roughly 5,000 servers handling auction traffic where latency and throughput translate directly into revenue. Small percentage changes at that scale are large absolute numbers.

Problem

The fleet was carrying more load than it comfortably should, and the simple answer, add more servers, meant adding cost linearly forever. The harder and cheaper answer was to make the existing fleet do more. The constraint was that the same traffic carried the revenue: any rate-limiting that shed load risked shedding income with it.

Approach

I built an ML-enhanced rate-limiting model in Go, Perl, and Python that decided what to shed and what to keep based on its revenue value, not on a flat threshold. The model was tuned continuously through a scalable ML pipeline I built with a six-engineer team, using decision-tree modeling to keep the trade-off calibrated as traffic patterns shifted. In parallel I analyzed the fleet’s Prometheus and InfluxDB telemetry in Jupyter to find where resources were being spent without return.

Architecture and key decisions

Rate limiting by revenue value, not by volume. The model protected the requests that paid and shed the ones that did not, which is how a 30% performance gain cost only 5% of revenue.
Continuous tuning over a one-time model. Traffic on an exchange changes constantly, so the decision-tree model retuned through a pipeline rather than being trained once and left to drift.
Telemetry read for cost. Prometheus and InfluxDB data usually get watched for outages. Reading the same data for cost surfaced a $10,000-a-month reduction that informed how resources were allocated.

What shipped

The production rate-limiting model running across the 5,000-server fleet, the ML pipeline that kept it tuned, and an infrastructure-cost analysis that turned monitoring data into a concrete monthly saving.

Outcome

Server performance rose 30% while 95% of revenue was retained across the 5,000-server fleet, and the telemetry analysis surfaced a $10,000-a-month cost reduction that was acted on. The same hardware did substantially more work.

What this demonstrates

Getting more out of infrastructure you already pay for is one of the cleanest forms of margin there is, and it is mostly a measurement-and-modeling problem. When I tell an operator their systems are leaving capacity or money on the table, the method is the one used here: instrument it, model the trade-off that matters, and tune it as reality moves.