30% More Performance Across a 5,000-Server Fleet
At Index Exchange I built an ML-enhanced rate-limiting model that lifted server performance 30% across a 5,000-server global fleet while holding 95% of revenue, and found a $10,000-a-month cost reduction in the infrastructure data.
Status
Enterprise EmploymentDomain
InfrastructureHeadline result
30% performance gain at 95% revenue retention across 5,000 servers; $10,000/month cost reduction identified and acted on
Demonstrates
Representative stack
Fleet
- 5,000 global servers
- Real-time ad auction traffic
Model
- ML-enhanced rate limiting (Go, Perl, Python)
- Decision-tree tuning
- 6-engineer ML pipeline
Measurement
- Prometheus + InfluxDB analysis
- $10K/month cost reduction surfaced
Situation
This is enterprise work at large scale. At Index Exchange, a real-time advertising exchange, I worked on the performance of a fleet of roughly 5,000 servers handling auction traffic where latency and throughput translate directly into revenue. Small percentage changes at that scale are large absolute numbers.
Problem
The fleet was carrying more load than it comfortably should, and the simple answer, add more servers, meant adding cost linearly forever. The harder and cheaper answer was to make the existing fleet do more. The constraint was that the same traffic carried the revenue: any rate-limiting that shed load risked shedding income with it.
Approach
I built an ML-enhanced rate-limiting model in Go, Perl, and Python that decided what to shed and what to keep based on its revenue value, not on a flat threshold. The model was tuned continuously through a scalable ML pipeline I built with a six-engineer team, using decision-tree modeling to keep the trade-off calibrated as traffic patterns shifted. In parallel I analyzed the fleet’s Prometheus and InfluxDB telemetry in Jupyter to find where resources were being spent without return.
Architecture and key decisions
- Rate limiting by revenue value, not by volume. The model protected the requests that paid and shed the ones that did not, which is how a 30% performance gain cost only 5% of revenue.
- Continuous tuning over a one-time model. Traffic on an exchange changes constantly, so the decision-tree model retuned through a pipeline rather than being trained once and left to drift.
- Telemetry read for cost. Prometheus and InfluxDB data usually get watched for outages. Reading the same data for cost surfaced a $10,000-a-month reduction that informed how resources were allocated.
What shipped
The production rate-limiting model running across the 5,000-server fleet, the ML pipeline that kept it tuned, and an infrastructure-cost analysis that turned monitoring data into a concrete monthly saving.
Outcome
Server performance rose 30% while 95% of revenue was retained across the 5,000-server fleet, and the telemetry analysis surfaced a $10,000-a-month cost reduction that was acted on. The same hardware did substantially more work.
What this demonstrates
Getting more out of infrastructure you already pay for is one of the cleanest forms of margin there is, and it is mostly a measurement-and-modeling problem. When I tell an operator their systems are leaving capacity or money on the table, the method is the one used here: instrument it, model the trade-off that matters, and tune it as reality moves.
The playbooks behind this work