The Situation

A B2B programmatic advertising platform was operating with a fundamentally compromised audience reach forecasting system.

Impression data used to generate forecasts contained significant ISP and non-human traffic contamination, making commercial forecasts unreliable and undermining advertiser confidence.

The Task

Architect and deliver a full rebuild of the forecasting data pipeline from the ground up — identifying and quantifying contamination, designing a clean classification system, and restoring forecast integrity to production standard.

The Action / Approach

Conducted systematic analysis across hundreds of millions of impressions to identify and quantify contamination sources.

Designed and implemented a domain classification system of over 37,000 domains to isolate ISP traffic from genuine programmatic inventory.

Rebuilt the AWS data pipeline architecture across S3, Glue, and Athena to enforce clean data separation at source.

Implemented pre/post-regime data separation to account for structural breaks in the data timeline.

The Result

ISP contamination quantified at 73-79% of impressions across the dataset — a finding that fundamentally reframed the platform’s commercial forecasting model.

Clean data pipeline delivered to production, unblocking accurate audience reach forecasting and providing an evidence base for advertiser-facing reporting.

Practice