Interested in working with us? We are hiring!

See open positions

AdRoll Puts the “P” in Big Data: Processing Petabytes

Valentino Volonghi Written by Valentino Volonghi, November 24, 2014

Originally published on the AdRoll Blog on November 12, 2014.

The advertising industry has undeniably become a data play, as consumers are generating valuable data with every digital interaction. We hear buzzwords like “big data,” “machine learning” and “real-time algorithms,” but little about how these puzzle pieces fit together to help marketers achieve their business objectives. Over the last few years, the ad tech industry has lead the way in turning big data concepts into solutions that solve real business challenges.

Big Data Feature Image

In its most basic form, data science is the extraction of knowledge from data, and machine learning powers this process making programmatic buying possible. In performance advertising, the predictive power of your model grows as you increase the amount of data that is flowing through the system. And AdRoll has a lot of data to work with.

Customer intent data is the biggest competitive advantage at a company’s disposal if it can collect, analyze, and use that data in real time. A few months ago our systems generated and ingested 40-50 terabytes of data each day, but recently we’ve reached new heights in data volume at AdRoll, processing 130TB of data, about 30TB compressed, every single day. Basically, we operate at a data volume that is 150 times bigger than all of the US stock exchanges combined, by two orders of magnitude, and over 10 times the volume of events. In three days we generate as much data as the US stock exchanges generate in one year.

In fact, this year we hit the petabyte mark, processing over 10 petabytes (1 PB = 1000000000000000 bytes) = 1000 terabytes), a 1,200% increase year over year. To put that in perspective, in order to accommodate the storage and processing capacity of 10PB, we would need a space at least the size of AT&T Park. Given San Francisco real estate prices, it’s a good thing we’ve been able to utilize globally distributed, cloud-based data warehouses thanks to our friends at Amazon AWS.

Our partnership with Amazon allows us to grow our inventory sources and securely store, process, and scale our data volume. We can keep our data science and engineering talent focused on product innovation without the time and management resources required by legacy infrastructure. This has allowed us to serve our customers better while providing a steady revenue stream for publishers, which subsidizes the free web consumers have come to expect.

Consumers are spending more time online and on mobile devices, and we are incorporating more and more signals and digital interactions (time on site, quality of ad space, emails opened) into our RTB algorithm for better targeting and optimization. Our machines now process over 60 billion events on a daily basis.

So what does this massive increase in data volume indicate for the industry as a whole? It’s simple- predictive analytics are the future of RTB and data will drive the next generation of advertising.

Retargeting has become a powerful tool, providing the masses with real-time predictive capabilities that were once only reserved for the likes of Google. More inventory sources are popping up, more digital interactions are being captured, and more intelligent algorithms are being developed as a result.

Retargeting was one of the first ways marketers could leverage the customer data available for collection on their websites and it continues to outpace the industry in terms of innovation in RTB. Big data has moved from a buzzword to a staple in advertising, and it’s importance and profitability will only grow.

Who knows how many baseball fields of data we’ll be processing in 2015.