Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka
Date : May 12, 2021
Time : 02:30 PM - 03:00 PM

At Stripe, we operate a general ledger modeled as double-entry bookkeeping for all financial transactions. Warehousing such data is challenging due to its high volume and high cardinality of unique accounts. Apache Pinot works well in synergy with Kafka to provide an excellent solution. With Pinot’s exactly-once consumption of Kafka, intelligent data layout and indexing optimization, we have created a real time financial OLAP cube that can lookup with <100ms(p95) latency in tables with half PB size. Furthermore, it is financially critical to get up-to-date, accurate analytics over all records. Due to the changing nature of real time transactions, it is impossible to pre-compute the analytics as a fixed time series. We have overcome the challenge by creating a real time key-value store inside Pinot that can sustain half million QPS with all the financial transactions. We will talk about the details of our solution and the interesting technical challenges faced.

Speakers
speakerimage
Xiaoman Dong
Software Engineer, Stripe
speakerimage
Joey Pereira
Software Engineer, Stripe