Implementing Real-Time Analytics with Kafka Streams
Date : May 16, 2023
Time : 1:00 PM - 1:45 PM

Real-time analytics requires manifold query types, which, for example, perform custom aggregations over groups of events, such as ordered sequences of selected events that meet certain criteria on keys or respective values. In practice, there are several query types which are not yet supported out-of-the-box. Kafka Streams provides state stores, which enable maintaining arbitrary state. These state stores can be used to store data ready for real-time analytics. Interactive Queries facilitate to leverage application state from outside the streams application. With this, you can retrieve individual events or ranges of events as input for aggregation operations in real-time analytics. As an example, we focus on order-preserving range queries. As one can read in the documentation, Kafka Streams’ Interactive Queries on ranges do not provide ordering guarantees. This may result in insufficient analytics and sorting large sets of events in main memory is not always feasible. In our talk, we discuss different approaches and highlight an indexing strategy for guaranteeing the order of a range query. We will discuss the pros and cons and finally demonstrate a real-world example of our solution. Furthermore, we showcase how our approach can also be applied to other implementations of custom analytics queries.

Speakers
speakerimage
Ramin Gharib
Software Engineer, bakdata GmbH