KAFKA SUMMIT LONDON

April 25 - 26, 2022

Streaming Updates through Complex Operations in Kafka Streams at Scale

Date : April 25, 2022

Time : 1:00 PM - 1:45 PM

With Kafka Streams, you can build complex stream processing topologies, including Joins and Aggregations over data streams. Making these complex processing pipelines aware of updates (including deletions) often becomes difficult, especially for previously joined and aggregated data. Because of the sheer amount of data, re-aggregating or re-joining from scratch at some time to handle updates correctly is not desirable. In practice, these updates play an important role in stream processing, for example, to continuously improve data quality, to ensure data privacy, or to handle late-arriving data. This talk explores how we efficiently handle these stream updates and deletions in consecutive joins with Kafka Streams. Furthermore, we present an optimization for the aggregate operation in Kafka Streams, leveraging state stores to handle updates in complex aggregates. We discuss the challenges we encountered running complex stream processing topologies on Kubernetes and explore the solution with hands-on experiments. Finally, we demonstrate how splitting the stream processing topologies enables us to have more fine granular control over resource allocation and scalability of different consecutive processing steps. We show how this improves cost-efficiency through autoscaling and the overall manageability of such streaming pipelines.

Speakers

Victor Künstler

Software Engineer, bakdata GmbH

Privacy Policy | Terms & Conditions,
Apache, Apache Kafka, Kafka, Apache Flink, Flink and associated open source project names are trademarks of the Apache Software Foundation.
The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event
Copyright © Confluent, Inc. 2016 - 2024

#kafkasummit