Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink in Apache Flink
Date : April 26, 2022
Time : 11:00 AM - 11:45 AM

Apache Kafka is one of the most commonly used connectors with Apache Flink for exactly-once streaming use cases. The combination of both systems allows you to build mission-critical systems that require low end-to-end latency and exactly-once processing eg. banks processing transactions. In Apache Flink 1.14, we released a new KafkaSink based on Apache Flink’s unified Sink interface that natively supports streaming and batch executions. However, we needed to stretch Kafka’s transactions API to fully support exactly-once processing in Flink. In this talk, we will start with a quick recap of Apache Kafka’s transactions and Flink’s checkpointing mechanism. Then, we describe the two-phase commit protocol implemented in KafkaSink in-depth and emphasize the difficulties we have overcome when applying Kafka’s transaction API to longer-lasting transactions. We explain how we ensure performant writing to Apache Kafka and how the KafkaSink recovery works. In summary, this talk should give users a deep dive into how Apache Flink leverages Apache Kafka’s transactions and developers an overview of what they have to consider when using Apache Kafka’s transactions.

Speakers
speakerimage
Fabian Paul
Software Engineer, Ververica