Building next generation, advanced Kafka connector with Beam Splittable DoFn API
Date : September 15, 2021
Time : 12:00 PM - 01:00 PM

Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines.Pipelines built with Apache Beam could be executed with one of Beam’s supported distributed processing back-ends including Apache Flink, Apache Spark, and Google Cloud Dataflow. KafkaIO is one of the key IO connectors in Apache Beam. In this talk, we are going to talk about how we build the powerful KafkaIO on top of Beam Splittable DoFn, which is the new generation of IO framework in Beam. Within the power of Kafka Consumer APIs and Beam Splittable DoFn, the KafkaIO is able to dynamically read from any given TopicPartition on the fly. Besides, given the splittable nature of Splittable DoFn, the processing back-ends are able to redistribute the work more evenly by performing checkpointing with certain frequency. We are also going to talk about the future of KafkaIO in Beam, and how we plan to add support for dynamic splitting.

Boyuan Zhang
Software Engineer, Google