KAFKA SUMMIT AMERICAS

September 14 - 15, 2021

Streaming Data Lakes Using Kafka Connect + Apache Hudi

Date : September 14, 2021

Time : 12:00 PM - 01:00 PM

Apache Hudi is a data lake platform, that provides streaming primitives (upserts/deletes/change streams) on top of data lake storage. Hudi powers very large data lakes at Uber, Robinhood and other companies, while being pre-installed on four major cloud platforms. Hudi supports exactly-once, near real-time data ingestion from Apache Kafka to cloud storage, which is typically used in-place of a S3/HDFS sink connector to gain transactions and mutability. While this approach is scalable and battle-tested, it can only ingest data in mini batches, leading to lower data freshness. In this talk, we introduce a Kafka Connect Sink Connector for Apache Hudi, which writes data straight into Hudi's log format, making the data immediately queryable, while Hudi's table services like indexing, compaction, clustering work behind the scenes, to further re-organize for better query performance.

Speakers

Vinoth Chandar

PMC Chair, Apache Hudi, Apache Software Foundation

Balaji Varadarajan

Sr. Staff Engineer, Robinhood

Privacy Policy | Terms & Conditions,
Apache, Apache Kafka, Kafka, Apache Flink, Flink and associated open source project names are trademarks of the Apache Software Foundation.
The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event
Copyright © Confluent, Inc. 2016 - 2024

#kafkasummit