KAFKA SUMMIT EUROPE

May 11 - 12, 2021

Bayer Document Stream Pipelines

Date : May 12, 2021

Time : 04:00 PM - 04:30 PM

Bayer selected Apache Kafka as the primary layer for a variety of document streams flowing through several text processing and enrichment steps. Every day, Bayer analyzes numerous documents including clinical trials, patents, reports, news, literature, etc. We will give an idea about the strategic importance, peek into future challenges and we will provide an end-to-end technical overview. Throughout the discussion, we will look at challenges we handle in the platform and discuss respective solutions. Among others, we discuss our approach to continuously pull in data from a variety of external sources and how we harmonize different formats and schemas. We discuss large document processing and error handling, which allows efficient debugging while not blocking the pipeline. Then, we take on the user’s perspective and demo the platform. One will learn how users create new document processing pipelines and how Bayer keeps track of the many running Kafka pipelines.

Speakers

Astrid Rheinländer

Computational Scientist, Bayer AG

Christoph Böhm

Data Engineer, bakdata GmbH

Privacy Policy | Terms & Conditions,
Apache, Apache Kafka, Kafka, Apache Flink, Flink and associated open source project names are trademarks of the Apache Software Foundation.
The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event
Copyright © Confluent, Inc. 2016 - 2024

#kafkasummit