Building More Reliable Data Pipelines for Nearmap's Deep Learning Models: An Evolutionary Case Study

Continual learning using a continually evolving dataset is the norm for the AI team at Nearmap. We have had a software system & data pipelines to facilitate the management of this ever-growing dataset in place for several years of operation. During that time, both our needs & the system have evolved – we improvised and learned from early limitations & challenges. One of the biggest challenges of MLOps is building data systems right! Reliable, Fault-tolerant, & continually flowing pipelines are the foundation, with necessary additional capabilities for data quality control, reconciliations, & lineage/tracking. Based on our learnings, we have rebuilt a new generation of our system (based on Kafka) with one aim – the much discussed ""operation vacation"". The aim is to facilitate full automation and zero manual intervention of the system. In this session, we will go into details of the challenges we encountered, the lessons we learned, what we improved, and lastly; are we on vacation yet?

Speakers
speakerimage
Suneeta Mall
Principal Machine Learning Engineer, Nearmap
speakerimage
Samanvay Karambhe
Data Scientist, Nearmap