Handling failures is important, but it’s a must have when a product handles sensitive data. It also becomes exponentially harder in the world of microservices, since a failure can happen in any of the services and even in their dependencies. One of the go-to solutions when handling errors is to simply “Retry”. However, in a complex system, A “Retrying” mechanism must be Smart, Customisable, Pluggable, and Persistent. In this talk, we'll discuss what capabilities a ""GOOD"" retrying mechanism should have, how I implemented a Kafka-based retrying solution that answers those capabilities, can handle a large and diverse set of errors and why Kafka is a good match for such a solution.