Repost 🔃 Kafka Streams, co-partitioning requirements illustrated

Post banner
Publication: Jun 20, 2020

Back in 2020, I wrote an article about data exchange between Kafka Streams instances. I was not working with Kafka Stream at a specific time. But my head was full of many ideas I wanted to put on paper after using the Kafka Streams library for quite a while. The way joins happen was one of those ideas.

Like many people, when I started to work with this stream processing library, I had prior experiences with Apache Spark and other data processing frameworks. Shuffling was quite an important concept for stateful operations such as grouping, joining or windowing. The idea was not easy to grasp, and you had to avoid it as much as possible. So when I had to switch mental model to use Kafka Streams, thinking about Repartitioning instead of Shuffling felt way more intuitive. So I tried to use my knowledge of Apache Kafka to describe it by creating a demo and a colourful blog post.

Note on Repost Blog: blog.loicmdivad.com is a place to collect all the work I'm doing in software engineering. But to avoid content duplication when an article is published somewhere else (e.g. Medium, corporate blog), I just create a Repost Blog. This type of blog just describes the story of the publication briefly and include a link to it. Repost blog can be identified by the tag repost . They also have the fr-link or en-link to indicate the language of the initial publication.
Publication: Jun 20, 2020