Back in 2020, I wrote an article about data exchange between Kafka Streams instances. I was not working with Kafka Stream at a specific time. But my head was full of many ideas I wanted to put on paper after using the Kafka Streams library for quite a while. The way joins happen was one of those ideas.
Like many people, when I started to work with this stream processing library, I had prior experiences with Apache Spark and other data processing frameworks. Shuffling was quite an important concept for stateful operations such as grouping, joining or windowing. The idea was not easy to grasp, and you had to avoid it as much as possible. So when I had to switch mental model to use Kafka Streams, thinking about Repartitioning instead of Shuffling felt way more intuitive. So I tried to use my knowledge of Apache Kafka to describe it by creating a demo and a colourful blog post.