[ https://issues.apache.org/jira/browse/KAFKA-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955805#comment-16955805 ]
Mark Tinsley commented on KAFKA-4609: ------------------------------------- Also seeing this issue in version 2.2.1 Not an easy one to duplicate sadly, found this after running some stress tests on the system. Tracked down to a group-by-key to aggregate followed by a join to another table. The aggregate fn is quite simple, it's just creating a list of all the messages it sees. What I was seeing was the aggregate value had a duplicate in it's list, checked the topic it was consuming by and can confirm no duplicate message on the topic. > KTable/KTable join followed by groupBy and aggregate/count can result in > duplicated results > ------------------------------------------------------------------------------------------- > > Key: KAFKA-4609 > URL: https://issues.apache.org/jira/browse/KAFKA-4609 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.1.1, 0.10.2.0 > Reporter: Damian Guy > Priority: Major > Labels: architecture > > When caching is enabled, KTable/KTable joins can result in duplicate values > being emitted. This will occur if there were updates to the same key in both > tables. Each table is flushed independently, and each table will trigger the > join, so you get two results for the same key. > If we subsequently perform a groupBy and then aggregate operation we will now > process these duplicates resulting in incorrect aggregated values. For > example count will be double the value it should be. -- This message was sent by Atlassian Jira (v8.3.4#803005)