[ https://issues.apache.org/jira/browse/KAFKA-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924238#comment-15924238 ]
Damian Guy edited comment on KAFKA-4609 at 3/14/17 2:07 PM: ------------------------------------------------------------ [~miguno] It is because the caches are flushed independently and both KTables trigger the join, i.e., assuming you have {{table1.join(table2)}} and within a single commit interval you received: table1 A:1 table2 A:A when the stores are flushed on the commit interval. We flush the store for table1, this triggers the join and produces A:1:A. We then flush table2, this triggers the join and produce A:1:A was (Author: damianguy): [~miguno] It is because the caches are flushed independently and both KTables trigger the join, i.e., assuming you have {table1.join(table2)} and within a single commit interval you received: table1 A:1 table2 A:A when the stores are flushed on the commit interval. We flush the store for table1, this triggers the join and produces A:1:A. We then flush table2, this triggers the join and produce A:1:A > KTable/KTable join followed by groupBy and aggregate/count can result in > incorrect results > ------------------------------------------------------------------------------------------ > > Key: KAFKA-4609 > URL: https://issues.apache.org/jira/browse/KAFKA-4609 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.1.1, 0.10.2.0 > Reporter: Damian Guy > Assignee: Damian Guy > Labels: architecture > > When caching is enabled, KTable/KTable joins can result in duplicate values > being emitted. This will occur if there were updates to the same key in both > tables. Each table is flushed independently, and each table will trigger the > join, so you get two results for the same key. > If we subsequently perform a groupBy and then aggregate operation we will now > process these duplicates resulting in incorrect aggregated values. For > example count will be double the value it should be. -- This message was sent by Atlassian JIRA (v6.3.15#6346)