[ https://issues.apache.org/jira/browse/KAFKA-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093842#comment-17093842 ]
Sophie Blee-Goldman edited comment on KAFKA-9923 at 4/27/20, 7:00 PM: ---------------------------------------------------------------------- The root cause is the same but the resulting problems are different: caching with duplicates doesn't seem to really make sense (should we even allow that combination?), however changelogging + duplicates definitely does. But this seems to be broken in a way that has correctness implications as we may be losing records during compaction was (Author: ableegoldman): The root cause is the same but the implications are different: caching with duplicates doesn't seem to really make sense, however changelogging + duplicates definitely does but this seems to have correctness implications as we may be losing records during compaction > Join window store duplicates can be compacted in changelog > ----------------------------------------------------------- > > Key: KAFKA-9923 > URL: https://issues.apache.org/jira/browse/KAFKA-9923 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: Sophie Blee-Goldman > Priority: Critical > > Stream-stream joins use the regular `WindowStore` implementation but with > `retainDuplicates` set to true. To allow for duplicates while using the same > unique-key underlying stores we just wrap the key with an incrementing > sequence number before inserting it. > This wrapping occurs at the innermost layer of the store hierarchy, which > means the duplicates must first pass through the changelogging layer. At this > point the keys are still identical. So, we end up sending the records to the > changelog without distinct keys and therefore may lose the older of the > duplicates during compaction. -- This message was sent by Atlassian Jira (v8.3.4#803005)