[ 
https://issues.apache.org/jira/browse/KAFKA-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093842#comment-17093842
 ] 

Sophie Blee-Goldman edited comment on KAFKA-9923 at 4/27/20, 7:00 PM:
----------------------------------------------------------------------

The root cause is the same but the resulting problems are different: caching 
with duplicates doesn't seem to really make sense (should we even allow that 
combination?), however changelogging + duplicates definitely does. But this 
seems to be broken in a way that has correctness implications as we may be 
losing records during compaction


was (Author: ableegoldman):
The root cause is the same but the implications are different: caching with 
duplicates doesn't seem to really make sense, however changelogging + 
duplicates definitely does but this seems to have correctness implications as 
we may be losing records during compaction

> Join window store duplicates can be compacted in changelog 
> -----------------------------------------------------------
>
>                 Key: KAFKA-9923
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9923
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Sophie Blee-Goldman
>            Priority: Critical
>
> Stream-stream joins use the regular `WindowStore` implementation but with 
> `retainDuplicates` set to true. To allow for duplicates while using the same 
> unique-key underlying stores we just wrap the key with an incrementing 
> sequence number before inserting it.
> This wrapping occurs at the innermost layer of the store hierarchy, which 
> means the duplicates must first pass through the changelogging layer. At this 
> point the keys are still identical. So, we end up sending the records to the 
> changelog without distinct keys and therefore may lose the older of the 
> duplicates during compaction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to