[ https://issues.apache.org/jira/browse/KAFKA-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guozhang Wang updated KAFKA-3658: --------------------------------- Description: As [~h...@pinterest.com] found out, the current validation check of {{KStreamJoinWindow}} requires the retention period to be at least twice than the join window size. This check was originally for making the segment interval to be larger than the join window size. But for windowed stream-stream join this is not necessary. More specifically, for example with a window size 6, and retention period 12, and num. segment 5, the segment size will be set to 3. This means after time 12, the first segment of [0, 3) will be dropped, then at time 13, a late record with timestamp (1) will not be accepted to the window store, and will not participate in the joining as well. The proposed change is to only require retention period to be > window size, not window size * 2. cc [~ymatsuda] was: As [~h...@pinterest.com] found out, the current implementation of {{RocksDBWindowStore}} does not guarantee a single window locates completely in one segment, and hence when we expiring a segment, that would result in partial window expiration (i.e. some records of the window are dropped, while some others are still available for queries). We need to fix this issue in setting the segment size to consider the window size. Another minor issue is that retention size should be validated correctly to be no less than the window size. > Incorrect validation check on maintenance period with join window size > ---------------------------------------------------------------------- > > Key: KAFKA-3658 > URL: https://issues.apache.org/jira/browse/KAFKA-3658 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: Guozhang Wang > Assignee: Guozhang Wang > Labels: architecture > Fix For: 0.10.0.1 > > > As [~h...@pinterest.com] found out, the current validation check of > {{KStreamJoinWindow}} requires the retention period to be at least twice than > the join window size. This check was originally for making the segment > interval to be larger than the join window size. But for windowed > stream-stream join this is not necessary. > More specifically, for example with a window size 6, and retention period 12, > and num. segment 5, the segment size will be set to 3. This means after time > 12, the first segment of [0, 3) will be dropped, then at time 13, a late > record with timestamp (1) will not be accepted to the window store, and will > not participate in the joining as well. > The proposed change is to only require retention period to be > window size, > not window size * 2. > cc [~ymatsuda] -- This message was sent by Atlassian JIRA (v6.3.4#6332)