Thanks for your example, Maciej I can explain more about the design. > Let's have events. > S1, id1, v1, 1 > S2, id1, v2, 1 > > Nothing is happening as none of the streams have reached the watermark. > Now let's add > S2, id2, v2, 101 > This should trigger join for id1 because we have all the knowledge to > perform this join (we know that the watermark for id1 record was > reached).
Base on this example, if the left stream events are S1, id1, v1, 1 S1, id1, v2, 101 S1, id1, v3, 102 S1, id1, v4, 99 // assume the we need watermark as` ts - 3` For out-of-order data like late event v4(99) should we ginored or not because the versioned table’s watermark is only base on versioned table’s data instead of both sides, and it cannot represents the of left stream, now we use left stream’s watermark for out-of-order data. And continually base on your example, assume the right stream events are S2, id2, v1, 101 S2, id3, v1, 101 S2, id3, v2, 102 S2, id3, v3, 103 S2, id3, v4, 104 How we clean the old version data e.g. id3(v1~v3)? If you clean them only base on versioned table watermark(e.g. versioned table watermark is 105), the data (id3, 101), (id3, 102) from left stream data cannot find correct version, right? Now the left stream watermark is used to clean up the outdated data and ensure the every row in left stream can find correct version. Best, Leonard