Re: Watermarks in Event Time Temporal Join

Leonard Xu Wed, 28 Apr 2021 19:37:34 -0700

Thanks for your example, Maciej

I can explain more about the design.
> Let's have events.
> S1, id1, v1, 1
> S2, id1, v2, 1
> 
> Nothing is happening as none of the streams have reached the watermark.
> Now let's add
> S2, id2, v2, 101
> This should trigger join for id1 because we have all the knowledge to
> perform this join (we know that the watermark for id1 record was
> reached).


Base on this example, if the left stream events are
S1, id1, v1, 1
S1, id1, v2, 101
S1, id1, v3, 102
S1, id1, v4, 99 // assume the we need watermark as` ts - 3`
For out-of-order data like late event v4(99) should we ginored or not because 
the versioned table’s watermark is only base on versioned table’s data instead 
of both sides,
and it cannot represents the  of left stream, now we use left stream’s 
watermark for out-of-order data.
 
And continually base on your example, assume the right stream events are 
S2, id2, v1, 101
S2, id3, v1, 101
S2, id3, v2, 102
S2, id3, v3, 103
S2, id3, v4, 104
How we clean the old version data e.g. id3(v1~v3)? If you clean them only base 
on versioned table watermark(e.g. versioned table watermark is 105), the
 data (id3, 101),  (id3, 102) from  left stream data cannot find correct 
version, right?
Now the left stream watermark is used to clean up the outdated data and ensure 
the every row in left stream can find correct version.

Best,
Leonard

Re: Watermarks in Event Time Temporal Join

Reply via email to