Hi, Taras

> But checkpoint data size for join task is permanently increasing despite the 
> watermarks on the tables and "Low watermark" mark in UI. 
> As far as I understand outdated records from both tables must be dropped from 
> checkpoint after 2 hours, but looks like it holds all job state since the 
> beginning.

What kind of join do you use?

If you’re using interval join [1], the outdated data in join operator state 
will be cleaned after interval join time period + watermark interval,and your 
understanding is wright in this case.

If you’re using regular join, Flink regular join will keep all data in state to 
ensure join semantics is same with classical DB which means the operator state 
will continuous to increase,  you can set the option table.exec.state.ttl [2] 
to clean the outdated data in state according your business. 


Best,
Leonard
[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/streaming/joins.html#interval-joins
 
<https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/streaming/joins.html#interval-joins>
[2] 
https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/config.html#table-exec-state-ttl
 
<https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/config.html#table-exec-state-ttl>

Reply via email to