[ 
https://issues.apache.org/jira/browse/FLINK-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064864#comment-17064864
 ] 

Jark Wu commented on FLINK-16581:
---------------------------------

Hi [~lsy], thanks for the contribution. I glanced the pull request you 
submitted. But I would like to discuss the approach here. 

Currently, there are 2 ways to cleanup states. 
1) registering a processing-time timer, and cleanup entries when the timer is 
callback.
  - pros: can cleanup multiple states at the same time (state consistent)
  - cons: timer space depends on the key size, which may lead to OOM (heap 
timer). 
  - used in Group Aggregation, Over Aggregateion, TopN
2) using the {{StateTtlConfig}} provided by DataStream.
  - pros: decouple the logic of state ttl with the record processing, easy to 
program (take a look at old planner NonWindowJoin which bundles ttl timestamp 
with records in MapState).
  - cons: can't cleanup multiple states at the same time.
  - useed in Sream-Stream Joins.

Personally, I perfer using {{StateTtlConfig}} which leverage the ability of 
DataStream and not inventing the same thing. Besides, it can help to improve 
the readability of codes (reduce bugs). What do you think [~lzljs3620320] 
[~lsy]?






> Minibatch deduplication lack state TTL
> --------------------------------------
>
>                 Key: FLINK-16581
>                 URL: https://issues.apache.org/jira/browse/FLINK-16581
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Runtime
>    Affects Versions: 1.9.2, 1.10.0
>            Reporter: Jingsong Lee
>            Assignee: dalongliu
>            Priority: Critical
>             Fix For: 1.9.3, 1.10.1, 1.11.0
>
>
> This lead to OOM with long running streaming job.
> We should check all unbounded operations, should not lack state TTL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to