[ https://issues.apache.org/jira/browse/FLINK-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064864#comment-17064864 ]
Jark Wu commented on FLINK-16581: --------------------------------- Hi [~lsy], thanks for the contribution. I glanced the pull request you submitted. But I would like to discuss the approach here. Currently, there are 2 ways to cleanup states. 1) registering a processing-time timer, and cleanup entries when the timer is callback. - pros: can cleanup multiple states at the same time (state consistent) - cons: timer space depends on the key size, which may lead to OOM (heap timer). - used in Group Aggregation, Over Aggregateion, TopN 2) using the {{StateTtlConfig}} provided by DataStream. - pros: decouple the logic of state ttl with the record processing, easy to program (take a look at old planner NonWindowJoin which bundles ttl timestamp with records in MapState). - cons: can't cleanup multiple states at the same time. - useed in Sream-Stream Joins. Personally, I perfer using {{StateTtlConfig}} which leverage the ability of DataStream and not inventing the same thing. Besides, it can help to improve the readability of codes (reduce bugs). What do you think [~lzljs3620320] [~lsy]? > Minibatch deduplication lack state TTL > -------------------------------------- > > Key: FLINK-16581 > URL: https://issues.apache.org/jira/browse/FLINK-16581 > Project: Flink > Issue Type: Bug > Components: Table SQL / Runtime > Affects Versions: 1.9.2, 1.10.0 > Reporter: Jingsong Lee > Assignee: dalongliu > Priority: Critical > Fix For: 1.9.3, 1.10.1, 1.11.0 > > > This lead to OOM with long running streaming job. > We should check all unbounded operations, should not lack state TTL -- This message was sent by Atlassian Jira (v8.3.4#803005)