[ https://issues.apache.org/jira/browse/FLINK-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065782#comment-17065782 ]
Jark Wu commented on FLINK-16581: --------------------------------- Hi [~lzljs3620320], that's true there is a lot of operators using timers. That's because {{StateTtlConfig}} is introduced in recent releases and we don't have much time to refactor the existing operators. The reason why we use {{StateTtlConfig}} is because it simplify the implementatiion A LOT. > can't cleanup multiple states at the same time For example, in COUNT DISTINCT, there are 2 states, the {{MapState}} stores all distinct values, the {{count}} store the size of the MapState. If we use {{StateTtlConfig}}, some entries of MapState may be retired, but {{count}} is not. If a retired value comes in, the {{count}} value gets larger by mistake. If we use timer, MapState and count will be reset together. But I think that's not a big problem, because the result is anyway not correct once ttl happens. in {{RetractableTopNFunction}}, there will be multiple states, the {{dataState}} which stores all input data, the {{treeMap}} stores the TopN element in order. > Minibatch deduplication lack state TTL > -------------------------------------- > > Key: FLINK-16581 > URL: https://issues.apache.org/jira/browse/FLINK-16581 > Project: Flink > Issue Type: Bug > Components: Table SQL / Runtime > Affects Versions: 1.9.2, 1.10.0 > Reporter: Jingsong Lee > Assignee: dalongliu > Priority: Critical > Fix For: 1.9.3, 1.10.1, 1.11.0 > > > This lead to OOM with long running streaming job. > We should check all unbounded operations, should not lack state TTL -- This message was sent by Atlassian Jira (v8.3.4#803005)