[ 
https://issues.apache.org/jira/browse/FLINK-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065782#comment-17065782
 ] 

Jark Wu commented on FLINK-16581:
---------------------------------

Hi [~lzljs3620320], that's true there is a lot of operators using timers. 
That's because {{StateTtlConfig}} is introduced in recent releases and we don't 
have much time to refactor the existing operators. The reason why we use  
{{StateTtlConfig}} is because it simplify the implementatiion A LOT. 

> can't cleanup multiple states at the same time
For example, in COUNT DISTINCT, there are 2 states, the {{MapState}} stores all 
distinct values, the {{count}} store the size of the MapState. If we use 
{{StateTtlConfig}}, some entries of MapState may be retired, but {{count}} is 
not. If a retired value comes in, the {{count}} value gets larger by mistake. 
If we use timer, MapState and count will be reset together. 
But I think that's not a big problem, because the result is anyway not correct 
once ttl happens. 

in {{RetractableTopNFunction}}, there will be multiple states, the 
{{dataState}} which stores all input data, the {{treeMap}} stores the TopN 
element in order. 

> Minibatch deduplication lack state TTL
> --------------------------------------
>
>                 Key: FLINK-16581
>                 URL: https://issues.apache.org/jira/browse/FLINK-16581
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Runtime
>    Affects Versions: 1.9.2, 1.10.0
>            Reporter: Jingsong Lee
>            Assignee: dalongliu
>            Priority: Critical
>             Fix For: 1.9.3, 1.10.1, 1.11.0
>
>
> This lead to OOM with long running streaming job.
> We should check all unbounded operations, should not lack state TTL



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to