Hi Ajay,

>From your description, I think watermarks[1], which indicates all earlier 
>events have been arrived, might meet your requests in a way. But this means 
>you should use windows and have event-time in your stream job.

If you don't want to introduce the concept of window, I think you can use 
'KeyedStateBackend#applyToAllKeys' to manually clear the target state when you 
see the "last" element, and record the cleared state name into a pre-definied 
operator state, so that arrived late elements could be skipped. Just be careful 
to not let the list in operator state not so large, e.g. only keep a fixed size 
of expired states.

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/event_timestamps_watermarks.html

Best
Yun Tang
________________________________
From: Aggarwal, Ajay <[email protected]>
Sent: Tuesday, February 5, 2019 22:54
To: [email protected]
Subject: late element and expired state


Hello,



I have some questions regarding best practices to deal with ever expanding 
state with KeyBy(). In my input stream I will continue to see new keys. And I 
am using Keyed state. How do I keep the total state in limit? After reading the 
flink documentation and some blogs I am planning to use following :



  *   When I know I have seen the “last” element associated with a key, I can 
manually clear the state
  *   I can also use the TTL on state and expire it and garbage collect it 
(with next full snapshot). This is useful when I never see the “last” element.



Is that the right strategy?



Also if an element arrives late (after the state has been cleared), how do I 
detect that the state has been cleared/expired so I can skip these late 
elements ? Is there an API that will give you the hint about cleared/expired 
state?



Thanks.



Ajay

Reply via email to