Hi Ajay, >From your description, I think watermarks[1], which indicates all earlier >events have been arrived, might meet your requests in a way. But this means >you should use windows and have event-time in your stream job.
If you don't want to introduce the concept of window, I think you can use 'KeyedStateBackend#applyToAllKeys' to manually clear the target state when you see the "last" element, and record the cleared state name into a pre-definied operator state, so that arrived late elements could be skipped. Just be careful to not let the list in operator state not so large, e.g. only keep a fixed size of expired states. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/event_timestamps_watermarks.html Best Yun Tang ________________________________ From: Aggarwal, Ajay <[email protected]> Sent: Tuesday, February 5, 2019 22:54 To: [email protected] Subject: late element and expired state Hello, I have some questions regarding best practices to deal with ever expanding state with KeyBy(). In my input stream I will continue to see new keys. And I am using Keyed state. How do I keep the total state in limit? After reading the flink documentation and some blogs I am planning to use following : * When I know I have seen the “last” element associated with a key, I can manually clear the state * I can also use the TTL on state and expire it and garbage collect it (with next full snapshot). This is useful when I never see the “last” element. Is that the right strategy? Also if an element arrives late (after the state has been cleared), how do I detect that the state has been cleared/expired so I can skip these late elements ? Is there an API that will give you the hint about cleared/expired state? Thanks. Ajay
