Hey, As for the MATCH_RECOGNIZE clause, I highly recommend applying a time constraint[1]. The idle state retention time does not apply to the MATCH_RECOGNIZE, but you can think of the time constraint as something similar, but it is closer to the actual query logic.
If you are hitting FLINK-15160 unfortunately I don't have a good solution for it. The only thing that comes to my mind is adding a heartbeat event to the event stream to prune the partial matches, but I understand it is quite invasive. If you would be willing to help fixing the problem in FLINK, I could also help review it and give pointers how it could be done. Best, Dawid [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/streaming/match_recognize.html#time-constraint On 26/01/2021 17:39, Dcosta, Agnelo (HBO) wrote: > > Hi Dawid, thanks for the clarification and it helps a lot. > Reply to couple of points : > > what is causing the state to grow? > We are using flink SQL and have 5 pattern match queries , 3 group by > tumble windows. State growth over time is primarily coming from > pattern match queries. > > Is it ever growing keyspace? > Yes. By design our keyspace is ever growing. The expectation is that > messages for one key will come in for couple of hours, then stop > coming in. We would never see messages from that key again. New keys > are constantly coming in. > > Is it that a watermark does not progress? > Watermark on the subtask level is constantly updating and is in sync > with other subtasks. We have not seen any issues with watermark > updating as such. > > Looking through mailing list archive, our problem seems similar to > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Memory-in-CEP-queries-keep-increasing-td31045.html > https://issues.apache.org/jira/browse/FLINK-15160 : Clean up is not > applied if there are no incoming events for a key. > > By design we can have partial matched states/matches in pattern match > queries. And key space is such that no new event comes in for those > partial matches. > > thanks. > > > > *From: *Dawid Wysakowicz <dwysakow...@apache.org> > *Date: *Tuesday, January 26, 2021 at 3:14 AM > *To: *Dcosta, Agnelo (HBO) <agnelo.dco...@hbo.com>, > user@flink.apache.org <user@flink.apache.org> > *Subject: *Re: Difference between table.exec.source.idle-timeout and > setIdleStateRetentionTime ? > > **External Email received from: dwysakow...@apache.org ** > > > > Hi, > > The difference is that the *table.exec.source.idle-timeout *is used > for dealing with source idleness[1]. It is a problem that a watermark > cannot advance if some of the partition become idle (do not produce > any records). Watermark is always the minimum of watermarks of all > input partitions. The setting makes flink ignore certain partitions in > the calculation after the time threshold is reached. > > The IdleStateRetention is Table API specific. As described in the link > you provided it removes entries from a state for keys that were not > seen for a given time threshold. > > As for your issue, I'd recommend first investigating what is causing > the state to grow. Is it ever growing keyspace? Is it that a watermark > does not progress (this should manifest in results as well). Or is it > something else. > > Best, > > Dawid > > > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/event_timestamps_watermarks.html#dealing-with-idle-sources > > On 25/01/2021 20:12, Dcosta, Agnelo (HBO) wrote: > > Hi, > > What is the difference between *table.exec.source.idle-timeout* and > *setIdleStateRetentionTime* ? > > table.exec.source.idle-timeout: > https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/config.html#table-exec-source-idle-timeout > > > > setIdleStateRetentionTime: > https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/query_configuration.html#idle-state-retention-time > > > > Some context: > Hi we are using flink 1.12. > Our checkpoint size is constantly increasing once app is deployed. > After performing a restart, checkpoint size goes back to expected size. > Looking at actual checkpoint files generated, it seems our app is > holding on to state/events since the time app started up. > Based on our sql, the maximum time we would need to hold state is 10 > minutes. > > > > This e-mail is intended only for the use of the addressees. Any > copying, forwarding, printing or other use of this e-mail by persons > other than the addressees is not authorized. This e-mail may contain > information that is privileged, confidential and exempt from > disclosure. If you are not the intended recipient, please notify us > immediately by return e-mail (including the original message in your > reply) and then delete and discard all copies of the e-mail. Thank you. > HB75 >
signature.asc
Description: OpenPGP digital signature