Re: Difference between table.exec.source.idle-timeout and setIdleStateRetentionTime ?

Dawid Wysakowicz Wed, 27 Jan 2021 00:41:24 -0800

Hey,

As for the MATCH_RECOGNIZE clause, I highly recommend applying a time
constraint[1]. The idle state retention time does not apply to the
MATCH_RECOGNIZE, but you can think of the time constraint as something
similar, but it is closer to the actual query logic.


If you are hitting FLINK-15160 unfortunately I don't have a good
solution for it. The only thing that comes to my mind is adding a
heartbeat event to the event stream to prune the partial matches, but I
understand it is quite invasive.

If you would be willing to help fixing the problem in FLINK, I could
also help review it and give pointers how it could be done.

Best,

Dawid

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/streaming/match_recognize.html#time-constraint

On 26/01/2021 17:39, Dcosta, Agnelo (HBO) wrote:
>
> Hi Dawid, thanks for the clarification and it helps a lot.
> Reply to couple of points :
>
> what is causing the state to grow?
> We are using flink SQL and have 5 pattern match queries , 3 group by
> tumble windows. State growth over time is primarily coming from
> pattern match queries.
>
> Is it ever growing keyspace?
> Yes. By design our keyspace is ever growing. The expectation is that
> messages for one key will come in for couple of hours, then stop
> coming in. We would never see messages from that key again. New keys
> are constantly coming in.
>
> Is it that a watermark does not progress?
> Watermark on the subtask level is constantly updating and is in sync
> with other subtasks. We have not seen any issues with watermark
> updating as such.
>
> Looking through mailing list archive, our problem seems similar to
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Memory-in-CEP-queries-keep-increasing-td31045.html
> https://issues.apache.org/jira/browse/FLINK-15160 : Clean up is not
> applied if there are no incoming events for a key.
>
> By design we can have partial matched states/matches in pattern match
> queries. And key space is such that no new event comes in for those
> partial matches.
>
> thanks.
>
>  
>
> *From: *Dawid Wysakowicz <dwysakow...@apache.org>
> *Date: *Tuesday, January 26, 2021 at 3:14 AM
> *To: *Dcosta, Agnelo (HBO) <agnelo.dco...@hbo.com>,
> user@flink.apache.org <user@flink.apache.org>
> *Subject: *Re: Difference between table.exec.source.idle-timeout and
> setIdleStateRetentionTime ?
>
> **External Email received from: dwysakow...@apache.org **
>
>  
>
> Hi,
>
> The difference is that the *table.exec.source.idle-timeout *is used
> for dealing with source idleness[1]. It is a problem that a watermark
> cannot advance if some of the partition become idle (do not produce
> any records). Watermark is always the minimum of watermarks of all
> input partitions. The setting makes flink ignore certain partitions in
> the calculation after the time threshold is reached.
>
> The IdleStateRetention is Table API specific. As described in the link
> you provided it removes entries from a state for keys that were not
> seen for a given time threshold.
>
> As for your issue, I'd recommend first investigating what is causing
> the state to grow. Is it ever growing keyspace? Is it that a watermark
> does not progress (this should manifest in results as well). Or is it
> something else.
>
> Best,
>
> Dawid
>
>  
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/event_timestamps_watermarks.html#dealing-with-idle-sources
>
> On 25/01/2021 20:12, Dcosta, Agnelo (HBO) wrote:
>
> Hi,
>
> What is the difference between *table.exec.source.idle-timeout* and
> *setIdleStateRetentionTime* ?
>
> table.exec.source.idle-timeout:
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/config.html#table-exec-source-idle-timeout
>
>  
>
> setIdleStateRetentionTime:
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/query_configuration.html#idle-state-retention-time
>
>  
>
> Some context:
> Hi we are using flink 1.12.
> Our checkpoint size is constantly increasing once app is deployed.
> After performing a restart, checkpoint size goes back to expected size.
> Looking at actual checkpoint files generated, it seems our app is
> holding on to state/events since the time app started up.
> Based on our sql, the maximum time we would need to hold state is 10
> minutes.
>
>  
>
> This e-mail is intended only for the use of the addressees. Any
> copying, forwarding, printing or other use of this e-mail by persons
> other than the addressees is not authorized. This e-mail may contain
> information that is privileged, confidential and exempt from
> disclosure. If you are not the intended recipient, please notify us
> immediately by return e-mail (including the original message in your
> reply) and then delete and discard all copies of the e-mail. Thank you.
> HB75
>

signature.asc
Description: OpenPGP digital signature

Re: Difference between table.exec.source.idle-timeout and setIdleStateRetentionTime ?

Reply via email to