[ https://issues.apache.org/jira/browse/FLINK-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16465949#comment-16465949 ]
ASF GitHub Bot commented on FLINK-9138: --------------------------------------- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/5860 Hi @glaksh100, I just noticed that the bucket closing check is only done when a record is written. Hence, inactive buckets might not get closed in time if a larger inactive bucket interval is configured. In some sense, the new feature is an extended version of the inactive bucket closing feature. How should we handle that case? 1. throw an exception during configuration, i.e., when `setInactiveBucketThreshold()` and `setBatchRolloverInterval()` are called. 2. configure the inactive bucket interval to be at least the rollover interval in case it is configured larger and continue. We should also make sure that the check interval is configured appropriately. I'm leaning towards the first approach. It would make the misconfiguration obvious to the user and fail the program before it is submitted. What do you think? Best, Fabian > Enhance BucketingSink to also flush data by time interval > --------------------------------------------------------- > > Key: FLINK-9138 > URL: https://issues.apache.org/jira/browse/FLINK-9138 > Project: Flink > Issue Type: Improvement > Components: filesystem-connector > Affects Versions: 1.4.2 > Reporter: Narayanan Arunachalam > Priority: Major > > BucketingSink now supports flushing data to the file system by size limit and > by period of inactivity. It will be useful to also flush data by a specified > time period. This way, the data will be written out when write throughput is > low but there is no significant time period gaps between the writes. This > reduces ETA for the data in the file system and should help move the > checkpoints faster as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)