[ https://issues.apache.org/jira/browse/FLINK-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440162#comment-16440162 ]
ASF GitHub Bot commented on FLINK-9138: --------------------------------------- GitHub user glaksh100 opened a pull request: https://github.com/apache/flink/pull/5860 [FLINK-9138][filesystem-connectors] Implement time based rollover in BucketingSink ## What is the purpose of the change This pull request enables a time-based rollover of the part file in the BucketingSink. This is particularly applicable when when write throughput is low and helps data become available at a fixed interval, for consumption. ## Brief change log - Add a `batchRolloverInterval` field with a setter - Track a `firstWrittenToTime` for the bucket state - Check for `currentProcessingTime` - `firstWrittenToTime` > `batchRolloverInterval` and roll over if true ## Verifying this change This change added tests and can be verified as follows: - Added a `testRolloverInterval` test method to the `BucketingSinkTest` ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (**yes** / no) - If yes, how is the feature documented? (not applicable / docs / **JavaDocs** / not documented) You can merge this pull request into a Git repository by running: $ git pull https://github.com/glaksh100/flink FLINK-9138.bucketingSinkRolloverInterval Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5860.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5860 ---- commit fee3ba293f4db4ad2d39b4ac0f3993711da9bda6 Author: Lakshmi Gururaja Rao <lgururajarao@...> Date: 2018-04-16T23:31:49Z [FLINK-9138] Implement time based rollover of part file in BucketingSink ---- > Enhance BucketingSink to also flush data by time interval > --------------------------------------------------------- > > Key: FLINK-9138 > URL: https://issues.apache.org/jira/browse/FLINK-9138 > Project: Flink > Issue Type: Improvement > Components: filesystem-connector > Affects Versions: 1.4.2 > Reporter: Narayanan Arunachalam > Priority: Major > > BucketingSink now supports flushing data to the file system by size limit and > by period of inactivity. It will be useful to also flush data by a specified > time period. This way, the data will be written out when write throughput is > low but there is no significant time period gaps between the writes. This > reduces ETA for the data in the file system and should help move the > checkpoints faster as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)