[jira] [Commented] (FLINK-9138) Enhance BucketingSink to also flush data by time interval

ASF GitHub Bot (JIRA) Tue, 24 Apr 2018 07:51:04 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449984#comment-16449984
 ]


ASF GitHub Bot commented on FLINK-9138:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5860#discussion_r183753866
  
    --- Diff: 
flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java
 ---
    @@ -87,9 +87,11 @@
      * and a rolling counter. For example the file {@code "part-1-17"} 
contains the data from
      * {@code subtask 1} of the sink and is the {@code 17th} bucket created by 
that subtask. Per default
      * the part prefix is {@code "part"} but this can be configured using 
{@link #setPartPrefix(String)}.
    - * When a part file becomes bigger than the user-specified batch size the 
current part file is closed,
    - * the part counter is increased and a new part file is created. The batch 
size defaults to {@code 384MB},
    - * this can be configured using {@link #setBatchSize(long)}.
    + * When a part file becomes bigger than the user-specified batch size or 
when the part file becomes older
    + * than the user-specified roll over interval the current part file is 
closed,the part counter is increased
    --- End diff --
    
    Add space `closed,the` -> `closed, the`


> Enhance BucketingSink to also flush data by time interval
> ---------------------------------------------------------
>
>                 Key: FLINK-9138
>                 URL: https://issues.apache.org/jira/browse/FLINK-9138
>             Project: Flink
>          Issue Type: Improvement
>          Components: filesystem-connector
>    Affects Versions: 1.4.2
>            Reporter: Narayanan Arunachalam
>            Priority: Major
>
> BucketingSink now supports flushing data to the file system by size limit and 
> by period of inactivity. It will be useful to also flush data by a specified 
> time period. This way, the data will be written out when write throughput is 
> low but there is no significant time period gaps between the writes. This 
> reduces ETA for the data in the file system and should help move the 
> checkpoints faster as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-9138) Enhance BucketingSink to also flush data by time interval

Reply via email to