Ying Xu created FLINK-13027: ------------------------------- Summary: StreamingFileSink bulk-encoded writer supports file rolling upon customized events Key: FLINK-13027 URL: https://issues.apache.org/jira/browse/FLINK-13027 Project: Flink Issue Type: New Feature Components: API / DataStream Reporter: Ying Xu
When writing in bulk-encoded format such as Parquet, StreamingFileSink only supports OnCheckpointRollingPolicy, which rolls file at checkpointing time. In many scenarios, it is beneficial that the sink can roll file upon certain events, for example, when the file size reaches a limit. Such a rolling policy can also potentially alleviate some of the side effects of OnCheckpointRollingPolicy, e.g.,, most of the heavy liftings including file uploading all happen at the checkpoint time. Specifically, this Jira calls for a new rolling policy that rolls file: # whenever a customized event happens, e.g., the file size reaches certain limit. # whenever a checkpoint happens. This is needed for providing exactly-once guarantees when writing bulk-encoded files. Users of this rolling policy need to be aware that the customized event and the next checkpoint epoch may be close to each other, thus may yield a tiny file per checkpoint at the worst. -- This message was sent by Atlassian JIRA (v7.6.3#76005)