[jira] [Commented] (FLINK-11499) Extend StreamingFileSink BulkFormats to support arbitrary roll policies

Piotr Nowojski (Jira) Thu, 02 Apr 2020 23:59:10 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074322#comment-17074322
 ]


Piotr Nowojski commented on FLINK-11499:
----------------------------------------

{quote}
Have you consider using checkpoint stream? I think the checkpoint state backend 
is the closest storage for job.
{quote}
[~lzljs3620320]: Not directly, but I'm hoping the solution be general enough, 
that one could pass the same target FileSystem for the WAL stream as for 
checkpointing. Wouldn't that achieve the same result but on a lower level 
(writing to a file directly vs to state)?
{quote}
Some writers which come with buffer size/capacity criteria may get full even 
before the checkpoint is triggered. We have to think about this scenario as 
well, right? 
{quote}
[~zenfenan]: maybe you are right. I was already thinking about it as I suspect 
it's even currently supported that bulk format file will roll in the middle of 
checkpoint? If that's the case, it's a matter of keeping this feature and it 
shouldn't be that difficult, as we should already have the code to support it.

> Extend StreamingFileSink BulkFormats to support arbitrary roll policies
> -----------------------------------------------------------------------
>
>                 Key: FLINK-11499
>                 URL: https://issues.apache.org/jira/browse/FLINK-11499
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem
>            Reporter: Seth Wiesman
>            Priority: Major
>              Labels: usability
>             Fix For: 1.11.0
>
>
> Currently when using the StreamingFilleSink Bulk-encoding formats can only be 
> combined with the `OnCheckpointRollingPolicy`, which rolls the in-progress 
> part file on every checkpoint.
> However, many bulk formats such as parquet are most efficient when written as 
> large files; this is not possible when frequent checkpointing is enabled. 
> Currently the only work-around is to have long checkpoint intervals which is 
> not ideal.
>  
> The StreamingFileSink should be enhanced to support arbitrary roll policy's 
> so users may write large bulk files while retaining frequent checkpoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-11499) Extend StreamingFileSink BulkFormats to support arbitrary roll policies

Reply via email to