Hi, Kevin

I have two minor tips that you can have a try.

1. check the severity of the skewed data and try to solve it at the logic,
or reduce the skew by keyby multiple times
2. increase the checkpoint timeout appropriately

Best,
Weihua


On Fri, Jul 1, 2022 at 9:29 AM yuxia <luoyu...@alumni.sjtu.edu.cn> wrote:

> Streaming file sink  will write to s3 when processing element. But it's
> just temporary file. Only after one  successful checkpoint (more exactly,
> once recieve a notification for successful checkpoint), will it commit
> these temporary files written since last successful checkpoint .
>
> Best regards,
> Yuxia
>
> ------------------------------
> *发件人: *"Xin Ma" <kevin.xin...@gmail.com>
> *收件人: *"User" <user@flink.apache.org>
> *发送时间: *星期四, 2022年 6 月 30日 下午 11:05:51
> *主题: *StreamingFileSink & checkpoint tuning
>
> Hi,
>
> I recently encountered an issue while using StreamingFileSink.
> I have a flink job consuming records from various sources and write to s3
> with streaming file sink. But the job sometimes fails due to checkpoint
> timeout, and the root cause is checkpoint alignment failure as there is
> data skewness between different data sources.
>
> I don't want to enable unaligned checkpointing but prefer to do some
> checkpoint tuning first.
>
> My current checkpoint interval is 1 min and timeout is also 1 min. I wanna
> increase *tolerable checkpoint failure number* to 5, as I believe the
> unaligned subtasks will definitely update their watermark in 5 minutes. My
> question is, will streaming file sink still writes to s3 even if the
> checkpoint fails or just wait until next successful checkpoint? (as if we
> don't tolerate checkpoint failure, the job will simply restart from last
> successful checkpoint)
>
>
> Thanks.
>
> Best,
> Kevin
>
>

Reply via email to