Hi,

1. In case of S3 FileSystem, Flink uses the multipart upload process [1]
for better performance. It might not be obvious at first by looking at the
docs, but it's noted at the bottom of the FileSystem page [2]
For more information you can also check FLINK-9751 and FLINK-9752

2. In case of local FileSystem it always starts with a dot according to
LocalRecoverableWriter [3] but make sure to check the implementation of
RecoverableWriter for the FileSystem you want to use.

Regards,
Mate

[1] https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/filesystem/#s3-specific
[3]
https://github.com/apache/flink/blob/1e0b58aa8d962469fa9dd7b470037aeaece43500/flink-core/src/main/java/org/apache/flink/core/fs/local/LocalRecoverableWriter.java#L129

Chirag Dewan via user <user@flink.apache.org> ezt írta (időpont: 2023.
márc. 29., Sze, 9:07):

> Hi,
>
>
>
> We are tying to use Flink's File sink to distribute files to AWS S3
> storage. We are using Flink provided Hadoop s3a connector as plugin.
>
> We have some observations that we needed to clarify:
>
> 1. When using file sink for local filesystem distribution, we can see that
> the sink creates 3 sets of files - in progress, pending (on rolling) and
> finished (upon checkpointing). But with S3 file sink we can see only the
> finished files, in the S3 buckets.
>
> So we wanted to understand where does the sink creates the in-progress and
> pending files for S3 file sink ?
>
>
> 2. We can also see with local file system sink, the in-progress and
> pending file names follow the nomenclature:
> .<prefix>-<uid>-<partFileIndex>.inprogress.uid-<suffix>
>
> There is a dot at the begining of the filename, may be flink is trying to
> create these files as hidden files. But in the flink documentation this is
> not mentioned.
>
> So can we assume that the in-progress and pending filenames shall always
> start with a dot ?
>
> thanks a lot in advance
>
>
>

Reply via email to