Hi, Meghajit

1. From the implementation [1] the order of split depends on the
implementation of the FileSystem.

2. From the implementation [2] the order of the file also depends on the
implementation of the FileSystem.

3. Currently there is no such public interface ,which you could extend to
implement your own strategy. Would you like to share the specific problem
you currently meet?

3. `FileSource` supports checkpoints. I think the watermark is a general
mechanism so you could read the related documentation[3].

[1]
https://github.com/apache/flink/blob/355b165859aebaae29b6425023d352246caa0613/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/enumerate/BlockSplittingRecursiveEnumerator.java#L141

[2]
https://github.com/apache/flink/blob/d33c39d974f08a5ac520f220219ecb0796c9448c/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/enumerate/NonSplittingRecursiveEnumerator.java#L102

[3]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/event-time/generating_watermarks/
Best,
Guowei


On Wed, Jan 19, 2022 at 6:06 PM Meghajit Mazumdar <
meghajit.mazum...@gojek.com> wrote:

> Hello,
>
> We are using FileSource
> <https://nightlies.apache.org/flink/flink-docs-release-1.14/api/java/> to
> process Parquet Files and had a few doubts around it. Would really
> appreciate if somebody can help answer them:
>
> 1. For a given file, does FileSource read the contents inside it in order
> ? In other words, what is the order in which the file splits are generated
> from the contents of the file ?
>
> 2. We want to provide a GCS Bucket URL to the FileSource so that it can
> read parquet files from there. The bucket has multiple parquet files.
> Wanted to know, what is the order in which the files will be picked and
> processed by this FileSource ? Can we provide an order strategy ourselves,
> say, process according to creation time ?
>
> 3. Is it possible/good practice to apply checkpointing and watermarking
> for a bounded source like FileSource ?
>
> --
> *Regards,*
> *Meghajit*
>

Reply via email to