Hi!

FileEnumerator never reads the actual content of a file. FileEnumerator
lives in job managers and it only reads the necessary meta-data of the file
(for example how large is the file) so that it can split the work across
all task managers. Corresponding file readers, in the other hand, lives in
task managers and perform the exact reading work. They accept file splits
assigned to them and read the contents corresponding to these splits.

Meghajit Mazumdar <meghajit.mazum...@gojek.com> 于2022年1月27日周四 16:57写道:

> Hello,
>
> I had a question about the FileSource in Flink 1.14
> <https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/file/src/FileSource.html>
> .
>
> Considering FileSource is set to read from a remote GCS URL, I could read
> and understand that the FileEnumerator is actually responsible for
> discovering the files under the URL.
>
> However, how does the FileSource, and thus the FileEnumerator, generate
> the splits when a remote URL is used ? Does it:
> 1. download all the files eagerly and then generate the splits ?, or
> 2. only downloads and generates the splits when the source reader asks for
> splits ?, or
> 3. doesn't download but only streams the data from the remote as required ?
>
> Would be great if somebody could help me out. Thanks !
>
> *Regards,*
> *Meghajit*
>

Reply via email to