SplitEnumerator and SourceReader

Kirti Dhar Upadhyay K via user Thu, 20 Apr 2023 05:29:57 -0700

Hi Community,

I have started using file source of Flink 1.17.x recently.
I was going through the FLIP-27 documentation and as much I understand 
SplitEnumerator lists files (splits) and assigns to SourceReader. A single 
instance of SplitEnumerator  runs whereas parallelism can be done on 
SourceReader side. I have below queries on same:



  1.  Who actually downloads the file (let's say the file is on S3)? Is it 
SplitEnumerator which downloads the files and then assign the splits to 
SourceReaders OR it only lists and give the path of file in split to 
SourceReader, which downloads the file and process?


  1.  Is the complete file downloaded in one go? OR chunked downloading is also 
possible?



  1.  I got that SplitEnumerator can be run on JobManager OR on single instance 
of TaskManager. How a user can configure it where to run?



  1.  Is there any memory footprint impact if FileSource is running in 
streaming mode (continuous streaming)?


Thanks for any help!

Regards,
Kirti Dhar

SplitEnumerator and SourceReader

Reply via email to