Hi Community, I have started using file source of Flink 1.17.x recently. I was going through the FLIP-27 documentation and as much I understand SplitEnumerator lists files (splits) and assigns to SourceReader. A single instance of SplitEnumerator runs whereas parallelism can be done on SourceReader side. I have below queries on same:
1. Who actually downloads the file (let's say the file is on S3)? Is it SplitEnumerator which downloads the files and then assign the splits to SourceReaders OR it only lists and give the path of file in split to SourceReader, which downloads the file and process? 1. Is the complete file downloaded in one go? OR chunked downloading is also possible? 1. I got that SplitEnumerator can be run on JobManager OR on single instance of TaskManager. How a user can configure it where to run? 1. Is there any memory footprint impact if FileSource is running in streaming mode (continuous streaming)? Thanks for any help! Regards, Kirti Dhar