Hi Vishal, with readFile, files are first collected and then sorted [1]. The same is true for the new FileSource. Here, you could plugin your own Enumerator to output files in chunks but then you need to continuously pull more and can't use batch mode.
We are happy to receive any patch for that behavior (for the new source). [1] https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileMonitoringFunction.java#L259-L261 On Mon, Apr 4, 2022 at 12:07 AM Vishal Santoshi <vishal.santo...@gmail.com> wrote: > Folks, > I am doing a simple batch job that uses readFile() with > "s3a://[bucket_name]" as the path with setNestedFileEnumeration(true). I am > a little curious about a few things. > > In batch mode which I think is turned on by > FileProcessingMode.PROCESS_ONCE mode does the source list all the S3 > objects in the bucket to create input splits *before* it calls downstream > operators ? > > > > > Thanks. > > > > > >