Thanks for the clarification. My experiments have been in line with what you have suggested.
Regards. On Mon, Apr 4, 2022 at 5:30 AM Arvid Heise <ar...@apache.org> wrote: > Hi Vishal, > > with readFile, files are first collected and then sorted [1]. The same is > true for the new FileSource. Here, you could plugin your own Enumerator to > output files in chunks but then you need to continuously pull more and > can't use batch mode. > > We are happy to receive any patch for that behavior (for the new source). > > [1] > https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/ContinuousFileMonitoringFunction.java#L259-L261 > > On Mon, Apr 4, 2022 at 12:07 AM Vishal Santoshi <vishal.santo...@gmail.com> > wrote: > >> Folks, >> I am doing a simple batch job that uses readFile() with >> "s3a://[bucket_name]" as the path with setNestedFileEnumeration(true). I am >> a little curious about a few things. >> >> In batch mode which I think is turned on by >> FileProcessingMode.PROCESS_ONCE mode does the source list all the S3 >> objects in the bucket to create input splits *before* it calls >> downstream operators ? >> >> >> >> >> Thanks. >> >> >> >> >> >>