Hi, in a unified stream/batch FileSource there is a processStaticFileSet() method to enumerate all the splits only once, and make Source complete when it's finished.
As for my own experience using the processStaticFileSet with large s3 buckets, the enumeration seems to happen on the jobmanager, and listing bucket with a 1B items will probably block it for a long time - this happens on 1.14, not sure how it's going on an upcoming 1.15. with best regards, Roman Grebennikov | g...@dfdx.me On Sun, Apr 3, 2022, at 22:07, Vishal Santoshi wrote: > Folks, > I am doing a simple batch job that uses readFile() with > "s3a://[bucket_name]" as the path with setNestedFileEnumeration(true). I am a > little curious about a few things. > > In batch mode which I think is turned on by FileProcessingMode.PROCESS_ONCE > mode does the source list all the S3 objects in the bucket to create input > splits *before* it calls downstream operators ? > > > > > Thanks. > > > > >