Thanks for the clarification.
My experiments have been in line with what you have suggested.
Regards.
On Mon, Apr 4, 2022 at 5:30 AM Arvid Heise wrote:
> Hi Vishal,
>
> with readFile, files are first collected and then sorted [1]. The same is
> true for the new FileSource. Here, you could
Hi Vishal,
with readFile, files are first collected and then sorted [1]. The same is
true for the new FileSource. Here, you could plugin your own Enumerator to
output files in chunks but then you need to continuously pull more and
can't use batch mode.
We are happy to receive any patch for that b
Hi,
in a unified stream/batch FileSource there is a processStaticFileSet() method
to enumerate all the splits only once,
and make Source complete when it's finished.
As for my own experience using the processStaticFileSet with large s3 buckets,
the enumeration seems to happen on the jobmanager
Folks,
I am doing a simple batch job that uses readFile() with
"s3a://[bucket_name]" as the path with setNestedFileEnumeration(true). I am
a little curious about a few things.
In batch mode which I think is turned on by FileProcessingMode.PROCESS_ONCE
mode does the source list all the S3 o