Hi!

Do you mean the pathsAlreadyProcessed set in ContinuousFileSplitEnumerator?

This is because ContinuousFileSplitEnumerator has to continuously add new
files to splitAssigner, while StaticFileSplitEnumerator does not.
The pathsAlreadyProcessed set records the paths already discovered
by ContinuousFileSplitEnumerator so that it will not add the same file to
splitAssigner twice. For StaticFileSplitEnumerator it does not need to
discover new files and all files have already been recorded in its
splitAssigner so it does not need the pathsAlreadyProcessed set.

For more detailed logic check the caller of the constructors of both
enumerators.

Krzysztof Chmielewski <krzysiek.chmielew...@gmail.com> 于2022年1月6日周四 07:04写道:

> Hi,
> Why StaticFileSplitEnumerator from FileSource does not track the already
> processed files similar to how ContinuousFileSplitEnumerator does?
>
> I'm thinking about scenario where we have a Bounded FileSource that reads
> a lot of files using FileSource and stream it's content to Kafka.If there
> will be a Job/cluster restart then we will process same files again.
>
> Regards,
> Krzysztof Chmielewski
>

Reply via email to