Hi,
in documentation for SplitEnumeratorContext::callAsync method we read that:

"(...)  When this method is invoked multiple times, The Callables may be
executed in a thread pool concurrently.
It is important to make sure that the callable does not modify any shared
state, especially the states that will be a part of the
SplitEnumerator.snapshotState(long) (...)"

The ContinuousHiveSplitEnumerator::start method exectutes below code:

enumeratorContext.callAsync(
        monitor, this::handleNewSplits, discoveryInterval, discoveryInterval);

The handleNewSplits method does this:

this.seenPartitionsSinceOffset = newSplitsAndState.seenPartitions;

My question is, on how many threads "monitor" callable will be executed?
If more than one (and this is possible accordingly to the callAsync
documentation) then
I think that this reference assignment in handleNewSplits method could lead
to bugs. Since both threads that were executing monitor callable can return
totally different collection.

For ContinuousFileSplitEnumerator this was resolved in a different way.
Filtering of already processed paths is done by Source-Coordinator thread
and there is no reference assignment.

Regards,
Krzysztof Chmielewski

Reply via email to