aho135 commented on code in PR #19372:
URL: https://github.com/apache/druid/pull/19372#discussion_r3150398648
##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java:
##########
@@ -4255,6 +4418,23 @@ private OrderedSequenceNumber<SequenceOffsetType>
getOffsetFromStorageForPartiti
}
Review Comment:
Thanks for the review @FrankChen021!
> Bounded mode should either clear/namespace old metadata for the run
I'm a bit wary of automatic cleanup of metadata. I'm thinking through the
scenario where a cluster operator has a running Supervisor. They want to
re-ingest some older data so they resubmit the exact same spec (forgetting to
update `id`) so the Bounded Supervisor succeeds but the previously committed
offset gets lost.
> explicitly prefer the configured start when initializing the bounded task
group.
This falls into the same issue as above where if the operator forgets to set
the `id` to something different than the running Supervisor then the previous
committed offset is lost forever
I'm leaning towards adding validation that if metadata already exists for
the `id` then just throw an exception and suggest the operator to resubmit with
a different `id` or reset the Supervisor. Curious to hear your thoughts on this
approach. Thanks again
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]