aho135 commented on code in PR #19372:
URL: https://github.com/apache/druid/pull/19372#discussion_r3150398648


##########
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java:
##########
@@ -4255,6 +4418,23 @@ private OrderedSequenceNumber<SequenceOffsetType> 
getOffsetFromStorageForPartiti
       }

Review Comment:
   Thanks for the review @FrankChen021!
   
   > Bounded mode should either clear/namespace old metadata for the run
   
   I'm a bit wary of automatic cleanup of metadata. I'm thinking through the 
scenario where a cluster operator has a running Supervisor. They want to 
re-ingest some older data so they resubmit the exact same spec (forgetting to 
update `id`) so the Bounded Supervisor succeeds but the previously committed 
offset gets lost.
   
   > explicitly prefer the configured start when initializing the bounded task 
group.
   
   This falls into the same issue as above where if the operator forgets to set 
the `id` to something different than the running Supervisor then the previous 
committed offset is lost forever
   
   I'm leaning towards adding validation that if metadata already exists for 
the `id` then just throw an exception and suggest the operator to resubmit with 
a different `id` or reset the Supervisor. Curious to hear your thoughts on this 
approach. Thanks again



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to