Hey Druid community,

I read an old Druid document about segment state, it says “unpublished and 
unavailable isn’t possible”, which caught my eyes. I read through some source 
code in Native Batch Parallel task:

1. `PartialSegmentMergeTask` constructs the final segments and push to Deep 
Storage. (This is the third phase in the native batch parallel ingestion)
2. `ParallelIndexSupervisorTask` calls publishSegments to publish segments to 
metadata store (This is a transactional operation)

However, these two operations are in two independent steps. Then I found the 
`publishSegments` method relies on reports which is an in-memory HashMap from 
ParallelIndexPhaseRunner. If a segment is pushed, and Overlord is crashed just 
after it, the reports will be lost as it’s in-memory, then no publish will 
happen to those pushed segments. Correct me if I am wrong.

So my questions are:

1. Is it possible for a segment being pushed to Deep Storage, but failed to 
publish? Will Druid drop these segments in Deep Storage or new Overlord 
continues publishing it after Overlord leader switching?
2. If the above is true, how do we do disaster recovery?

Some context on why we think about the state of “unpublished and unavailable”. 
We have some datasources that contain PII sensitive data which require the 
“KILL” task to clean up overshadowed segments. The pre-requisite of setting up 
metrics tracking “unused” segments from metadata store is that Deep Storage and 
metadata store are eventually consistency. It’s important that no segment is 
left in Deep Storage but unpublished to metadata store.

Thank you!

Jianbin Chen // Sr. Software Engineer @ Shopify

Reply via email to