Re: [PR] docs: Recommend Overlord-based auto-compaction and mark useIncrementalCache production ready (druid)

via GitHub Wed, 08 Apr 2026 14:48:34 -0700


clintropolis commented on code in PR #19252:
URL: https://github.com/apache/druid/pull/19252#discussion_r3053565307



##########
server/src/main/java/org/apache/druid/server/coordinator/stats/DruidRunStats.java:
##########
@@ -35,44 +35,44 @@
  * runtime of a single coordinator duty.
  */
 @ThreadSafe
-public class CoordinatorRunStats
+public class DruidRunStats

Review Comment:
   I can't help but wonder if there is a better name for this, maybe 
`DutyRunStats` or something that means 'thing to collect stuff to emit later 
from regularly occurring internal chores'? I guess 'duty' isn't quite right 
because that is basically only used to refer to periodic coordinator tasks, not 
supervisor stuff. The javadoc still only mentions coordinator run/duties, which 
should be fixed.
   
   Also, it is kind of weird having the name `DruidRunStats` but the thing it 
has is still called `CoordinatorStat`, it seems like that naming should be 
changed to reflect the change here.
   
   Stepping back, what exactly is the motivation for renaming, i guess that 
compaction uses and runs as a supervisor now so it isn't really specific to the 
coordinator? While this is still used quite heavily by all of the things the 
coordinator does, it seems reasonable to give it a more generic name of some 
sort, I was just wondering if this one is a bit too generic, but maybe is fine 
too as long as the javadoc clarifies its purpose?
   
   Some addtional thoughts: I believe some of us would like to eventually merge 
the coordinator and overlord into a single service. Since they both now 
basically need a heavy segment timeline and so have similar footprint 
requirements, there aren't a lot of compelling reasons to keep them separate 
anymore. In my mind 'coordinator' would be the remaining service, with all of 
the overlords functionality merged into it (though this hasn't really been 
discussed so maybe other people have other opinions), so if that were true then 
this would basically become something only used by the coordinator again heh. 
There is a lot of work to do for something like this, so it is not really a 
short term goal afaik and needs more official discussion at some point, just 
adding it here for additional stuff to think about.
   
   
   



##########
docs/data-management/cascading-reindexing.md:
##########
@@ -36,15 +36,6 @@ For example, you might want to:
 
 Cascading reindexing handles all of this automatically by generating a 
timeline of compaction intervals and applying the appropriate rules to each 
interval.
 
-## Prerequisites
-
-Before using cascading reindexing, ensure your cluster meets the following 
requirements:
-
-- **Compaction supervisors enabled**: Set `useSupervisors` to `true` in the 
[compaction dynamic 
config](../api-reference/automatic-compaction-api.md#update-cluster-level-compaction-config).
-- **MSQ compaction engine**: Set `engine` to `msq` in the compaction dynamic 
config or in the supervisor spec.
-- **Incremental segment metadata caching**: Set 
`druid.manager.segments.useIncrementalCache` to `always` or `ifSynced` in your 
Overlord and Coordinator runtime properties. See [Segment metadata 
caching](../configuration/index.md#metadata-retrieval).
-- **At least two compaction task slots**: The MSQ task engine requires at 
least two tasks (one controller, one worker).
-

Review Comment:
   i think we need to leave part of this, no? the default engine is still 
native, so people still need to set engine to msq, and you need 2 compaction 
slots since its using msq engine



##########
docs/data-management/automatic-compaction.md:
##########
@@ -83,9 +80,135 @@ maximize performance and minimize disk usage of the 
`compact` tasks launched by
 
 For more details on each of the specs in an auto-compaction configuration, see 
[Automatic compaction dynamic 
configuration](../configuration/index.md#automatic-compaction-dynamic-configuration).
 
+## Auto-compaction using compaction supervisors
+
+:::info
+For advanced time-based data lifecycle management — such as coarsening segment 
granularity, deleting old rows, or changing compression as data ages — see 
[Cascading reindexing](cascading-reindexing.md).
+:::
+
+The recommended way to run automatic compaction is using compaction 
supervisors on the Overlord. Compaction supervisors provide the following 
benefits:
+
+* Can use the supervisor framework to get information about the 
auto-compaction, such as status or state
+* More easily suspend or resume compaction for a datasource
+* Can use either the native compaction engine or the [MSQ task 
engine](#use-msq-for-auto-compaction)
+* More reactive and submits tasks as soon as a compaction slot is available
+* Tracked compaction task status to avoid re-compacting an interval repeatedly
+* Uses new Indexing State Fingerprinting mechanisms to store less data per 
segment in metadata storage

Review Comment:
   i know this isn't new, but by default we still store compaction state afaict 
(`ClusterCompactionConfig.storeCompactionStatePerSegment` defaults to true), so 
this should be reworded to be like 'can be configured to store only 
fingerprints' or whatever



##########
docs/configuration/index.md:
##########
@@ -800,14 +800,14 @@ These Coordinator static configurations can be defined in 
the `coordinator/runti
 
 ##### Metadata retrieval
 
-|Property|Description|Default|
-|--------|-----------|-------|
-|`druid.manager.config.pollDuration`|How often the manager polls the config 
table for updates.|`PT1M`|
-|`druid.manager.segments.pollDuration`|The duration between polls the 
Coordinator does for updates to the set of active segments. Generally defines 
the amount of lag time it can take for the Coordinator to notice new 
segments.|`PT1M`|
-|`druid.manager.segments.useIncrementalCache`|(Experimental) Denotes the usage 
mode of the segment metadata incremental cache. This cache provides a 
performance improvement over the polling mechanism currently employed by the 
Coordinator as it retrieves payloads of only updated segments. Possible cache 
modes are: (a) `never`: Incremental cache is disabled. (b) `always`: 
Incremental cache is enabled. Service start-up will be blocked until cache has 
synced with the metadata store at least once. (c) `ifSynced`: Cache is enabled. 
This mode does not block service start-up and is a way to retain existing 
behavior of the Coordinator. If the incremental cache is in modes `always` or 
`ifSynced`, reads from the cache will block until it has synced with the 
metadata store at least once after becoming leader. The Coordinator never 
writes to this cache.|`never`|
-|`druid.manager.rules.pollDuration`|The duration between polls the Coordinator 
does for updates to the set of active rules. Generally defines the amount of 
lag time it can take for the Coordinator to notice rules.|`PT1M`|
-|`druid.manager.rules.defaultRule`|The default rule for the cluster|`_default`|
-|`druid.manager.rules.alertThreshold`|The duration after a failed poll upon 
which an alert should be emitted.|`PT10M`|
+|Property|Description| Default    |

Review Comment:
   what's up with all the unrelated formatting changes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] docs: Recommend Overlord-based auto-compaction and mark useIncrementalCache production ready (druid)

Reply via email to