clintropolis commented on code in PR #19252: URL: https://github.com/apache/druid/pull/19252#discussion_r3053565307
########## server/src/main/java/org/apache/druid/server/coordinator/stats/DruidRunStats.java: ########## @@ -35,44 +35,44 @@ * runtime of a single coordinator duty. */ @ThreadSafe -public class CoordinatorRunStats +public class DruidRunStats Review Comment: I can't help but wonder if there is a better name for this, maybe `DutyRunStats` or something that means 'thing to collect stuff to emit later from regularly occurring internal chores'? I guess 'duty' isn't quite right because that is basically only used to refer to periodic coordinator tasks, not supervisor stuff. The javadoc still only mentions coordinator run/duties, which should be fixed. Also, it is kind of weird having the name `DruidRunStats` but the thing it has is still called `CoordinatorStat`, it seems like that naming should be changed to reflect the change here. Stepping back, what exactly is the motivation for renaming, i guess that compaction uses and runs as a supervisor now so it isn't really specific to the coordinator? While this is still used quite heavily by all of the things the coordinator does, it seems reasonable to give it a more generic name of some sort, I was just wondering if this one is a bit too generic, but maybe is fine too as long as the javadoc clarifies its purpose? Some addtional thoughts: I believe some of us would like to eventually merge the coordinator and overlord into a single service. Since they both now basically need a heavy segment timeline and so have similar footprint requirements, there aren't a lot of compelling reasons to keep them separate anymore. In my mind 'coordinator' would be the remaining service, with all of the overlords functionality merged into it (though this hasn't really been discussed so maybe other people have other opinions), so if that were true then this would basically become something only used by the coordinator again heh. There is a lot of work to do for something like this, so it is not really a short term goal afaik and needs more official discussion at some point, just adding it here for additional stuff to think about. ########## docs/data-management/cascading-reindexing.md: ########## @@ -36,15 +36,6 @@ For example, you might want to: Cascading reindexing handles all of this automatically by generating a timeline of compaction intervals and applying the appropriate rules to each interval. -## Prerequisites - -Before using cascading reindexing, ensure your cluster meets the following requirements: - -- **Compaction supervisors enabled**: Set `useSupervisors` to `true` in the [compaction dynamic config](../api-reference/automatic-compaction-api.md#update-cluster-level-compaction-config). -- **MSQ compaction engine**: Set `engine` to `msq` in the compaction dynamic config or in the supervisor spec. -- **Incremental segment metadata caching**: Set `druid.manager.segments.useIncrementalCache` to `always` or `ifSynced` in your Overlord and Coordinator runtime properties. See [Segment metadata caching](../configuration/index.md#metadata-retrieval). -- **At least two compaction task slots**: The MSQ task engine requires at least two tasks (one controller, one worker). - Review Comment: i think we need to leave part of this, no? the default engine is still native, so people still need to set engine to msq, and you need 2 compaction slots since its using msq engine ########## docs/data-management/automatic-compaction.md: ########## @@ -83,9 +80,135 @@ maximize performance and minimize disk usage of the `compact` tasks launched by For more details on each of the specs in an auto-compaction configuration, see [Automatic compaction dynamic configuration](../configuration/index.md#automatic-compaction-dynamic-configuration). +## Auto-compaction using compaction supervisors + +:::info +For advanced time-based data lifecycle management — such as coarsening segment granularity, deleting old rows, or changing compression as data ages — see [Cascading reindexing](cascading-reindexing.md). +::: + +The recommended way to run automatic compaction is using compaction supervisors on the Overlord. Compaction supervisors provide the following benefits: + +* Can use the supervisor framework to get information about the auto-compaction, such as status or state +* More easily suspend or resume compaction for a datasource +* Can use either the native compaction engine or the [MSQ task engine](#use-msq-for-auto-compaction) +* More reactive and submits tasks as soon as a compaction slot is available +* Tracked compaction task status to avoid re-compacting an interval repeatedly +* Uses new Indexing State Fingerprinting mechanisms to store less data per segment in metadata storage Review Comment: i know this isn't new, but by default we still store compaction state afaict (`ClusterCompactionConfig.storeCompactionStatePerSegment` defaults to true), so this should be reworded to be like 'can be configured to store only fingerprints' or whatever ########## docs/configuration/index.md: ########## @@ -800,14 +800,14 @@ These Coordinator static configurations can be defined in the `coordinator/runti ##### Metadata retrieval -|Property|Description|Default| -|--------|-----------|-------| -|`druid.manager.config.pollDuration`|How often the manager polls the config table for updates.|`PT1M`| -|`druid.manager.segments.pollDuration`|The duration between polls the Coordinator does for updates to the set of active segments. Generally defines the amount of lag time it can take for the Coordinator to notice new segments.|`PT1M`| -|`druid.manager.segments.useIncrementalCache`|(Experimental) Denotes the usage mode of the segment metadata incremental cache. This cache provides a performance improvement over the polling mechanism currently employed by the Coordinator as it retrieves payloads of only updated segments. Possible cache modes are: (a) `never`: Incremental cache is disabled. (b) `always`: Incremental cache is enabled. Service start-up will be blocked until cache has synced with the metadata store at least once. (c) `ifSynced`: Cache is enabled. This mode does not block service start-up and is a way to retain existing behavior of the Coordinator. If the incremental cache is in modes `always` or `ifSynced`, reads from the cache will block until it has synced with the metadata store at least once after becoming leader. The Coordinator never writes to this cache.|`never`| -|`druid.manager.rules.pollDuration`|The duration between polls the Coordinator does for updates to the set of active rules. Generally defines the amount of lag time it can take for the Coordinator to notice rules.|`PT1M`| -|`druid.manager.rules.defaultRule`|The default rule for the cluster|`_default`| -|`druid.manager.rules.alertThreshold`|The duration after a failed poll upon which an alert should be emitted.|`PT10M`| +|Property|Description| Default | Review Comment: what's up with all the unrelated formatting changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
