Hi Jun, Thank you for sharing your questions, please find my answers below.
41. There can only be user partitions on `metadata.log.dir` if that log dir is also listed in `log.dirs`. `LogManager` does not specifically load contents from `metadata.log.dir`. The broker will communicate UUIDs to the controller for all log dirs configured in `log.dirs`. If the metadata directory happens to be one of those, it may also contain user partitions, so the controller will know about it. If it is a completely separate log dir, it cannot hold user partitions, so there's no need to include it. 42. I'm not sure about what exactly you refer to with "decommission the disk", so please let me know if I'm missing your point here. A disk can be removed from `log.dirs` and removed from the system in a single broker restart: 1. Shutdown the broker 2. Unmount, remove the disk 3. Update `log.dirs` config 4. Start the broker Upon restart, the broker will update `directory.ids` in the `meta.properties` for the remaining configured log dirs. Log dir identity cannot be inferred from the path, because the same storage device can be remounted under a different path, so the way we identify storage directories is by looking at their contents – the `directory.id` field in its `meta.properties`. But this also means that a log dir cannot be identified if it is not available, and so it also means that the broker can only generate `directory.ids` if all log directories listed under `log.dirs` happen to be available. Consider the following example, where `log.dirs=/a,/b/,/c`, and the following `meta.properties` (non-relevant values omitted): # /a/meta.properties directory.id=1 directory.ids=1,2,3 # /b/meta.properties directory.id=2 directory.ids=1,2,3 # /c/meta.properties directory.id=3 directory.ids=1,2,3 If `log.dirs` is updated to remove `/c`, the broker will be able to determine the new value for `directory.ids=1,2` by loading `/a/meta.properties` and `/b/meta.properties`. But if either `/a`, or `/b` happens to be unavailable, e.g. due to some temporary disk failure we cannot determine `directory.ids`. e.g. if `/b` is unavailable, the broker can't tell if `directory.ids` should be `1,2`, `1,3`, or even `1,4`. In a scenario where an operator wishes to remove a log dir from configuration and some other log dir is also offline, the operator will have a few options: a) Bring the offline log dir back online before restarting the broker. b) Edit `meta.properties` to remove the UUID for the deconfigured logdir from `directory.ids` in the remaining available log dirs. This will remove the need for the broker to regenerate `directory.ids` as the entry count for `directory.ids` and `log.dirs` will be equal. c) Also remove the offline log dir from `log.dirs`. 43. If the log dir was already failed at startup, indeed, the broker will not know that. But in that case, there's no risk of a race or failure. What I meant here relates rather to log dir failures at runtime. I've updated this bit in the KIP to clarify. When executing the log directory failure handler, the broker knows which directory failed, which partitions resided there, and it can check if any of those newly failed partitions refer to a different log dir in the cluster metadata. The assignment should be correct for all of them, as the broker will be proactive in notifying the controller of any changes in log dir assignment. But in case of some race condition, the broker should nudge the controller to deal with the incorrectly assigned partitions. 44. Tom Bentley and I have discussed this previously in this thread, in emails dated Jan 10, 13, 23 and Feb 3. When upgrading a JBOD enabled ZK cluster, we could piggyback on the ZK to KRaft upgrade (as per KIP-866) and delay bumping `meta.properties` until the ZK->KRaft upgrade is finalized. After then, we do not support downgrading. But I'm not convinced we should do this, since there's another upgrade scenario – when this change proposed in this KIP is applied to a KRaft cluster that does not yet support JBOD. In this scenario there are no multiple steps in which one of them is considered final, and I'm not sure we'd want to introduce an additional step – making the upgrade process more complex – just to address this issue either. I think the best approach is to keep using `version=1` in `meta.properties`. The new properties introduced in this KIP will safely be ignored by previous versions, in either ZK or KRaft mode, and we avoid creating conflicts with unexpected declared versions. The presence of the extra fields is also innocuous in the case of a second upgrade following a downgrade. I've updated the KIP to reflect this. Let me know what you think. Best, -- Igor