Hi Jun, Thank you for reviewing the KIP. Please find my replies to your comments below.
10. Thanks for pointing out this typo; it has been corrected. 11. I agree that the additional delay in switching to the future replica is undesirable, however I see a couple of issues if we forward the request to the controller as you describe: a) If the controller persists the change to the log directory assignment before the future replica has caught up and there is a failure in the original log directory then if the broker is a leader for the partition there will be no failover and the partition will become unavailable. It is not safe to call AssignReplicasToDirectories before the replica exists in the designated log directory. b) An existing behavior we'd like to maintain if possible is the ability to move partitions between log directories when a broker is offline, as it can be very useful to manage storage space. ZK brokers will load and accept logs in the new location after startup. Maintaining this behavior requires that the broker be able to override/correct assignments that are seen in the cluster metadata. i.e. the broker is the authority on log directory placement in case of mismatch. If we want to keep this feature and have the controller send log directory reassignments, we'll need a way to distinguish between mismatch due to offline movement and mismatch due to controller triggered reassignment. To keep the delay low, instead of sending AlterReplicaLogDirs within the lock the RPC can be queued elsewhere when the future replica first catches up. ReplicaAlterLogDirsThread can keep going and not switch the replicas yet. Once the broker receives confirmation of the metadata change it can then briefly block appends to the old replica and make the switch. In the unlikely event that source log directory fails between the moment AssignReplicasToDirectories is acknowledged by the controller and before the broker is able to make the switch, then the broker can issue AssignReplicasToDirectories of that replica back to the offline log directory to let the controller know that the replica is actually offline. What do you think? 12. Indeed, the metadata.log.dir, if explicitly defined to a separate directory should not be included in the directory UUID list sent to the controller in broker registration and heartbeat requests. I have updated the KIP to make this explicit. 13. Thank you for making this suggestion. Let's address the different scenarios you enumerated: a) When enabling JBOD for an existing KRaft cluster In this scenario, the broker finds a single log directory configured in `log.dirs`, with an already existing `meta.properties`, which is simply missing `directory.id <http://directory.id/>` and `directory.ids`. It is safe to have the broker automatically generate the log dir UUID and update the `meta.properties` file. This removes any need to have extra steps in upgrading and enabling of this feature, so it is quite useful. b) Adding a log dir to an existing JBOD KRaft cluster In this scenario, the broker finds a shorter list of `directory.ids` in `meta.properties` than what is configured in `log.dirs`. KIP-631 introduced the requirement to run the command kafka-storage.sh to format storage directories. Currently, if the broker in KRaft mode cannot find `meta.properties` in each log directory it will fail at startup. KIP-785 proposes removing the need to run the format storage command, but it is still open. If new log directories are being added the storage command must be run anyway. So I don't think there will be any benefit in this case. c) Removing a log dir from an existing JBOD KRaft cluster In this scenario the broker finds a larger list of `directory.ids` in `meta.properties` than what is configured in `log.dirs`. The removal of log directories requires an explicit update to the `log.dirs` configuration, so it is also safe to have the broker automatically update `directory.ids` in `meta.properties` to remove the extra UUIDs. It's also useful to drop the requirement to run the storage command after removal of log directories from configuration, as it reduces operational burden. c) Upgrading a JBOD ZK cluster to KRaft In ZooKeeper brokers write `meta.properties` with `version=0`, but in KRaft mode brokers require `version=1`. According to the current discussion on KIP-866, the broker will automatically upgrade meta.properties from `version=0` to `version=1`. Assuming the meta.properties gets translated into `version=1` as part of KIP-866, then the broker should find `meta.properties` which are simply missing the two new fields: `directory.id <http://directory.id/>` and `directory.ids`. So it should be OK for the broker to generate UUIDs for each log directory and update the `meta.properties` file. I have updated the KIP to remove the need to run the storage command in these scenarios - apart from b) where it should already be a requirement AFAICT - and describe how the broker will automatically update `directory.id <http://directory.id/>` and `directory.ids` in `meta.properties` if required. 14. Thank you for pointing this out. Yes, the controller should simply delegate the log directory choice back to the broker, by assigning `Uuid.ZERO` to all the replicas that were assigned to removed log directories. I have updated the KIP under "Controller" > "Broker registration". 15. Yes. Same as in "Metadata caching", replicas are considered offline if the replica references a log directory which is not in the list of online log directories for the broker ID hosting the replica. This means the controller will not assign leadership those replicas. This is described under "Controller" > "Handling log directory failures". 16. We'll need to bump metadata.version to gate use of the records and RPC changes. I've added mention of this under the "Compatibility, Deprecation, and Migration Plan" section. Please let me know of any further feedback. Best, -- Igor