Hi Colin, > I would call #2 LOST. It was assigned in the past, but we don't know where. > I see that you called this OFFLINE). This is not really normal... > it should happen only when we're migrating from ZK mode to KRaft mode, > or going from an older KRaft release with multiple directories to a > post-JBOD release.
What you refer to as #2 LOST is actually what I named SELECTED, as in: a directory has already been _selected_ sometime before, we just don't know which one yet. In the mean time this change has already been merged, but let me know if you feel strongly about the naming here as I'm happy to rename SELECTED_DIR to LOST_DIR in a new PR. https://github.com/apache/kafka/pull/14291 > As for the third state -- I'm not sure why SELECTED_DIR needs to exist. The third state (actually it is ordered second) - OFFLINE_DIR - conveys that a replica is assigned to an unspecified offline directory. This can be used by the broker in the following way: * When catching up with metadata, seeing that one of it's partitions is mapped to SELECTED_DIR, and it cannot find that partition in any of the online log directories, and at least one log dir is offline, then the broker sends AssignReplicasToDirs to converge the assignment to OFFLINE_DIR * If a log directory failure happens during an intra-broker (across dir) replica movement, after sending AssignReplicasToDirs with the new UUID, and before the future replica catches up again. (there's a section in the KIP about this). We could just use a random UUID, as if a replica is assigned to a dir that is not in the broker's registration online dirs set then it is considered offline by controllers and metadata cache, but using a reserved UUID feels cleaner. > I think we need a general mechanism for checking that replicas are > in the directories we expect and sending an RPC to the controller > if they are not. A mechanism like this will automatically get rid > of the LOST replicas just as part of normal operation -- nothing > special required. Thanks for pointing this out, I forgot to put in the notes in my previous email that we discussed this too. The KIP proposes this is done when catching up with metadata, but you also suggested we extend the stray replica detection mechanism to also check for these inconsistencies. I think this is a good idea, and we'll look into that as well. Best, -- Igor