Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

Igor Soarez Fri, 06 Oct 2023 18:33:07 -0700

Hi Colin,

> I would call #2 LOST. It was assigned in the past, but we don't know where.
> I see that you called this OFFLINE). This is not really normal...
> it should happen only when we're migrating from ZK mode to KRaft mode,
> or going from an older KRaft release with multiple directories to a
> post-JBOD release.


What you refer to as #2 LOST is actually what I named SELECTED,
as in: a directory has already been _selected_ sometime before,
we just don't know which one yet.

In the mean time this change has already been merged, but let me know
if you feel strongly about the naming here as I'm happy to rename
SELECTED_DIR to LOST_DIR in a new PR.
https://github.com/apache/kafka/pull/14291

> As for the third state -- I'm not sure why SELECTED_DIR needs to exist.

The third state (actually it is ordered second) - OFFLINE_DIR - conveys
that a replica is assigned to an unspecified offline directory.

This can be used by the broker in the following way:

  * When catching up with metadata, seeing that one of it's partitions
  is mapped to SELECTED_DIR, and it cannot find that partition in
  any of the online log directories, and at least one log dir is offline,
  then the broker sends AssignReplicasToDirs to converge the assignment
  to OFFLINE_DIR

  * If a log directory failure happens during an intra-broker (across dir)
  replica movement, after sending AssignReplicasToDirs with the new UUID,
  and before the future replica catches up again. (there's a section
  in the KIP about this).

We could just use a random UUID, as if a replica is assigned to a dir
that is not in the broker's registration online dirs set then it is
considered offline by controllers and metadata cache, but using a
reserved UUID feels cleaner.

> I think we need a general mechanism for checking that replicas are
> in the directories we expect and sending an RPC to the controller
> if they are not. A mechanism like this will automatically get rid
> of the LOST replicas just as part of normal operation -- nothing
> special required.

Thanks for pointing this out, I forgot to put in the notes in my
previous email that we discussed this too.

The KIP proposes this is done when catching up with metadata,
but you also suggested we extend the stray replica detection
mechanism to also check for these inconsistencies. I think
this is a good idea, and we'll look into that as well.

Best,

--
Igor

Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

Reply via email to