Hi folks,

Recently I've been working to make the ledger metadata on the client
immutable, with the goal of making client metadata management more
understandable. The basic idea is that the metadata the client uses
should reflect what is in zookeeper. So if a client wants to modify
the metadata, if makes a copy, modifies, writes to zookeeper and then
starts using it. This gets rid of all the confictsWith and merge
operations.

There is only one case where this doesn't work. When we recover a
ledger, we read the LAC from all bookies, then read forward entry by
entry, rewriting the entry, until we reach the end. If a bookie fails
during the rewrite, we replace it in the ensemble, but we don't write
that back to zookeeper until the end.

I was banging my head off this yesterday, trying to find a nice way to
fit this in (there's loads of nasty ways), when I came to the
conclusion that failure recovery during recovery isn't actually
useful.

Recovery operates on a few seconds of data (from the last LAC written
to the end of the ledger, call this LLAC). Take a ledger with 3:2:2
configuration. If the writer crashes, and one bookie crashes, when we
recover we currently replace that crashed bookie, so that if another
bookie crashes the data is still available. But, and this is why I
don't think it's useful, if another bookie crashes, the recovered data
may be available, but everything before the LLAC in the ledger will
not be available.
IMO, this kind of thing should be handled by rereplication, not
ensemble change (as as aside, we should have a hint system to trigger
rereplication ASAP on this ledger).

Anyhow, I'd like to hear other opinions on this before proceeding.
Recovery with ensemble changes can work. Rather than modifying the
ledger, create a shadow ensemble list, and give entries from that to
the writers, but with the current entanglement in the client, this is
a bit nasty.

Cheers,
Ivan

Reply via email to