Re: Usefulness of ensemble change during recovery

Sijie Guo Sun, 05 Aug 2018 23:49:13 -0700

On Sun, Aug 5, 2018 at 11:46 PM Sijie Guo <guosi...@gmail.com> wrote:


>
>
> On Sat, Aug 4, 2018 at 1:49 AM Ivan Kelly <iv...@apache.org> wrote:
>
>> Hi folks,
>>
>> Recently I've been working to make the ledger metadata on the client
>> immutable, with the goal of making client metadata management more
>> understandable. The basic idea is that the metadata the client uses
>> should reflect what is in zookeeper. So if a client wants to modify
>> the metadata, if makes a copy, modifies, writes to zookeeper and then
>> starts using it. This gets rid of all the confictsWith and merge
>> operations.
>>
>> There is only one case where this doesn't work. When we recover a
>> ledger, we read the LAC from all bookies, then read forward entry by
>> entry, rewriting the entry, until we reach the end. If a bookie fails
>> during the rewrite, we replace it in the ensemble, but we don't write
>> that back to zookeeper until the end.
>>
>> I was banging my head off this yesterday, trying to find a nice way to
>> fit this in (there's loads of nasty ways), when I came to the
>> conclusion that failure recovery during recovery isn't actually
>> useful.
>>
>
>
>> Recovery operates on a few seconds of data (from the last LAC written
>> to the end of the ledger, call this LLAC).
>
>
> the data during this duration can be very large if the traffic of the
> ledger is large. That has
> been observed at Twitter's production. so when we are talking about "a few
> seconds of data",
> we can't assume the amount of data is little. That says the recovery can
> be taking time than
> what we can expect, so if we don't handle failures during recovery how we
> are able to ensure
> we have enough data copy during recovery.
>
> I am not sure "make ledger metadata immutable" == "getting rid of merging
> ledger metadata".
> because I don't think these are same thing. making ledger metadata
> immutable will make code
> much clearer and simpler because the ledger metadata is immutable. how
> getting rid of merging
> ledger metadata is a different thing, when you make ledger metadata
> immutable, it will help make
> merging ledger metadata on conflicts clearer.
>
> In the ledger recovery case, it is actually okay to merge ledger metadata.
> let's assume LAC is L at the
> time of recovery, ledger metadata is M  is the copy before recovery. the
> client that attempts to recovery
> the ledger will first set the ledger to IN_RECOVERY first before
> recovering the ledger. so the conflicts will
> only coming from the clients (can be many) that attempt to recover and
> AutoRecovery daemon. the resolution
> of this conflict is simpler:
>
> when fail to write ledger metadata (version conflicts), read back the
> ledger metadata, if the state is changed
> back to CLOSED, it means it is updated by other client that also recovers
> the ledger, we discarded our ensemble;
> if the state has been changed, that means it is modified by AutoRecovery,
> AutoRecovery doesn't add ensembles,
>

sorry for typo => "if the state has not been changed"


> so can simply take the ensembles before L from zookeeper and our ensembles
> after L and merge them together.
>
>
>> Take a ledger with 3:2:2
>> configuration. If the writer crashes, and one bookie crashes, when we
>> recover we currently replace that crashed bookie, so that if another
>> bookie crashes the data is still available. But, and this is why I
>> don't think it's useful, if another bookie crashes, the recovered data
>> may be available, but everything before the LLAC in the ledger will
>> not be available.
>
> IMO, this kind of thing should be handled by rereplication, not
>> ensemble change (as as aside, we should have a hint system to trigger
>> rereplication ASAP on this ledger).
>
>
>> Anyhow, I'd like to hear other opinions on this before proceeding.
>> Recovery with ensemble changes can work. Rather than modifying the
>> ledger, create a shadow ensemble list, and give entries from that to
>> the writers, but with the current entanglement in the client, this is
>> a bit nasty.
>>
>> Cheers,
>> Ivan
>>
>

Re: Usefulness of ensemble change during recovery

Reply via email to