>> Recovery operates on a few seconds of data (from the last LAC written
>> to the end of the ledger, call this LLAC).
>
> the data during this duration can be very large if the traffic of the
> ledger is large. That has
> been observed at Twitter's production. so when we are talking about "a few
> seconds of data",
> we can't assume the amount of data is little. That says the recovery can be
> taking time than

Yes, it can be large, but still it is only a few seconds worth of
data. It is the amount of data that can be transmitted in the period
of one roundtrip, as the next roundtrip will update the LAC.

I didn't mean to imply the data was small. I was implying that the
data was small in comparison to the overall size of that ledger.

> what we can expect, so if we don't handle failures during recovery how we
> are able to ensure
> we have enough data copy during recovery.

Consider a e3w3a2 ledger, there's two cases where you can lose a
bookie during recover.

Case one, one bookie is lost. You can still recover from as ack=2 is available.
Case two, two bookies are lost. You can't recover, but ledger is
unavailable anyhow, since any entry in the ledger may only have been
replicated to 2.

However, with e3w3a3 I guess you wouldn't be able to recover at all,
and we have to handle that case.

> I am not sure "make ledger metadata immutable" == "getting rid of merging
> ledger metadata".
> because I don't think these are same thing. making ledger metadata
> immutable will make code
> much clearer and simpler because the ledger metadata is immutable. how
> getting rid of merging
> ledger metadata is a different thing, when you make ledger metadata
> immutable, it will help make
> merging ledger metadata on conflicts clearer.

I wouldn't call it merging in this case. Merging implies taking two
valid pieces of metadata and getting another usable, valid metadata
from it.
What happens with immutable metadata, is that you are taking one valid
metadata, and applying operations to it. So in the failure during
recovery place, we would have a list of AddEnsemble operations which
we add when we try to close.

In theory this is perfectly valid and clean. It just can look messy in
the code, due to how the PendingAddOp reaches back into the ledger
handle to get the current ensemble.

So, in conclusion, I will keep the handling. In any case, these
changes are all still blocked on
https://github.com/apache/bookkeeper/pull/1577.

-Ivan

Reply via email to