Thanks for the detailed response. Just one question, if writer doesn't
fail, but bookie write fails (Say a soft failure because of network problem
or GC pause), the writer will create a new fragment within a ledger. So the
same sequence of operations that happen while closing the ledger needs to
happen at fragment level as well. Because the log entry can be copied to
new fragment, the log from failed bookie (or more of soft-failed bookie
because of network issue or GC pause) in previous fragment needs to be
truncated.

Thanks,
Unmesh

On Tue, Jan 28, 2020 at 6:32 PM Ivan Kelly <iv...@apache.org> wrote:

> > So.. log truncation, the way it's needed in leader based systems like
> RAFT
> > and Kafka, where leader may have entries appended to its log which are
> not
> > replicated. If leader crashes before replicating entries, which will
> elect
> > other node as leader. Once the previous leader rejoins the cluster, it
> > needs to truncate its own log removing all the conflicting entries. This
> > case wont happen in bookkeeper?
>
> Something similar does happen in bookkeeper. Firstly, it's important
> to keep in mind that a single ledger in bookkeeper only has a single
> writer ever. If the writer crashes, no new entries can be added to
> that ledger. In this way, you can kinda think of a ledger as a term in
> RAFT or an epoch in ZK. To build a replicated log in bookkeeper, you
> must chain a bunch of ledgers together. BK leaves that to the user.
>
> In the case of a writer crash, the next writer(i.e. the client adding
> the next ledger to the chain) needs to run the recovery algorithm,
> which finds the last entry which may possibly have been acknowledged
> to the reader. It uses this last entry to mark the ledger as closed.
> This "close" operation is similar to a truncate. Individual bookies in
> the ensemble may have entries past this last entry. However, these
> entries do not exist on enough bookies for the entry to have been
> acknowledged as written, so they can be ignored.
>
> For example, say you have a ledger A across 3 bookies, b1 and b2, and
> being written to by writer w1, with ensemble 2, write quorum 2 and ack
> quorum 2.
>
> w1 crashes when the bookies have the following entries.
>
> b1: e1
> b2: e1, e2
>
> The next writer, w2, could close this ledger at either e1 or e2. Both
> are correct.
> For e1, it would try to read the last entry from both b1 & b2, but
> only b1 would reply. w2 would see that e1 is the last entry on b1 and
> as ack quorum is 2, it no entry beyond e1 has been acknowledged to w1
> (to acknowledge to the writer, acknowledgement must be received from
> |ack quorum| bookies).
> For e2, it would try to read the last entry from both b1 & b2, either
> b2 or both would reply. If both replied w2 would see that e2 was
> written by the client, but not acknowledged to w1. However, it is also
> possible that only b2 replied, so w2 cannot divine whether e2 was
> acknowledged to w1. In both cases, it's safe to take e2 as the last
> entry. w2 ensures that e2 is replicated to |ack quorum| bookies, and
> marks it as the end of the ledger.
>
> The case where e1 was found to be the last ledger can be considered
> similar to truncate.
>
> -Ivan
>

Reply via email to