> So.. log truncation, the way it's needed in leader based systems like RAFT > and Kafka, where leader may have entries appended to its log which are not > replicated. If leader crashes before replicating entries, which will elect > other node as leader. Once the previous leader rejoins the cluster, it > needs to truncate its own log removing all the conflicting entries. This > case wont happen in bookkeeper?
Something similar does happen in bookkeeper. Firstly, it's important to keep in mind that a single ledger in bookkeeper only has a single writer ever. If the writer crashes, no new entries can be added to that ledger. In this way, you can kinda think of a ledger as a term in RAFT or an epoch in ZK. To build a replicated log in bookkeeper, you must chain a bunch of ledgers together. BK leaves that to the user. In the case of a writer crash, the next writer(i.e. the client adding the next ledger to the chain) needs to run the recovery algorithm, which finds the last entry which may possibly have been acknowledged to the reader. It uses this last entry to mark the ledger as closed. This "close" operation is similar to a truncate. Individual bookies in the ensemble may have entries past this last entry. However, these entries do not exist on enough bookies for the entry to have been acknowledged as written, so they can be ignored. For example, say you have a ledger A across 3 bookies, b1 and b2, and being written to by writer w1, with ensemble 2, write quorum 2 and ack quorum 2. w1 crashes when the bookies have the following entries. b1: e1 b2: e1, e2 The next writer, w2, could close this ledger at either e1 or e2. Both are correct. For e1, it would try to read the last entry from both b1 & b2, but only b1 would reply. w2 would see that e1 is the last entry on b1 and as ack quorum is 2, it no entry beyond e1 has been acknowledged to w1 (to acknowledge to the writer, acknowledgement must be received from |ack quorum| bookies). For e2, it would try to read the last entry from both b1 & b2, either b2 or both would reply. If both replied w2 would see that e2 was written by the client, but not acknowledged to w1. However, it is also possible that only b2 replied, so w2 cannot divine whether e2 was acknowledged to w1. In both cases, it's safe to take e2 as the last entry. w2 ensures that e2 is replicated to |ack quorum| bookies, and marks it as the end of the ledger. The case where e1 was found to be the last ledger can be considered similar to truncate. -Ivan