On Oct 6, 2017 3:07 AM, "Ivan Kelly" <iv...@apache.org> wrote:

Hi folks,

Following up from the meeting yesterday, I said I would look into the
code to verify the behaviour because there could be a correctness
problem.

I think there could be an issue. The code is convoluted, but my
understanding of it is as follows.

We check all ledger, journal and index directories for a cookie. If it
doesn't exist, it gets added to a missingCookieDirs list. We then
iterate over this directory. If any directory in missingCookieDirs
isn't listed as a ledger directory in the journal dir cookies, or
isn't empty, we fail to start.

The issue is that a journal dir could be emptied and we wouldn't
detect it. It would be great if someone else could eyeball the code
and tell me I'm wrong. The code is in Bookie#checkEnvironment.

This breaks correctness. Imagine we have a ledger on b1, b2, b3.
Writer w1 is writing to the ledger.
The state of the ledger on the bookies is:

b1: e0     Fenced: false, LAC: -
b2: e0     Fenced: false, LAC: -
b3: e0     Fenced: false, LAC: -

w1 gets partitioned from network. w2 tries to recover the ledger, it
tries to fence on all bookies. The message to b3 gets lost. b1 and b2
acknowledge the fencing, so w2 continues to recover and close the
ledger with e0 as the last entry.

b1: e0     Fenced: true, LAC: e0
b2: e0     Fenced: true, LAC: e0
b3: e0     Fenced: false, LAC: -

If w1 became unpartitioned at this point, it wouldn't be able to add a
new entry to the ledger as any quorum would see fenced on b1 or b2.

However, imagine that the fenced message is only in the journal on b2,
b2 crashes, something wipes the journal directory and then b2 comes
back up.


The case you described here is "almost correct". But there is an key here:
B2 can't startup itself if journal disk is wiped out, because the cookie is
missed. So this is an operation issue or lifecycle management issue:

1) at twitter, when we took a bookie out for repair and before adding it
back, we typically make sure there are no ledgers referencing this bookie.
It is done by either auto or manual recovery.

2) we are lacking an life cycle management of taking bookie out and adding
bookie back, to automate this. It has to guarantee a bookie when it is
taken out for repair, there are no ledgers referencing it before adding it
back.


The good thing of this case is it only happens if you add a bookie back by
simply removing cookie. Otherwise cookie should do it's job.

Hower it can still happen in a different case: bit flap. In your case, if
fence bit in b2 is already persisted on disk, but it got corrupted. Then it
will cause the issue you described. One problem is we don't have checksum
on the index file header when it stores those fence bits.

So I think two issues we can look for:

- enforce life cycle management for bookie.
- add checksum for index file headers.


The new state of the ledger on the bookies will be.

b1: e0     Fenced: true, LAC: e0
b2: e0     Fenced: false, LAC: -
b3: e0     Fenced: false, LAC: -

Now w1 can write a new entry, e1, and b2 & b3 would both acknowledge
it, even though the end of the ledger is e0.

It requires many planets to be aligned for it to harm us, but we must fix
this.

Regards,
Ivan

Reply via email to