Hi folks,

I was travelling over the weekend, so I didn't have a chance to reply
to anything on this thread. First off, as Enrico said, there's a lot
of different topics being discussed at once. Perhaps each should be
broken into a github issue, and then we can continue each conversation
there, as it's getting a but unwieldy for email.

I've created a cookie monster project, which we can throw all the issues into.
https://github.com/apache/bookkeeper/projects/1

There's a few individual opinions I'd like to give here though.

> Needing the check the instance of the bookie when auditing

The auditor, while it does check when bookies have disappeared, it
also periodically checks all ledgers by reading the first and last
entry of each segment. So even if a bookie has resurrected, the
auditor will find that it is missing entries it is supposed to have.

> UUID in ledger metadata

At least for the write path, I'm not sure if this is needed, but
consider the following.

Only one writer can "vote" on the entries of the ledger. Other writers
are fencing writers. A fencing writer has to hit a majority of bookies
to proceed to closing the ledger. Unless a majority have been wiped,
it will not proceed to close as an empty ledger. However, if a
majority have been wiped, the correct behaviour would be for it not be
possible to close the ledger, as we cannot know what the end of the
ledger is.

That said, not boot if any ledger refers to a bookie could solve this.

> No ledgers referencing bookie? (Sijie's suggestion)

I'm resistant this idea, because it assumes a central oracle where all
ledgers can be queried. I know we currently have this, but I don't
think it scales for each bookie to read the metadata of the whole
system.

In any case, why not instead of refusing to start if any ledgers
reference the bookie, on boot the bookie checks which ledgers it is
supposed to have, and if it doesn't have them, start pulling the data
for them itself. While doing this replication it should avoid all new
writes.

> Storing the list of files in the cookie? (Enrico's suggestion)

I don't think this is needed. The purpose of the cookie is to protect
against stuff like a mount not coming up, or a machine being
completely wiped. We assume that on a journalled filesystem, files
don't just disappear arbitrarily. There may be corruption in
individual files, but see my first point.

Anyhow, as I said earlier, we should decide the broad topics here and
move into issues. I've made a first pass.

Regards,
Ivan

Reply via email to