Il lun 9 ott 2017, 10:54 Ivan Kelly <iv...@apache.org> ha scritto:

> Hi folks,
>
> I was travelling over the weekend, so I didn't have a chance to reply
> to anything on this thread. First off, as Enrico said, there's a lot
> of different topics being discussed at once. Perhaps each should be
> broken into a github issue, and then we can continue each conversation
> there, as it's getting a but unwieldy for email.
>
> I've created a cookie monster project, which we can throw all the issues
> into.
> https://github.com/apache/bookkeeper/projects/1
>
> There's a few individual opinions I'd like to give here though.
>
> > Needing the check the instance of the bookie when auditing
>
> The auditor, while it does check when bookies have disappeared, it
> also periodically checks all ledgers by reading the first and last
> entry of each segment. So even if a bookie has resurrected, the
> auditor will find that it is missing entries it is supposed to have.
>
> > UUID in ledger metadata
>
> At least for the write path, I'm not sure if this is needed, but
> consider the following.
>
> Only one writer can "vote" on the entries of the ledger. Other writers
> are fencing writers. A fencing writer has to hit a majority of bookies
> to proceed to closing the ledger. Unless a majority have been wiped,
> it will not proceed to close as an empty ledger. However, if a
> majority have been wiped, the correct behaviour would be for it not be
> possible to close the ledger, as we cannot know what the end of the
> ledger is.
>
> That said, not boot if any ledger refers to a bookie could solve this.
>
> > No ledgers referencing bookie? (Sijie's suggestion)
>
> I'm resistant this idea, because it assumes a central oracle where all
> ledgers can be queried. I know we currently have this, but I don't
> think it scales for each bookie to read the metadata of the whole
> system.
>
Makes sense

>
> In any case, why not instead of refusing to start if any ledgers
> reference the bookie, on boot the bookie checks which ledgers it is
> supposed to have,

How can you do this without querying the big oracle? You can use the local
view as source of truth. Maybe I am missing one piece

and if it doesn't have them, start pulling the data
> for them itself. While doing this replication it should avoid all new
> writes.
>
> > Storing the list of files in the cookie? (Enrico's suggestion)
>
> I don't think this is needed. The purpose of the cookie is to protect
> against stuff like a mount not coming up, or a machine being
> completely wiped. We assume that on a journalled filesystem, files
> don't just disappear arbitrarily. There may be corruption in
> individual files, but see my first point.
>

I am fine with this assumption. I never saw such type if corruption ideed.
I just wanted to enumerate as many cases of error as possible.

>
> Anyhow, as I said earlier, we should decide the broad topics here and
> move into issues. I've made a first pass.
>
> Regards,
> Ivan
>
-- 


-- Enrico Olivelli

Reply via email to