I like this too. I have no time immediately for working on this sorry. Maybe the only blocker isse is about the boot with empty dirs which Sijie pointed
Enrico Il lun 9 ott 2017, 19:08 Sijie Guo <guosi...@gmail.com> ha scritto: > +1. I liked this summary. > > JV, is this related to what you were writing? or anyone else want to drive > this? > > - Sijie > > On Mon, Oct 9, 2017 at 9:32 AM, Venkateswara Rao Jujjuri < > jujj...@gmail.com> > wrote: > > > Can we have a doc to put all these things? Thread has grown enough to > cause > > confusion. > > > > Immediate things. > > 1. Don't assume new bookie if journal dir is empty. > > 2. Put cookies through bookie format, and bookie never boots on an empty > > cookie or mismatched cookie. > > 3. We can live with operations procedure to deal with incarnation issue. > > Infact we run an automated bookie decomm script which runs through the > > entire metadata and makes sure that the bookie is not part of any ledger. > > > > For next step: > > 1. Establish incarnation support. > > 2. Deal with bitrot. > > > > Makes sense? > > > > JV > > > > On Mon, Oct 9, 2017 at 8:55 AM, Sijie Guo <guosi...@gmail.com> wrote: > > > > > On Oct 9, 2017 1:54 AM, "Ivan Kelly" <iv...@apache.org> wrote: > > > > > > Hi folks, > > > > > > I was travelling over the weekend, so I didn't have a chance to reply > > > to anything on this thread. First off, as Enrico said, there's a lot > > > of different topics being discussed at once. Perhaps each should be > > > broken into a github issue, and then we can continue each conversation > > > there, as it's getting a but unwieldy for email. > > > > > > I've created a cookie monster project, which we can throw all the > issues > > > into. > > > https://github.com/apache/bookkeeper/projects/1 > > > > > > There's a few individual opinions I'd like to give here though. > > > > > > > Needing the check the instance of the bookie when auditing > > > > > > The auditor, while it does check when bookies have disappeared, it > > > also periodically checks all ledgers by reading the first and last > > > entry of each segment. So even if a bookie has resurrected, the > > > auditor will find that it is missing entries it is supposed to have. > > > > > > > UUID in ledger metadata > > > > > > At least for the write path, I'm not sure if this is needed, but > > > consider the following. > > > > > > Only one writer can "vote" on the entries of the ledger. Other writers > > > are fencing writers. A fencing writer has to hit a majority of bookies > > > to proceed to closing the ledger. Unless a majority have been wiped, > > > it will not proceed to close as an empty ledger. However, if a > > > majority have been wiped, the correct behaviour would be for it not be > > > possible to close the ledger, as we cannot know what the end of the > > > ledger is. > > > > > > That said, not boot if any ledger refers to a bookie could solve this. > > > > > > > No ledgers referencing bookie? (Sijie's suggestion) > > > > > > I'm resistant this idea, because it assumes a central oracle where all > > > ledgers can be queried. I know we currently have this, but I don't > > > think it scales for each bookie to read the metadata of the whole > > > system. > > > > > > In any case, why not instead of refusing to start if any ledgers > > > reference the bookie, on boot the bookie checks which ledgers it is > > > supposed to have, and if it doesn't have them, start pulling the data > > > for them itself. While doing this replication it should avoid all new > > > writes. > > > > > > > > > Yes, that's another thing we need to improve for auto recovery. It is > not > > > only on boot, you need to do it periodically, in the garbage collection > > > thread. The bookie need to scan what ledgers are missing and what > entries > > > are missing and replicate them. > > > > > > > > > > > > > Storing the list of files in the cookie? (Enrico's suggestion) > > > > > > I don't think this is needed. The purpose of the cookie is to protect > > > against stuff like a mount not coming up, or a machine being > > > completely wiped. We assume that on a journalled filesystem, files > > > don't just disappear arbitrarily. There may be corruption in > > > individual files, but see my first point. > > > > > > Anyhow, as I said earlier, we should decide the broad topics here and > > > move into issues. I've made a first pass. > > > > > > Regards, > > > Ivan > > > > > > > > > > > -- > > Jvrao > > --- > > First they ignore you, then they laugh at you, then they fight you, then > > you win. - Mahatma Gandhi > > > -- -- Enrico Olivelli