okay, but why do you want to track the list of files? I don't get your idea
here.

- Sijie

On Sun, Oct 8, 2017 at 11:45 PM, Enrico Olivelli <eolive...@gmail.com>
wrote:

> 2017-10-09 7:52 GMT+02:00 Sijie Guo <guosi...@gmail.com>:
>
> > On Sat, Oct 7, 2017 at 9:53 AM, Enrico Olivelli <eolive...@gmail.com>
> > wrote:
> >
> > > Il sab 7 ott 2017, 00:27 Sijie Guo <guosi...@gmail.com> ha scritto:
> > >
> > > > Enrico,
> > > >
> > > > Let's try to come to a conclusion or an agreement what we should fix
> > and
> > > > improve, before talking who is going to drive this.
> > > >
> > >
> > > Sure.
> > >
> > > This is my point of view:
> > > View have separate issues:
> > > - missing checksums, to protect fence bits
> > > - have a bug in bookie boot, we should not allow empty directories
> > > - have a clear lifecycle for the bookie, add/remove
> > > - deal with reincarnation of bookies
> > > - ensuring the correctness of the contents of the directories of the
> > bookie
> > >
> > > I would like to add a new point, we have rhe cookie inside every
> > configured
> > > directory managed by the bookie.
> > > No cookie -> no boot
> > > This will not be enough, we have to write in that file not only the
> > > identity of the bookie but the list of files expected to be in the
> > > directory.
> > > This way you will not boot with a corrupted directory.
> > > Config ->  list of dirs -> list of files
> > >
> >
> > I am not sure why this is a new point. This is exactly what cookie is
> > doing, no?
> >
>
> Sorry, I can't find such behavior in code on master brach
> https://github.com/apache/bookkeeper/blob/master/
> bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/Cookie.java
>
> I we have a copy of the cookie inside each directory (index + data +
> journal) I mean that each file should carry the exact list of files
> expected to be present in the directory at boot.
> So for instance when you add a new file to the set of files on a journal
> directory you must update the file in that directory, same for index,
> data.....
>
> Maybe I am missing something.
> It seems to me that cookie contains only a list a of directories not of
> "files"
>
> Enrico
>
>
>
>
> >
> >
> > >
> > > I agree on the fact that the bookie should be added (bookie format)
> only
> > if
> > > there is no reference to it in zk.
> > > The bookie format operation should write the cookie in any configured
> > > directory so that a bookie with empty directories won't ever start.
> > >
> > > I have to think more about this, but I wanted to share my first
> thoughts
> > >
> > > Enrico
> > >
> > >
> > > > - Sijie
> > > >
> > > > On Fri, Oct 6, 2017 at 1:14 PM, Enrico Olivelli <eolive...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > +1 for fixing the problem of missing cookie in 4.6
> > > > >
> > > > > Who drives the issue?
> > > > >
> > > > > Thank you all for the interesting points
> > > > > Enrico
> > > > >
> > > > > Il ven 6 ott 2017, 21:27 Venkateswara Rao Jujjuri <
> jujj...@gmail.com
> > >
> > > ha
> > > > > scritto:
> > > > >
> > > > > > Thanks for the writeup Sijie, comments below.
> > > > > >
> > > > > > On Fri, Oct 6, 2017 at 12:14 PM, Sijie Guo <guosi...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > I think the question is mainly around "how do we recognize the
> > > > bookie"
> > > > > or
> > > > > > > "incarnations". And the purpose of a cookie is designed for
> > > > addressing
> > > > > > > "incarnations".
> > > > > > >
> > > > > > > I will try to cover following aspects, and will try to answer
> > > > questions
> > > > > > > that Ivan and JV raised.
> > > > > > >
> > > > > > > - what is cookie?
> > > > > > > - how the behavior became bad?
> > > > > > > - how do we fix current bad behavior?
> > > > > > > - is the cookie enough?
> > > > > > >
> > > > > > >
> > > > > > > *What is Cookie?*
> > > > > > >
> > > > > > > Cookie is originally introduced in this commit -
> > > > > > >
> > > > > > https://github.com/apache/bookkeeper/commit/
> > > > > c6cc7cca3a85603c8e935ba6d06fbf
> > > > > > > 3d8d7a7eb5
> > > > > > > .
> > > > > > >
> > > > > > > A cookie is a identifier of a bookie. A cookie is created on
> > > > zookeeper
> > > > > > when
> > > > > > > a brand new bookie joint the cluster, the cookie is
> representing
> > > the
> > > > > > bookie
> > > > > > > instance
> > > > > > > during its lifecycle. The cookie is stored on all the disks for
> > > > > > > verification purpose. so if any of the disks misses the cookie
> > > (e.g.
> > > > > > disks
> > > > > > > were reformat or wiped out,
> > > > > > > disks are not mounted correctly), a bookie will reject to
> start.
> > > > > > >
> > > > > > >
> > > > > > > *How the behavior became bad?*
> > > > > > >
> > > > > > > The original behavior worked as expected to use the cookie in
> > > > zookeeper
> > > > > > as
> > > > > > > the source of truth. See
> > > > > > >
> > > > > > https://github.com/apache/bookkeeper/commit/
> > > > > c6cc7cca3a85603c8e935ba6d06fbf
> > > > > > > 3d8d7a7eb5
> > > > > > >
> > > > > > >
> > > > > > > The behavior was changed at
> > > > > > >
> > > > > > https://github.com/apache/bookkeeper/commit/
> > > > > 19b821c63b91293960041bca7b0316
> > > > > > > 14a109a7b8
> > > > > > > when trying to support both ip and hostname . It used journal
> > > > directory
> > > > > > as
> > > > > > > the source-of-truth for verifying cookies.
> > > > > > >
> > > > > > > At the community meeting, I was saying a bookie should reject
> > start
> > > > > when
> > > > > > a
> > > > > > > cookie file is missing locally and that was my operational
> > > > experience.
> > > > > It
> > > > > > > turns out twitter's branch didn't include the change at
> > > > > > > 19b821c63b91293960041bca7b031614a109a7b8,
> > > > > > > so it was still the original behavior at
> > > > > > > c6cc7cca3a85603c8e935ba6d06fbf3d8d7a7eb5 .
> > > > > > >
> > > > > > > *How do we fix current bad behavior?*
> > > > > > >
> > > > > > > We basically need to revert the current behaviour to the
> original
> > > > > > designed
> > > > > > > behavior. The cookie in zookeeper should be the source-of-truth
> > for
> > > > > > > validation.
> > > > > > >
> > > > > > > If the cookie works as expected (change the behavior to the
> > > original
> > > > > > > behavior), then it is the operational or lifecycle management
> > > issue I
> > > > > > > explained above.
> > > > > > >
> > > > > > > If a bookie failed with missing cookie, it should be:
> > > > > > >
> > > > > > > 1. taken out of the cluster
> > > > > > > 2. run re-replication (autorecovery or manual recovery)
> > > > > > > 3. ensure no ledgers using this bookie any more
> > > > > > > 4. reformat the bookie
> > > > > > > 5. add it back
> > > > > > >
> > > > > > > This can be automated by hooking into a scheduler (like k8s or
> > > > mesos).
> > > > > > But
> > > > > > > it requires some sort of lifecycle management in order to
> > automate
> > > > such
> > > > > > > operations. There is a BP-4:
> > > > > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > > > > > BP-4+-+BookKeeper+Lifecycle+Management
> > > > > > > proposed for this purpose.
> > > > > > >
> > > > > > >
> > > > > > > *Is the cookie enough?*
> > > > > > >
> > > > > > > Cookie (if we revert the current behavior to the original
> > > behavior),
> > > > > > should
> > > > > > > be able to address most of the issues related to
> "incarnations".
> > > > > > >
> > > > > > > There are still some corner cases will violate correctness
> > issues.
> > > > They
> > > > > > are
> > > > > > > related to "dangling writers" described in Ivan's first
> comment.
> > > > > > >
> > > > > > > How can a writer tell whether bookies changed or ledger changed
> > > when
> > > > it
> > > > > > > gets network partitioned?
> > > > > > >
> > > > > > > 1) Bookie Changed.
> > > > > > >
> > > > > > > Bookie can be reformatted and re-added to the cluster. Ivan and
> > JV
> > > > > > already
> > > > > > > touch this on adding UUID.
> > > > > > >
> > > > > > > I think the UUID doesn't have to be part of ledger metadata.
> > > because
> > > > > > > auditor and replication worker would use the lifecycle
> management
> > > for
> > > > > > > managing the lifecycle of bookies.
> > > > > > >
> > > > > >
> > > > > > You are suggesting that the 'manual/scripted' lifecycle tool is
> to
> > > the
> > > > > > rescue.
> > > > > > a side cart solution.
> > > > > >
> > > > > > But what are we saving by not keeping this info in the metadata?
> > > > > > metadata size? sure it is a huge win in ZK environment.
> > > > > >
> > > > > > >
> > > > > > > But the connection should have the UUID informations.
> > > > > > >
> > > > > >
> > > > > > By this you are suggesting  service discovery portion need to
> have
> > > UUID
> > > > > > info
> > > > > > but not metadata portion. Won't it be confusing to handle a case
> > > where
> > > > > > write fails
> > > > > > on bookie because of UUID mismatch, and you may need to handle
> that
> > > > case
> > > > > > and if you go back to the same bookie then no ensmeble changes.
> > > > > >
> > > > > > On the other hand if we introduce UUID into metadata, then we
> don't
> > > > need
> > > > > to
> > > > > > be
> > > > > > explicitly depend on the side-cart solution.
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Basically, any bookie client connects to a bookie, it needs to
> > > carry
> > > > > the
> > > > > > > namespace uuid and the bookie uuid to ensure bookie is
> connecting
> > > to
> > > > a
> > > > > > > right bookie. This would prevent "dangling writers" connect to
> > > > bookies
> > > > > > that
> > > > > > > are reformatted and added back.
> > > > > > >
> > > > > > >  While this is an issue, the problem can only get exposed in
> > > > > pathological
> > > > > > scenario
> > > > > > where AQ bookies have went through this scenario, which is ~ 3
> > > > > >
> > > > > >
> > > > > > 2) Ledger Changed.
> > > > > > >
> > > > > > > It is similar as what the case that Ivan' described. If a
> writer
> > > > > becomes
> > > > > > > 'network partitioned', and the ledger is deleted during this
> > > period,
> > > > > > after
> > > > > > > the writer comes back, the writer can still successfully write
> > > > entries
> > > > > to
> > > > > > > the bookies, because the ledgers are already deleted and all
> the
> > > > > fencing
> > > > > > > bits are gone.
> > > > > > >
> > > > > > > This violates the expectation of "fencing". but I am not sure
> we
> > > need
> > > > > to
> > > > > > > spend time on fixing this, because the ledger is already
> > explicitly
> > > > > > deleted
> > > > > > > by the application. so I think the behavior should be
> categorized
> > > as
> > > > > > > "undefined", just like "deleting a ledger when a writer is
> still
> > > > > writing
> > > > > > > entries" is a undefined behavior.
> > > > > > >
> > > > > > >
> > > > > > > To summarize my thought on this:
> > > > > > >
> > > > > > > 1. we need to revert the cookie behaviour to the original
> > behavior.
> > > > > make
> > > > > > > sure the cookie works as expected.
> > > > > > > 2. introduce UUID or epoch in the cookie. client connection
> > should
> > > > > carry
> > > > > > > namespace uuid and bookie uuid when establishing the
> connection.
> > > > > > > 3. work on BP-4 to have a complete lifecycle management to take
> > > > bookie
> > > > > > out
> > > > > > > and add bookie out.
> > > > > > >
> > > > > > > 1 is the immediate fix, so correct operations can still
> guarantee
> > > the
> > > > > > > correctness.
> > > > > > >
> > > > > >
> > > > > > I agree we need to take care of #1 ASAP and have a Issues opened
> > and
> > > > > > designs for #2 and #3.
> > > > > >
> > > > > > Thanks,
> > > > > > JV
> > > > > >
> > > > > > >
> > > > > > > - Sijie
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Oct 6, 2017 at 9:35 AM, Venkateswara Rao Jujjuri <
> > > > > > > jujj...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > > However, imagine that the fenced message is only in the
> > journal
> > > > on
> > > > > > b2,
> > > > > > > > > b2 crashes, something wipes the journal directory and then
> b2
> > > > comes
> > > > > > > > > back up.
> > > > > > > >
> > > > > > > > In this case what happened?
> > > > > > > > 1. We have WQ = 1
> > > > > > > > 2. We had data loss (crash and comeup clean)
> > > > > > > >
> > > > > > > > But yeah, in addition to dataloss we have fencing violation
> > too.
> > > > > > > > The problem is not just wiped journal dir, but how we
> recognize
> > > the
> > > > > > > bookie.
> > > > > > > > Bookie is just recognized by its ip address, not by its
> > > > incarnation.
> > > > > > > > Bookie1 at T1  (b1t1) ; and same bookie1 at T2 after bookie
> > > format
> > > > > > (b1t2)
> > > > > > > > should be two different bookies, isn;t it?
> > > > > > > > this is needed for the replication worker and the auditor
> too.
> > > > > > > >
> > > > > > > > Also, bookie needs to know if the writer/reader is intended
> to
> > > read
> > > > > > from
> > > > > > > > b1t2 not from b1t1.
> > > > > > > > Looks like we have a hole here? Or I may not be fully
> > > understanding
> > > > > > > cookie
> > > > > > > > verification mechanism.
> > > > > > > >
> > > > > > > > Also as Ivan pointed out, we appear to think the lack of
> > journal
> > > is
> > > > > > > > implicitly a new bookie, but overall cluster doesn't
> > > differentiate
> > > > > > > between
> > > > > > > > incarnations.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > JV
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Oct 6, 2017 at 8:46 AM, Ivan Kelly <iv...@apache.org
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > > The case you described here is "almost correct". But
> there
> > is
> > > > an
> > > > > > key
> > > > > > > > > here:
> > > > > > > > > > B2 can't startup itself if journal disk is wiped out,
> > because
> > > > the
> > > > > > > > cookie
> > > > > > > > > is
> > > > > > > > > > missed.
> > > > > > > > > This is what I expected to see, but isn't the case.
> > > > > > > > > <snip>
> > > > > > > > >       List<Cookie> journalCookies = Lists.newArrayList();
> > > > > > > > >             // try to read cookie from journal directory.
> > > > > > > > >             for (File journalDirectory :
> journalDirectories)
> > {
> > > > > > > > >                 try {
> > > > > > > > >                     Cookie journalCookie =
> > > > > > > > > Cookie.readFromDirectory(journalDirectory);
> > > > > > > > >                     journalCookies.add(journalCookie);
> > > > > > > > >                     if
> > > > (journalCookie.isBookieHostCreatedFromIp())
> > > > > {
> > > > > > > > >                         conf.setUseHostNameAsBookieID(
> > false);
> > > > > > > > >                     } else {
> > > > > > > > >                         conf.setUseHostNameAsBookieID(
> true);
> > > > > > > > >                     }
> > > > > > > > >                 } catch (FileNotFoundException fnf) {
> > > > > > > > >                     newEnv = true;
> > > > > > > > >                     missedCookieDirs.add(
> journalDirectory);
> > > > > > > > >                 }
> > > > > > > > >             }
> > > > > > > > > </snip>
> > > > > > > > >
> > > > > > > > > So if a journal is missing the cookie, newEnv is set to
> true.
> > > > This
> > > > > > > > > disabled the later checks.
> > > > > > > > >
> > > > > > > > > > Hower it can still happen in a different case: bit flap.
> In
> > > > your
> > > > > > > case,
> > > > > > > > if
> > > > > > > > > > fence bit in b2 is already persisted on disk, but it got
> > > > > corrupted.
> > > > > > > > Then
> > > > > > > > > it
> > > > > > > > > > will cause the issue you described. One problem is we
> don't
> > > > have
> > > > > > > > checksum
> > > > > > > > > > on the index file header when it stores those fence bits.
> > > > > > > > > Yes, this is also an issue.
> > > > > > > > >
> > > > > > > > > -Ivan
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Jvrao
> > > > > > > > ---
> > > > > > > > First they ignore you, then they laugh at you, then they
> fight
> > > you,
> > > > > > then
> > > > > > > > you win. - Mahatma Gandhi
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Jvrao
> > > > > > ---
> > > > > > First they ignore you, then they laugh at you, then they fight
> you,
> > > > then
> > > > > > you win. - Mahatma Gandhi
> > > > > >
> > > > > --
> > > > >
> > > > >
> > > > > -- Enrico Olivelli
> > > > >
> > > >
> > > --
> > >
> > >
> > > -- Enrico Olivelli
> > >
> >
>

Reply via email to