It is not related to ACLs, I got a report from a site without ZK ACLs.

By now all the updated sites (to 4.7.0) are working properly

Enrico

Il ven 1 giu 2018, 19:56 Enrico Olivelli <eolive...@gmail.com> ha scritto:

>
>
> Il ven 1 giu 2018, 16:03 Venkateswara Rao Jujjuri <jujj...@gmail.com> ha
> scritto:
>
>> @Enrico
>> Let me understand the issue: Bookies are up and running but ZK doesn't
>> show
>> bookies on the list.
>>
>> Do you see if session expired or not? or bookies are hung in some way?
>>
>> We have seen multiple situations in this area:
>> 1. Bookie process is around, but zk session lost. Alost all the time we
>> ran
>> into this when bookie is hung in some sense.
>>
>
> I suspect this is my case, but dumping the stack of all treads shows a
> normal bookie without activity.
> No error reported on logs.
>
> 2. Bookies are down, but ZK still shows the Bookie in RW list. We suspect
>> this is a ZK bug which got fixed in later releases.
>>
>
> No, bookie is not in the list of available/readonly bookies
>
>
> 3. Bookies are hug at disk, but was able to keep up ZK session.
>>
>
> Not this case, because no thread is performing I/O.
>
>>
>> Does your case fit into any of these situations? Or do you believe that
>> the
>> Bookie is healthy and up but lost ZK session?
>>
>
> I believe bookie is healthy, didn't remember if we tried 'bookie sanity'
> check
>
> If so how did you validate that your Bookie is healthy.?
>>
>
> Normal dump of stack trace
> Port bound, listening
> No errors in logs
> Process alive
> Additional http monitoring interface (custom internal monitoring system)
> up and reporting no error
>
> But clients can't see the bookie, even with bookkeeper shell list bookies.
>
> Unfortunately I have such reports from sites of my customers, so it is
> difficult to get feedback and perform tests
>
> Enrico
>
>
>> JV
>>
>>
>> On Fri, Jun 1, 2018 at 12:11 AM, Enrico Olivelli <eolive...@gmail.com>
>> wrote:
>>
>> > Il ven 1 giu 2018, 08:49 Sijie Guo <guosi...@gmail.com> ha scritto:
>> >
>> > > I don't think there is any zk changes between 4.6.2 and 4.7.0. Are you
>> > sure
>> > > the upgrade fixes the problem?
>> > >
>> >
>> > I have checked several times and it seems to me that every zk fix in
>> 4.7.0
>> > has been cherry picked to 4.6.2.
>> > It is only a fact that with the upgrade the issue does not appear.
>> Maybe it
>> > is too early to say that it is working.
>> >
>> > I will send news
>> >
>> > Enrico
>> >
>> >
>> >
>> > > - Sijie
>> > >
>> > > On Thu, May 31, 2018 at 11:30 PM, Enrico Olivelli <
>> eolive...@gmail.com>
>> > > wrote:
>> > >
>> > > > Seems that al the sites which are reporting this kind of problems
>> are
>> > > ONLY
>> > > > on 4.6.2.
>> > > >
>> > > > After an upgrade to 4.7.0 apparently the problem disappears.
>> > > >
>> > > > I will send news next week
>> > > >
>> > > > Enrico
>> > > >
>> > > > Il dom 20 mag 2018, 19:18 Enrico Olivelli <eolive...@gmail.com> ha
>> > > > scritto:
>> > > >
>> > > > > My guess is that is about using zk ACLs
>> > > > > I have no evidence
>> > > > >
>> > > > > Enrico
>> > > > >
>> > > > >
>> > > > > Il mar 15 mag 2018, 14:09 Enrico Olivelli <eolive...@gmail.com>
>> ha
>> > > > > scritto:
>> > > > >
>> > > > >> Il giorno mar 15 mag 2018 alle ore 14:04 Sijie Guo <
>> > > guosi...@gmail.com>
>> > > > >> ha scritto:
>> > > > >>
>> > > > >>> On Tue, May 15, 2018 at 4:45 AM, Enrico Olivelli <
>> > > eolive...@gmail.com>
>> > > > >>> wrote:
>> > > > >>>
>> > > > >>> > Hi,
>> > > > >>> > it is quite some time that we are seeing Bookies in staging
>> > > > >>> environments
>> > > > >>> > which disappear from ZK but appartently are still up and
>> running.
>> > > > >>> >
>> > > > >>> > I have not dug deeply into this problem, but at first glance
>> it
>> > > > should
>> > > > >>> be
>> > > > >>> > related to ZK session expiration, those machines are heavily
>> > loaded
>> > > > >>> > sometimes and it is not surprising that ZK session expires.
>> > > > >>> >
>> > > > >>>
>> > > > >>> There should be already a logic on re-registration after session
>> > > > expired,
>> > > > >>> no?
>> > > > >>>
>> > > > >>
>> > > > >> Yes. The fact is that in the month we are seeing this strange
>> > > behaviour.
>> > > > >> I don't know if it could be a regression on 4.7.
>> > > > >> I have no reports from production sites, but in production we
>> have
>> > > > >> dedicated machines for bookies.
>> > > > >>
>> > > > >>
>> > > > >>>
>> > > > >>> ZooKeeper stats should always show whether a bookie is able to
>> > > connect
>> > > > to
>> > > > >>> zookeeper. That would probably tell you what happens.
>> > > > >>>
>> > > > >>
>> > > > >> I will check, thank you for your suggestion.
>> > > > >>
>> > > > >> Enrico
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>>
>> > > > >>>
>> > > > >>> >
>> > > > >>> > Apart from searching for a bug, I wonder if would it be
>> useful an
>> > > > >>> automatic
>> > > > >>> > self check of the bookie, something like a periodic check
>> which
>> > > asks
>> > > > >>> to the
>> > > > >>> > Registration Manager if the bookie is listed in the expected
>> > bookie
>> > > > >>> list
>> > > > >>> > (readonly/available....)
>> > > > >>> >
>> > > > >>> > This will be useful even if we are not using ZK as well, now
>> that
>> > > we
>> > > > >>> have
>> > > > >>> > this great abstraction of ZK
>> > > > >>> >
>> > > > >>> > Thoughts ?
>> > > > >>> >
>> > > > >>> > Enrico
>> > > > >>> >
>> > > > >>>
>> > > > >> --
>> > > > >
>> > > > >
>> > > > > -- Enrico Olivelli
>> > > > >
>> > > > --
>> > > >
>> > > >
>> > > > -- Enrico Olivelli
>> > > >
>> > >
>> > --
>> >
>> >
>> > -- Enrico Olivelli
>> >
>>
>>
>>
>> --
>> Jvrao
>> ---
>> First they ignore you, then they laugh at you, then they fight you, then
>> you win. - Mahatma Gandhi
>>
> --
>
>
> -- Enrico Olivelli
>
-- 


-- Enrico Olivelli

Reply via email to