I got report from other sites with this issue that after the upgrade to
4.7.0 the problem does not appear anymore.

For me the case is closed.

I don't like this 'try to upgrade and hopefully it will work' but I didn't
have the possibility to perform debug because it is not possible to add
debugging to customers sites.

Enrico

Il mer 6 giu 2018, 18:51 Enrico Olivelli <eolive...@gmail.com> ha scritto:

> It is not related to ACLs, I got a report from a site without ZK ACLs.
>
> By now all the updated sites (to 4.7.0) are working properly
>
> Enrico
>
>
> Il ven 1 giu 2018, 19:56 Enrico Olivelli <eolive...@gmail.com> ha scritto:
>
>>
>>
>> Il ven 1 giu 2018, 16:03 Venkateswara Rao Jujjuri <jujj...@gmail.com> ha
>> scritto:
>>
>>> @Enrico
>>> Let me understand the issue: Bookies are up and running but ZK doesn't
>>> show
>>> bookies on the list.
>>>
>>> Do you see if session expired or not? or bookies are hung in some way?
>>>
>>> We have seen multiple situations in this area:
>>> 1. Bookie process is around, but zk session lost. Alost all the time we
>>> ran
>>> into this when bookie is hung in some sense.
>>>
>>
>> I suspect this is my case, but dumping the stack of all treads shows a
>> normal bookie without activity.
>> No error reported on logs.
>>
>> 2. Bookies are down, but ZK still shows the Bookie in RW list. We suspect
>>> this is a ZK bug which got fixed in later releases.
>>>
>>
>> No, bookie is not in the list of available/readonly bookies
>>
>>
>> 3. Bookies are hug at disk, but was able to keep up ZK session.
>>>
>>
>> Not this case, because no thread is performing I/O.
>>
>>>
>>> Does your case fit into any of these situations? Or do you believe that
>>> the
>>> Bookie is healthy and up but lost ZK session?
>>>
>>
>> I believe bookie is healthy, didn't remember if we tried 'bookie sanity'
>> check
>>
>> If so how did you validate that your Bookie is healthy.?
>>>
>>
>> Normal dump of stack trace
>> Port bound, listening
>> No errors in logs
>> Process alive
>> Additional http monitoring interface (custom internal monitoring system)
>> up and reporting no error
>>
>> But clients can't see the bookie, even with bookkeeper shell list bookies.
>>
>> Unfortunately I have such reports from sites of my customers, so it is
>> difficult to get feedback and perform tests
>>
>> Enrico
>>
>>
>>> JV
>>>
>>>
>>> On Fri, Jun 1, 2018 at 12:11 AM, Enrico Olivelli <eolive...@gmail.com>
>>> wrote:
>>>
>>> > Il ven 1 giu 2018, 08:49 Sijie Guo <guosi...@gmail.com> ha scritto:
>>> >
>>> > > I don't think there is any zk changes between 4.6.2 and 4.7.0. Are
>>> you
>>> > sure
>>> > > the upgrade fixes the problem?
>>> > >
>>> >
>>> > I have checked several times and it seems to me that every zk fix in
>>> 4.7.0
>>> > has been cherry picked to 4.6.2.
>>> > It is only a fact that with the upgrade the issue does not appear.
>>> Maybe it
>>> > is too early to say that it is working.
>>> >
>>> > I will send news
>>> >
>>> > Enrico
>>> >
>>> >
>>> >
>>> > > - Sijie
>>> > >
>>> > > On Thu, May 31, 2018 at 11:30 PM, Enrico Olivelli <
>>> eolive...@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > Seems that al the sites which are reporting this kind of problems
>>> are
>>> > > ONLY
>>> > > > on 4.6.2.
>>> > > >
>>> > > > After an upgrade to 4.7.0 apparently the problem disappears.
>>> > > >
>>> > > > I will send news next week
>>> > > >
>>> > > > Enrico
>>> > > >
>>> > > > Il dom 20 mag 2018, 19:18 Enrico Olivelli <eolive...@gmail.com> ha
>>> > > > scritto:
>>> > > >
>>> > > > > My guess is that is about using zk ACLs
>>> > > > > I have no evidence
>>> > > > >
>>> > > > > Enrico
>>> > > > >
>>> > > > >
>>> > > > > Il mar 15 mag 2018, 14:09 Enrico Olivelli <eolive...@gmail.com>
>>> ha
>>> > > > > scritto:
>>> > > > >
>>> > > > >> Il giorno mar 15 mag 2018 alle ore 14:04 Sijie Guo <
>>> > > guosi...@gmail.com>
>>> > > > >> ha scritto:
>>> > > > >>
>>> > > > >>> On Tue, May 15, 2018 at 4:45 AM, Enrico Olivelli <
>>> > > eolive...@gmail.com>
>>> > > > >>> wrote:
>>> > > > >>>
>>> > > > >>> > Hi,
>>> > > > >>> > it is quite some time that we are seeing Bookies in staging
>>> > > > >>> environments
>>> > > > >>> > which disappear from ZK but appartently are still up and
>>> running.
>>> > > > >>> >
>>> > > > >>> > I have not dug deeply into this problem, but at first glance
>>> it
>>> > > > should
>>> > > > >>> be
>>> > > > >>> > related to ZK session expiration, those machines are heavily
>>> > loaded
>>> > > > >>> > sometimes and it is not surprising that ZK session expires.
>>> > > > >>> >
>>> > > > >>>
>>> > > > >>> There should be already a logic on re-registration after
>>> session
>>> > > > expired,
>>> > > > >>> no?
>>> > > > >>>
>>> > > > >>
>>> > > > >> Yes. The fact is that in the month we are seeing this strange
>>> > > behaviour.
>>> > > > >> I don't know if it could be a regression on 4.7.
>>> > > > >> I have no reports from production sites, but in production we
>>> have
>>> > > > >> dedicated machines for bookies.
>>> > > > >>
>>> > > > >>
>>> > > > >>>
>>> > > > >>> ZooKeeper stats should always show whether a bookie is able to
>>> > > connect
>>> > > > to
>>> > > > >>> zookeeper. That would probably tell you what happens.
>>> > > > >>>
>>> > > > >>
>>> > > > >> I will check, thank you for your suggestion.
>>> > > > >>
>>> > > > >> Enrico
>>> > > > >>
>>> > > > >>
>>> > > > >>
>>> > > > >>>
>>> > > > >>>
>>> > > > >>> >
>>> > > > >>> > Apart from searching for a bug, I wonder if would it be
>>> useful an
>>> > > > >>> automatic
>>> > > > >>> > self check of the bookie, something like a periodic check
>>> which
>>> > > asks
>>> > > > >>> to the
>>> > > > >>> > Registration Manager if the bookie is listed in the expected
>>> > bookie
>>> > > > >>> list
>>> > > > >>> > (readonly/available....)
>>> > > > >>> >
>>> > > > >>> > This will be useful even if we are not using ZK as well, now
>>> that
>>> > > we
>>> > > > >>> have
>>> > > > >>> > this great abstraction of ZK
>>> > > > >>> >
>>> > > > >>> > Thoughts ?
>>> > > > >>> >
>>> > > > >>> > Enrico
>>> > > > >>> >
>>> > > > >>>
>>> > > > >> --
>>> > > > >
>>> > > > >
>>> > > > > -- Enrico Olivelli
>>> > > > >
>>> > > > --
>>> > > >
>>> > > >
>>> > > > -- Enrico Olivelli
>>> > > >
>>> > >
>>> > --
>>> >
>>> >
>>> > -- Enrico Olivelli
>>> >
>>>
>>>
>>>
>>> --
>>> Jvrao
>>> ---
>>> First they ignore you, then they laugh at you, then they fight you, then
>>> you win. - Mahatma Gandhi
>>>
>> --
>>
>>
>> -- Enrico Olivelli
>>
> --
>
>
> -- Enrico Olivelli
>
-- 


-- Enrico Olivelli

Reply via email to