I got report from other sites with this issue that after the upgrade to 4.7.0 the problem does not appear anymore.
For me the case is closed. I don't like this 'try to upgrade and hopefully it will work' but I didn't have the possibility to perform debug because it is not possible to add debugging to customers sites. Enrico Il mer 6 giu 2018, 18:51 Enrico Olivelli <eolive...@gmail.com> ha scritto: > It is not related to ACLs, I got a report from a site without ZK ACLs. > > By now all the updated sites (to 4.7.0) are working properly > > Enrico > > > Il ven 1 giu 2018, 19:56 Enrico Olivelli <eolive...@gmail.com> ha scritto: > >> >> >> Il ven 1 giu 2018, 16:03 Venkateswara Rao Jujjuri <jujj...@gmail.com> ha >> scritto: >> >>> @Enrico >>> Let me understand the issue: Bookies are up and running but ZK doesn't >>> show >>> bookies on the list. >>> >>> Do you see if session expired or not? or bookies are hung in some way? >>> >>> We have seen multiple situations in this area: >>> 1. Bookie process is around, but zk session lost. Alost all the time we >>> ran >>> into this when bookie is hung in some sense. >>> >> >> I suspect this is my case, but dumping the stack of all treads shows a >> normal bookie without activity. >> No error reported on logs. >> >> 2. Bookies are down, but ZK still shows the Bookie in RW list. We suspect >>> this is a ZK bug which got fixed in later releases. >>> >> >> No, bookie is not in the list of available/readonly bookies >> >> >> 3. Bookies are hug at disk, but was able to keep up ZK session. >>> >> >> Not this case, because no thread is performing I/O. >> >>> >>> Does your case fit into any of these situations? Or do you believe that >>> the >>> Bookie is healthy and up but lost ZK session? >>> >> >> I believe bookie is healthy, didn't remember if we tried 'bookie sanity' >> check >> >> If so how did you validate that your Bookie is healthy.? >>> >> >> Normal dump of stack trace >> Port bound, listening >> No errors in logs >> Process alive >> Additional http monitoring interface (custom internal monitoring system) >> up and reporting no error >> >> But clients can't see the bookie, even with bookkeeper shell list bookies. >> >> Unfortunately I have such reports from sites of my customers, so it is >> difficult to get feedback and perform tests >> >> Enrico >> >> >>> JV >>> >>> >>> On Fri, Jun 1, 2018 at 12:11 AM, Enrico Olivelli <eolive...@gmail.com> >>> wrote: >>> >>> > Il ven 1 giu 2018, 08:49 Sijie Guo <guosi...@gmail.com> ha scritto: >>> > >>> > > I don't think there is any zk changes between 4.6.2 and 4.7.0. Are >>> you >>> > sure >>> > > the upgrade fixes the problem? >>> > > >>> > >>> > I have checked several times and it seems to me that every zk fix in >>> 4.7.0 >>> > has been cherry picked to 4.6.2. >>> > It is only a fact that with the upgrade the issue does not appear. >>> Maybe it >>> > is too early to say that it is working. >>> > >>> > I will send news >>> > >>> > Enrico >>> > >>> > >>> > >>> > > - Sijie >>> > > >>> > > On Thu, May 31, 2018 at 11:30 PM, Enrico Olivelli < >>> eolive...@gmail.com> >>> > > wrote: >>> > > >>> > > > Seems that al the sites which are reporting this kind of problems >>> are >>> > > ONLY >>> > > > on 4.6.2. >>> > > > >>> > > > After an upgrade to 4.7.0 apparently the problem disappears. >>> > > > >>> > > > I will send news next week >>> > > > >>> > > > Enrico >>> > > > >>> > > > Il dom 20 mag 2018, 19:18 Enrico Olivelli <eolive...@gmail.com> ha >>> > > > scritto: >>> > > > >>> > > > > My guess is that is about using zk ACLs >>> > > > > I have no evidence >>> > > > > >>> > > > > Enrico >>> > > > > >>> > > > > >>> > > > > Il mar 15 mag 2018, 14:09 Enrico Olivelli <eolive...@gmail.com> >>> ha >>> > > > > scritto: >>> > > > > >>> > > > >> Il giorno mar 15 mag 2018 alle ore 14:04 Sijie Guo < >>> > > guosi...@gmail.com> >>> > > > >> ha scritto: >>> > > > >> >>> > > > >>> On Tue, May 15, 2018 at 4:45 AM, Enrico Olivelli < >>> > > eolive...@gmail.com> >>> > > > >>> wrote: >>> > > > >>> >>> > > > >>> > Hi, >>> > > > >>> > it is quite some time that we are seeing Bookies in staging >>> > > > >>> environments >>> > > > >>> > which disappear from ZK but appartently are still up and >>> running. >>> > > > >>> > >>> > > > >>> > I have not dug deeply into this problem, but at first glance >>> it >>> > > > should >>> > > > >>> be >>> > > > >>> > related to ZK session expiration, those machines are heavily >>> > loaded >>> > > > >>> > sometimes and it is not surprising that ZK session expires. >>> > > > >>> > >>> > > > >>> >>> > > > >>> There should be already a logic on re-registration after >>> session >>> > > > expired, >>> > > > >>> no? >>> > > > >>> >>> > > > >> >>> > > > >> Yes. The fact is that in the month we are seeing this strange >>> > > behaviour. >>> > > > >> I don't know if it could be a regression on 4.7. >>> > > > >> I have no reports from production sites, but in production we >>> have >>> > > > >> dedicated machines for bookies. >>> > > > >> >>> > > > >> >>> > > > >>> >>> > > > >>> ZooKeeper stats should always show whether a bookie is able to >>> > > connect >>> > > > to >>> > > > >>> zookeeper. That would probably tell you what happens. >>> > > > >>> >>> > > > >> >>> > > > >> I will check, thank you for your suggestion. >>> > > > >> >>> > > > >> Enrico >>> > > > >> >>> > > > >> >>> > > > >> >>> > > > >>> >>> > > > >>> >>> > > > >>> > >>> > > > >>> > Apart from searching for a bug, I wonder if would it be >>> useful an >>> > > > >>> automatic >>> > > > >>> > self check of the bookie, something like a periodic check >>> which >>> > > asks >>> > > > >>> to the >>> > > > >>> > Registration Manager if the bookie is listed in the expected >>> > bookie >>> > > > >>> list >>> > > > >>> > (readonly/available....) >>> > > > >>> > >>> > > > >>> > This will be useful even if we are not using ZK as well, now >>> that >>> > > we >>> > > > >>> have >>> > > > >>> > this great abstraction of ZK >>> > > > >>> > >>> > > > >>> > Thoughts ? >>> > > > >>> > >>> > > > >>> > Enrico >>> > > > >>> > >>> > > > >>> >>> > > > >> -- >>> > > > > >>> > > > > >>> > > > > -- Enrico Olivelli >>> > > > > >>> > > > -- >>> > > > >>> > > > >>> > > > -- Enrico Olivelli >>> > > > >>> > > >>> > -- >>> > >>> > >>> > -- Enrico Olivelli >>> > >>> >>> >>> >>> -- >>> Jvrao >>> --- >>> First they ignore you, then they laugh at you, then they fight you, then >>> you win. - Mahatma Gandhi >>> >> -- >> >> >> -- Enrico Olivelli >> > -- > > > -- Enrico Olivelli > -- -- Enrico Olivelli