It is not related to ACLs, I got a report from a site without ZK ACLs. By now all the updated sites (to 4.7.0) are working properly
Enrico Il ven 1 giu 2018, 19:56 Enrico Olivelli <eolive...@gmail.com> ha scritto: > > > Il ven 1 giu 2018, 16:03 Venkateswara Rao Jujjuri <jujj...@gmail.com> ha > scritto: > >> @Enrico >> Let me understand the issue: Bookies are up and running but ZK doesn't >> show >> bookies on the list. >> >> Do you see if session expired or not? or bookies are hung in some way? >> >> We have seen multiple situations in this area: >> 1. Bookie process is around, but zk session lost. Alost all the time we >> ran >> into this when bookie is hung in some sense. >> > > I suspect this is my case, but dumping the stack of all treads shows a > normal bookie without activity. > No error reported on logs. > > 2. Bookies are down, but ZK still shows the Bookie in RW list. We suspect >> this is a ZK bug which got fixed in later releases. >> > > No, bookie is not in the list of available/readonly bookies > > > 3. Bookies are hug at disk, but was able to keep up ZK session. >> > > Not this case, because no thread is performing I/O. > >> >> Does your case fit into any of these situations? Or do you believe that >> the >> Bookie is healthy and up but lost ZK session? >> > > I believe bookie is healthy, didn't remember if we tried 'bookie sanity' > check > > If so how did you validate that your Bookie is healthy.? >> > > Normal dump of stack trace > Port bound, listening > No errors in logs > Process alive > Additional http monitoring interface (custom internal monitoring system) > up and reporting no error > > But clients can't see the bookie, even with bookkeeper shell list bookies. > > Unfortunately I have such reports from sites of my customers, so it is > difficult to get feedback and perform tests > > Enrico > > >> JV >> >> >> On Fri, Jun 1, 2018 at 12:11 AM, Enrico Olivelli <eolive...@gmail.com> >> wrote: >> >> > Il ven 1 giu 2018, 08:49 Sijie Guo <guosi...@gmail.com> ha scritto: >> > >> > > I don't think there is any zk changes between 4.6.2 and 4.7.0. Are you >> > sure >> > > the upgrade fixes the problem? >> > > >> > >> > I have checked several times and it seems to me that every zk fix in >> 4.7.0 >> > has been cherry picked to 4.6.2. >> > It is only a fact that with the upgrade the issue does not appear. >> Maybe it >> > is too early to say that it is working. >> > >> > I will send news >> > >> > Enrico >> > >> > >> > >> > > - Sijie >> > > >> > > On Thu, May 31, 2018 at 11:30 PM, Enrico Olivelli < >> eolive...@gmail.com> >> > > wrote: >> > > >> > > > Seems that al the sites which are reporting this kind of problems >> are >> > > ONLY >> > > > on 4.6.2. >> > > > >> > > > After an upgrade to 4.7.0 apparently the problem disappears. >> > > > >> > > > I will send news next week >> > > > >> > > > Enrico >> > > > >> > > > Il dom 20 mag 2018, 19:18 Enrico Olivelli <eolive...@gmail.com> ha >> > > > scritto: >> > > > >> > > > > My guess is that is about using zk ACLs >> > > > > I have no evidence >> > > > > >> > > > > Enrico >> > > > > >> > > > > >> > > > > Il mar 15 mag 2018, 14:09 Enrico Olivelli <eolive...@gmail.com> >> ha >> > > > > scritto: >> > > > > >> > > > >> Il giorno mar 15 mag 2018 alle ore 14:04 Sijie Guo < >> > > guosi...@gmail.com> >> > > > >> ha scritto: >> > > > >> >> > > > >>> On Tue, May 15, 2018 at 4:45 AM, Enrico Olivelli < >> > > eolive...@gmail.com> >> > > > >>> wrote: >> > > > >>> >> > > > >>> > Hi, >> > > > >>> > it is quite some time that we are seeing Bookies in staging >> > > > >>> environments >> > > > >>> > which disappear from ZK but appartently are still up and >> running. >> > > > >>> > >> > > > >>> > I have not dug deeply into this problem, but at first glance >> it >> > > > should >> > > > >>> be >> > > > >>> > related to ZK session expiration, those machines are heavily >> > loaded >> > > > >>> > sometimes and it is not surprising that ZK session expires. >> > > > >>> > >> > > > >>> >> > > > >>> There should be already a logic on re-registration after session >> > > > expired, >> > > > >>> no? >> > > > >>> >> > > > >> >> > > > >> Yes. The fact is that in the month we are seeing this strange >> > > behaviour. >> > > > >> I don't know if it could be a regression on 4.7. >> > > > >> I have no reports from production sites, but in production we >> have >> > > > >> dedicated machines for bookies. >> > > > >> >> > > > >> >> > > > >>> >> > > > >>> ZooKeeper stats should always show whether a bookie is able to >> > > connect >> > > > to >> > > > >>> zookeeper. That would probably tell you what happens. >> > > > >>> >> > > > >> >> > > > >> I will check, thank you for your suggestion. >> > > > >> >> > > > >> Enrico >> > > > >> >> > > > >> >> > > > >> >> > > > >>> >> > > > >>> >> > > > >>> > >> > > > >>> > Apart from searching for a bug, I wonder if would it be >> useful an >> > > > >>> automatic >> > > > >>> > self check of the bookie, something like a periodic check >> which >> > > asks >> > > > >>> to the >> > > > >>> > Registration Manager if the bookie is listed in the expected >> > bookie >> > > > >>> list >> > > > >>> > (readonly/available....) >> > > > >>> > >> > > > >>> > This will be useful even if we are not using ZK as well, now >> that >> > > we >> > > > >>> have >> > > > >>> > this great abstraction of ZK >> > > > >>> > >> > > > >>> > Thoughts ? >> > > > >>> > >> > > > >>> > Enrico >> > > > >>> > >> > > > >>> >> > > > >> -- >> > > > > >> > > > > >> > > > > -- Enrico Olivelli >> > > > > >> > > > -- >> > > > >> > > > >> > > > -- Enrico Olivelli >> > > > >> > > >> > -- >> > >> > >> > -- Enrico Olivelli >> > >> >> >> >> -- >> Jvrao >> --- >> First they ignore you, then they laugh at you, then they fight you, then >> you win. - Mahatma Gandhi >> > -- > > > -- Enrico Olivelli > -- -- Enrico Olivelli