My guess is that is about using zk ACLs I have no evidence Enrico
Il mar 15 mag 2018, 14:09 Enrico Olivelli <eolive...@gmail.com> ha scritto: > Il giorno mar 15 mag 2018 alle ore 14:04 Sijie Guo <guosi...@gmail.com> > ha scritto: > >> On Tue, May 15, 2018 at 4:45 AM, Enrico Olivelli <eolive...@gmail.com> >> wrote: >> >> > Hi, >> > it is quite some time that we are seeing Bookies in staging environments >> > which disappear from ZK but appartently are still up and running. >> > >> > I have not dug deeply into this problem, but at first glance it should >> be >> > related to ZK session expiration, those machines are heavily loaded >> > sometimes and it is not surprising that ZK session expires. >> > >> >> There should be already a logic on re-registration after session expired, >> no? >> > > Yes. The fact is that in the month we are seeing this strange behaviour. > I don't know if it could be a regression on 4.7. > I have no reports from production sites, but in production we have > dedicated machines for bookies. > > >> >> ZooKeeper stats should always show whether a bookie is able to connect to >> zookeeper. That would probably tell you what happens. >> > > I will check, thank you for your suggestion. > > Enrico > > > >> >> >> > >> > Apart from searching for a bug, I wonder if would it be useful an >> automatic >> > self check of the bookie, something like a periodic check which asks to >> the >> > Registration Manager if the bookie is listed in the expected bookie list >> > (readonly/available....) >> > >> > This will be useful even if we are not using ZK as well, now that we >> have >> > this great abstraction of ZK >> > >> > Thoughts ? >> > >> > Enrico >> > >> > -- -- Enrico Olivelli