That value seems to be the ZooKeeper session id. I've found https://issues.apache.org/jira/browse/ZOOKEEPER-1622 that might be related (but I guess you would have seen the error a while ago so it's likely not that).
Also, looking at the session ID generation code (see code in the jira above, SessionTrackerImpl.initializeNextSessionId()), if the server id is bigger than 127 the resulting session id will be negative (if my bit shift analysis skills are still ok). Anything that might have changed there? Doesn't seem to be something that can be reset, it is decided by this method. Solr code should be fixed to do a better job of parsing that string. Ilan On Tue, Dec 3, 2024 at 6:56 PM Patrick Lok <patrick....@salesforce.com.invalid> wrote: > > That's what I think is happening too. The problem is the code is not > expecting it to happen and not handling it correctly. I'm wondering if > there's a way to reset it. > > On Tue, Dec 3, 2024 at 3:28 AM Ilan Ginzburg <ilans...@gmail.com> wrote: > > > Didn’t look at the code but from the number of digits wouldn’t it be a long > > wrapping around into negative territory? > > > > On Tue 3 Dec 2024 at 02:55, Patrick Lok <patrick....@salesforce.com > > .invalid> > > wrote: > > > > > Hi, > > > > > > We are seeing some weird issues with the Overseer ID which causes some > > > overseer election problems in our cluster. > > > > > > Recently we have noticed that one of our Solr 8 clusters is having > > trouble > > > electing dedicated overseer hosts as leader. After some investigation, we > > > noticed that we are having "negative" Overseer ID (Overseer ID with > > leading > > > dash" > > > > > > [zk: localhost:2181(CONNECTED) 0] ls /overseer_elect/election > > > [-5188057493699159958-1.1.1.15:8983_solr-n_0000192189, > > > -5260098076001480373- > > > 1.1.1.19:8983_solr-n_0000192192, > > > -5548288611309897871-1.1.1.28:8983_solr-n_0000192191, > > > -6124715353171356222-1.1.1.18:8983_solr-n_0000192188, > > -6412935227404643144- > > > 1.1.1.22:8983_solr-n_0000192186, > > > -6412935227404648050-1.1.1.89:8983_solr-n_0000192181, > > > -6557083032988176767-1.1.1.105:8983_solr-n_0000192190, > > > -6701159159471144532- > > > 1.1.1.219:8983_solr-n_0000192183] > > > > > > > > > (the actual IP addresses are different from what pasted above) > > > > > > Because of the leading dash in the Overseer ID, it causes the > > > LeaderElector.getNodeName() to return "5188057493699159958-1.1.1.15 > > > :8983_solr" instead "1.1.1.15:8983_solr" causing quite a bit of issues. > > > > > > Does anyone know why we started seeing a leading dash with the initial > > set > > > of digits in the Overseer ID? Who's generating that set of digits? Solr > > or > > > ZooKeeper? Is there a way to fix it? > > > > > > A simple change to LeaderElector.NODE_NAME seems to be an easy fix. But > > > since there's no unit test around it, I'm a bit worried that it might > > break > > > somewhere else in the code. > > > > > > Thanks, > > > Patrick > > > > >