Didn’t look at the code but from the number of digits wouldn’t it be a long wrapping around into negative territory?
On Tue 3 Dec 2024 at 02:55, Patrick Lok <patrick....@salesforce.com.invalid> wrote: > Hi, > > We are seeing some weird issues with the Overseer ID which causes some > overseer election problems in our cluster. > > Recently we have noticed that one of our Solr 8 clusters is having trouble > electing dedicated overseer hosts as leader. After some investigation, we > noticed that we are having "negative" Overseer ID (Overseer ID with leading > dash" > > [zk: localhost:2181(CONNECTED) 0] ls /overseer_elect/election > [-5188057493699159958-1.1.1.15:8983_solr-n_0000192189, > -5260098076001480373- > 1.1.1.19:8983_solr-n_0000192192, > -5548288611309897871-1.1.1.28:8983_solr-n_0000192191, > -6124715353171356222-1.1.1.18:8983_solr-n_0000192188, -6412935227404643144- > 1.1.1.22:8983_solr-n_0000192186, > -6412935227404648050-1.1.1.89:8983_solr-n_0000192181, > -6557083032988176767-1.1.1.105:8983_solr-n_0000192190, > -6701159159471144532- > 1.1.1.219:8983_solr-n_0000192183] > > > (the actual IP addresses are different from what pasted above) > > Because of the leading dash in the Overseer ID, it causes the > LeaderElector.getNodeName() to return "5188057493699159958-1.1.1.15 > :8983_solr" instead "1.1.1.15:8983_solr" causing quite a bit of issues. > > Does anyone know why we started seeing a leading dash with the initial set > of digits in the Overseer ID? Who's generating that set of digits? Solr or > ZooKeeper? Is there a way to fix it? > > A simple change to LeaderElector.NODE_NAME seems to be an easy fix. But > since there's no unit test around it, I'm a bit worried that it might break > somewhere else in the code. > > Thanks, > Patrick >