Hi,

We are seeing some weird issues with the Overseer ID which causes some
overseer election problems in our cluster.

Recently we have noticed that one of our Solr 8 clusters is having trouble
electing dedicated overseer hosts as leader. After some investigation, we
noticed that we are having "negative" Overseer ID (Overseer ID with leading
dash"

[zk: localhost:2181(CONNECTED) 0] ls /overseer_elect/election
[-5188057493699159958-1.1.1.15:8983_solr-n_0000192189, -5260098076001480373-
1.1.1.19:8983_solr-n_0000192192,
-5548288611309897871-1.1.1.28:8983_solr-n_0000192191,
-6124715353171356222-1.1.1.18:8983_solr-n_0000192188, -6412935227404643144-
1.1.1.22:8983_solr-n_0000192186,
-6412935227404648050-1.1.1.89:8983_solr-n_0000192181,
-6557083032988176767-1.1.1.105:8983_solr-n_0000192190, -6701159159471144532-
1.1.1.219:8983_solr-n_0000192183]


(the actual IP addresses are different from what pasted above)

Because of the leading dash in the Overseer ID, it causes the
LeaderElector.getNodeName() to return "5188057493699159958-1.1.1.15
:8983_solr" instead "1.1.1.15:8983_solr" causing quite a bit of issues.

Does anyone know why we started seeing a leading dash with the initial set
of digits in the Overseer ID? Who's generating that set of digits? Solr or
ZooKeeper? Is there a way to fix it?

A simple change to LeaderElector.NODE_NAME seems to be an easy fix. But
since there's no unit test around it, I'm a bit worried that it might break
somewhere else in the code.

Thanks,
Patrick

Reply via email to