I was only able to get this working giving 30 seconds difference between zookeeper start and solr start.
Regards, Sergio Maroto On Thu, 20 Jun 2024 at 15:08, Sergio García Maroto <marot...@gmail.com> wrote: > Hi All. > > I am facing a weird issue while upgrading Solr8.11 to Solr9. > I have everyhting up and running passing all kind of tests unit and > integration on my current CD process. > > I have a cluster of 3 machines on SolrCloud and it's all good and working. > Problem happens when machines are restarted. Either 1 or 2 servers of the > cluster can't connect to zookeeper even when zookeeper reports as healthy > and stable. > If I restart solr then the server can connect back to the cluster and gets > healthy. > > I check the logs and everything seems normal except the servers who tries > to connect ot the cluster and fails on start. I get this error. > I tried to delay the start of solr a bit just in case but no luck. > > Any help much appreciated. > Sergio > > 2024-06-20 12:56:42.944 INFO (main) [ ] o.a.s.c.c.ZkStateReader Updated > live nodes from ZooKeeper... (0) -> (2) > 2024-06-20 12:56:43.003 INFO (main) [ ] > o.a.s.c.DistributedClusterStateUpdater Creating > DistributedClusterStateUpdater with useDistributedStateUpdate=false. Solr > will be using Overseer based cluster state updates. > 2024-06-20 12:56:43.056 INFO (main) [ ] o.a.s.c.ZkController Publish > node=server03:8983_solr as DOWN > 2024-06-20 12:56:43.088 INFO (main) [ ] o.a.s.c.ZkController Register > node as live in ZooKeeper:/live_nodes/server03:8983_solr > 2024-06-20 12:56:43.111 ERROR (main) [ ] o.a.s.c.ZkController => > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = > NodeExists > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:125) > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode > = NodeExists > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:125) ~[?:?] > at > org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1778) ~[?:?] > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1650) ~[?:?] > at > org.apache.solr.common.cloud.SolrZkClient.lambda$multi$12(SolrZkClient.java:781) > ~[?:?] > at > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:70) > ~[?:?] > at > org.apache.solr.common.cloud.SolrZkClient.multi(SolrZkClient.java:781) > ~[?:?] > at > org.apache.solr.cloud.ZkController.createEphemeralLiveNode(ZkController.java:1211) > ~[?:?] >