I have also gotten myself into situations where the leader election looked broken and finding + restarting the overseer has always been the best method to fix it. You can find this stuff by browsing zookeeper.
On Mon, Jun 5, 2023 at 9:13 AM Walter Underwood <wun...@wunderwood.org> wrote: > > I’ve seen this kind of thing happen when the overseer is stuck for some > reason. Look for a long queue of work for the overseer in zookeeper. I’ve > fixed that by restarting the node which is the overseer. The new one wakes up > and clears the queue. I’ve only seen that twice. > > Wunder > > > On Jun 5, 2023, at 12:59 AM, Jan Høydahl <jan....@cominvent.com> wrote: > > > > Hi, > > > > One possible reason for this could be that a shard leader experienced a high > > load (or crash), causing its Zookeeper client timeout, e.g. losing its > > live_nodes entry. > > That would cause a leader election, and a replica would become the new > > leader. > > Once the original leader re-joins it will no longer be leader and go into > > recovery. > > > > Which version of Solr? > > Look for additional logs for what might have happened, e.g. Timeout logs. > > > > Jan > > > >> 4. jun. 2023 kl. 04:54 skrev HariBabu kuruva <hari2708.kur...@gmail.com>: > >> > >> Hi All, > >> > >> As part of the O.S patching we have rebooted the servers and services in > >> PROD environment. After the activity we have started our services and we > >> see below errors in Solr. > >> Remote error message: ClusterState says we are the leader > >> (https://solrhostname.corp.equinix.com:port/solr/abcStore_shard1_replica_n1), > >> but locally we don't think so > >> > >> Could you please help with this? > >> > >> It's a 5 Zk's and 10 solr node cluster. > >> > >> Thanks > >> Hari > >> > >> > >> > >> -- > >> > >> Thanks and Regards, > >> Hari > >> Mobile:9790756568 > > >