I have also gotten myself into situations where the leader election
looked broken and finding + restarting the overseer has always been
the best method to fix it.  You can find this stuff by browsing
zookeeper.


On Mon, Jun 5, 2023 at 9:13 AM Walter Underwood <wun...@wunderwood.org> wrote:
>
> I’ve seen this kind of thing happen when the overseer is stuck for some 
> reason. Look for a long queue of work for the overseer in zookeeper. I’ve 
> fixed that by restarting the node which is the overseer. The new one wakes up 
> and clears the queue. I’ve only seen that twice.
>
> Wunder
>
> > On Jun 5, 2023, at 12:59 AM, Jan Høydahl <jan....@cominvent.com> wrote:
> >
> > Hi,
> >
> > One possible reason for this could be that a shard leader experienced a high
> > load (or crash), causing its Zookeeper client timeout, e.g. losing its 
> > live_nodes entry.
> > That would cause a leader election, and a replica would become the new 
> > leader.
> > Once the original leader re-joins it will no longer be leader and go into 
> > recovery.
> >
> > Which version of Solr?
> > Look for additional logs for what might have happened, e.g. Timeout logs.
> >
> > Jan
> >
> >> 4. jun. 2023 kl. 04:54 skrev HariBabu kuruva <hari2708.kur...@gmail.com>:
> >>
> >> Hi All,
> >>
> >> As part of the O.S patching we have rebooted the servers and services in
> >> PROD environment. After the activity we have started our services and we
> >> see below errors in Solr.
> >> Remote error message: ClusterState says we are the leader
> >> (https://solrhostname.corp.equinix.com:port/solr/abcStore_shard1_replica_n1),
> >> but locally we don't think so
> >>
> >> Could you please help with this?
> >>
> >> It's a 5 Zk's and 10 solr node cluster.
> >>
> >> Thanks
> >> Hari
> >>
> >>
> >>
> >> --
> >>
> >> Thanks and Regards,
> >> Hari
> >> Mobile:9790756568
> >
>

Reply via email to