Leader Election Slowness

Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A) Mon, 14 Apr 2025 05:47:59 -0700

Hey All,

I'd like to solicit some feedback from the community regarding leader election 
slowness. This is happening in both Solr 8 and 9 clouds my team manages of 
various sizes. Trying to reproduce this has taken us down the Overseer rabbit 
hole but wanted to grab some oxygen before going too deep :-)


For context by "slow" I mean taking 30 seconds or more (sometimes over a 
minute). Typically this results in a big gap between the live nodes being 
updated (visible on all hosts) aka:

> 2025-04-04 22:23:29.150 INFO  ZkStateReader [   ] ? [zkCallback-13-thread-26] 
> - Updated live nodes from ZooKeeper... (16) -> (15)

Then for ~40 seconds the only consistent thing appears to be indexFetcher 
checking in on whether the index is in-sync, seemingly on a loop.

After 40 seconds we finally see:


> 2025-04-04 22:24:01.041 INFO  ZkStateReader [   ] ? [zkCallback-13-thread-26] 
> - A cluster state change: [WatchedEvent state:SyncConnected 
> type:NodeDataChanged path:/collections/some-cloud/state.json zxid: 
> 1133871372341] for collection [some_collection] has occurred - updating... 
> (live nodes size: [15])

> 2025-04-04 22:24:03.665 INFO  ShardLeaderElectionContext [some_collection 
> shard1 core_node20 some_collection_shard1_replica_t19] ? 
> [zkCallback-13-thread-120] - I am the new leader: 
> http://new-leader-url.com/solr/some_collection_shard1_replica_t19/ shard1

> 2025-04-04 22:24:03.672 INFO  IndexFetcher [   ] ? [indexFetcher-48-thread-1] 
> - Updated leaderUrl to 
> http://new-leader-url.com/solr/some_collection_shard1_replica_t19/

..


So I am most puzzled by the initial 40 second gap. For context this particular 
example occurred on a cloud with 2 collections and 31 total shards (so not too 
crazy). Also didn't see anything suspicious in zk metrics or logs. Has anyone 
experienced something similar and, if so, would they mind sharing what they 
found?


Finally, research we've done so far:

Trying to reproduce this with debug logs got us down the Overseer rabbit hole. 
Matt Biscocho pointed me at the initiative to remove Overseer 
https://issues.apache.org/jira/browse/SOLR-14927 in lieu of 
distributedClusterStateUpdates with optimistic locking. It would be interesting 
to try out but given our difficulty to reproduce the issue consistently we are 
not yet confident in our ability to measure the impact. Also, given that we see 
this for clouds with 1-2 collections typically with up to tens of shards I am 
not sure how much we will benefit from the extra concurrency of cluster state 
updating. I know folks here do this on a much bigger scale in terms of 
collections per cloud than us.

Thanks,
Luke

Leader Election Slowness

Reply via email to