Matthew Biscocho created SOLR-18176:
---------------------------------------

             Summary: HttpShardHandler query throughput bottleneck from 
ZooKeeper
                 Key: SOLR-18176
                 URL: https://issues.apache.org/jira/browse/SOLR-18176
             Project: Solr
          Issue Type: Bug
    Affects Versions: 9.10.1, 10.0
            Reporter: Matthew Biscocho
            Assignee: Matthew Biscocho
         Attachments: image-2026-03-24-13-14-15-761.png

I found significant throughput performance bottlenecking from queries with a 
Solr cloud containing nodes sharing collections and heavily sharded. What I 
noticed was as Solr query load increased, ZooKeeper CPU utilization followed 
linearly. Taking a JFR dump, it showed that every distrib query in 
HttpShardHandler was doing a synchronized get [without allowCache=true hereĀ 
|https://github.com/apache/solr/blob/2ea21db9af976eee8ed10c08fb95e071889387be/solr/core/src/java/org/apache/solr/handler/component/CloudReplicaSource.java#L192]for
 collection state from ZooKeeper which eventually started bottlenecking 
zookeeper reads and holding QTP threads drastically making query latency worse.

Changing to use cache resulting in a huge boost in query throughput and 
reduction in ZooKeeper CPU utilization.

!image-2026-03-24-13-14-15-761.png!

PR to follow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to