Matthew Biscocho created SOLR-18176:
---------------------------------------
Summary: HttpShardHandler query throughput bottleneck from
ZooKeeper
Key: SOLR-18176
URL: https://issues.apache.org/jira/browse/SOLR-18176
Project: Solr
Issue Type: Bug
Affects Versions: 9.10.1, 10.0
Reporter: Matthew Biscocho
Assignee: Matthew Biscocho
Attachments: image-2026-03-24-13-14-15-761.png
I found significant throughput performance bottlenecking from queries with a
Solr cloud containing nodes sharing collections and heavily sharded. What I
noticed was as Solr query load increased, ZooKeeper CPU utilization followed
linearly. Taking a JFR dump, it showed that every distrib query in
HttpShardHandler was doing a synchronized get [without allowCache=true hereĀ
|https://github.com/apache/solr/blob/2ea21db9af976eee8ed10c08fb95e071889387be/solr/core/src/java/org/apache/solr/handler/component/CloudReplicaSource.java#L192]for
collection state from ZooKeeper which eventually started bottlenecking
zookeeper reads and holding QTP threads drastically making query latency worse.
Changing to use cache resulting in a huge boost in query throughput and
reduction in ZooKeeper CPU utilization.
!image-2026-03-24-13-14-15-761.png!
PR to follow.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]