[
https://issues.apache.org/jira/browse/SOLR-18176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SOLR-18176:
----------------------------------
Labels: pull-request-available (was: )
> HttpShardHandler query throughput bottleneck from ZooKeeper
> -----------------------------------------------------------
>
> Key: SOLR-18176
> URL: https://issues.apache.org/jira/browse/SOLR-18176
> Project: Solr
> Issue Type: Bug
> Affects Versions: 10.0, 9.10.1
> Reporter: Matthew Biscocho
> Assignee: Matthew Biscocho
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2026-03-24-13-14-15-761.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> I found significant throughput performance bottlenecking from queries with a
> Solr cloud containing nodes sharing collections and heavily sharded. What I
> noticed was as Solr query load increased, ZooKeeper CPU utilization followed
> linearly. Taking a JFR dump, it showed that every distrib query in
> HttpShardHandler was doing a synchronized get [without allowCache=true hereĀ
> |https://github.com/apache/solr/blob/2ea21db9af976eee8ed10c08fb95e071889387be/solr/core/src/java/org/apache/solr/handler/component/CloudReplicaSource.java#L192]for
> collection state from ZooKeeper which eventually started bottlenecking
> zookeeper reads and holding QTP threads drastically making query latency
> worse.
> Changing to use cache resulting in a huge boost in query throughput and
> reduction in ZooKeeper CPU utilization.
> !image-2026-03-24-13-14-15-761.png!
> PR to follow.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]