[ 
https://issues.apache.org/jira/browse/SOLR-18176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SOLR-18176:
----------------------------------
    Labels: pull-request-available  (was: )

> HttpShardHandler query throughput bottleneck from ZooKeeper
> -----------------------------------------------------------
>
>                 Key: SOLR-18176
>                 URL: https://issues.apache.org/jira/browse/SOLR-18176
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 10.0, 9.10.1
>            Reporter: Matthew Biscocho
>            Assignee: Matthew Biscocho
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2026-03-24-13-14-15-761.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I found significant throughput performance bottlenecking from queries with a 
> Solr cloud containing nodes sharing collections and heavily sharded. What I 
> noticed was as Solr query load increased, ZooKeeper CPU utilization followed 
> linearly. Taking a JFR dump, it showed that every distrib query in 
> HttpShardHandler was doing a synchronized get [without allowCache=true hereĀ 
> |https://github.com/apache/solr/blob/2ea21db9af976eee8ed10c08fb95e071889387be/solr/core/src/java/org/apache/solr/handler/component/CloudReplicaSource.java#L192]for
>  collection state from ZooKeeper which eventually started bottlenecking 
> zookeeper reads and holding QTP threads drastically making query latency 
> worse.
> Changing to use cache resulting in a huge boost in query throughput and 
> reduction in ZooKeeper CPU utilization.
> !image-2026-03-24-13-14-15-761.png!
> PR to follow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to