[ 
https://issues.apache.org/jira/browse/SOLR-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847033#comment-17847033
 ] 

Aparna Suresh commented on SOLR-17275:
--------------------------------------

Indeed I confirmed the same once I saw your response. Somehow my test didnt 
capture the interaction with Zk with how it was designed, but debugging 
revealed the true issue.

 

Another reason to revert the logic is to have CloudSolrClient not try to be 
up-to-date with global cluster state, which was the intention of the parent 
Jira - SOLR-17153.

> Major performance regression of CloudSolrClient in Solr 9.6.0 when using 
> aliases
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-17275
>                 URL: https://issues.apache.org/jira/browse/SOLR-17275
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrJ
>    Affects Versions: 9.6.0
>         Environment: SolrJ 9.6.0, Ubuntu 22.04, Java 17
>            Reporter: Rafał Harabień
>            Priority: Blocker
>             Fix For: 9.6.1
>
>         Attachments: image-2024-05-06-17-23-42-236.png
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I observe worse performance of CloudSolrClient after upgrading from SolrJ 
> 9.5.0 to 9.6.0, especially on p99. 
> p99 jumped from ~25 ms to ~400 ms
> p90 jumped from ~9.9 ms to ~22 ms
> p75 jumped from ~7 ms to ~11 ms
> p50 jumped from ~4.5 ms to ~7.5 ms
> Screenshot from Grafana (at ~14:30 was deployed the new version):
> !image-2024-05-06-17-23-42-236.png!
> I've got a thread-dump and I can see many threads waiting in 
> [ZkStateReader.forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]:
> {noformat}
> Thread info: "suggest-solrThreadPool-thread-52" prio=5 Id=600 BLOCKED on 
> org.apache.solr.common.cloud.ZkStateReader@62e6bc3d owned by 
> "suggest-solrThreadPool-thread-34" Id=582
>       at 
> app//org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:506)
>       -  blocked on org.apache.solr.common.cloud.ZkStateReader@62e6bc3d
>       at 
> app//org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getState(ZkClientClusterStateProvider.java:155)
>       at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.resolveAliases(CloudSolrClient.java:1207)
>       at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1099)
>       at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:892)
>       at 
> app//org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:820)
>       at 
> app//org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:255)
>       at 
> app//org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:927)
>       ...
>       Number of locked synchronizers = 1
>       - java.util.concurrent.ThreadPoolExecutor$Worker@1beb7ed3
> {noformat}
> At the same time qTime from Solr hasn't changed so I'm pretty sure it's a 
> client regression.
> I've tried reproducing it locally and I can see 
> [forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]
>  function being called for every request in my application. I can see that 
> [this|https://github.com/apache/solr/commit/8cf552aa3642be473c6a08ce44feceb9cbe396d7]
>  commit
>  changed the logic in ZkClientClusterStateProvider.getState so the mentioned 
> function gets called if clusterState.getCollectionRef [returns 
> null|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java#L151].
>  In 9.5.0 it wasn't the case (forceUpdateCollection was not called in this 
> place). I can see in the debugger that getCollectionRef only supports 
> collections and not aliases (collectionStates map contains only collections). 
> In my application all collections are referenced using aliases so I guess 
> that's why I can see the regression in Solr response time.
> I am not familiar with the code enough to prepare a PR but I hope this 
> insight will be enough to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to