[ https://issues.apache.org/jira/browse/SOLR-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847059#comment-17847059 ]
ASF subversion and git services commented on SOLR-17275: -------------------------------------------------------- Commit 53963292db39803996f76a5229ac3b91c6265b5e in solr's branch refs/heads/branch_9_6 from aparnasuresh85 [ https://gitbox.apache.org/repos/asf?p=solr.git;h=53963292db3 ] SOLR-17275: Revert SolrJ ZkClientClusterStateProvider SOLR-17153 (#2463) SolrJ ZkClientClusterStateProvider: revert SOLR-17153 for perf regression when aliases are used. Other parts of that issue are not reverted. Affects only Solr 9.6. Thanks Rafał Harabień for reporting the problem. (cherry picked from commit f073dc86e1ab714c3e8eaa4a989698feb90f8a27) > Major performance regression of CloudSolrClient in Solr 9.6.0 when using > aliases > -------------------------------------------------------------------------------- > > Key: SOLR-17275 > URL: https://issues.apache.org/jira/browse/SOLR-17275 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ > Affects Versions: 9.6.0 > Environment: SolrJ 9.6.0, Ubuntu 22.04, Java 17 > Reporter: Rafał Harabień > Priority: Blocker > Fix For: 9.6.1 > > Attachments: image-2024-05-06-17-23-42-236.png > > Time Spent: 1h > Remaining Estimate: 0h > > I observe worse performance of CloudSolrClient after upgrading from SolrJ > 9.5.0 to 9.6.0, especially on p99. > p99 jumped from ~25 ms to ~400 ms > p90 jumped from ~9.9 ms to ~22 ms > p75 jumped from ~7 ms to ~11 ms > p50 jumped from ~4.5 ms to ~7.5 ms > Screenshot from Grafana (at ~14:30 was deployed the new version): > !image-2024-05-06-17-23-42-236.png! > I've got a thread-dump and I can see many threads waiting in > [ZkStateReader.forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503]: > {noformat} > Thread info: "suggest-solrThreadPool-thread-52" prio=5 Id=600 BLOCKED on > org.apache.solr.common.cloud.ZkStateReader@62e6bc3d owned by > "suggest-solrThreadPool-thread-34" Id=582 > at > app//org.apache.solr.common.cloud.ZkStateReader.forceUpdateCollection(ZkStateReader.java:506) > - blocked on org.apache.solr.common.cloud.ZkStateReader@62e6bc3d > at > app//org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider.getState(ZkClientClusterStateProvider.java:155) > at > app//org.apache.solr.client.solrj.impl.CloudSolrClient.resolveAliases(CloudSolrClient.java:1207) > at > app//org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1099) > at > app//org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:892) > at > app//org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:820) > at > app//org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:255) > at > app//org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:927) > ... > Number of locked synchronizers = 1 > - java.util.concurrent.ThreadPoolExecutor$Worker@1beb7ed3 > {noformat} > At the same time qTime from Solr hasn't changed so I'm pretty sure it's a > client regression. > I've tried reproducing it locally and I can see > [forceUpdateCollection|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L503] > function being called for every request in my application. I can see that > [this|https://github.com/apache/solr/commit/8cf552aa3642be473c6a08ce44feceb9cbe396d7] > commit > changed the logic in ZkClientClusterStateProvider.getState so the mentioned > function gets called if clusterState.getCollectionRef [returns > null|https://github.com/apache/solr/blob/f8e5a93c11267e13b7b43005a428bfb910ac6e57/solr/solrj-zookeeper/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java#L151]. > In 9.5.0 it wasn't the case (forceUpdateCollection was not called in this > place). I can see in the debugger that getCollectionRef only supports > collections and not aliases (collectionStates map contains only collections). > In my application all collections are referenced using aliases so I guess > that's why I can see the regression in Solr response time. > I am not familiar with the code enough to prepare a PR but I hope this > insight will be enough to fix this issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org