Matthew Biscocho created SOLR-18177:
---------------------------------------

             Summary: AllowListUrlChecker expensive URI creation on every host 
in the cluster on every distributed query
                 Key: SOLR-18177
                 URL: https://issues.apache.org/jira/browse/SOLR-18177
             Project: Solr
          Issue Type: Bug
    Affects Versions: 10.0, 9.10
            Reporter: Matthew Biscocho


While looking into SOLR-18114 AllowListUrlChecker.checkAllowList() was found to 
be hot path for every distributed query on clouds with high number of nodes and 
shards. I found 2 performance problems after digging through a JFR dump 
containing many nodes with many shards:

Each live host URL is parsed on every query using URI.create(). This is 
amplified in CloudReplicaSource which [calls findReplicas() in a loop for every 
shard|https://github.com/apache/solr/blob/6d896e096f593c86d59b779ba0a9a866791440e0/solr/core/src/java/org/apache/solr/handler/component/CloudReplicaSource.java#L96]
 in the collection, so the cost scales linearly with shard count. We could 
probably save some CPU time here by maybe caching these URIs as I don't expect 
live nodes hosts to move that often.

  !Screenshot 2026-03-26 at 9.42.46 AM-1.png!                                   
       

Second is that the host and port string is then [checked against 
liveHostUrls|https://github.com/apache/solr/blob/6d896e096f593c86d59b779ba0a9a866791440e0/solr/core/src/java/org/apache/solr/security/AllowListUrlChecker.java#L145]
 using a stream() O(n) but the funny part is that liveHostUrls is a set 
completely defeating the purpose of doing a O(1) lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to