Matthew Biscocho created SOLR-18177:
---------------------------------------
Summary: AllowListUrlChecker expensive URI creation on every host
in the cluster on every distributed query
Key: SOLR-18177
URL: https://issues.apache.org/jira/browse/SOLR-18177
Project: Solr
Issue Type: Bug
Affects Versions: 10.0, 9.10
Reporter: Matthew Biscocho
While looking into SOLR-18114 AllowListUrlChecker.checkAllowList() was found to
be hot path for every distributed query on clouds with high number of nodes and
shards. I found 2 performance problems after digging through a JFR dump
containing many nodes with many shards:
Each live host URL is parsed on every query using URI.create(). This is
amplified in CloudReplicaSource which [calls findReplicas() in a loop for every
shard|https://github.com/apache/solr/blob/6d896e096f593c86d59b779ba0a9a866791440e0/solr/core/src/java/org/apache/solr/handler/component/CloudReplicaSource.java#L96]
in the collection, so the cost scales linearly with shard count. We could
probably save some CPU time here by maybe caching these URIs as I don't expect
live nodes hosts to move that often.
!Screenshot 2026-03-26 at 9.42.46 AM-1.png!
Second is that the host and port string is then [checked against
liveHostUrls|https://github.com/apache/solr/blob/6d896e096f593c86d59b779ba0a9a866791440e0/solr/core/src/java/org/apache/solr/security/AllowListUrlChecker.java#L145]
using a stream() O(n) but the funny part is that liveHostUrls is a set
completely defeating the purpose of doing a O(1) lookup.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]