[jira] [Commented] (SOLR-9378) Avoid sending the shard.url parameter in shard requests

Chris M. Hostetter (Jira) Wed, 03 May 2023 18:46:04 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719109#comment-17719109
 ]


Chris M. Hostetter commented on SOLR-9378:
------------------------------------------

This has wormed into my brain again – especially when looking at logs from 
collections with many, many replicas, and thinking about all the bytes over the 
wire being wasted to send a param that is completely ignored 99.99999999% of 
the time (does anyone actually use {{ShardAugmenterFactory}} in production?)
----
When the {{shard.url}} param was introduced, it's usage in {{SearchHandler}} 
had (and still has) this comment...
{code:java}
params.set(ShardParams.SHARD_URL, shard); // so the shard knows what was 
asked{code}
 
...if you dig back into the history of this, and why this param was introduced 
in in 2011 (!) as part of SOLR-2444 / SOLR-705 (and think about the state of 
Solr at that time) it's only real purpose for existing was to power 
{{ShardAugmenterFactory}} _because at that point in time there was no 
SolrCloud, no concept of collections, and {*}no distributed indexing with 
document routing{*}._

When you did a "distributed search" circa Solr 4.0, you *HAD* to specify a 
{{shards}} param that listed the URLs of all the shards to query, and all the 
replicas of those shards to use as fallbacks if a replica was down. The 
_reason_ for the {{shard.url}} in {{SearchHandler}} and the 
{{ShardAugmenterFactory}} , was so that if you were looking at a search result, 
and wanted to updated/delete a document, you would know all the URLs of all the 
"cores" you needed to loop over when sending that indexing commant

In the 12 years since this code was added, no other usage for the {{shard.url}} 
param has come along, and you no longer need to manually update every replica.

So why don't we take a big leap forward into the exciting world of 2013 – where 
solr knows the mapping of collections->shards->replicas and stop wasting 10s of 
KB of network traffic on every request sending a param no one cares about?
----
Proposal:
 * deprecate {{ShardParams.SHARD_URL}} on 9x, delete from main
 * remove all usage of {{ShardParams.SHARD_URL}} on both 9x and main
 * change {{ShardAugmenterFactory}} to output the _name_ of the shard 
associated with the {{SolrCore}} processing the request.
 ** Add a "back compat" option to {{ShardAugmenterFactory}} to output the 
"classic" list of all replica urls
 ** make this configurable by overriding the transformers (implicit) 
registration in {{solrconfig.xml}}
 ** "back compat" code will generate the full list of replica URLs by getting 
them from the {{DocCollection}} associated with the {{SolrCore}} processing the 
request.

> Avoid sending the shard.url parameter in shard requests
> -------------------------------------------------------
>
>                 Key: SOLR-9378
>                 URL: https://issues.apache.org/jira/browse/SOLR-9378
>             Project: Solr
>          Issue Type: Improvement
>          Components: search, SolrCloud
>            Reporter: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 6.2, 7.0
>
>
> The shard.url parameter contains a list of all replicas for a shard. One of 
> those is chosen by the HttpShardHandler to execute the request. So, it is 
> used only within the context of processing request on a distributor node as a 
> special storage for a list of replicas urls between the prep and execution 
> phase of HttpShardHandler. There is no real need to send this parameter down 
> to the chosen shard.
> However, Hoss pointed out to me that removing this would break 
> ShardAugmenterFactory so we need to figure out if/how we can do this. 
> Personally, I don't think it is at all useful to write down *all* replicas 
> with the document without telling which replica really served the query.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-9378) Avoid sending the shard.url parameter in shard requests

Reply via email to