[ https://issues.apache.org/jira/browse/SOLR-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719109#comment-17719109 ]
Chris M. Hostetter commented on SOLR-9378: ------------------------------------------ This has wormed into my brain again – especially when looking at logs from collections with many, many replicas, and thinking about all the bytes over the wire being wasted to send a param that is completely ignored 99.99999999% of the time (does anyone actually use {{ShardAugmenterFactory}} in production?) ---- When the {{shard.url}} param was introduced, it's usage in {{SearchHandler}} had (and still has) this comment... {code:java} params.set(ShardParams.SHARD_URL, shard); // so the shard knows what was asked{code} ...if you dig back into the history of this, and why this param was introduced in in 2011 (!) as part of SOLR-2444 / SOLR-705 (and think about the state of Solr at that time) it's only real purpose for existing was to power {{ShardAugmenterFactory}} _because at that point in time there was no SolrCloud, no concept of collections, and {*}no distributed indexing with document routing{*}._ When you did a "distributed search" circa Solr 4.0, you *HAD* to specify a {{shards}} param that listed the URLs of all the shards to query, and all the replicas of those shards to use as fallbacks if a replica was down. The _reason_ for the {{shard.url}} in {{SearchHandler}} and the {{ShardAugmenterFactory}} , was so that if you were looking at a search result, and wanted to updated/delete a document, you would know all the URLs of all the "cores" you needed to loop over when sending that indexing commant In the 12 years since this code was added, no other usage for the {{shard.url}} param has come along, and you no longer need to manually update every replica. So why don't we take a big leap forward into the exciting world of 2013 – where solr knows the mapping of collections->shards->replicas and stop wasting 10s of KB of network traffic on every request sending a param no one cares about? ---- Proposal: * deprecate {{ShardParams.SHARD_URL}} on 9x, delete from main * remove all usage of {{ShardParams.SHARD_URL}} on both 9x and main * change {{ShardAugmenterFactory}} to output the _name_ of the shard associated with the {{SolrCore}} processing the request. ** Add a "back compat" option to {{ShardAugmenterFactory}} to output the "classic" list of all replica urls ** make this configurable by overriding the transformers (implicit) registration in {{solrconfig.xml}} ** "back compat" code will generate the full list of replica URLs by getting them from the {{DocCollection}} associated with the {{SolrCore}} processing the request. > Avoid sending the shard.url parameter in shard requests > ------------------------------------------------------- > > Key: SOLR-9378 > URL: https://issues.apache.org/jira/browse/SOLR-9378 > Project: Solr > Issue Type: Improvement > Components: search, SolrCloud > Reporter: Shalin Shekhar Mangar > Priority: Minor > Fix For: 6.2, 7.0 > > > The shard.url parameter contains a list of all replicas for a shard. One of > those is chosen by the HttpShardHandler to execute the request. So, it is > used only within the context of processing request on a distributor node as a > special storage for a list of replicas urls between the prep and execution > phase of HttpShardHandler. There is no real need to send this parameter down > to the chosen shard. > However, Hoss pointed out to me that removing this would break > ShardAugmenterFactory so we need to figure out if/how we can do this. > Personally, I don't think it is at all useful to write down *all* replicas > with the document without telling which replica really served the query. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org