: > that seems... dangerous. you could easily wind up in a situation where : > nodes just keep trying to forward forever? : : There is some special http parameter being added when forwarding : requests, so I'm sure each node will be able to decide whether it should : act as LB or if it is supposed to be the final destination. Or we can : add such a param. Of course, if SolrJ on the client side has already : selected a replica, the receiving node should not discard that and do : its own balancing. So there is some state to get right here.
"Forever" wasn'treally what i ment to say ... I'm concerned more about how you would implement this to work well in the 'general case' -- ie: multiple nodes, multiple collections, multiple shards, multiple replicas per shard -- w/o doing "too much" forwarding. If nodeA gets a request, when exactly should it decide "i *COULD* handle this request for collection1 using local core, but I'll go ahead and forward it to nodeB instead." ? ... should it be based on what percentage of collection1's total replica list are located on nodeA, or based on what pecentage of nodeA is dedicated to collection1? ... should nodeB be more or less likely then nodeC to get the request based on how many total cores each node has for collection1, or how many unique shards each one has? Also bear in mind that even if you assumed everything was nice and evenly distributed, a "simple" round robin based approach would have some pretty signifincat impacts on the number of intra-node network requests.... Say you have a 5 node cluster, hosting a 1shard/5replica collection such that each node has 1 replica: today any node can process the request locally; but if we did a round robin proxy of the request, that means we'd only handle it locally 1/5th the time, and 4/5ths of the time you add an extra network hop and the assocaited network IO involved (plus the original node has a thread tied up waiting to proxy the response) .. so you'd go from needing 0 "internal" network requests/IO to having internal traffic of 80% of the amount of external traffic recieved. If those 5 nodes host a collection with 2 shards/5replicas each, spread evenly over the 5 nodes: today any given request typically causes 2 intra-cluster network requests to get the per-shard data; but if we round robin proxy the initial request to a differnet node 4/5ths of the time we now typically need 2.8 internal requests for each external request... It just seems like adding more forwarding/proxy logic -- that isn't strictly neccessary to compute complete results -- could introduce a lot of complexity risk for a problem that already has multiple solutions: 1) client (or external load blanacer) can round robin over live nodes (and given that cluster state and metrics are available via HTTP, a client can make very sophisticated choices) 2) a single "extra" solr node in the cluster can be used as a "self configuring" load balancer that will automatically know when new nodes are added to the cluster, or when replicas get moved/added, etc... : : Jan : : > 10. mar. 2021 kl. 19:32 skrev Chris Hostetter <hossman_luc...@fucit.org>: : > : > : > : Is there any way whatsoever to solve this on the Solr side only? : > : : > : Only I can think of is to send all requests to a 3rd node in the cluster : > : that does not have a core for the collection, then it will balance : > : between the two :) : > : > correct -- you can create a Solr node w/o any cores that will act as a : > "load balancer" to other solr nodes. : > : > : Or create a new, empty collection on the node, which acts as a routing : > : collection only to the target collection? : > : > no -- this won't work, because the requerst your remote client sends will : > need to specify the actual collection you want to query, and when the node : > gets this it will hand it to the local core for that collection -- it : > won't care that there is another local collection that's unrelated. : > : > : Sounds like there should be a way to explicitly disable the : > : "optimization" of always handling the request locally in single-shard : > : collections, i.e. always try to balance unless shards.preference=local? : > : > that seems... dangerous. you could easily wind up in a situation where : > nodes just keep trying to forward forever? : > : > : > : > : : > : Jan : > : : > : > 10. mar. 2021 kl. 19:06 skrev Chris Hostetter <hossman_luc...@fucit.org <mailto:hossman_luc...@fucit.org>>: : > : > : > : > : > : > : Ah, I missed "single shard" ... this looks relevant: : > : > : https://issues.apache.org/jira/browse/SOLR-12217 <https://issues.apache.org/jira/browse/SOLR-12217> : > : > : > : > That improvement still isn't going to impact Jan's situation where the : > : > *client* isn't SolrJ ... as the description says: : > : > : > : >>> NOTE: This Jira doesn't cover the single-sharded collections cases when : > : >>> not using the CloudSolrClient or Streaming Expressions (i.e. if you do : > : >>> a non-streaming curl request to a random node in the cluster, the : > : >>> shards.preference parameter is not considered in the case of single : > : >>> shards collections). : > : > : > : > : > : > : : > : > : On Wed, Mar 10, 2021 at 12:43 PM Jan Høydahl <jan....@cominvent.com <mailto:jan....@cominvent.com>> wrote: : > : > : : > : > : > We have not set any shard.preference, and I also think preferLocal : > : > : > defaults to false, i.e random : > : > : > : > : > : > Earlier we had 2 shares for the same collection (both existed on both : > : > : > nodes) and then requests were distributed to both nodes. That’s why, when : > : > : > we went to 1 shard, I was wondering if the “single-shard” code path perhaps : > : > : > never attempts to utilize replicas?? But have not looked in code yet. : > : > : > : > : > : > Guess next step is to setup a small local test cluster and see what : > : > : > happens. : > : > : > : > : > : > Jan Høydahl : > : > : > : > : > : > > 10. mar. 2021 kl. 15:46 skrev Michael Gibney <mich...@michaelgibney.net <mailto:mich...@michaelgibney.net> : > : > : > >: : > : > : > > : > : > : > > You say not "anything fancy" -- depending on how you define "fancy", if : > : > : > you : > : > : > > have an explicit `shards.preference` param, based on the version you're : > : > : > > running (8.4) you might also take a look at : > : > : > > https://issues.apache.org/jira/browse/SOLR-14471 <https://issues.apache.org/jira/browse/SOLR-14471>. (If SOLR-14471 is the : > : > : > > problem, removing the explicit `shards.preference` param should restore : > : > : > > default "shuffling" routing). : > : > : > > : > : > : > > I haven't dug too deep, but it looks like for 8.4 preferLocalShards : > : > : > > actually defaults to false? I might be missing something though: : > : > : > > : > : > : > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85 <https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85> : > : > : > > : > : > : > > : > : > : > > : > : > : > >> On Wed, Mar 10, 2021 at 9:10 AM Houston Putman <houstonput...@gmail.com : > : > : > > : > : > : > >> wrote: : > : > : > >> : > : > : > >> I could be wrong, but i dont think preferLocalShards is the default in : > : > : > >> multi-shard use cases. : > : > : > >> : > : > : > >>> On Wed, Mar 10, 2021 at 9:07 AM Mike Drob <md...@mdrob.com> wrote: : > : > : > >>> : > : > : > >>> I believe a server will always try to prefer local cores. Can you do an : > : > : > >>> experiment with 3 nodes, and send http queries to the node not hosting : > : > : > >> any : > : > : > >>> replicas? That should confirm the balanced distribution. : > : > : > >>> : > : > : > >>> If you have multiple shards, the receiving server will forward the : > : > : > >> requests : > : > : > >>> for shards it doesn’t have, but would still prefer local shards when : > : > : > they : > : > : > >>> are available. : > : > : > >>> : > : > : > >>> On Wed, Mar 10, 2021 at 8:00 AM Jan Høydahl <jan....@cominvent.com> : > : > : > >> wrote: : > : > : > >>> : > : > : > >>>> Hi, : > : > : > >>>> : > : > : > >>>> A client has a SolrCloud 8.4 setup with two nodes, and one collection : > : > : > >>> with : > : > : > >>>> one shard and replicationFactor=2. : > : > : > >>>> Of course we want search traffic to be evenly distributed between the : > : > : > >> two : > : > : > >>>> replicas. : > : > : > >>>> The client is using plain HTTP requests, no SolrJ or anything fancy, : > : > : > >> and : > : > : > >>>> sends all requests to one of the two nodes. : > : > : > >>>> I was expecting Solr to forward about 50% of those requests to the : > : > : > >> other : > : > : > >>>> replica, but it is serving them all locally. : > : > : > >>>> : > : > : > >>>> I know we can setup an LB in front or re-program the client to do : > : > : > round : > : > : > >>>> robin, but that is not my question. : > : > : > >>>> Is the select-random-replica logic only active when we have a sharded : > : > : > >>>> oollection, and not for a single-shard? : > : > : > >>>> : > : > : > >>>> Jan : > : > : > >>> : > : > : > >> : > : > : > : > : > : : > : > : > : > -Hoss : > : > http://www.lucidworks.com/ <http://www.lucidworks.com/> : > : : > : : > : > -Hoss : > http://www.lucidworks.com/ <http://www.lucidworks.com/> : -Hoss http://www.lucidworks.com/