>> 2) a single "extra" solr node in the cluster can be used as a "self configuring" load balancer
I’ve thought about this a bunch before, are there mechanisms to instruct Solr to not host shards for this purpose? Maybe it deserves its own discussion. On Wed, Mar 10, 2021 at 5:14 PM Chris Hostetter <hossman_luc...@fucit.org> wrote: > > : > that seems... dangerous. you could easily wind up in a situation > where > : > nodes just keep trying to forward forever? > : > : There is some special http parameter being added when forwarding > : requests, so I'm sure each node will be able to decide whether it should > : act as LB or if it is supposed to be the final destination. Or we can > : add such a param. Of course, if SolrJ on the client side has already > : selected a replica, the receiving node should not discard that and do > : its own balancing. So there is some state to get right here. > > "Forever" wasn'treally what i ment to say ... I'm concerned more about how > you would implement this to work well in the 'general case' -- ie: > multiple nodes, multiple collections, multiple shards, multiple replicas > per shard -- w/o doing "too much" forwarding. > > > If nodeA gets a request, when exactly should it decide "i *COULD* handle > this request for collection1 using local core, but I'll go ahead and > forward it to nodeB instead." ? ... should it be based on what percentage > of collection1's total replica list are located on nodeA, or based on what > pecentage of nodeA is dedicated to collection1? ... should nodeB be more > or less likely then nodeC to get the request based on how many total cores > each node has for collection1, or how many unique shards each one has? > > > Also bear in mind that even if you assumed everything was nice and evenly > distributed, a "simple" round robin based approach would have some pretty > signifincat impacts on the number of intra-node network requests.... > > Say you have a 5 node cluster, hosting a 1shard/5replica collection such > that each node has 1 replica: today any node can process the request > locally; but if we did a round robin proxy of the request, that means we'd > only handle it locally 1/5th the time, and 4/5ths of the time you add an > extra network hop and the assocaited network IO involved (plus the > original node has a thread tied up waiting to proxy the response) .. so > you'd go from needing 0 "internal" network requests/IO to having internal > traffic of 80% of the amount of external traffic recieved. > > If those 5 nodes host a collection with 2 shards/5replicas each, spread > evenly over the 5 nodes: today any given request typically causes 2 > intra-cluster network requests to get the per-shard data; but if we round > robin proxy the initial request to a differnet node 4/5ths of the time we > now typically need 2.8 internal requests for each external request... > > > It just seems like adding more forwarding/proxy logic -- that isn't > strictly neccessary to compute complete results -- could introduce a lot > of complexity risk for a problem that already has multiple solutions: > > 1) client (or external load blanacer) can round robin over live nodes (and > given that cluster state and metrics are available via HTTP, a client can > make very sophisticated choices) > > 2) a single "extra" solr node in the cluster can be used as a "self > configuring" load balancer that will automatically know when new nodes are > added to the cluster, or when replicas get moved/added, etc... > > > > > > > : > : Jan > : > : > 10. mar. 2021 kl. 19:32 skrev Chris Hostetter < > hossman_luc...@fucit.org>: > : > > : > > : > : Is there any way whatsoever to solve this on the Solr side only? > : > : > : > : Only I can think of is to send all requests to a 3rd node in the > cluster > : > : that does not have a core for the collection, then it will balance > : > : between the two :) > : > > : > correct -- you can create a Solr node w/o any cores that will act as a > : > "load balancer" to other solr nodes. > : > > : > : Or create a new, empty collection on the node, which acts as a > routing > : > : collection only to the target collection? > : > > : > no -- this won't work, because the requerst your remote client sends > will > : > need to specify the actual collection you want to query, and when the > node > : > gets this it will hand it to the local core for that collection -- it > : > won't care that there is another local collection that's unrelated. > : > > : > : Sounds like there should be a way to explicitly disable the > : > : "optimization" of always handling the request locally in > single-shard > : > : collections, i.e. always try to balance unless > shards.preference=local? > : > > : > that seems... dangerous. you could easily wind up in a situation > where > : > nodes just keep trying to forward forever? > : > > : > > : > > : > : > : > : Jan > : > : > : > : > 10. mar. 2021 kl. 19:06 skrev Chris Hostetter < > hossman_luc...@fucit.org <mailto:hossman_luc...@fucit.org>>: > : > : > > : > : > > : > : > : Ah, I missed "single shard" ... this looks relevant: > : > : > : https://issues.apache.org/jira/browse/SOLR-12217 < > https://issues.apache.org/jira/browse/SOLR-12217> > : > : > > : > : > That improvement still isn't going to impact Jan's situation where > the > : > : > *client* isn't SolrJ ... as the description says: > : > : > > : > : >>> NOTE: This Jira doesn't cover the single-sharded collections > cases when > : > : >>> not using the CloudSolrClient or Streaming Expressions (i.e. if > you do > : > : >>> a non-streaming curl request to a random node in the cluster, > the > : > : >>> shards.preference parameter is not considered in the case of > single > : > : >>> shards collections). > : > : > > : > : > > : > : > : > : > : > : On Wed, Mar 10, 2021 at 12:43 PM Jan Høydahl < > jan....@cominvent.com <mailto:jan....@cominvent.com>> wrote: > : > : > : > : > : > : > We have not set any shard.preference, and I also think > preferLocal > : > : > : > defaults to false, i.e random > : > : > : > > : > : > : > Earlier we had 2 shares for the same collection (both existed > on both > : > : > : > nodes) and then requests were distributed to both nodes. > That’s why, when > : > : > : > we went to 1 shard, I was wondering if the “single-shard” code > path perhaps > : > : > : > never attempts to utilize replicas?? But have not looked in > code yet. > : > : > : > > : > : > : > Guess next step is to setup a small local test cluster and see > what > : > : > : > happens. > : > : > : > > : > : > : > Jan Høydahl > : > : > : > > : > : > : > > 10. mar. 2021 kl. 15:46 skrev Michael Gibney < > mich...@michaelgibney.net <mailto:mich...@michaelgibney.net> > : > : > : > >: > : > : > : > > > : > : > : > > You say not "anything fancy" -- depending on how you define > "fancy", if > : > : > : > you > : > : > : > > have an explicit `shards.preference` param, based on the > version you're > : > : > : > > running (8.4) you might also take a look at > : > : > : > > https://issues.apache.org/jira/browse/SOLR-14471 < > https://issues.apache.org/jira/browse/SOLR-14471>. (If SOLR-14471 is the > : > : > : > > problem, removing the explicit `shards.preference` param > should restore > : > : > : > > default "shuffling" routing). > : > : > : > > > : > : > : > > I haven't dug too deep, but it looks like for 8.4 > preferLocalShards > : > : > : > > actually defaults to false? I might be missing something > though: > : > : > : > > > : > : > : > > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85 > < > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85 > > > : > : > : > > > : > : > : > > > : > : > : > > > : > : > : > >> On Wed, Mar 10, 2021 at 9:10 AM Houston Putman < > houstonput...@gmail.com > : > : > : > > > : > : > : > >> wrote: > : > : > : > >> > : > : > : > >> I could be wrong, but i dont think preferLocalShards is the > default in > : > : > : > >> multi-shard use cases. > : > : > : > >> > : > : > : > >>> On Wed, Mar 10, 2021 at 9:07 AM Mike Drob <md...@mdrob.com> > wrote: > : > : > : > >>> > : > : > : > >>> I believe a server will always try to prefer local cores. > Can you do an > : > : > : > >>> experiment with 3 nodes, and send http queries to the node > not hosting > : > : > : > >> any > : > : > : > >>> replicas? That should confirm the balanced distribution. > : > : > : > >>> > : > : > : > >>> If you have multiple shards, the receiving server will > forward the > : > : > : > >> requests > : > : > : > >>> for shards it doesn’t have, but would still prefer local > shards when > : > : > : > they > : > : > : > >>> are available. > : > : > : > >>> > : > : > : > >>> On Wed, Mar 10, 2021 at 8:00 AM Jan Høydahl < > jan....@cominvent.com> > : > : > : > >> wrote: > : > : > : > >>> > : > : > : > >>>> Hi, > : > : > : > >>>> > : > : > : > >>>> A client has a SolrCloud 8.4 setup with two nodes, and > one collection > : > : > : > >>> with > : > : > : > >>>> one shard and replicationFactor=2. > : > : > : > >>>> Of course we want search traffic to be evenly distributed > between the > : > : > : > >> two > : > : > : > >>>> replicas. > : > : > : > >>>> The client is using plain HTTP requests, no SolrJ or > anything fancy, > : > : > : > >> and > : > : > : > >>>> sends all requests to one of the two nodes. > : > : > : > >>>> I was expecting Solr to forward about 50% of those > requests to the > : > : > : > >> other > : > : > : > >>>> replica, but it is serving them all locally. > : > : > : > >>>> > : > : > : > >>>> I know we can setup an LB in front or re-program the > client to do > : > : > : > round > : > : > : > >>>> robin, but that is not my question. > : > : > : > >>>> Is the select-random-replica logic only active when we > have a sharded > : > : > : > >>>> oollection, and not for a single-shard? > : > : > : > >>>> > : > : > : > >>>> Jan > : > : > : > >>> > : > : > : > >> > : > : > : > > : > : > : > : > : > > : > : > -Hoss > : > : > http://www.lucidworks.com/ <http://www.lucidworks.com/> > : > : > : > : > : > > : > -Hoss > : > http://www.lucidworks.com/ <http://www.lucidworks.com/> > : > > -Hoss > http://www.lucidworks.com/