Hi Dave, Solr knows how to replicate the index accross nodes like you said, but in order to do that all solrcloud nodes should connect to the same Zookeeper cluster, or else how could they know about each other?
You can make Zookeeper cluster distributed across N nodes so it’s not a single point of failure too. But if I understand correctly from your first post, you don’t want to use the same Zookeeper? ~~ufuk yilmaz Sent from Mail for Windows From: David Filip Sent: Friday, October 20, 2023 7:30 PM To: users@solr.apache.org Subject: Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr9.2.1) Dima, Thanks for the reply! However, this does not quite answer my question, as far as I can tell. I am very familiar with network proxies, and have both Nginx proxy (externally facing) and Apache (internal load balancing) on my network. I am comfortable with how to distribute search queries across nodes. My fundamental question — and sorry if this was not clear — is how to I keep the indices (collections) in-sync across nodes? Put another way, if I update shard1 on one node, how do I get the other node(s) automatically updated? The goal is to be able to do indexing on a particular node, and have any updates propagate across the other nodes, so that the indices (collections) are identical (hopefully within a few seconds) across all of the nodes. Of course, one way is to have a shared filesystem to share the index (collection) data files across all of the nodes … but then the shared filesystem becomes a single point of failure. It appears that Solr knows how to replicate the indices (collections) across nodes, so that there is no single point of failure. This is what I am trying to figure out. Thanks, Dave. > On Oct 20, 2023, at 11:52 AM, Dmitri Maziuk <dmitri.maz...@gmail.com> wrote: > > On 10/19/23 18:48, David Filip wrote: > >> My goal is to replicate content from one to the other, so that I can take >> one down (e.g., solr1) and still search current collections (e.g., on solr2). > > You need a proxy host, it can be anything from apache to F5, configured to > pass requests to Solr nodes, based on some criteria. > > In the active-passive, blue-green, or whatever you call it, configuration, > you and don't need zookeeper or anything shared on the backend (there is an > argument for having the backend nodes fully independent). > > If you RTFM: see Query Fault Tolerance" in > https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-distributed-requests.html > -- even if you use SolrCloud you still need a proxy for what you want done. > (Unless your client application knows how to talk to zookeper and can use it > as the proxy.) > > As an aside, it's interesting that Apache httpd does not have a mod_zookeper > among its proxy modules. > > Dima >