Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr9.2.1)

David Filip Fri, 20 Oct 2023 10:16:58 -0700

Thanks - yes, that is what I am trying to do.  However, I am not clear on how 
to add nodes to the Zookeeper included with Solr.

I think the problem I am having is that the zoo.cfg file include with Solr:

$ cat server/solr/zoo.cfg 

Does not ave any ’server.x’ lines, and I’m having trouble trying to figure out 
how to get different Solr nodes to connect to Zookeeper running on a different 
node?  Do I just specify a different Zookeeper node and port when I start Solr 
via a command line parameter (-z {server}:{port})?  So no credentials, and I 
assume network level security?

Perhaps I need to ignore the Zookeeper that comes bundled with Solr and install 
it separately on its own?  Then I am not entirely clear on how I need to 
configure Zookeeper and Solr on each node to work together?

Basically, I’ve found bits and pieces about how to install Solr with its own 
Zookeeper (SolrCloud), and how to install and configure Zookeeper on its own 
(completely separate from anything to do with Solr), and I even found a high 
level page in the Solr documentation that starts with "Although Solr comes 
bundled with Apache ZooKeeper, you should consider yourself discouraged from 
using this internal ZooKeeper in production.”, but it seems a bit light on how 
I configure the “ZooKeeper Ensamble” to work with Solr (I think it assumes I 
have some familiarity with Zookeeper already).

So I probably have read all the bits that I need, and I have probably read more 
bits than I need, some of which I can ignore, but am trying to figure out how 
it all nicely fits together.

Originally yes, I was thinking “If Zookeeper is included with Solr already, why 
do I need to install it separately”, but perhaps that is what is confusing me?

Does all of that make any sense?

Thanks,

Dave.

> On Oct 20, 2023, at 12:36 PM, ufuk yılmaz <uyil...@vivaldi.net.INVALID> wrote:
> 
> Hi Dave,
> 
> Solr knows how to replicate the index accross nodes like you said, but in 
> order to do that all solrcloud nodes should connect to the same Zookeeper 
> cluster, or else how could they know about each other?
> 
> You can make Zookeeper cluster distributed across N nodes so it’s not a 
> single point of failure too.
> 
> But if I understand correctly from your first post, you don’t want to use the 
> same Zookeeper?
> 
> ~~ufuk yilmaz
> 
> Sent from Mail for Windows
> 
> From: David Filip
> Sent: Friday, October 20, 2023 7:30 PM
> To: users@solr.apache.org
> Subject: Re: Newbie Help: Replicating Between Two SolrCloud Instances 
> (Solr9.2.1)
> 
> Dima,
> 
> Thanks for the reply!  However, this does not quite answer my question, as 
> far as I can tell.  I am very familiar with network proxies, and have both 
> Nginx proxy (externally facing) and Apache (internal load balancing) on my 
> network.  I am comfortable with how to distribute search queries across nodes.
> 
> My fundamental question — and sorry if this was not clear — is how to I keep 
> the indices (collections) in-sync across nodes?  Put another way, if I update 
> shard1 on one node, how do I get the other node(s) automatically updated?  
> The goal is to be able to do indexing on a particular node, and have any 
> updates propagate across the other nodes, so that the indices (collections) 
> are identical (hopefully within a few seconds) across all of the nodes.
> 
> Of course, one way is to have a shared filesystem to share the index 
> (collection) data files across all of the nodes … but then the shared 
> filesystem becomes a single point of failure.
> 
> It appears that Solr knows how to replicate the indices (collections) across 
> nodes, so that there is no single point of failure.  This is what I am trying 
> to figure out.
> 
> Thanks,
> 
> Dave.
> 
>> On Oct 20, 2023, at 11:52 AM, Dmitri Maziuk <dmitri.maz...@gmail.com> wrote:
>> 
>> On 10/19/23 18:48, David Filip wrote:
>> 
>>> My goal is to replicate content from one to the other, so that I can take 
>>> one down (e.g., solr1) and still search current collections (e.g., on 
>>> solr2).
>> 
>> You need a proxy host, it can be anything from apache to F5, configured to 
>> pass requests to Solr nodes, based on some criteria.
>> 
>> In the active-passive, blue-green, or whatever you call it, configuration, 
>> you and don't need zookeeper or anything shared on the backend (there is an 
>> argument for having the backend nodes fully independent).
>> 
>> If you RTFM: see Query Fault Tolerance" in 
>> https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-distributed-requests.html
>>  -- even if you use SolrCloud you still need a proxy for what you want done. 
>> (Unless your client application knows how to talk to zookeper and can use it 
>> as the proxy.)
>> 
>> As an aside, it's interesting that Apache httpd does not have a mod_zookeper 
>> among its proxy modules.
>> 
>> Dima
>> 
> 
>

Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr9.2.1)

Reply via email to