Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

David Filip Fri, 20 Oct 2023 12:35:30 -0700

Shawn,

Understand about redundancy and needing an odd number of nodes (I’ve used 
quorum in other (non-Solr) type of clusters, so I get it).


So what I’ve done now is installed ZooKeeper on a separate (physical) node (so 
no longer using ZooKeeper bundled with Solr, since that was causing come 
confusion).

So I’m trying to follow this document regarding how to set up the “Ensemble” 
for Solr:

https://solr.apache.org/guide/6_6/setting-up-an-external-zookeeper-ensemble.html

This includes the following “example” in zoo.cfg:

dataDir=/var/lib/zookeeperdata/1
clientPort=2181
initLimit=5
syncLimit=2
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

So assuming that I have three (3x) physical nodes — each running Solr 9.2 — and 
assuming that they are named:

solr1.mydomain.com <http://solr1.mydomain.com/>
solr2.mydomain.com <http://solr2.mydomain.com/>
solr3.mydomain.com <http://solr.mydomain.com/>

I am assuming that in zoo.cfg I will have:

        server.1=solr1.mydomain.com
        server.2=solr2.mydomain.com
        server.3=solr3.mydomain.com

But do I also have three (3x) separate zoo.cfg files, or a single zoo.cfg file? 
 This document is kinda inferring - I think? - that I need to create three 
separate copies in (assuming I’ve install ZooKeeper in /opt/zookeeper):

        /opt/zookeeper/conf/zoo1.cfg
        /opt/zookeeper/conf/zoo2.cfg
        /opt/zookeeper/conf/zoo3.cfg

But is there also still just a /opt/zookeeper/conf/zoo.cfg as well?  And do all 
of the configuration files contain the same thing?

I’m not sure if this is more of a ZooKeeper question, or more of a Solr 
question, but I’m a bit confused nonetheless as to how this all fits together?

As far as pointing each Solr instance, on each physical Solr node, to 
ZooKeeper, I am assuming that I just need to start Solr on each with (assuming 
my ZooKeeper node is zookeeper.mydomain.com <http://zookeeper.mydomain.com/>):

bin/solr start -e cloud -z zookeeper.mydomain.com:2181 -noprompt

Or is there anything else that I need to do on each Solr node?

Thanks in advance for any clarification.

Regards,

Dave.

> On Oct 20, 2023, at 2:51 PM, Shawn Heisey <apa...@elyograg.org.INVALID> wrote:
> 
> On 10/19/23 17:48, David Filip wrote:
>> I think I am getting confused between differences in Solr versions (most 
>> links seem to talk about Solr 6, and I’ve installed Solr 9), and SolrCloud 
>> vs. Standalone, when searching the ’Net …  so I am hoping that someone can 
>> point me towards what I need to do.  Apologies in advance for perhaps not 
>> using the correct Solr terminology.
>> I will describe what I have, and what I want to accomplish, to the best of 
>> my abilities.
>> I have installed Solr 9.2.1 on two separate physical nodes (different 
>> physical computers).  Both are running SolrCloud, and are running with the 
>> same (duplicate) configuration files.  Both are running their own local 
>> zookeeper, and are separate cores.  Let’s call them solr1 and solr2.  Right 
>> now I can index content on and search each one individually, but they do not 
>> know about each other (which is I think the fundamental problem I am trying 
>> to solve).
> 
> You need three servers minimum.  In the minimal fault-tolerant setup, two of 
> those will run Zookeeper and Solr, the third will only need to run Zookeeper. 
>  If the third server does not run Solr, it can be a smaller server than the 
> other two.
> 
>> My goal is to replicate content from one to the other, so that I can take 
>> one down (e.g., solr1) and still search current collections (e.g., on 
>> solr2).  When I run Solr Admin web page, I can select: Collections=> 
>> {collection}, click on a Shard, and I see the [+ add replica] button, but I 
>> can’t add a new replica on the “other" node, because only the local node 
>> appears (e.g., 10.0.x.xxx:8983_solr).  What I think I need to do is add the 
>> nodes (solr1 and solr2) together (?) so that I can add a new replica on the 
>> “other” node.
> 
> This is an inherent capability of SolrCloud.  One collection consists of one 
> or more shards, and each shard consists of one or more replicas. When there 
> is more than one replica, one of them will be elected leader.
> 
> All the Solr servers must talk to the same ZK ensemble in order to form a 
> SolrCloud cluster.  Zookeeper should run as its own process, not the embedded 
> ZK server that Solr provides, but dedicated hosts for ZK are not required 
> unless the SolrCloud cluster is really big.
> 
>> I’ve found references that tell me I need an odd number of zookeeper nodes 
>> (for quorum), so I’m not sure if I want both nodes to share a single 
>> zookeeper instance?  If I did do that, and let’s say that I pointed solr2 to 
>> zookeeper on solr1, could I still search against solr2 if solr1 zookeeper 
>> was down?  I would think not, but I’m not sure.
> 
> Here is the situation with ZK ensemble fault tolerance:
> 
> 2 servers can sustain zero failures.
> 3 servers can sustain one failure.
> 4 servers can sustain one failure.
> 5 servers can sustain two failures.
> 6 servers can sustain two failures.
> 
> Additional note:  In geographically diverse setups, it is not possible to 
> have a fault tolerant ZK install with only two datacenters or availability 
> zones.  You need three.
> 
> This is why an odd number is recommended -- because adding one more node does 
> not provide any additional fault tolerance.
> 
> If ZK has too many failures, SolrCloud will switch to read-only mode and the 
> node you contact will not be aware of other Solr servers going down or coming 
> up.
> 
> I would recommend that any new Solr install, especially if you want fault 
> tolerance, should run SolrCloud, not standalone mode.
> 
> Thanks,
> Shawn
>

Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

Reply via email to