Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

Andy C Fri, 20 Oct 2023 13:32:20 -0700

Hi Dave,

The zoo.cfg does not reference Solr at all.


Each Zookeeper instance has 3 ports of note:

   - The "client port" that accepts requests from external clients (in this
   case Solr)
   - Two ports used for internal zookeeper to zookeeper communication

The client port is configured by the "clientPort" entry in the zoo.cfg. The
"server.N" entries both configure what ports this Zookeeper instance will
listen on for internal communication, and configure what host and ports it
will use to communicate with the other Zookeeper instances. In this
example, there are 3 Zookeeper instances, all running on the same machine
(not a real world case).

So if the current Zookeeper instance is instance "1" (as configured in the
"myid" file), it will listen on ports 2888 and 3888 for internal zk to zk
requests. And expects the other two Zookeeper instances to be running on
localhost listening on ports 2889|3889 and 2890|3890 respectively.

On the Solr side, you have to configure the hostname and 'client port' of
each of the Zookeeper instances.

Hope this helps.





On Fri, Oct 20, 2023 at 3:35 PM David Filip <dfi...@colornetlabs.com> wrote:

> Shawn,
>
> Understand about redundancy and needing an odd number of nodes (I’ve used
> quorum in other (non-Solr) type of clusters, so I get it).
>
> So what I’ve done now is installed ZooKeeper on a separate (physical) node
> (so no longer using ZooKeeper bundled with Solr, since that was causing
> come confusion).
>
> So I’m trying to follow this document regarding how to set up the
> “Ensemble” for Solr:
>
> Setting Up an External ZooKeeper Ensemble | Apache Solr Reference Guide 6.6
> <https://solr.apache.org/guide/6_6/setting-up-an-external-zookeeper-ensemble.html>
> solr.apache.org
> <https://solr.apache.org/guide/6_6/setting-up-an-external-zookeeper-ensemble.html>
> [image: favicon.ico]
> <https://solr.apache.org/guide/6_6/setting-up-an-external-zookeeper-ensemble.html>
> <https://solr.apache.org/guide/6_6/setting-up-an-external-zookeeper-ensemble.html>
>
> This includes the following “example” in zoo.cfg:
>
> dataDir=/var/lib/zookeeperdata/1clientPort=2181initLimit=5syncLimit=2
> server.1=localhost:2888:3888
> server.2=localhost:2889:3889
> server.3=localhost:2890:3890
>
>
> So assuming that I have three (3x) physical nodes — each running Solr 9.2
> — and assuming that they are named:
>
> solr1.mydomain.com
> solr2.mydomain.com
> solr3.mydomain.com <http://solr.mydomain.com>
>
> I am assuming that in zoo.cfg I will have:
>
> server.1=solr1.mydomain.com
> server.2=solr2.mydomain.com
> server.3=solr3.mydomain.com
>
> But do I also have three (3x) separate zoo.cfg files, or a single zoo.cfg
> file?  This document is kinda inferring - I think? - that I need to create
> three separate copies in (assuming I’ve install ZooKeeper in
> /opt/zookeeper):
>
> /opt/zookeeper/conf/zoo1.cfg
> /opt/zookeeper/conf/zoo2.cfg
> /opt/zookeeper/conf/zoo3.cfg
>
> But is there also still just a /opt/zookeeper/conf/zoo.cfg as well?  And
> do all of the configuration files contain the same thing?
>
> I’m not sure if this is more of a ZooKeeper question, or more of a Solr
> question, but I’m a bit confused nonetheless as to how this all fits
> together?
>
> As far as pointing each Solr instance, on each physical Solr node, to
> ZooKeeper, I am assuming that I just need to start Solr on each with
> (assuming my ZooKeeper node is zookeeper.mydomain.com):
>
> bin/solr start -e cloud -z zookeeper.mydomain.com:2181 -noprompt
>
>
> Or is there anything else that I need to do on each Solr node?
>
> Thanks in advance for any clarification.
>
> Regards,
>
> Dave.
>
> On Oct 20, 2023, at 2:51 PM, Shawn Heisey <apa...@elyograg.org.INVALID>
> wrote:
>
> On 10/19/23 17:48, David Filip wrote:
>
> I think I am getting confused between differences in Solr versions (most
> links seem to talk about Solr 6, and I’ve installed Solr 9), and SolrCloud
> vs. Standalone, when searching the ’Net …  so I am hoping that someone can
> point me towards what I need to do.  Apologies in advance for perhaps not
> using the correct Solr terminology.
> I will describe what I have, and what I want to accomplish, to the best of
> my abilities.
> I have installed Solr 9.2.1 on two separate physical nodes (different
> physical computers).  Both are running SolrCloud, and are running with the
> same (duplicate) configuration files.  Both are running their own local
> zookeeper, and are separate cores.  Let’s call them solr1 and solr2.  Right
> now I can index content on and search each one individually, but they do
> not know about each other (which is I think the fundamental problem I am
> trying to solve).
>
>
> You need three servers minimum.  In the minimal fault-tolerant setup, two
> of those will run Zookeeper and Solr, the third will only need to run
> Zookeeper.  If the third server does not run Solr, it can be a smaller
> server than the other two.
>
> My goal is to replicate content from one to the other, so that I can take
> one down (e.g., solr1) and still search current collections (e.g., on
> solr2).  When I run Solr Admin web page, I can select: Collections=>
> {collection}, click on a Shard, and I see the [+ add replica] button, but I
> can’t add a new replica on the “other" node, because only the local node
> appears (e.g., 10.0.x.xxx:8983_solr).  What I think I need to do is add the
> nodes (solr1 and solr2) together (?) so that I can add a new replica on the
> “other” node.
>
>
> This is an inherent capability of SolrCloud.  One collection consists of
> one or more shards, and each shard consists of one or more replicas. When
> there is more than one replica, one of them will be elected leader.
>
> All the Solr servers must talk to the same ZK ensemble in order to form a
> SolrCloud cluster.  Zookeeper should run as its own process, not the
> embedded ZK server that Solr provides, but dedicated hosts for ZK are not
> required unless the SolrCloud cluster is really big.
>
> I’ve found references that tell me I need an odd number of zookeeper nodes
> (for quorum), so I’m not sure if I want both nodes to share a single
> zookeeper instance?  If I did do that, and let’s say that I pointed solr2
> to zookeeper on solr1, could I still search against solr2 if solr1
> zookeeper was down?  I would think not, but I’m not sure.
>
>
> Here is the situation with ZK ensemble fault tolerance:
>
> 2 servers can sustain zero failures.
> 3 servers can sustain one failure.
> 4 servers can sustain one failure.
> 5 servers can sustain two failures.
> 6 servers can sustain two failures.
>
> Additional note:  In geographically diverse setups, it is not possible to
> have a fault tolerant ZK install with only two datacenters or availability
> zones.  You need three.
>
> This is why an odd number is recommended -- because adding one more node
> does not provide any additional fault tolerance.
>
> If ZK has too many failures, SolrCloud will switch to read-only mode and
> the node you contact will not be aware of other Solr servers going down or
> coming up.
>
> I would recommend that any new Solr install, especially if you want fault
> tolerance, should run SolrCloud, not standalone mode.
>
> Thanks,
> Shawn
>
>
>

Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

Reply via email to