To answer simply, Zookeeper and Solr are just two different distibuted 
applications. Zookeeper is used in a lot of different places where 
synchronization between distributed systems is needed. If you need to 
coordinate data between your own application instances you can use it too.

SolrCloud is a distributed system too, it just happens to be using Zookeeper 
for coordination between it’s nodes, I guess it could have used something else 
instead if Solr developers wanted to.

So fault tolerance of Zookeeper and SolrCloud are two different subjects. It’s 
like your database and your application. You can distribute your database to 
many different places to have fault tolerance. You also distribute your 
application. These two are just different matters. You can use a single 
Zookeeper node and connect all of your Solr nodes to it if you wished to do so. 
But then single Zookeeper node could be a point of failure since if it goes 
down your SolrCloud nodes would lose the ability to synchronize state until it 
comes back up.

So Solr is like your application and Zookeeper is like your database.

Solr comes bundled with a Zookeeper because people would like to try out 
SolrCloud specific features without going through the hassle of setting up 
Zookeeper. It’s just there to start Solr in cloud mode to play with it.

I’d also recommend to stick to a single Solr version’s documentation, like if 
you are installing and using Solr version 9.1, always stick to the 
documentation version 9.1, because older docs may have conflicting/outdated 
information in relation to the version you are using. Google takes you to a 
different doc version whenever you search for something, so be careful to stick 
to the correct documentation version. Nowadays version 6.x is pretty outdated.

If you are using Docker it’s pretty easy to set up an N nodes Zookeeper cluster 
and M nodes SolrCloud cluster. I can share an example docker-compose file if 
you wish.

I hope this was useful

--ufuk yilmaz

Sent from Mail for Windows

From: David Filip
Sent: Saturday, October 21, 2023 2:24 AM
To: users@solr.apache.org
Subject: Re: Newbie Help: Replicating Between Two SolrCloud Instances 
(Solr9.2.1)

Shawn,

Thanks for this.  I will try dig further into the ZooKeeper documentation.

>From a matter of perspective, however, I think what I am not clear on is 
>having more than one ZK “server”, and when and why I would need more than one?

Perhaps it is just terminology, but if I have three (3x) Solr instances (cores) 
running on three (3x) separate physical servers (different hardware), and I 
want to replicate shards between those three, do I have all three (3x) Solr 
instances (cores) taking to the same single (1x) ZooKeeper “server"?

Or if I have three (3x) Solr instances (cores) replicating shards between them, 
do I also need three (3x) ZooKeeper “servers”, e.g., server.1, server.2, 
server.3, each “server” assigned to one specific Solr instance (core)?

So while I understand this the might not be place to talk about configuring 
ZooKeeper per se, if its not too much trouble, can you please clarify if there 
is a many-to-one relationship between Solr and ZooKeeper (many Solr cores 
talking to one ZooKeeper “server”, which communicates between them), or there a 
one-to-one relationship (each Solr instance (core) talks to one ZooKeeper 
“server”).

I hope that is clear and an easy question to answer?  Once I understand that, I 
think I can figure this out with what I have found and been given.

Thanks,

Dave.

> On Oct 20, 2023, at 4:47 PM, Shawn Heisey <elyog...@elyograg.org.INVALID> 
> wrote:
> 
> On 10/20/23 13:34, David Filip wrote:
>> Understand about redundancy and needing an odd number of nodes (I’ve used 
>> quorum in other (non-Solr) type of clusters, so I get it).
>> So what I’ve done now is installed ZooKeeper on a separate (physical) node 
>> (so no longer using ZooKeeper bundled with Solr, since that was causing come 
>> confusion).
>> So I’m trying to follow this document regarding how to set up the “Ensemble” 
>> for Solr:
>> Setting Up an External ZooKeeper Ensemble | Apache Solr Reference Guide 6.6 
>> <https://solr.apache.org/guide/6_6/setting-up-an-external-zookeeper-ensemble.html>
>> solr.apache.org 
>> <https://solr.apache.org/guide/6_6/setting-up-an-external-zookeeper-ensemble.html>
>>      favicon.ico 
>> <https://solr.apache.org/guide/6_6/setting-up-an-external-zookeeper-ensemble.html>
>> <https://solr.apache.org/guide/6_6/setting-up-an-external-zookeeper-ensemble.html>
>> This includes the following “example” in zoo.cfg:
>> |dataDir=/var/lib/zookeeperdata/1 clientPort=2181 initLimit=5 syncLimit=2 
>> server.1=localhost:2888:3888 server.2=localhost:2889:3889 
>> server.3=localhost:2890:3890|
>> So assuming that I have three (3x) physical nodes — each running Solr 9.2 — 
>> and assuming that they are named:
>> solr1.mydomain.com <http://solr1.mydomain.com>
>> solr2.mydomain.com <http://solr2.mydomain.com>
>> solr3.mydomain.com <http://solr.mydomain.com>
> 
> This is getting into how to configure ZK, which is a completely separate 
> Apache project from Solr.  My info here is from memory.
> 
> Those names will only work if each ZK instance is on the same machine as a 
> Solr instance.  ZK is completely separate from Solr, you do not tell it 
> anything about Solr.  You also need the port numbers.  If ZK will be on the 
> same machines as Solr, I would use something like this, and the ZK config 
> will be identical on all the ZK servers:
> 
> dataDir=/path/to/some/data/directory
> clientPort=2181
> initLimit=5
> syncLimit=2
> server.1=solr1.mydomain.com:2888:3888
> server.2=solr1.mydomain.com:2888:3888
> server.3=solr1.mydomain.com:2888:3888
> 
> You must ensure that the ZK servers can talk to each other on tcp ports 2888 
> and 3888, and that each Solr server can reach all the ZK servers on port 
> 2181.  For most purposes, you do not want to use localhost.
> 
> Each server will have a file with its id number.  I think it is named "myid" 
> in the data directory, but you should check ZK documentation to make sure.
> 
> The -z option on the solr script would be something like this:
> 
> solr1.mydomain.com:2181,solr2.mydomain.com:2181,solr3.mydomain.com/solr
> 
> For redundancy purposes, every Solr server will need to talk to ALL of the ZK 
> servers, not just one.
> 
> Adding a chroot (which is /solr in my example) is encouraged just in case you 
> might want to use your ZK install to coordinate software other than Solr or 
> for multiple SolrCloud clusters.  The Solr reference guide has info about how 
> to create the chroot with a 'bin/solr zk' command.
> 
> Thanks,
> Shawn
> 


Reply via email to