Thanks Shawn!

I have since created 3x physical servers (Virtual Machines running on a cloud 
server), each one running its own instance of both ZooKeeper and Solr, and 
every Zookeeper/Solr instance configured to know about the others.  Everything 
seems to be working as expected, and any configuration changes are replicated 
across all 3 Solr servers, and I can replicate shards across all 3 Solr servers.

What is even better is that I can make an update on any one of the 3 Solr 
servers, and search for those changes on the other two.  This is in contrast to 
the old model I found, where all updates were made on a master that replicated 
out to read-only slaves.  So a big improvement over that model (which I also 
got confused by after reading about it in older versions of the on-line 
documentation, as even though I had installed Solr 9.2, as most of the 
documentation I found was for <= 6).

I can take one node down, see it as ‘DEAD’, and the other two nodes continue to 
operate (both for updates and reading).  When I bring that node back online, 
within a few seconds, it is automatically updated.  So overall, I think I’m 
good now!

My confusion was partly in trying to figure out how to use the ZooKeeper that 
came bundled with each Solr instance … much easier just to ignore it and 
install ZooKeeper separately … and understanding the one-to-one relationship 
between ZooKeeper and Solr instances.  Every example I found in the 
documentation included installing everything — including multiple servers — on 
the same physical node (localhost), which sent me down the wrong path.

The only hiccup I have is an oddity having to do with my servers being 
dual-hosted (two network connections):

Each server has a 10.0.2.0/24 Ethernet network connection, as well as a 
10.0.1.0/24 WiFi network connection.  The Ethernet network (10.0.2.0/24) is in 
the hosting center, and what I want the servers to use for communication 
between them, and the WiFi network (10.0.1.0/24) is so that I can access the 
servers from outside the hosting center (e.g., my office):

1. I have defined the Ethernet network addresses (10.0.2.0/24) in all of the 
ZooKeeper instances (zoo.cfg)

2. I have defined the Ethernet network addresses (10.0.2.0/24) when starting up 
each Solr instance (-z)

3. ZooKeeper from Solr Admin (Cloud => ZK Status) displays the Ethernet network 
addresses (10.0.2.0/24)

4. However, Solr from Solr Admin (Cloud => Nodes) displays the WiFi network 
addresses (10.0.1.0/24) !!!

So how do I get Solr to use the Ethernet network addresses instead of the WiFi 
network addresses?

I don’t believe this is just a display artifact, because Solr Admin could not 
display other Solr instances (Cloud => Nodes), or replicate between Solr 
instances, until I opened up the network on the WiFi network.  Once I opened up 
the firewall on the WiFi network, then everything started working.

Ideally, I would like Solr to communicate between nodes on the Ethernet 
network, but also be able to answer queries from either network (Ethernet or 
WiFi).  Is that possible?  Can I control which network interface it uses for 
inter-node communications?  The reasons are twofold:

1. Ethernet network is much faster

2. WiFi network is much less reliable (sporadic outages, slow-downs, relying on 
WiFi extenders that have to be periodically restarted)

In theory I guess I could bind Solr only to the Ethernet network (it currently 
binds to 0.0.0.0, a.k.a. all network interfaces), but then I would need to set 
up a separate proxy from my WiFi network to each of the Solr servers, which I 
would rather not do.  So is it possible to control which network interface Solr 
binds to for inter-server communication, while leaving other network interfaces 
open to queries?

Thanks,

Dave.

> On Oct 23, 2023, at 3:18 AM, Shawn Heisey <apa...@elyograg.org.INVALID> wrote:
> 
> On 10/20/2023 5:23 PM, David Filip wrote:
>> From a matter of perspective, however, I think what I am not clear on is 
>> having more than one ZK “server”, and when and why I would need more than 
>> one?
>> Perhaps it is just terminology, but if I have three (3x) Solr instances 
>> (cores) running on three (3x) separate physical servers (different 
>> hardware), and I want to replicate shards between those three, do I have all 
>> three (3x) Solr instances (cores) taking to the same single (1x) ZooKeeper 
>> “server"?
>> Or if I have three (3x) Solr instances (cores) replicating shards between 
>> them, do I also need three (3x) ZooKeeper “servers”, e.g., server.1, 
>> server.2, server.3, each “server” assigned to one specific Solr instance 
>> (core)?
> 
> You need three ZK "servers" each running on different physical hardware so 
> that ZK has fault tolerance.  This requirement of a three server minimum is 
> inherent in ZK's design and cannot be changed.
> 
> You need two Solr servers minimum so that Solr has fault tolerance.
> 
> You can run Solr on the same hardware as you run ZK, but it is STRONGLY 
> recommended that ZK be a completely separate service from Solr, so that if 
> you restart Solr, ZK does not go down, and vice versa.  For best performance, 
> it is also recommended that ZK's data directory reside on a separate physical 
> storage device from other processes like Solr, but if you have a small 
> SolrCloud cluster and/or fast disks such as SSD, that may not be required.
> 
> ZK servers must all know about each other in order to maintain a coherent 
> cluster.
> 
> Each Solr instance must know about all the ZK servers, which is why the 
> zkhost string must list them all with an optional chroot.  Every Solr 
> instance will maintain connections to all of the ZK servers simultaneously.
> 
> As I noted before, a SolrCloud collection is composed of one or more shards.  
> Each shard is composed of one or more replicas, each of which is a Solr core. 
>  One Solr instance can host many cores.  I would recommend NOT running 
> multiple Solr instances on each machine.
> 
> Thanks,
> Shawn
> 

Reply via email to