On 10/19/23 17:48, David Filip wrote:
I think I am getting confused between differences in Solr versions (most links
seem to talk about Solr 6, and I’ve installed Solr 9), and SolrCloud vs.
Standalone, when searching the ’Net … so I am hoping that someone can point me
towards what I need to do. Apologies in advance for perhaps not using the
correct Solr terminology.
I will describe what I have, and what I want to accomplish, to the best of my
abilities.
I have installed Solr 9.2.1 on two separate physical nodes (different physical
computers). Both are running SolrCloud, and are running with the same
(duplicate) configuration files. Both are running their own local zookeeper,
and are separate cores. Let’s call them solr1 and solr2. Right now I can
index content on and search each one individually, but they do not know about
each other (which is I think the fundamental problem I am trying to solve).
You need three servers minimum. In the minimal fault-tolerant setup,
two of those will run Zookeeper and Solr, the third will only need to
run Zookeeper. If the third server does not run Solr, it can be a
smaller server than the other two.
My goal is to replicate content from one to the other, so that I can take one down
(e.g., solr1) and still search current collections (e.g., on solr2). When I run Solr
Admin web page, I can select: Collections=> {collection}, click on a Shard, and I
see the [+ add replica] button, but I can’t add a new replica on the “other" node,
because only the local node appears (e.g., 10.0.x.xxx:8983_solr). What I think I need
to do is add the nodes (solr1 and solr2) together (?) so that I can add a new replica
on the “other” node.
This is an inherent capability of SolrCloud. One collection consists of
one or more shards, and each shard consists of one or more replicas.
When there is more than one replica, one of them will be elected leader.
All the Solr servers must talk to the same ZK ensemble in order to form
a SolrCloud cluster. Zookeeper should run as its own process, not the
embedded ZK server that Solr provides, but dedicated hosts for ZK are
not required unless the SolrCloud cluster is really big.
I’ve found references that tell me I need an odd number of zookeeper nodes (for
quorum), so I’m not sure if I want both nodes to share a single zookeeper
instance? If I did do that, and let’s say that I pointed solr2 to zookeeper on
solr1, could I still search against solr2 if solr1 zookeeper was down? I would
think not, but I’m not sure.
Here is the situation with ZK ensemble fault tolerance:
2 servers can sustain zero failures.
3 servers can sustain one failure.
4 servers can sustain one failure.
5 servers can sustain two failures.
6 servers can sustain two failures.
Additional note: In geographically diverse setups, it is not possible
to have a fault tolerant ZK install with only two datacenters or
availability zones. You need three.
This is why an odd number is recommended -- because adding one more node
does not provide any additional fault tolerance.
If ZK has too many failures, SolrCloud will switch to read-only mode and
the node you contact will not be aware of other Solr servers going down
or coming up.
I would recommend that any new Solr install, especially if you want
fault tolerance, should run SolrCloud, not standalone mode.
Thanks,
Shawn