Re: knn query parser, number of results and filtering by score

2023-10-20 Thread Alessandro Benedetti
I agree, you can definitely raise a bug for the debug, if you do me a
favour and also test in no Cloud mode, it will help us to understand if
it's a Solr bug or Lucene bug.

I also agree with your second point about the functional expectations, that
is a very minor though, you can create the ticket and contribute a fix if
you like, happy to review it!
--
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 | Github



On Thu, 19 Oct 2023 at 17:51, Mirko Sertic  wrote:

> I've prepared a testcase. Given the following documents with
> TESTEMBEDDING_EU_3 is a DenseVectorField with length 3 and euclidean
> distance function. They are written to a collection made of two shards
> with no further routing strategy, so they should be more or less evenly
> distributed between the two shards:
>
> {
>id: 'Position1',
>TESTEMBEDDING_EU_3: [0, 0, 0]
> }
> {
>id: 'Position2',
>TESTEMBEDDING_EU_3: [0.1, 0.1, 0.1]
> }
> {
>id: 'Position3',
>TESTEMBEDDING_EU_3: [0.2, 0.2, 0.2]
> }
> {
>id: 'Position4',
>TESTEMBEDDING_EU_3: [0.3, 0.3, 0.3]
> }
> {
>id: 'Position5',
>TESTEMBEDDING_EU_3: [0.4, 0.4, 0.4]
> }
> {
>id: 'Position6',
>TESTEMBEDDING_EU_3: [0.5, 0.5, 0.5]
> }
> {
>id: 'Position7',
>TESTEMBEDDING_EU_3: [0.6, 0.6, 0.6]
> }
> {
>id: 'Position8',
>TESTEMBEDDING_EU_3: [0.7, 0.7, 0.7]
> }
> {
>id: 'Position9',
>TESTEMBEDDING_EU_3: [0.8, 0.8, 0.8]
> }
> {
>id: 'Position10',
>TESTEMBEDDING_EU_3: [0.9, 0.9, 0.9]
> }
> {
>id: 'Position11',
>TESTEMBEDDING_EU_3: [1.0, 1.0, 1.0]
> }
>
> How I'll do a {!knn f=TESTEMBEDDING_EU_3  topK=3}[1.0,1.0,1.0] query.
> I'd expect a result with 3 documents, id:Position11 should be an exact
> macht, and the nearst neighbors should be id:Position10 and
> id:Position9. I'd also expect that the explain logging should mark these
> tree as part of the topK=3. I get the following search result:
>
> {
>"responseHeader": {
>  "zkConnected": true,
>  "status": 0,
>  "QTime": 35
>},
>"response": {
>  "numFound": 6,
>  "start": 0,
>  "maxScore": 1.0,
>  "numFoundExact": true,
>  "docs": [
>{
>  "id": "Position11",
>  "TESTEMBEDDING_3": [
>"1.0",
>"1.0",
>"1.0"
>  ],
>  "[shard]":
> "
> http://fusion-integ-solr-search-200gb-0.fusion-integ-solr-search-200gb-headless:8983/solr/suchpool_atlas_2023_10_08_shard1_replica_p9/|http://fusion-integ-solr-analytics-200gb-1.fusion-integ-solr-analytics-200gb-headless:8983/solr/suchpool_atlas_2023_10_08_shard1_replica_t7/|http://fusion-integ-solr-search-200gb-1.fusion-integ-solr-search-200gb-headless:8983/solr/suchpool_atlas_2023_10_08_shard1_replica_p11/|http://fusion-integ-solr-analytics-200gb-0.fusion-integ-solr-analytics-200gb-headless:8983/solr/suchpool_atlas_2023_10_08_shard1_replica_t5/
> ",
>  "[explain]": "0.0 = not in top 3\n",
>  "score": 1.0
>},
>{
>  "id": "Position10",
>  "TESTEMBEDDING_3": [
>"0.9",
>"0.9",
>"0.9"
>  ],
>  "[shard]":
> "
> http://fusion-integ-solr-search-200gb-0.fusion-integ-solr-search-200gb-headless:8983/solr/suchpool_atlas_2023_10_08_shard1_replica_p9/|http://fusion-integ-solr-analytics-200gb-1.fusion-integ-solr-analytics-200gb-headless:8983/solr/suchpool_atlas_2023_10_08_shard1_replica_t7/|http://fusion-integ-solr-search-200gb-1.fusion-integ-solr-search-200gb-headless:8983/solr/suchpool_atlas_2023_10_08_shard1_replica_p11/|http://fusion-integ-solr-analytics-200gb-0.fusion-integ-solr-analytics-200gb-headless:8983/solr/suchpool_atlas_2023_10_08_shard1_replica_t5/
> ",
>  "[explain]": "0.0 = not in top 3\n",
>  "score": 0.97087383
>},
>{
>  "id": "Position9",
>  "TESTEMBEDDING_3": [
>"0.8",
>"0.8",
>"0.8"
>  ],
>  "[shard]":
> "
> http://fusion-integ-solr-search-200gb-0.fusion-integ-solr-search-200gb-headless:8983/solr/suchpool_atlas_2023_10_08_shard2_replica_p17/|http://fusion-integ-solr-search-200gb-1.fusion-integ-solr-search-200gb-headless:8983/solr/suchpool_atlas_2023_10_08_shard2_replica_p21/|http://fusion-integ-solr-analytics-200gb-0.fusion-integ-solr-analytics-200gb-headless:8983/solr/suchpool_atlas_2023_10_08_shard2_replica_t13/|http://fusion-integ-solr-analytics-200gb-1.fusion-integ-solr-analytics-200gb-headless:8983/solr/suchpool_atlas_2023_10_08_shard2_replica_t15/
> ",
>  "[explain]": "0.0 = not in top 3\n",
> 

Re: Trouble with ZK Status and TLS

2023-10-20 Thread Jamie Gruener
Thank you! Bummed that this isn't working but very relieved that I can stop 
trying to fix it.

Thanks,

--Jamie



On 10/19/23, 4:46 PM, "Jan Høydahl" mailto:jan@cominvent.com>> wrote:


CAUTION: This email originated from outside of biospatial. Do not click links 
or open attachments unless you recognize the sender and know the content is 
safe.




The Admin UI ZK screen is not intended for the SSL connection to ZK, since 
status is not supported on that protocol. Instead we need to add support for 
the ZK AdminServer to get the same. Contributions are welcome.


Jan


> 19. okt. 2023 kl. 22:29 skrev Jamie Gruener  >:
>
> We’re working on standing up a new Solr 9.4.0 cluster with ZooKeeper 3.8.3 
> ensemble. We’ve configured mTLS for authentication, authorization, and comms 
> for client <-> solr; TLS for solr <-> solr intra-cluster comms, and TLS for 
> zk <-> zk intra-ensemble comms.
>
> Where we are stuck is at the TLS configuration for solr<->zk comms. At least 
> some parts are working since we can configure the url scheme and the 
> security.json file, but when we try to browse the Solr UI to get ZK Status it 
> doesn’t populate with any data. On the ZooKeeper side, we see these errors:
>
> 2023-10-19 16:08:06,403 [myid:] - ERROR 
> [nioEventLoopGroup-7-1:o.a.z.s.NettyServerCnxnFactory$CertificateVerifier@468]
>  - Unsuccessful handshake with session 0x0
>
> From our testing with running `solr zk cp` command (used to upload the 
> security.json file), we’re pretty sure that the problem is that solr isn’t 
> trying to establish a TLS connection to satisfy the ZK Status request.
>
> This ticket states that the TLS configuration works for at least one person, 
> https://issues.apache.org/jira/browse/SOLR-16115 
> , but I can’t find any more 
> documentation about configuring this.
>
> Any hints? Anyone get this working?
>
> Thanks,
>
> --Jamie







Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

2023-10-20 Thread Dmitri Maziuk

On 10/19/23 18:48, David Filip wrote:


My goal is to replicate content from one to the other, so that I can take one 
down (e.g., solr1) and still search current collections (e.g., on solr2).


You need a proxy host, it can be anything from apache to F5, configured 
to pass requests to Solr nodes, based on some criteria.


In the active-passive, blue-green, or whatever you call it, 
configuration, you and don't need zookeeper or anything shared on the 
backend (there is an argument for having the backend nodes fully 
independent).


If you RTFM: see Query Fault Tolerance" in 
https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-distributed-requests.html 
-- even if you use SolrCloud you still need a proxy for what you want 
done. (Unless your client application knows how to talk to zookeper and 
can use it as the proxy.)


As an aside, it's interesting that Apache httpd does not have a 
mod_zookeper among its proxy modules.


Dima



Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

2023-10-20 Thread David Filip
Dima,

Thanks for the reply!  However, this does not quite answer my question, as far 
as I can tell.  I am very familiar with network proxies, and have both Nginx 
proxy (externally facing) and Apache (internal load balancing) on my network.  
I am comfortable with how to distribute search queries across nodes.

My fundamental question — and sorry if this was not clear — is how to I keep 
the indices (collections) in-sync across nodes?  Put another way, if I update 
shard1 on one node, how do I get the other node(s) automatically updated?  The 
goal is to be able to do indexing on a particular node, and have any updates 
propagate across the other nodes, so that the indices (collections) are 
identical (hopefully within a few seconds) across all of the nodes.

Of course, one way is to have a shared filesystem to share the index 
(collection) data files across all of the nodes … but then the shared 
filesystem becomes a single point of failure.

It appears that Solr knows how to replicate the indices (collections) across 
nodes, so that there is no single point of failure.  This is what I am trying 
to figure out.

Thanks,

Dave.

> On Oct 20, 2023, at 11:52 AM, Dmitri Maziuk  wrote:
> 
> On 10/19/23 18:48, David Filip wrote:
> 
>> My goal is to replicate content from one to the other, so that I can take 
>> one down (e.g., solr1) and still search current collections (e.g., on solr2).
> 
> You need a proxy host, it can be anything from apache to F5, configured to 
> pass requests to Solr nodes, based on some criteria.
> 
> In the active-passive, blue-green, or whatever you call it, configuration, 
> you and don't need zookeeper or anything shared on the backend (there is an 
> argument for having the backend nodes fully independent).
> 
> If you RTFM: see Query Fault Tolerance" in 
> https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-distributed-requests.html
>  -- even if you use SolrCloud you still need a proxy for what you want done. 
> (Unless your client application knows how to talk to zookeper and can use it 
> as the proxy.)
> 
> As an aside, it's interesting that Apache httpd does not have a mod_zookeper 
> among its proxy modules.
> 
> Dima
> 



RE: Newbie Help: Replicating Between Two SolrCloud Instances (Solr9.2.1)

2023-10-20 Thread ufuk yılmaz
Hi Dave,

Solr knows how to replicate the index accross nodes like you said, but in order 
to do that all solrcloud nodes should connect to the same Zookeeper cluster, or 
else how could they know about each other?

You can make Zookeeper cluster distributed across N nodes so it’s not a single 
point of failure too.

But if I understand correctly from your first post, you don’t want to use the 
same Zookeeper?

~~ufuk yilmaz

Sent from Mail for Windows

From: David Filip
Sent: Friday, October 20, 2023 7:30 PM
To: users@solr.apache.org
Subject: Re: Newbie Help: Replicating Between Two SolrCloud Instances 
(Solr9.2.1)

Dima,

Thanks for the reply!  However, this does not quite answer my question, as far 
as I can tell.  I am very familiar with network proxies, and have both Nginx 
proxy (externally facing) and Apache (internal load balancing) on my network.  
I am comfortable with how to distribute search queries across nodes.

My fundamental question — and sorry if this was not clear — is how to I keep 
the indices (collections) in-sync across nodes?  Put another way, if I update 
shard1 on one node, how do I get the other node(s) automatically updated?  The 
goal is to be able to do indexing on a particular node, and have any updates 
propagate across the other nodes, so that the indices (collections) are 
identical (hopefully within a few seconds) across all of the nodes.

Of course, one way is to have a shared filesystem to share the index 
(collection) data files across all of the nodes … but then the shared 
filesystem becomes a single point of failure.

It appears that Solr knows how to replicate the indices (collections) across 
nodes, so that there is no single point of failure.  This is what I am trying 
to figure out.

Thanks,

Dave.

> On Oct 20, 2023, at 11:52 AM, Dmitri Maziuk  wrote:
> 
> On 10/19/23 18:48, David Filip wrote:
> 
>> My goal is to replicate content from one to the other, so that I can take 
>> one down (e.g., solr1) and still search current collections (e.g., on solr2).
> 
> You need a proxy host, it can be anything from apache to F5, configured to 
> pass requests to Solr nodes, based on some criteria.
> 
> In the active-passive, blue-green, or whatever you call it, configuration, 
> you and don't need zookeeper or anything shared on the backend (there is an 
> argument for having the backend nodes fully independent).
> 
> If you RTFM: see Query Fault Tolerance" in 
> https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-distributed-requests.html
>  -- even if you use SolrCloud you still need a proxy for what you want done. 
> (Unless your client application knows how to talk to zookeper and can use it 
> as the proxy.)
> 
> As an aside, it's interesting that Apache httpd does not have a mod_zookeper 
> among its proxy modules.
> 
> Dima
> 




Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

2023-10-20 Thread Dmitri Maziuk

On 10/20/23 11:27, David Filip wrote:

Dima,

Thanks for the reply!  However, this does not quite answer my question, as far 
as I can tell.



My fundamental question — and sorry if this was not clear — is how to I keep 
the indices (collections) in-sync across nodes?


If you post updates to the running index, you need zookeper and a 3-node 
cluster. If you have static content (e.g. loaded at dark o'clock from 
external source) with no on-the-fly updates, you can use standalone nodes.


Dima



Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

2023-10-20 Thread Meghan Boyd
>On Thu, Oct 19, 2023 at 7:50 PM David Filip  wrote
>Although from what I’ve read, I think (?) that with SolrCloud that
master/slave concept goes away, and one node now becomes a leader?  But I
may be confused about that ...

You can read about why this change was made in SOLR-14702
. This is a change is
terminology but not in functionality, so if you are trying to equate older
documentation with newer documentation you can keep this in mind. FWIW I
also highly  recommend Marcus Eagan's talk regarding this change from
Haystack 2021: https://www.youtube.com/watch?v=klutmvleVTA


Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr9.2.1)

2023-10-20 Thread David Filip
Thanks - yes, that is what I am trying to do.  However, I am not clear on how 
to add nodes to the Zookeeper included with Solr.

I think the problem I am having is that the zoo.cfg file include with Solr:

$ cat server/solr/zoo.cfg 

Does not ave any ’server.x’ lines, and I’m having trouble trying to figure out 
how to get different Solr nodes to connect to Zookeeper running on a different 
node?  Do I just specify a different Zookeeper node and port when I start Solr 
via a command line parameter (-z {server}:{port})?  So no credentials, and I 
assume network level security?

Perhaps I need to ignore the Zookeeper that comes bundled with Solr and install 
it separately on its own?  Then I am not entirely clear on how I need to 
configure Zookeeper and Solr on each node to work together?

Basically, I’ve found bits and pieces about how to install Solr with its own 
Zookeeper (SolrCloud), and how to install and configure Zookeeper on its own 
(completely separate from anything to do with Solr), and I even found a high 
level page in the Solr documentation that starts with "Although Solr comes 
bundled with Apache ZooKeeper, you should consider yourself discouraged from 
using this internal ZooKeeper in production.”, but it seems a bit light on how 
I configure the “ZooKeeper Ensamble” to work with Solr (I think it assumes I 
have some familiarity with Zookeeper already).

So I probably have read all the bits that I need, and I have probably read more 
bits than I need, some of which I can ignore, but am trying to figure out how 
it all nicely fits together.

Originally yes, I was thinking “If Zookeeper is included with Solr already, why 
do I need to install it separately”, but perhaps that is what is confusing me?

Does all of that make any sense?

Thanks,

Dave.

> On Oct 20, 2023, at 12:36 PM, ufuk yılmaz  wrote:
> 
> Hi Dave,
> 
> Solr knows how to replicate the index accross nodes like you said, but in 
> order to do that all solrcloud nodes should connect to the same Zookeeper 
> cluster, or else how could they know about each other?
> 
> You can make Zookeeper cluster distributed across N nodes so it’s not a 
> single point of failure too.
> 
> But if I understand correctly from your first post, you don’t want to use the 
> same Zookeeper?
> 
> ~~ufuk yilmaz
> 
> Sent from Mail for Windows
> 
> From: David Filip
> Sent: Friday, October 20, 2023 7:30 PM
> To: users@solr.apache.org
> Subject: Re: Newbie Help: Replicating Between Two SolrCloud Instances 
> (Solr9.2.1)
> 
> Dima,
> 
> Thanks for the reply!  However, this does not quite answer my question, as 
> far as I can tell.  I am very familiar with network proxies, and have both 
> Nginx proxy (externally facing) and Apache (internal load balancing) on my 
> network.  I am comfortable with how to distribute search queries across nodes.
> 
> My fundamental question — and sorry if this was not clear — is how to I keep 
> the indices (collections) in-sync across nodes?  Put another way, if I update 
> shard1 on one node, how do I get the other node(s) automatically updated?  
> The goal is to be able to do indexing on a particular node, and have any 
> updates propagate across the other nodes, so that the indices (collections) 
> are identical (hopefully within a few seconds) across all of the nodes.
> 
> Of course, one way is to have a shared filesystem to share the index 
> (collection) data files across all of the nodes … but then the shared 
> filesystem becomes a single point of failure.
> 
> It appears that Solr knows how to replicate the indices (collections) across 
> nodes, so that there is no single point of failure.  This is what I am trying 
> to figure out.
> 
> Thanks,
> 
> Dave.
> 
>> On Oct 20, 2023, at 11:52 AM, Dmitri Maziuk  wrote:
>> 
>> On 10/19/23 18:48, David Filip wrote:
>> 
>>> My goal is to replicate content from one to the other, so that I can take 
>>> one down (e.g., solr1) and still search current collections (e.g., on 
>>> solr2).
>> 
>> You need a proxy host, it can be anything from apache to F5, configured to 
>> pass requests to Solr nodes, based on some criteria.
>> 
>> In the active-passive, blue-green, or whatever you call it, configuration, 
>> you and don't need zookeeper or anything shared on the backend (there is an 
>> argument for having the backend nodes fully independent).
>> 
>> If you RTFM: see Query Fault Tolerance" in 
>> https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-distributed-requests.html
>>  -- even if you use SolrCloud you still need a proxy for what you want done. 
>> (Unless your client application knows how to talk to zookeper and can use it 
>> as the proxy.)
>> 
>> As an aside, it's interesting that Apache httpd does not have a mod_zookeper 
>> among its proxy modules.
>> 
>> Dima
>> 
> 
> 



Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

2023-10-20 Thread Shawn Heisey

On 10/19/23 17:48, David Filip wrote:

I think I am getting confused between differences in Solr versions (most links 
seem to talk about Solr 6, and I’ve installed Solr 9), and SolrCloud vs. 
Standalone, when searching the ’Net …  so I am hoping that someone can point me 
towards what I need to do.  Apologies in advance for perhaps not using the 
correct Solr terminology.

I will describe what I have, and what I want to accomplish, to the best of my 
abilities.

I have installed Solr 9.2.1 on two separate physical nodes (different physical 
computers).  Both are running SolrCloud, and are running with the same 
(duplicate) configuration files.  Both are running their own local zookeeper, 
and are separate cores.  Let’s call them solr1 and solr2.  Right now I can 
index content on and search each one individually, but they do not know about 
each other (which is I think the fundamental problem I am trying to solve).


You need three servers minimum.  In the minimal fault-tolerant setup, 
two of those will run Zookeeper and Solr, the third will only need to 
run Zookeeper.  If the third server does not run Solr, it can be a 
smaller server than the other two.



My goal is to replicate content from one to the other, so that I can take one down 
(e.g., solr1) and still search current collections (e.g., on solr2).  When I run Solr 
Admin web page, I can select: Collections=> {collection}, click on a Shard, and I 
see the [+ add replica] button, but I can’t add a new replica on the “other" node, 
because only the local node appears (e.g., 10.0.x.xxx:8983_solr).  What I think I need 
to do is add the nodes (solr1 and solr2) together (?) so that I can add a new replica 
on the “other” node.


This is an inherent capability of SolrCloud.  One collection consists of 
one or more shards, and each shard consists of one or more replicas. 
When there is more than one replica, one of them will be elected leader.


All the Solr servers must talk to the same ZK ensemble in order to form 
a SolrCloud cluster.  Zookeeper should run as its own process, not the 
embedded ZK server that Solr provides, but dedicated hosts for ZK are 
not required unless the SolrCloud cluster is really big.



I’ve found references that tell me I need an odd number of zookeeper nodes (for 
quorum), so I’m not sure if I want both nodes to share a single zookeeper 
instance?  If I did do that, and let’s say that I pointed solr2 to zookeeper on 
solr1, could I still search against solr2 if solr1 zookeeper was down?  I would 
think not, but I’m not sure.


Here is the situation with ZK ensemble fault tolerance:

2 servers can sustain zero failures.
3 servers can sustain one failure.
4 servers can sustain one failure.
5 servers can sustain two failures.
6 servers can sustain two failures.

Additional note:  In geographically diverse setups, it is not possible 
to have a fault tolerant ZK install with only two datacenters or 
availability zones.  You need three.


This is why an odd number is recommended -- because adding one more node 
does not provide any additional fault tolerance.


If ZK has too many failures, SolrCloud will switch to read-only mode and 
the node you contact will not be aware of other Solr servers going down 
or coming up.


I would recommend that any new Solr install, especially if you want 
fault tolerance, should run SolrCloud, not standalone mode.


Thanks,
Shawn



Re: Remove duplicates in destination of copy field

2023-10-20 Thread Chris Hostetter


copyField -- at a schema level -- is a very low level operation that 
happens at the moment the documents are being added to the index (long 
after he update processor chains are run)

More complicated logic and/around copying values from one field to another 
as part of an update processor chain can be done using the 
CloneFieldUpdateProcessorFactory



-Hoss
http://www.lucidworks.com/


Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

2023-10-20 Thread David Filip
Shawn,

Understand about redundancy and needing an odd number of nodes (I’ve used 
quorum in other (non-Solr) type of clusters, so I get it).

So what I’ve done now is installed ZooKeeper on a separate (physical) node (so 
no longer using ZooKeeper bundled with Solr, since that was causing come 
confusion).

So I’m trying to follow this document regarding how to set up the “Ensemble” 
for Solr:

https://solr.apache.org/guide/6_6/setting-up-an-external-zookeeper-ensemble.html

This includes the following “example” in zoo.cfg:

dataDir=/var/lib/zookeeperdata/1
clientPort=2181
initLimit=5
syncLimit=2
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

So assuming that I have three (3x) physical nodes — each running Solr 9.2 — and 
assuming that they are named:

solr1.mydomain.com 
solr2.mydomain.com 
solr3.mydomain.com 

I am assuming that in zoo.cfg I will have:

server.1=solr1.mydomain.com
server.2=solr2.mydomain.com
server.3=solr3.mydomain.com

But do I also have three (3x) separate zoo.cfg files, or a single zoo.cfg file? 
 This document is kinda inferring - I think? - that I need to create three 
separate copies in (assuming I’ve install ZooKeeper in /opt/zookeeper):

/opt/zookeeper/conf/zoo1.cfg
/opt/zookeeper/conf/zoo2.cfg
/opt/zookeeper/conf/zoo3.cfg

But is there also still just a /opt/zookeeper/conf/zoo.cfg as well?  And do all 
of the configuration files contain the same thing?

I’m not sure if this is more of a ZooKeeper question, or more of a Solr 
question, but I’m a bit confused nonetheless as to how this all fits together?

As far as pointing each Solr instance, on each physical Solr node, to 
ZooKeeper, I am assuming that I just need to start Solr on each with (assuming 
my ZooKeeper node is zookeeper.mydomain.com ):

bin/solr start -e cloud -z zookeeper.mydomain.com:2181 -noprompt

Or is there anything else that I need to do on each Solr node?

Thanks in advance for any clarification.

Regards,

Dave.

> On Oct 20, 2023, at 2:51 PM, Shawn Heisey  wrote:
> 
> On 10/19/23 17:48, David Filip wrote:
>> I think I am getting confused between differences in Solr versions (most 
>> links seem to talk about Solr 6, and I’ve installed Solr 9), and SolrCloud 
>> vs. Standalone, when searching the ’Net …  so I am hoping that someone can 
>> point me towards what I need to do.  Apologies in advance for perhaps not 
>> using the correct Solr terminology.
>> I will describe what I have, and what I want to accomplish, to the best of 
>> my abilities.
>> I have installed Solr 9.2.1 on two separate physical nodes (different 
>> physical computers).  Both are running SolrCloud, and are running with the 
>> same (duplicate) configuration files.  Both are running their own local 
>> zookeeper, and are separate cores.  Let’s call them solr1 and solr2.  Right 
>> now I can index content on and search each one individually, but they do not 
>> know about each other (which is I think the fundamental problem I am trying 
>> to solve).
> 
> You need three servers minimum.  In the minimal fault-tolerant setup, two of 
> those will run Zookeeper and Solr, the third will only need to run Zookeeper. 
>  If the third server does not run Solr, it can be a smaller server than the 
> other two.
> 
>> My goal is to replicate content from one to the other, so that I can take 
>> one down (e.g., solr1) and still search current collections (e.g., on 
>> solr2).  When I run Solr Admin web page, I can select: Collections=> 
>> {collection}, click on a Shard, and I see the [+ add replica] button, but I 
>> can’t add a new replica on the “other" node, because only the local node 
>> appears (e.g., 10.0.x.xxx:8983_solr).  What I think I need to do is add the 
>> nodes (solr1 and solr2) together (?) so that I can add a new replica on the 
>> “other” node.
> 
> This is an inherent capability of SolrCloud.  One collection consists of one 
> or more shards, and each shard consists of one or more replicas. When there 
> is more than one replica, one of them will be elected leader.
> 
> All the Solr servers must talk to the same ZK ensemble in order to form a 
> SolrCloud cluster.  Zookeeper should run as its own process, not the embedded 
> ZK server that Solr provides, but dedicated hosts for ZK are not required 
> unless the SolrCloud cluster is really big.
> 
>> I’ve found references that tell me I need an odd number of zookeeper nodes 
>> (for quorum), so I’m not sure if I want both nodes to share a single 
>> zookeeper instance?  If I did do that, and let’s say that I pointed solr2 to 
>> zookeeper on solr1, could I still search against solr2 if solr1 zookeeper 
>> was down?  I would think not, but I’m not sure.
> 
> Here is the situation with ZK ensemble fault tolerance:
> 
> 2 servers can sustain zero failures.
> 3 servers can sustain o

Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

2023-10-20 Thread Andy C
Hi Dave,

The zoo.cfg does not reference Solr at all.

Each Zookeeper instance has 3 ports of note:

   - The "client port" that accepts requests from external clients (in this
   case Solr)
   - Two ports used for internal zookeeper to zookeeper communication

The client port is configured by the "clientPort" entry in the zoo.cfg. The
"server.N" entries both configure what ports this Zookeeper instance will
listen on for internal communication, and configure what host and ports it
will use to communicate with the other Zookeeper instances. In this
example, there are 3 Zookeeper instances, all running on the same machine
(not a real world case).

So if the current Zookeeper instance is instance "1" (as configured in the
"myid" file), it will listen on ports 2888 and 3888 for internal zk to zk
requests. And expects the other two Zookeeper instances to be running on
localhost listening on ports 2889|3889 and 2890|3890 respectively.

On the Solr side, you have to configure the hostname and 'client port' of
each of the Zookeeper instances.

Hope this helps.





On Fri, Oct 20, 2023 at 3:35 PM David Filip  wrote:

> Shawn,
>
> Understand about redundancy and needing an odd number of nodes (I’ve used
> quorum in other (non-Solr) type of clusters, so I get it).
>
> So what I’ve done now is installed ZooKeeper on a separate (physical) node
> (so no longer using ZooKeeper bundled with Solr, since that was causing
> come confusion).
>
> So I’m trying to follow this document regarding how to set up the
> “Ensemble” for Solr:
>
> Setting Up an External ZooKeeper Ensemble | Apache Solr Reference Guide 6.6
> 
> solr.apache.org
> 
> [image: favicon.ico]
> 
> 
>
> This includes the following “example” in zoo.cfg:
>
> dataDir=/var/lib/zookeeperdata/1clientPort=2181initLimit=5syncLimit=2
> server.1=localhost:2888:3888
> server.2=localhost:2889:3889
> server.3=localhost:2890:3890
>
>
> So assuming that I have three (3x) physical nodes — each running Solr 9.2
> — and assuming that they are named:
>
> solr1.mydomain.com
> solr2.mydomain.com
> solr3.mydomain.com 
>
> I am assuming that in zoo.cfg I will have:
>
> server.1=solr1.mydomain.com
> server.2=solr2.mydomain.com
> server.3=solr3.mydomain.com
>
> But do I also have three (3x) separate zoo.cfg files, or a single zoo.cfg
> file?  This document is kinda inferring - I think? - that I need to create
> three separate copies in (assuming I’ve install ZooKeeper in
> /opt/zookeeper):
>
> /opt/zookeeper/conf/zoo1.cfg
> /opt/zookeeper/conf/zoo2.cfg
> /opt/zookeeper/conf/zoo3.cfg
>
> But is there also still just a /opt/zookeeper/conf/zoo.cfg as well?  And
> do all of the configuration files contain the same thing?
>
> I’m not sure if this is more of a ZooKeeper question, or more of a Solr
> question, but I’m a bit confused nonetheless as to how this all fits
> together?
>
> As far as pointing each Solr instance, on each physical Solr node, to
> ZooKeeper, I am assuming that I just need to start Solr on each with
> (assuming my ZooKeeper node is zookeeper.mydomain.com):
>
> bin/solr start -e cloud -z zookeeper.mydomain.com:2181 -noprompt
>
>
> Or is there anything else that I need to do on each Solr node?
>
> Thanks in advance for any clarification.
>
> Regards,
>
> Dave.
>
> On Oct 20, 2023, at 2:51 PM, Shawn Heisey 
> wrote:
>
> On 10/19/23 17:48, David Filip wrote:
>
> I think I am getting confused between differences in Solr versions (most
> links seem to talk about Solr 6, and I’ve installed Solr 9), and SolrCloud
> vs. Standalone, when searching the ’Net …  so I am hoping that someone can
> point me towards what I need to do.  Apologies in advance for perhaps not
> using the correct Solr terminology.
> I will describe what I have, and what I want to accomplish, to the best of
> my abilities.
> I have installed Solr 9.2.1 on two separate physical nodes (different
> physical computers).  Both are running SolrCloud, and are running with the
> same (duplicate) configuration files.  Both are running their own local
> zookeeper, and are separate cores.  Let’s call them solr1 and solr2.  Right
> now I can index content on and search each one individually, but they do
> not know about each other (which is I think the fundamental problem I am
> trying to solve).
>
>
> You need three servers minimum.  In the minimal fault-tolerant setup, two
> of those will run Zookeeper and Solr, the third will only need to run
> Zookeeper.  If the third server does not run Solr, it can be a smaller
> server than the other two.
>
> My goal is to replicate content from one to the other, so that I can take
> one down (e.g., solr1) and still search current c

Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

2023-10-20 Thread Shawn Heisey

On 10/20/23 13:34, David Filip wrote:
Understand about redundancy and needing an odd number of nodes (I’ve 
used quorum in other (non-Solr) type of clusters, so I get it).


So what I’ve done now is installed ZooKeeper on a separate (physical) 
node (so no longer using ZooKeeper bundled with Solr, since that was 
causing come confusion).


So I’m trying to follow this document regarding how to set up the 
“Ensemble” for Solr:


Setting Up an External ZooKeeper Ensemble | Apache Solr Reference Guide 
6.6 

solr.apache.org 

	favicon.ico 





This includes the following “example” in zoo.cfg:

|dataDir=/var/lib/zookeeperdata/1 clientPort=2181 initLimit=5 
syncLimit=2 server.1=localhost:2888:3888 server.2=localhost:2889:3889 
server.3=localhost:2890:3890|



So assuming that I have three (3x) physical nodes — each running Solr 
9.2 — and assuming that they are named:


solr1.mydomain.com 
solr2.mydomain.com 
solr3.mydomain.com 


This is getting into how to configure ZK, which is a completely separate 
Apache project from Solr.  My info here is from memory.


Those names will only work if each ZK instance is on the same machine as 
a Solr instance.  ZK is completely separate from Solr, you do not tell 
it anything about Solr.  You also need the port numbers.  If ZK will be 
on the same machines as Solr, I would use something like this, and the 
ZK config will be identical on all the ZK servers:


dataDir=/path/to/some/data/directory
clientPort=2181
initLimit=5
syncLimit=2
server.1=solr1.mydomain.com:2888:3888
server.2=solr1.mydomain.com:2888:3888
server.3=solr1.mydomain.com:2888:3888

You must ensure that the ZK servers can talk to each other on tcp ports 
2888 and 3888, and that each Solr server can reach all the ZK servers on 
port 2181.  For most purposes, you do not want to use localhost.


Each server will have a file with its id number.  I think it is named 
"myid" in the data directory, but you should check ZK documentation to 
make sure.


The -z option on the solr script would be something like this:

solr1.mydomain.com:2181,solr2.mydomain.com:2181,solr3.mydomain.com/solr

For redundancy purposes, every Solr server will need to talk to ALL of 
the ZK servers, not just one.


Adding a chroot (which is /solr in my example) is encouraged just in 
case you might want to use your ZK install to coordinate software other 
than Solr or for multiple SolrCloud clusters.  The Solr reference guide 
has info about how to create the chroot with a 'bin/solr zk' command.


Thanks,
Shawn



[Operator] [ANNOUNCE] Apache Solr Operator v0.8.0 released

2023-10-20 Thread Jason Gerlowski
The Apache Solr PMC is pleased to announce the release of the Apache
Solr Operator v0.8.0.

The Apache Solr Operator is a safe and easy way of managing a Solr
ecosystem in Kubernetes.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below. The release is
available for immediate download at:

  

### Solr Operator v0.8.0 Release Highlights:

* The minimum supported version of Solr has been set to Solr 8.11
* The minimum Kubernetes version supported is now v1.22
* Managed scale up and scale down are now supported for SolrClouds.
* By default, when scaling down a SolrCloud, replicas will be
migrated off Pods before they are deleted.
* By default, when scaling up a SolrCloud, replicas will be
balanced across all Pods after the SolrCloud has been scaled up. (Only
supported for Solr 9.3+)
* SSL bugs with Solr 9 have been fixed, and v0.8.0 will successfully
support SSL for Solr 8.11 and 9.4+
* Solr 8.11 features are now supported by default, such as
maxBooleanClauses, metrics disabling, health endpoint for
readinessCheck
* Keystore/Truststore passwords can be explicitly set in the SolrCloud
CRD for mountedDir SSL. This enables the use of the CertManager CSI
Driver with Solr.
* Rolling Updates for SolrClouds using ephemeral storage are now safer
and replicas are balanced at the end of the operation to ensure
optimal resource utilization.
* Replica balancing is only supported when Solr 9.3+ is used.

A summary of important changes is published in the documentation at:

  

For the most exhaustive list, see the change log on ArtifactHub or
view the git history in the solr-operator repo.

  


  


Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

2023-10-20 Thread David Filip
Shawn,

Thanks for this.  I will try dig further into the ZooKeeper documentation.

From a matter of perspective, however, I think what I am not clear on is having 
more than one ZK “server”, and when and why I would need more than one?

Perhaps it is just terminology, but if I have three (3x) Solr instances (cores) 
running on three (3x) separate physical servers (different hardware), and I 
want to replicate shards between those three, do I have all three (3x) Solr 
instances (cores) taking to the same single (1x) ZooKeeper “server"?

Or if I have three (3x) Solr instances (cores) replicating shards between them, 
do I also need three (3x) ZooKeeper “servers”, e.g., server.1, server.2, 
server.3, each “server” assigned to one specific Solr instance (core)?

So while I understand this the might not be place to talk about configuring 
ZooKeeper per se, if its not too much trouble, can you please clarify if there 
is a many-to-one relationship between Solr and ZooKeeper (many Solr cores 
talking to one ZooKeeper “server”, which communicates between them), or there a 
one-to-one relationship (each Solr instance (core) talks to one ZooKeeper 
“server”).

I hope that is clear and an easy question to answer?  Once I understand that, I 
think I can figure this out with what I have found and been given.

Thanks,

Dave.

> On Oct 20, 2023, at 4:47 PM, Shawn Heisey  
> wrote:
> 
> On 10/20/23 13:34, David Filip wrote:
>> Understand about redundancy and needing an odd number of nodes (I’ve used 
>> quorum in other (non-Solr) type of clusters, so I get it).
>> So what I’ve done now is installed ZooKeeper on a separate (physical) node 
>> (so no longer using ZooKeeper bundled with Solr, since that was causing come 
>> confusion).
>> So I’m trying to follow this document regarding how to set up the “Ensemble” 
>> for Solr:
>> Setting Up an External ZooKeeper Ensemble | Apache Solr Reference Guide 6.6 
>> 
>> solr.apache.org 
>> 
>>  favicon.ico 
>> 
>> 
>> This includes the following “example” in zoo.cfg:
>> |dataDir=/var/lib/zookeeperdata/1 clientPort=2181 initLimit=5 syncLimit=2 
>> server.1=localhost:2888:3888 server.2=localhost:2889:3889 
>> server.3=localhost:2890:3890|
>> So assuming that I have three (3x) physical nodes — each running Solr 9.2 — 
>> and assuming that they are named:
>> solr1.mydomain.com 
>> solr2.mydomain.com 
>> solr3.mydomain.com 
> 
> This is getting into how to configure ZK, which is a completely separate 
> Apache project from Solr.  My info here is from memory.
> 
> Those names will only work if each ZK instance is on the same machine as a 
> Solr instance.  ZK is completely separate from Solr, you do not tell it 
> anything about Solr.  You also need the port numbers.  If ZK will be on the 
> same machines as Solr, I would use something like this, and the ZK config 
> will be identical on all the ZK servers:
> 
> dataDir=/path/to/some/data/directory
> clientPort=2181
> initLimit=5
> syncLimit=2
> server.1=solr1.mydomain.com:2888:3888
> server.2=solr1.mydomain.com:2888:3888
> server.3=solr1.mydomain.com:2888:3888
> 
> You must ensure that the ZK servers can talk to each other on tcp ports 2888 
> and 3888, and that each Solr server can reach all the ZK servers on port 
> 2181.  For most purposes, you do not want to use localhost.
> 
> Each server will have a file with its id number.  I think it is named "myid" 
> in the data directory, but you should check ZK documentation to make sure.
> 
> The -z option on the solr script would be something like this:
> 
> solr1.mydomain.com:2181,solr2.mydomain.com:2181,solr3.mydomain.com/solr
> 
> For redundancy purposes, every Solr server will need to talk to ALL of the ZK 
> servers, not just one.
> 
> Adding a chroot (which is /solr in my example) is encouraged just in case you 
> might want to use your ZK install to coordinate software other than Solr or 
> for multiple SolrCloud clusters.  The Solr reference guide has info about how 
> to create the chroot with a 'bin/solr zk' command.
> 
> Thanks,
> Shawn
> 



RE: Newbie Help: Replicating Between Two SolrCloud Instances (Solr9.2.1)

2023-10-20 Thread ufuk yılmaz
To answer simply, Zookeeper and Solr are just two different distibuted 
applications. Zookeeper is used in a lot of different places where 
synchronization between distributed systems is needed. If you need to 
coordinate data between your own application instances you can use it too.

SolrCloud is a distributed system too, it just happens to be using Zookeeper 
for coordination between it’s nodes, I guess it could have used something else 
instead if Solr developers wanted to.

So fault tolerance of Zookeeper and SolrCloud are two different subjects. It’s 
like your database and your application. You can distribute your database to 
many different places to have fault tolerance. You also distribute your 
application. These two are just different matters. You can use a single 
Zookeeper node and connect all of your Solr nodes to it if you wished to do so. 
But then single Zookeeper node could be a point of failure since if it goes 
down your SolrCloud nodes would lose the ability to synchronize state until it 
comes back up.

So Solr is like your application and Zookeeper is like your database.

Solr comes bundled with a Zookeeper because people would like to try out 
SolrCloud specific features without going through the hassle of setting up 
Zookeeper. It’s just there to start Solr in cloud mode to play with it.

I’d also recommend to stick to a single Solr version’s documentation, like if 
you are installing and using Solr version 9.1, always stick to the 
documentation version 9.1, because older docs may have conflicting/outdated 
information in relation to the version you are using. Google takes you to a 
different doc version whenever you search for something, so be careful to stick 
to the correct documentation version. Nowadays version 6.x is pretty outdated.

If you are using Docker it’s pretty easy to set up an N nodes Zookeeper cluster 
and M nodes SolrCloud cluster. I can share an example docker-compose file if 
you wish.

I hope this was useful

--ufuk yilmaz

Sent from Mail for Windows

From: David Filip
Sent: Saturday, October 21, 2023 2:24 AM
To: users@solr.apache.org
Subject: Re: Newbie Help: Replicating Between Two SolrCloud Instances 
(Solr9.2.1)

Shawn,

Thanks for this.  I will try dig further into the ZooKeeper documentation.

>From a matter of perspective, however, I think what I am not clear on is 
>having more than one ZK “server”, and when and why I would need more than one?

Perhaps it is just terminology, but if I have three (3x) Solr instances (cores) 
running on three (3x) separate physical servers (different hardware), and I 
want to replicate shards between those three, do I have all three (3x) Solr 
instances (cores) taking to the same single (1x) ZooKeeper “server"?

Or if I have three (3x) Solr instances (cores) replicating shards between them, 
do I also need three (3x) ZooKeeper “servers”, e.g., server.1, server.2, 
server.3, each “server” assigned to one specific Solr instance (core)?

So while I understand this the might not be place to talk about configuring 
ZooKeeper per se, if its not too much trouble, can you please clarify if there 
is a many-to-one relationship between Solr and ZooKeeper (many Solr cores 
talking to one ZooKeeper “server”, which communicates between them), or there a 
one-to-one relationship (each Solr instance (core) talks to one ZooKeeper 
“server”).

I hope that is clear and an easy question to answer?  Once I understand that, I 
think I can figure this out with what I have found and been given.

Thanks,

Dave.

> On Oct 20, 2023, at 4:47 PM, Shawn Heisey  
> wrote:
> 
> On 10/20/23 13:34, David Filip wrote:
>> Understand about redundancy and needing an odd number of nodes (I’ve used 
>> quorum in other (non-Solr) type of clusters, so I get it).
>> So what I’ve done now is installed ZooKeeper on a separate (physical) node 
>> (so no longer using ZooKeeper bundled with Solr, since that was causing come 
>> confusion).
>> So I’m trying to follow this document regarding how to set up the “Ensemble” 
>> for Solr:
>> Setting Up an External ZooKeeper Ensemble | Apache Solr Reference Guide 6.6 
>> 
>> solr.apache.org 
>> 
>>  favicon.ico 
>> 
>> 
>> This includes the following “example” in zoo.cfg:
>> |dataDir=/var/lib/zookeeperdata/1 clientPort=2181 initLimit=5 syncLimit=2 
>> server.1=localhost:2888:3888 server.2=localhost:2889:3889 
>> server.3=localhost:2890:3890|
>> So assuming that I have three (3x) physical nodes — each running Solr 9.2 — 
>> and assuming that they are named:
>> solr1.mydomain.com 
>> solr2.mydomain.com 
>> solr3.mydomain.com 

Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

2023-10-20 Thread Dmitri Maziuk

On 10/20/23 18:23, David Filip wrote:

Shawn,

Thanks for this.  I will try dig further into the ZooKeeper documentation.

 From a matter of perspective, however, I think what I am not clear on is 
having more than one ZK “server”, and when and why I would need more than one?


Solr doesn't talk to ZK "server", it talks to to ZK "cluster". You need 
3 ZK servers to have a working ZK cluster.


You can have 1 or more Solr instances talking to ZK cluster.

Dima



Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

2023-10-20 Thread David Filip
Thanks - I think I understand now.

> On Oct 20, 2023, at 7:51 PM, Dmitri Maziuk  wrote:
> 
> On 10/20/23 18:23, David Filip wrote:
>> Shawn,
>> Thanks for this.  I will try dig further into the ZooKeeper documentation.
>> From a matter of perspective, however, I think what I am not clear on is 
>> having more than one ZK “server”, and when and why I would need more than 
>> one?
> 
> Solr doesn't talk to ZK "server", it talks to to ZK "cluster". You need 3 ZK 
> servers to have a working ZK cluster.
> 
> You can have 1 or more Solr instances talking to ZK cluster.
> 
> Dima
> 



Re: Newbie Help: Replicating Between Two SolrCloud Instances (Solr 9.2.1)

2023-10-20 Thread Dmitri Maziuk

On 10/20/23 18:52, David Filip wrote:

Thanks - I think I understand now.



That whole "cloud" thing is great, I'm sure, but honestly: if I don't 
have the load to balance or servers that fall down and cry every time a 
mouse moves, I have hard time figuring out why spend time and effort.


Unless you have access to a container infra and can use solr-operator: 
that should handle all those cloud details for you and you get the best 
of all worlds.


Dima