Get older solr releases

2021-03-10 Thread jman



Hi,

I usually install solr by manually downloading tarballs from mirrors. I
don't always want to install the latest version when provisioning a
new server. I upgrade after a manual overview of the new release.

However, I've tested a couple of mirrors [0] and the main distribution
point [1]: often I only find the latest version so in the end it's a
bit complicated to reproduce the same installation on multiple servers.

Is there a policy for mirrors to retain old versions?

Besides mirroring versions I install on a local site, are there other
options?

thanks

[0] https://www.apache.org/mirrors/
[1] https://downloads.apache.org/lucene/solr/#mirror


Re: Get older solr releases

2021-03-10 Thread Christian Ortner
Hi,

Several older versions of Solr are available as Docker images:
https://hub.docker.com/_/solr/

If using Docker is not an option, you can still extract the Solr
distribution from those images by starting a container and copying the
relevant path recursively to the host system.

Cheers.

On Wed, Mar 10, 2021, 11:37 jman  wrote:

>
> Hi,
>
> I usually install solr by manually downloading tarballs from mirrors. I
> don't always want to install the latest version when provisioning a
> new server. I upgrade after a manual overview of the new release.
>
> However, I've tested a couple of mirrors [0] and the main distribution
> point [1]: often I only find the latest version so in the end it's a
> bit complicated to reproduce the same installation on multiple servers.
>
> Is there a policy for mirrors to retain old versions?
>
> Besides mirroring versions I install on a local site, are there other
> options?
>
> thanks
>
> [0] https://www.apache.org/mirrors/
> [1] https://downloads.apache.org/lucene/solr/#mirror
>


size of object created in each solr shard for a given facet.limit

2021-03-10 Thread Vijay Tiwary
Hello Team

I have Solr cloud 5.4.1 set up having a collection with 4 shards &
replication factor of 2
If i run a solr query with facet on a field & fact.limit as 1000 and rows=0
how many java objects corresponding to facet gets created in each shard for
following scenario
1) If total no of unique terms for the facet field in question for the
above query is 5000
2) If total no of unique terms for the facet field in question for the
above query is 100



Regards
Vijay


Re: Get older solr releases

2021-03-10 Thread Markus Jelsma
Hello,

All ASF project releases are permanently available at the archive:
http://archive.apache.org/dist/lucene/solr/

Future versions of Solr are probably here:
http://archive.apache.org/dist/solr/

Regards,
Markus

Op wo 10 mrt. 2021 om 11:41 schreef Christian Ortner :

> Hi,
>
> Several older versions of Solr are available as Docker images:
> https://hub.docker.com/_/solr/
>
> If using Docker is not an option, you can still extract the Solr
> distribution from those images by starting a container and copying the
> relevant path recursively to the host system.
>
> Cheers.
>
> On Wed, Mar 10, 2021, 11:37 jman  wrote:
>
> >
> > Hi,
> >
> > I usually install solr by manually downloading tarballs from mirrors. I
> > don't always want to install the latest version when provisioning a
> > new server. I upgrade after a manual overview of the new release.
> >
> > However, I've tested a couple of mirrors [0] and the main distribution
> > point [1]: often I only find the latest version so in the end it's a
> > bit complicated to reproduce the same installation on multiple servers.
> >
> > Is there a policy for mirrors to retain old versions?
> >
> > Besides mirroring versions I install on a local site, are there other
> > options?
> >
> > thanks
> >
> > [0] https://www.apache.org/mirrors/
> > [1] https://downloads.apache.org/lucene/solr/#mirror
> >
>


RE: Idle timeout expired and Early Client Disconnect errors

2021-03-10 Thread ufuk yılmaz
If I understand correctly, this ticket is about registering a new, custom 
expression. SolrClientCache and CloudSolrStream are more like backbone classes 
working behind every streaming expression. Is it really possible to modify them 
this way?

Sent from Mail for Windows 10

From: Joel Bernstein
Sent: 08 March 2021 22:02
To: users@solr.apache.org
Subject: Re: Idle timeout expired and Early Client Disconnect errors

This ticket shows how it is done in the solrconfig.xml:

https://issues.apache.org/jira/browse/SOLR-9103



Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Mar 8, 2021 at 9:18 AM ufuk yılmaz 
wrote:

> How do you “register” something like a CloudSolrStream btw? Using Blob
> Store API?
>
> Sent from Mail for Windows 10
>
> From: Susmit
> Sent: 06 March 2021 23:03
> To: users@solr.apache.org
> Subject: Re: Idle timeout expired and Early Client Disconnect errors
>
> better to use solr 8.9 and configure http timeouts from solr.in.sh
> workaround is bigger - need to extend cloudsolrstream , register it and
> install custom solrclientcache with overridden setcontext method
>
> Sent from my iPhone
>
> > On Mar 6, 2021, at 9:25 AM, ufuk yılmaz 
> wrote:
> >
> > How? O_O
> >
> > Sent from Mail for Windows 10
> >
> > From: Susmit
> > Sent: 06 March 2021 18:35
> > To: solr-u...@lucene.apache.org
> > Subject: Re: Idle timeout expired and Early Client Disconnect errors
> >
> > i have used a workaround to increase the default (hard coded) timeout of
> 2 min in solrclientcache.
> > i can run 9+ hour long streaming queries with no issues.
> >
> > Sent from my iPhone
> >
> >> On Mar 2, 2021, at 5:32 PM, ufuk yılmaz 
> wrote:
> >>
> >> I divided the query to 1000 pieces and removed the parallel stream
> clause, it seems to be working without timeout so far, if it does I just
> can divide it to even smaller pieces I guess.
> >>
> >> I tried to send all 1000 pieces in a “list” expression to be executed
> linearly, it didn’t work but I was just curious if it could handle such a
> large query 😃
> >>
> >> Now I’m just generating expression strings from java code and sending
> them one by one. I tried to use SolrJ for this, but encountered a weird
> problem where even the simplest expression (echo) stops working after a few
> iterations in a loop. I’m guessing the underlying HttpClient is not closing
> connections timely, hitting the OS per-host connection limit. I asked a
> separate question about this. I was following the example on lucidworks:
> https://lucidworks.com/post/streaming-expressions-in-solrj/
> >>
> >> I just modified my code to use regular REST calls using okhttp3, it’s a
> shame that I couldn’t use SolrJ since it truly streams every result 1 by 1
> continuously. REST just returns a single large response at the very end of
> the stream.
> >>
> >> Thanks again for your help.
> >>
> >> Sent from Mail for Windows 10
> >>
> >> From: Joel Bernstein
> >> Sent: 02 March 2021 00:19
> >> To: solr-u...@lucene.apache.org
> >> Subject: Re: Idle timeout expired and Early Client Disconnect errors
> >>
> >> Also the parallel function builds hash partitioning filters that could
> lead
> >> to timeouts if they take too long to build. Try the query without the
> >> parallel function if you're still getting timeouts when making the query
> >> smaller.
> >>
> >>
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >>
>  On Mon, Mar 1, 2021 at 4:03 PM Joel Bernstein 
> wrote:
> >>>
> >>> The settings in your version are 30 seconds and 15 seconds for socket
> and
> >>> connection timeouts.
> >>>
> >>> Typically timeouts occur because one or more shards in the query are
> idle
> >>> beyond the timeout threshold. This happens because lot's of data is
> being
> >>> read from other shards.
> >>>
> >>> Breaking the query into small parts would be a good strategy.
> >>>
> >>>
> >>>
> >>>
> >>> Joel Bernstein
> >>> http://joelsolr.blogspot.com/
> >>>
> >>>
> >>> On Mon, Mar 1, 2021 at 3:30 PM ufuk yılmaz  >
> >>> wrote:
> >>>
>  Hello Mr. Bernstein,
> 
>  I’m using version 8.4. So, if I understand correctly, I can’t increase
>  timeouts and they are bound to happen in such a large stream. Should
> I just
>  reduce the output of my search expressions?
> 
>  Maybe I can split my search results into ~100 parts and run the same
>  query 100 times in series. Each part would emit ~3M documents so they
>  should finish before timeout?
> 
>  Is this a reasonable solution?
> 
>  Btw how long is the default hard-coded timeout value? Because
> yesterday I
>  ran another query which took more than 1 hour without any timeouts and
>  finished successfully.
> 
>  Sent from Mail for Windows 10
> 
>  From: Joel Bernstein
>  Sent: 01 March 2021 23:03
>  To: solr-u...@lucene.apache.org
>  Subject: Re: Idle timeout expired and Early Client Disconnect errors
> 
>  Oh wait, I misread your email. The idle timeout issue is configura

Re: Elevation in dataDir in Solr Cloud

2021-03-10 Thread Bruno Roustant
> even if we could load the elevate.xml file from the data
folder in Cloud (or changing it directly in zk), does this make sense in
terms of performance? We have frequent commits and a big elevate.xml file.

In terms of performance a big elevate.xml should not be an issue. The
structure used for query matching is a tree-like and supports a large
number of elevation rules even for subset matching (though it uses more
memory). Do you use exact or subset matching?
I'm going to fix this elevate-in-data-dir issue in branch 8.9.

Le ven. 19 févr. 2021 à 16:39, Mónica Marrero 
a écrit :

> Thank you! I just filed the bug in Jira:
> https://issues.apache.org/jira/browse/SOLR-15170
>
> About the workaround you mentioned, we ran a quick test on one server and
> it apparently worked, but we did not check it properly in a cluster (we
> decided that it is better not to go with this in production anyway). Just
> out of curiosity, even if we could load the elevate.xml file from the data
> folder in Cloud (or changing it directly in zk), does this make sense in
> terms of performance? We have frequent commits and a big elevate.xml file.
>
> We are considering your suggestion of using the elevation directly in the
> queries (I have seen work to improve this by removing the requirement of
> having an elevate.xml file at all). It seems to be straightforward to apply
> in some cases, but not so much when you need normalization.
>
> --
> Disclaimer: This email and any files transmitted with it are confidential
> and intended solely for the use of the individual or entity to whom they
> are
> addressed. If you have received this email in error please notify the
> system manager. If you are not the named addressee you should not
> disseminate,
> distribute or copy this email. Please notify the sender
> immediately by email if you have received this email by mistake and delete
> this email from your
> system.
>


Solr not distributing search requests among replicas

2021-03-10 Thread Jan Høydahl
Hi,

A client has a SolrCloud 8.4 setup with two nodes, and one collection with one 
shard and replicationFactor=2.
Of course we want search traffic to be evenly distributed between the two 
replicas.
The client is using plain HTTP requests, no SolrJ or anything fancy, and sends 
all requests to one of the two nodes.
I was expecting Solr to forward about 50% of those requests to the other 
replica, but it is serving them all locally.

I know we can setup an LB in front or re-program the client to do round robin, 
but that is not my question.
Is the select-random-replica logic only active when we have a sharded 
oollection, and not for a single-shard?

Jan

Re: Solr not distributing search requests among replicas

2021-03-10 Thread Mike Drob
I believe a server will always try to prefer local cores. Can you do an
experiment with 3 nodes, and send http queries to the node not hosting any
replicas? That should confirm the balanced distribution.

If you have multiple shards, the receiving server will forward the requests
for shards it doesn’t have, but would still prefer local shards when they
are available.

On Wed, Mar 10, 2021 at 8:00 AM Jan Høydahl  wrote:

> Hi,
>
> A client has a SolrCloud 8.4 setup with two nodes, and one collection with
> one shard and replicationFactor=2.
> Of course we want search traffic to be evenly distributed between the two
> replicas.
> The client is using plain HTTP requests, no SolrJ or anything fancy, and
> sends all requests to one of the two nodes.
> I was expecting Solr to forward about 50% of those requests to the other
> replica, but it is serving them all locally.
>
> I know we can setup an LB in front or re-program the client to do round
> robin, but that is not my question.
> Is the select-random-replica logic only active when we have a sharded
> oollection, and not for a single-shard?
>
> Jan


Re: Solr not distributing search requests among replicas

2021-03-10 Thread Houston Putman
I could be wrong, but i dont think preferLocalShards is the default in
multi-shard use cases.

On Wed, Mar 10, 2021 at 9:07 AM Mike Drob  wrote:

> I believe a server will always try to prefer local cores. Can you do an
> experiment with 3 nodes, and send http queries to the node not hosting any
> replicas? That should confirm the balanced distribution.
>
> If you have multiple shards, the receiving server will forward the requests
> for shards it doesn’t have, but would still prefer local shards when they
> are available.
>
> On Wed, Mar 10, 2021 at 8:00 AM Jan Høydahl  wrote:
>
> > Hi,
> >
> > A client has a SolrCloud 8.4 setup with two nodes, and one collection
> with
> > one shard and replicationFactor=2.
> > Of course we want search traffic to be evenly distributed between the two
> > replicas.
> > The client is using plain HTTP requests, no SolrJ or anything fancy, and
> > sends all requests to one of the two nodes.
> > I was expecting Solr to forward about 50% of those requests to the other
> > replica, but it is serving them all locally.
> >
> > I know we can setup an LB in front or re-program the client to do round
> > robin, but that is not my question.
> > Is the select-random-replica logic only active when we have a sharded
> > oollection, and not for a single-shard?
> >
> > Jan
>


Re: Unable to see features and models uploaded to SOLR LTR

2021-03-10 Thread Lazar Kovacevic
Yes, i enable LTR and reload the core..

On Wed, Mar 10, 2021, 02:33 Diego Ceccarelli 
wrote:

> also you might need to reload the core to see the features / models loaded:
>
> https://solr.apache.org/guide/8_5/coreadmin-api.html see RELOAD
>
> On Wed, Mar 10, 2021, 06:33 Jörn Franke  wrote:
>
> > Do you you enable ltr at Solr startup ?
> >
> >  -Dsolr.ltr.enabled=true
> >
> > > Am 10.03.2021 um 04:23 schrieb Lazar Kovacevic :
> > >
> > > Issue is exactly the same as this one reported on docker-solr project:
> > >
> > >
> > >
> > > https://github.com/docker-solr/docker-solr/issues/335
> > >
> > >
> > >
> > > I uploaded the feature store and model, following the official
> > > documentation, with no errors reported, nor are there errors in log
> > files,
> > > and yet features are not visible through solr web interface, nor LTR
> > query
> > > commands return any features.
> > >
> > >
> > > Environment:
> > >
> > > Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-48-generic x86_64)
> > >
> > > openjdk 11.0.9.1 2020-11-04
> > > OpenJDK Runtime Environment (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04)
> > > OpenJDK 64-Bit Server VM (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04, mixed
> > > mode, sharing)
> > >
> > > solr-8.7.0
> > >
> > >
> > > Lazar
> >
>


Re: Solr not distributing search requests among replicas

2021-03-10 Thread Michael Gibney
You say not "anything fancy" -- depending on how you define "fancy", if you
have an explicit `shards.preference` param, based on the version you're
running (8.4) you might also take a look at
https://issues.apache.org/jira/browse/SOLR-14471. (If SOLR-14471 is the
problem, removing the explicit `shards.preference` param should restore
default "shuffling" routing).

I haven't dug too deep, but it looks like for 8.4 preferLocalShards
actually defaults to false? I might be missing something though:
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85



On Wed, Mar 10, 2021 at 9:10 AM Houston Putman 
wrote:

> I could be wrong, but i dont think preferLocalShards is the default in
> multi-shard use cases.
>
> On Wed, Mar 10, 2021 at 9:07 AM Mike Drob  wrote:
>
> > I believe a server will always try to prefer local cores. Can you do an
> > experiment with 3 nodes, and send http queries to the node not hosting
> any
> > replicas? That should confirm the balanced distribution.
> >
> > If you have multiple shards, the receiving server will forward the
> requests
> > for shards it doesn’t have, but would still prefer local shards when they
> > are available.
> >
> > On Wed, Mar 10, 2021 at 8:00 AM Jan Høydahl 
> wrote:
> >
> > > Hi,
> > >
> > > A client has a SolrCloud 8.4 setup with two nodes, and one collection
> > with
> > > one shard and replicationFactor=2.
> > > Of course we want search traffic to be evenly distributed between the
> two
> > > replicas.
> > > The client is using plain HTTP requests, no SolrJ or anything fancy,
> and
> > > sends all requests to one of the two nodes.
> > > I was expecting Solr to forward about 50% of those requests to the
> other
> > > replica, but it is serving them all locally.
> > >
> > > I know we can setup an LB in front or re-program the client to do round
> > > robin, but that is not my question.
> > > Is the select-random-replica logic only active when we have a sharded
> > > oollection, and not for a single-shard?
> > >
> > > Jan
> >
>


Re: Solr not distributing search requests among replicas

2021-03-10 Thread Jan Høydahl
We have not set any shard.preference, and I also think preferLocal defaults to 
false, i.e random

Earlier we had 2 shares for the same collection (both existed on both nodes) 
and then requests were distributed to both nodes. That’s why, when we went to 1 
shard, I was wondering if the “single-shard” code path perhaps never attempts 
to utilize replicas?? But have not looked in code yet.

Guess next step is to setup a small local test cluster and see what happens.

Jan Høydahl

> 10. mar. 2021 kl. 15:46 skrev Michael Gibney :
> 
> You say not "anything fancy" -- depending on how you define "fancy", if you
> have an explicit `shards.preference` param, based on the version you're
> running (8.4) you might also take a look at
> https://issues.apache.org/jira/browse/SOLR-14471. (If SOLR-14471 is the
> problem, removing the explicit `shards.preference` param should restore
> default "shuffling" routing).
> 
> I haven't dug too deep, but it looks like for 8.4 preferLocalShards
> actually defaults to false? I might be missing something though:
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85
> 
> 
> 
>> On Wed, Mar 10, 2021 at 9:10 AM Houston Putman 
>> wrote:
>> 
>> I could be wrong, but i dont think preferLocalShards is the default in
>> multi-shard use cases.
>> 
>>> On Wed, Mar 10, 2021 at 9:07 AM Mike Drob  wrote:
>>> 
>>> I believe a server will always try to prefer local cores. Can you do an
>>> experiment with 3 nodes, and send http queries to the node not hosting
>> any
>>> replicas? That should confirm the balanced distribution.
>>> 
>>> If you have multiple shards, the receiving server will forward the
>> requests
>>> for shards it doesn’t have, but would still prefer local shards when they
>>> are available.
>>> 
>>> On Wed, Mar 10, 2021 at 8:00 AM Jan Høydahl 
>> wrote:
>>> 
 Hi,
 
 A client has a SolrCloud 8.4 setup with two nodes, and one collection
>>> with
 one shard and replicationFactor=2.
 Of course we want search traffic to be evenly distributed between the
>> two
 replicas.
 The client is using plain HTTP requests, no SolrJ or anything fancy,
>> and
 sends all requests to one of the two nodes.
 I was expecting Solr to forward about 50% of those requests to the
>> other
 replica, but it is serving them all locally.
 
 I know we can setup an LB in front or re-program the client to do round
 robin, but that is not my question.
 Is the select-random-replica logic only active when we have a sharded
 oollection, and not for a single-shard?
 
 Jan
>>> 
>> 


Re: Solr not distributing search requests among replicas

2021-03-10 Thread Michael Gibney
Ah, I missed "single shard" ... this looks relevant:
https://issues.apache.org/jira/browse/SOLR-12217

On Wed, Mar 10, 2021 at 12:43 PM Jan Høydahl  wrote:

> We have not set any shard.preference, and I also think preferLocal
> defaults to false, i.e random
>
> Earlier we had 2 shares for the same collection (both existed on both
> nodes) and then requests were distributed to both nodes. That’s why, when
> we went to 1 shard, I was wondering if the “single-shard” code path perhaps
> never attempts to utilize replicas?? But have not looked in code yet.
>
> Guess next step is to setup a small local test cluster and see what
> happens.
>
> Jan Høydahl
>
> > 10. mar. 2021 kl. 15:46 skrev Michael Gibney  >:
> >
> > You say not "anything fancy" -- depending on how you define "fancy", if
> you
> > have an explicit `shards.preference` param, based on the version you're
> > running (8.4) you might also take a look at
> > https://issues.apache.org/jira/browse/SOLR-14471. (If SOLR-14471 is the
> > problem, removing the explicit `shards.preference` param should restore
> > default "shuffling" routing).
> >
> > I haven't dug too deep, but it looks like for 8.4 preferLocalShards
> > actually defaults to false? I might be missing something though:
> >
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85
> >
> >
> >
> >> On Wed, Mar 10, 2021 at 9:10 AM Houston Putman  >
> >> wrote:
> >>
> >> I could be wrong, but i dont think preferLocalShards is the default in
> >> multi-shard use cases.
> >>
> >>> On Wed, Mar 10, 2021 at 9:07 AM Mike Drob  wrote:
> >>>
> >>> I believe a server will always try to prefer local cores. Can you do an
> >>> experiment with 3 nodes, and send http queries to the node not hosting
> >> any
> >>> replicas? That should confirm the balanced distribution.
> >>>
> >>> If you have multiple shards, the receiving server will forward the
> >> requests
> >>> for shards it doesn’t have, but would still prefer local shards when
> they
> >>> are available.
> >>>
> >>> On Wed, Mar 10, 2021 at 8:00 AM Jan Høydahl 
> >> wrote:
> >>>
>  Hi,
> 
>  A client has a SolrCloud 8.4 setup with two nodes, and one collection
> >>> with
>  one shard and replicationFactor=2.
>  Of course we want search traffic to be evenly distributed between the
> >> two
>  replicas.
>  The client is using plain HTTP requests, no SolrJ or anything fancy,
> >> and
>  sends all requests to one of the two nodes.
>  I was expecting Solr to forward about 50% of those requests to the
> >> other
>  replica, but it is serving them all locally.
> 
>  I know we can setup an LB in front or re-program the client to do
> round
>  robin, but that is not my question.
>  Is the select-random-replica logic only active when we have a sharded
>  oollection, and not for a single-shard?
> 
>  Jan
> >>>
> >>
>


Re: Solr not distributing search requests among replicas

2021-03-10 Thread Christine Poerschke (BLOOMBERG/ LONDON)
The shortCircuit parameter might explain it?

https://github.com/apache/solr/blob/releases/lucene-solr%2F8.4.0/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java#L401

Christine

From: users@solr.apache.org At: 03/10/21 17:42:46To:  users@solr.apache.org
Subject: Re: Solr not distributing search requests among replicas

We have not set any shard.preference, and I also think preferLocal defaults to 
false, i.e random

Earlier we had 2 shares for the same collection (both existed on both nodes) 
and then requests were distributed to both nodes. That’s why, when we went to 1 
shard, I was wondering if the “single-shard” code path perhaps never attempts 
to utilize replicas?? But have not looked in code yet.

Guess next step is to setup a small local test cluster and see what happens.

Jan Høydahl

> 10. mar. 2021 kl. 15:46 skrev Michael Gibney :
> 
> You say not "anything fancy" -- depending on how you define "fancy", if you
> have an explicit `shards.preference` param, based on the version you're
> running (8.4) you might also take a look at
> https://issues.apache.org/jira/browse/SOLR-14471. (If SOLR-14471 is the
> problem, removing the explicit `shards.preference` param should restore
> default "shuffling" routing).
> 
> I haven't dug too deep, but it looks like for 8.4 preferLocalShards
> actually defaults to false? I might be missing something though:
> 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj
/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGene
rator.java#L85
> 
> 
> 
>> On Wed, Mar 10, 2021 at 9:10 AM Houston Putman 
>> wrote:
>> 
>> I could be wrong, but i dont think preferLocalShards is the default in
>> multi-shard use cases.
>> 
>>> On Wed, Mar 10, 2021 at 9:07 AM Mike Drob  wrote:
>>> 
>>> I believe a server will always try to prefer local cores. Can you do an
>>> experiment with 3 nodes, and send http queries to the node not hosting
>> any
>>> replicas? That should confirm the balanced distribution.
>>> 
>>> If you have multiple shards, the receiving server will forward the
>> requests
>>> for shards it doesn’t have, but would still prefer local shards when they
>>> are available.
>>> 
>>> On Wed, Mar 10, 2021 at 8:00 AM Jan Høydahl 
>> wrote:
>>> 
 Hi,
 
 A client has a SolrCloud 8.4 setup with two nodes, and one collection
>>> with
 one shard and replicationFactor=2.
 Of course we want search traffic to be evenly distributed between the
>> two
 replicas.
 The client is using plain HTTP requests, no SolrJ or anything fancy,
>> and
 sends all requests to one of the two nodes.
 I was expecting Solr to forward about 50% of those requests to the
>> other
 replica, but it is serving them all locally.
 
 I know we can setup an LB in front or re-program the client to do round
 robin, but that is not my question.
 Is the select-random-replica logic only active when we have a sharded
 oollection, and not for a single-shard?
 
 Jan
>>> 
>> 




Re: Solr not distributing search requests among replicas

2021-03-10 Thread Chris Hostetter

all of the "routing" logic (preferLocal, shards.preference, etc...) really 
only comes into play once solr "code" (either CloudSolrClient, or a solr 
server recieving a request) decides that it needs to make a remote 
connection.

If a node recieves a request, and it has a local core capable of handling 
that request, it will process it in order to avoid the network overhead of 
sending it somewhere else.

Where things like shards.preference (and the deprecated 
"preferLocalShards") come into play (on the server side) is in situations 
where a Solr node is acting as a a Solr client:

1) the Solr node does't have any cores suitable for dealing with teh 
request (ie: it's a 'top level' request for a collection that has no local 
replicas and must be forwarded)

2) Solr is processing a top-level request and now needs to federate 
distributed sub-requests to each of the shards and is deciding which 
replica of each shard should get the request.




: Date: Wed, 10 Mar 2021 18:42:35 +0100
: From: Jan Høydahl 
: Reply-To: users@solr.apache.org
: To: users@solr.apache.org
: Subject: Re: Solr not distributing search requests among replicas
: 
: We have not set any shard.preference, and I also think preferLocal defaults 
to false, i.e random
: 
: Earlier we had 2 shares for the same collection (both existed on both nodes) 
and then requests were distributed to both nodes. That’s why, when we went to 1 
shard, I was wondering if the “single-shard” code path perhaps never attempts 
to utilize replicas?? But have not looked in code yet.
: 
: Guess next step is to setup a small local test cluster and see what happens.
: 
: Jan Høydahl
: 
: > 10. mar. 2021 kl. 15:46 skrev Michael Gibney :
: > 
: > You say not "anything fancy" -- depending on how you define "fancy", if you
: > have an explicit `shards.preference` param, based on the version you're
: > running (8.4) you might also take a look at
: > https://issues.apache.org/jira/browse/SOLR-14471. (If SOLR-14471 is the
: > problem, removing the explicit `shards.preference` param should restore
: > default "shuffling" routing).
: > 
: > I haven't dug too deep, but it looks like for 8.4 preferLocalShards
: > actually defaults to false? I might be missing something though:
: > 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85
: > 
: > 
: > 
: >> On Wed, Mar 10, 2021 at 9:10 AM Houston Putman 
: >> wrote:
: >> 
: >> I could be wrong, but i dont think preferLocalShards is the default in
: >> multi-shard use cases.
: >> 
: >>> On Wed, Mar 10, 2021 at 9:07 AM Mike Drob  wrote:
: >>> 
: >>> I believe a server will always try to prefer local cores. Can you do an
: >>> experiment with 3 nodes, and send http queries to the node not hosting
: >> any
: >>> replicas? That should confirm the balanced distribution.
: >>> 
: >>> If you have multiple shards, the receiving server will forward the
: >> requests
: >>> for shards it doesn’t have, but would still prefer local shards when they
: >>> are available.
: >>> 
: >>> On Wed, Mar 10, 2021 at 8:00 AM Jan Høydahl 
: >> wrote:
: >>> 
:  Hi,
:  
:  A client has a SolrCloud 8.4 setup with two nodes, and one collection
: >>> with
:  one shard and replicationFactor=2.
:  Of course we want search traffic to be evenly distributed between the
: >> two
:  replicas.
:  The client is using plain HTTP requests, no SolrJ or anything fancy,
: >> and
:  sends all requests to one of the two nodes.
:  I was expecting Solr to forward about 50% of those requests to the
: >> other
:  replica, but it is serving them all locally.
:  
:  I know we can setup an LB in front or re-program the client to do round
:  robin, but that is not my question.
:  Is the select-random-replica logic only active when we have a sharded
:  oollection, and not for a single-shard?
:  
:  Jan
: >>> 
: >> 
: 

-Hoss
http://www.lucidworks.com/

Re: Solr not distributing search requests among replicas

2021-03-10 Thread Chris Hostetter

: Ah, I missed "single shard" ... this looks relevant:
: https://issues.apache.org/jira/browse/SOLR-12217

That improvement still isn't going to impact Jan's situation where the 
*client* isn't SolrJ ... as the description says:

>> NOTE: This Jira doesn't cover the single-sharded collections cases when 
>> not using the CloudSolrClient or Streaming Expressions (i.e. if you do 
>> a non-streaming curl request to a random node in the cluster, the 
>> shards.preference parameter is not considered in the case of single 
>> shards collections).


: 
: On Wed, Mar 10, 2021 at 12:43 PM Jan Høydahl  wrote:
: 
: > We have not set any shard.preference, and I also think preferLocal
: > defaults to false, i.e random
: >
: > Earlier we had 2 shares for the same collection (both existed on both
: > nodes) and then requests were distributed to both nodes. That’s why, when
: > we went to 1 shard, I was wondering if the “single-shard” code path perhaps
: > never attempts to utilize replicas?? But have not looked in code yet.
: >
: > Guess next step is to setup a small local test cluster and see what
: > happens.
: >
: > Jan Høydahl
: >
: > > 10. mar. 2021 kl. 15:46 skrev Michael Gibney  >:
: > >
: > > You say not "anything fancy" -- depending on how you define "fancy", if
: > you
: > > have an explicit `shards.preference` param, based on the version you're
: > > running (8.4) you might also take a look at
: > > https://issues.apache.org/jira/browse/SOLR-14471. (If SOLR-14471 is the
: > > problem, removing the explicit `shards.preference` param should restore
: > > default "shuffling" routing).
: > >
: > > I haven't dug too deep, but it looks like for 8.4 preferLocalShards
: > > actually defaults to false? I might be missing something though:
: > >
: > 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85
: > >
: > >
: > >
: > >> On Wed, Mar 10, 2021 at 9:10 AM Houston Putman  >
: > >> wrote:
: > >>
: > >> I could be wrong, but i dont think preferLocalShards is the default in
: > >> multi-shard use cases.
: > >>
: > >>> On Wed, Mar 10, 2021 at 9:07 AM Mike Drob  wrote:
: > >>>
: > >>> I believe a server will always try to prefer local cores. Can you do an
: > >>> experiment with 3 nodes, and send http queries to the node not hosting
: > >> any
: > >>> replicas? That should confirm the balanced distribution.
: > >>>
: > >>> If you have multiple shards, the receiving server will forward the
: > >> requests
: > >>> for shards it doesn’t have, but would still prefer local shards when
: > they
: > >>> are available.
: > >>>
: > >>> On Wed, Mar 10, 2021 at 8:00 AM Jan Høydahl 
: > >> wrote:
: > >>>
: >  Hi,
: > 
: >  A client has a SolrCloud 8.4 setup with two nodes, and one collection
: > >>> with
: >  one shard and replicationFactor=2.
: >  Of course we want search traffic to be evenly distributed between the
: > >> two
: >  replicas.
: >  The client is using plain HTTP requests, no SolrJ or anything fancy,
: > >> and
: >  sends all requests to one of the two nodes.
: >  I was expecting Solr to forward about 50% of those requests to the
: > >> other
: >  replica, but it is serving them all locally.
: > 
: >  I know we can setup an LB in front or re-program the client to do
: > round
: >  robin, but that is not my question.
: >  Is the select-random-replica logic only active when we have a sharded
: >  oollection, and not for a single-shard?
: > 
: >  Jan
: > >>>
: > >>
: >
: 

-Hoss
http://www.lucidworks.com/

Re: Solr not distributing search requests among replicas

2021-03-10 Thread Jan Høydahl
Aha, I'm starting to see what is happening here.

So on the server side, a node hosting one of N replicas for a shard, and that 
collection is single-sharded, then no randomization or forwarding will ever 
take place.
Before SOLR-12217 it would not happen when using SolrJ either, but after 
SOLR-12217, SolrJ will load balance between replicas when selecting a node to 
send request to.

So in this case the client is a .NET app and has no SolrJ.

Is there any way whatsoever to solve this on the Solr side only?

Only I can think of is to send all requests to a 3rd node in the cluster that 
does not have a core for the collection, then it will balance between the two :)
Or create a new, empty collection on the node, which acts as a routing 
collection only to the target collection?

Sounds like there should be a way to explicitly disable the "optimization" of 
always handling the request locally in single-shard collections, i.e. always 
try to balance unless shards.preference=local?

Jan

> 10. mar. 2021 kl. 19:06 skrev Chris Hostetter :
> 
> 
> : Ah, I missed "single shard" ... this looks relevant:
> : https://issues.apache.org/jira/browse/SOLR-12217
> 
> That improvement still isn't going to impact Jan's situation where the 
> *client* isn't SolrJ ... as the description says:
> 
>>> NOTE: This Jira doesn't cover the single-sharded collections cases when 
>>> not using the CloudSolrClient or Streaming Expressions (i.e. if you do 
>>> a non-streaming curl request to a random node in the cluster, the 
>>> shards.preference parameter is not considered in the case of single 
>>> shards collections).
> 
> 
> : 
> : On Wed, Mar 10, 2021 at 12:43 PM Jan Høydahl  wrote:
> : 
> : > We have not set any shard.preference, and I also think preferLocal
> : > defaults to false, i.e random
> : >
> : > Earlier we had 2 shares for the same collection (both existed on both
> : > nodes) and then requests were distributed to both nodes. That’s why, when
> : > we went to 1 shard, I was wondering if the “single-shard” code path 
> perhaps
> : > never attempts to utilize replicas?? But have not looked in code yet.
> : >
> : > Guess next step is to setup a small local test cluster and see what
> : > happens.
> : >
> : > Jan Høydahl
> : >
> : > > 10. mar. 2021 kl. 15:46 skrev Michael Gibney  : > >:
> : > >
> : > > You say not "anything fancy" -- depending on how you define "fancy", if
> : > you
> : > > have an explicit `shards.preference` param, based on the version you're
> : > > running (8.4) you might also take a look at
> : > > https://issues.apache.org/jira/browse/SOLR-14471. (If SOLR-14471 is the
> : > > problem, removing the explicit `shards.preference` param should restore
> : > > default "shuffling" routing).
> : > >
> : > > I haven't dug too deep, but it looks like for 8.4 preferLocalShards
> : > > actually defaults to false? I might be missing something though:
> : > >
> : > 
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85
> : > >
> : > >
> : > >
> : > >> On Wed, Mar 10, 2021 at 9:10 AM Houston Putman  : > >
> : > >> wrote:
> : > >>
> : > >> I could be wrong, but i dont think preferLocalShards is the default in
> : > >> multi-shard use cases.
> : > >>
> : > >>> On Wed, Mar 10, 2021 at 9:07 AM Mike Drob  wrote:
> : > >>>
> : > >>> I believe a server will always try to prefer local cores. Can you do 
> an
> : > >>> experiment with 3 nodes, and send http queries to the node not hosting
> : > >> any
> : > >>> replicas? That should confirm the balanced distribution.
> : > >>>
> : > >>> If you have multiple shards, the receiving server will forward the
> : > >> requests
> : > >>> for shards it doesn’t have, but would still prefer local shards when
> : > they
> : > >>> are available.
> : > >>>
> : > >>> On Wed, Mar 10, 2021 at 8:00 AM Jan Høydahl 
> : > >> wrote:
> : > >>>
> : >  Hi,
> : > 
> : >  A client has a SolrCloud 8.4 setup with two nodes, and one collection
> : > >>> with
> : >  one shard and replicationFactor=2.
> : >  Of course we want search traffic to be evenly distributed between the
> : > >> two
> : >  replicas.
> : >  The client is using plain HTTP requests, no SolrJ or anything fancy,
> : > >> and
> : >  sends all requests to one of the two nodes.
> : >  I was expecting Solr to forward about 50% of those requests to the
> : > >> other
> : >  replica, but it is serving them all locally.
> : > 
> : >  I know we can setup an LB in front or re-program the client to do
> : > round
> : >  robin, but that is not my question.
> : >  Is the select-random-replica logic only active when we have a sharded
> : >  oollection, and not for a single-shard?
> : > 
> : >  Jan
> : > >>>
> : > >>
> : >
> : 
> 
> -Hoss
> http://www.lucidworks.com/



Re: Solr not distributing search requests among replicas

2021-03-10 Thread Chris Hostetter

: Is there any way whatsoever to solve this on the Solr side only?
: 
: Only I can think of is to send all requests to a 3rd node in the cluster 
: that does not have a core for the collection, then it will balance 
: between the two :)

correct -- you can create a Solr node w/o any cores that will act as a 
"load balancer" to other solr nodes.

: Or create a new, empty collection on the node, which acts as a routing 
: collection only to the target collection?

no -- this won't work, because the requerst your remote client sends will 
need to specify the actual collection you want to query, and when the node 
gets this it will hand it to the local core for that collection -- it 
won't care that there is another local collection that's unrelated.

: Sounds like there should be a way to explicitly disable the 
: "optimization" of always handling the request locally in single-shard 
: collections, i.e. always try to balance unless shards.preference=local?

that seems... dangerous.  you could easily wind up in a situation where 
nodes just keep trying to forward forever?



: 
: Jan
: 
: > 10. mar. 2021 kl. 19:06 skrev Chris Hostetter :
: > 
: > 
: > : Ah, I missed "single shard" ... this looks relevant:
: > : https://issues.apache.org/jira/browse/SOLR-12217
: > 
: > That improvement still isn't going to impact Jan's situation where the 
: > *client* isn't SolrJ ... as the description says:
: > 
: >>> NOTE: This Jira doesn't cover the single-sharded collections cases when 
: >>> not using the CloudSolrClient or Streaming Expressions (i.e. if you do 
: >>> a non-streaming curl request to a random node in the cluster, the 
: >>> shards.preference parameter is not considered in the case of single 
: >>> shards collections).
: > 
: > 
: > : 
: > : On Wed, Mar 10, 2021 at 12:43 PM Jan Høydahl  
wrote:
: > : 
: > : > We have not set any shard.preference, and I also think preferLocal
: > : > defaults to false, i.e random
: > : >
: > : > Earlier we had 2 shares for the same collection (both existed on both
: > : > nodes) and then requests were distributed to both nodes. That’s why, 
when
: > : > we went to 1 shard, I was wondering if the “single-shard” code path 
perhaps
: > : > never attempts to utilize replicas?? But have not looked in code yet.
: > : >
: > : > Guess next step is to setup a small local test cluster and see what
: > : > happens.
: > : >
: > : > Jan Høydahl
: > : >
: > : > > 10. mar. 2021 kl. 15:46 skrev Michael Gibney 
 : > >:
: > : > >
: > : > > You say not "anything fancy" -- depending on how you define "fancy", 
if
: > : > you
: > : > > have an explicit `shards.preference` param, based on the version 
you're
: > : > > running (8.4) you might also take a look at
: > : > > https://issues.apache.org/jira/browse/SOLR-14471. (If SOLR-14471 is 
the
: > : > > problem, removing the explicit `shards.preference` param should 
restore
: > : > > default "shuffling" routing).
: > : > >
: > : > > I haven't dug too deep, but it looks like for 8.4 preferLocalShards
: > : > > actually defaults to false? I might be missing something though:
: > : > >
: > : > 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85
: > : > >
: > : > >
: > : > >
: > : > >> On Wed, Mar 10, 2021 at 9:10 AM Houston Putman 
 : > >
: > : > >> wrote:
: > : > >>
: > : > >> I could be wrong, but i dont think preferLocalShards is the default 
in
: > : > >> multi-shard use cases.
: > : > >>
: > : > >>> On Wed, Mar 10, 2021 at 9:07 AM Mike Drob  wrote:
: > : > >>>
: > : > >>> I believe a server will always try to prefer local cores. Can you 
do an
: > : > >>> experiment with 3 nodes, and send http queries to the node not 
hosting
: > : > >> any
: > : > >>> replicas? That should confirm the balanced distribution.
: > : > >>>
: > : > >>> If you have multiple shards, the receiving server will forward the
: > : > >> requests
: > : > >>> for shards it doesn’t have, but would still prefer local shards when
: > : > they
: > : > >>> are available.
: > : > >>>
: > : > >>> On Wed, Mar 10, 2021 at 8:00 AM Jan Høydahl 
: > : > >> wrote:
: > : > >>>
: > : >  Hi,
: > : > 
: > : >  A client has a SolrCloud 8.4 setup with two nodes, and one 
collection
: > : > >>> with
: > : >  one shard and replicationFactor=2.
: > : >  Of course we want search traffic to be evenly distributed between 
the
: > : > >> two
: > : >  replicas.
: > : >  The client is using plain HTTP requests, no SolrJ or anything 
fancy,
: > : > >> and
: > : >  sends all requests to one of the two nodes.
: > : >  I was expecting Solr to forward about 50% of those requests to the
: > : > >> other
: > : >  replica, but it is serving them all locally.
: > : > 
: > : >  I know we can setup an LB in front or re-program the client to do
: > : > round
: > : >  robin, but that is not my question.
: > : >  Is the select-random-rep

Re: Solr not distributing search requests among replicas

2021-03-10 Thread Jan Høydahl
> no -- this won't work, because the requerst your remote client sends will 
> need to specify the actual collection you want to query, and when the node 

I was more thinking of some explicit &collections=otherColl or 
&shards=other_1,other_2 but easier to just send to a node without that 
collection - we have 2 more nodes in the cluster

> that seems... dangerous.  you could easily wind up in a situation where 
> nodes just keep trying to forward forever?

There is some special http parameter being added when forwarding requests, so 
I'm sure each node will be able to decide whether it should act as LB or if it 
is supposed to be the final destination. Or we can add such a param. Of course, 
if SolrJ on the client side has already selected a replica, the receiving node 
should not discard that and do its own balancing. So there is some state to get 
right here.

Jan

> 10. mar. 2021 kl. 19:32 skrev Chris Hostetter :
> 
> 
> : Is there any way whatsoever to solve this on the Solr side only?
> : 
> : Only I can think of is to send all requests to a 3rd node in the cluster 
> : that does not have a core for the collection, then it will balance 
> : between the two :)
> 
> correct -- you can create a Solr node w/o any cores that will act as a 
> "load balancer" to other solr nodes.
> 
> : Or create a new, empty collection on the node, which acts as a routing 
> : collection only to the target collection?
> 
> no -- this won't work, because the requerst your remote client sends will 
> need to specify the actual collection you want to query, and when the node 
> gets this it will hand it to the local core for that collection -- it 
> won't care that there is another local collection that's unrelated.
> 
> : Sounds like there should be a way to explicitly disable the 
> : "optimization" of always handling the request locally in single-shard 
> : collections, i.e. always try to balance unless shards.preference=local?
> 
> that seems... dangerous.  you could easily wind up in a situation where 
> nodes just keep trying to forward forever?
> 
> 
> 
> : 
> : Jan
> : 
> : > 10. mar. 2021 kl. 19:06 skrev Chris Hostetter  >:
> : > 
> : > 
> : > : Ah, I missed "single shard" ... this looks relevant:
> : > : https://issues.apache.org/jira/browse/SOLR-12217 
> 
> : > 
> : > That improvement still isn't going to impact Jan's situation where the 
> : > *client* isn't SolrJ ... as the description says:
> : > 
> : >>> NOTE: This Jira doesn't cover the single-sharded collections cases when 
> : >>> not using the CloudSolrClient or Streaming Expressions (i.e. if you do 
> : >>> a non-streaming curl request to a random node in the cluster, the 
> : >>> shards.preference parameter is not considered in the case of single 
> : >>> shards collections).
> : > 
> : > 
> : > : 
> : > : On Wed, Mar 10, 2021 at 12:43 PM Jan Høydahl  > wrote:
> : > : 
> : > : > We have not set any shard.preference, and I also think preferLocal
> : > : > defaults to false, i.e random
> : > : >
> : > : > Earlier we had 2 shares for the same collection (both existed on both
> : > : > nodes) and then requests were distributed to both nodes. That’s why, 
> when
> : > : > we went to 1 shard, I was wondering if the “single-shard” code path 
> perhaps
> : > : > never attempts to utilize replicas?? But have not looked in code yet.
> : > : >
> : > : > Guess next step is to setup a small local test cluster and see what
> : > : > happens.
> : > : >
> : > : > Jan Høydahl
> : > : >
> : > : > > 10. mar. 2021 kl. 15:46 skrev Michael Gibney 
> mailto:mich...@michaelgibney.net>
> : > : > >:
> : > : > >
> : > : > > You say not "anything fancy" -- depending on how you define 
> "fancy", if
> : > : > you
> : > : > > have an explicit `shards.preference` param, based on the version 
> you're
> : > : > > running (8.4) you might also take a look at
> : > : > > https://issues.apache.org/jira/browse/SOLR-14471 
> . (If SOLR-14471 is the
> : > : > > problem, removing the explicit `shards.preference` param should 
> restore
> : > : > > default "shuffling" routing).
> : > : > >
> : > : > > I haven't dug too deep, but it looks like for 8.4 preferLocalShards
> : > : > > actually defaults to false? I might be missing something though:
> : > : > >
> : > : > 
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/solrj/src/java/org/apache/solr/client/solrj/routing/RequestReplicaListTransformerGenerator.java#L85
>  
> 
> : > : > >
> : > : > >
> : > : > >
> : > : > >> On Wed, Mar 10, 2021 at 9:10 AM Houston Putman 
>  : > : > >
> : > : > >> wrote:
> : > : > >>
> : > : > >> I could be wrong, but i dont think preferLocalShards is the 
>

Does CVE-2020-27223 impact Solr 8.6.1

2021-03-10 Thread Steven White
Hi everyone,

Does anyone know if CVE-2020-27223 [1] impacts Solr?  This is a
vulnerability in jetty-http-9.4.27.v20200227.jar which we ship with Solr
8.6.1.

Thanks,

Steven

[1] https://nvd.nist.gov/vuln/detail/CVE-2020-27223


Re: Searching and WordDelimiterGraphFilterFactory

2021-03-10 Thread Shaun Campbell
Thanks I'll check that out. :)

Shaun

On Tue, 9 Mar 2021 at 22:31, Jörn Franke  wrote:

> From the Solr ref guide: you forgot the flatten graph filter at the end -
> this is needed for any graph filter you use :  class="solr.FlattenGraphFilterFactory"/> 
>
> > Am 09.03.2021 um 22:21 schrieb Shaun Campbell  >:
> >
> > Hi Susmit
> >
> > That didn't seem to work. Don't know if I was doing something wrong. I
> > ended up writing a regex to split the incoming string into strings of
> > numbers and letters and build up the query manually. It's all working
> now.
> >
> > Thanks
> > Shaun
> >
> >> On Tue, 9 Mar 2021 at 16:50, Susmit  wrote:
> >>
> >> q.op = AND could be useful. the parts broken down by WDgff joined by
> ‘AND’
> >>
> >> Sent from my iPhone
> >>
> >>> On Mar 9, 2021, at 3:07 AM, Shaun Campbell 
> >> wrote:
> >>>
> >>> Hi
> >>>
> >>> I'm trying to produce an autosuggestion field for project ids using
> >>> ngrams and WordDelimiterGraphFilterFactory to split on word number
> >>> boundaries.
> >>>
> >>> The ids have various formats ranging from nihr123456, 12/34/567,
> >>> DRF-2018-11-ST2-062.
> >>>
> >>> What I'm trying to do is allow the user to enter the number parts or
> the
> >>> alphabetical characters, or both and match all. The basic
> autosuggestion
> >> is
> >>> working but I have an issue where the query is matching some but not
> all
> >> of
> >>> the component parts. For example:
> >>>
> >>> I enter DRF-2018-11 and it matches:
> >>>
> >>> DRF-2018-11-ST2-062
> >>> PB-PG-0909-20188
> >>> CS-2018-18-ST2-005
> >>>
> >>>
> >>> The first one is correct because it matches the DRF, the 2018 and the
> 11.
> >>> The second and third ones I don't want because there's no DRF, or 11 in
> >> the
> >>> ids.  Is there any way to get around this problem in Solr
> configuration,
> >> or
> >>> do I have to split the id manually in code and construct a query where
> >> the
> >>> id is DRF AND id is 2018 AND id is 11?
> >>>
> >>> Here is my field type configuration:
> >>>
> >>>  >>> positionIncrementGap="100" autoGeneratePhraseQueries="true">
> >>> 
> >>>
> >>> 
> >>>  >> generateWordParts="1"
> >>> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> >>> catenateAll="0" splitOnCaseChange="0" splitOnNumerics="1"/>
> >>> 
> >>> 
> >>>  >>> maxGramSize="7"/>
> >>>
> >>> 
> >>> 
> >>>
> >>> 
> >>>  >> generateWordParts="1"
> >>> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> >>> catenateAll="0" splitOnCaseChange="0"  splitOnNumerics="1"/>
> >>> 
> >>>
> >>> 
> >>> 
> >>>
> >>> Thanks
> >>> Shaun
> >>
>


Re: Get older solr releases

2021-03-10 Thread Chris Hostetter


To expand on Markus's comment...

1) The availability for downloading "Past Versions" of Solr is spelled out 
on the downloads page: 

https://solr.apache.org/downloads.html#past-versions

2) The mirror network, by design, is only suppose to host "current, 
recommended releases" -- the specific definition of that is left somewhat 
up to interpretation by the projects -- to help reduce disk requirements 
of every mirror provider (which mirror releases for *ALL* apache 
projects): 

https://infra.apache.org/mirrors.html

-Hoss
http://www.lucidworks.com/


: All ASF project releases are permanently available at the archive:
: http://archive.apache.org/dist/lucene/solr/
: 
: Future versions of Solr are probably here:
: http://archive.apache.org/dist/solr/




Re: Solr not distributing search requests among replicas

2021-03-10 Thread Chris Hostetter

: > that seems... dangerous.  you could easily wind up in a situation where 
: > nodes just keep trying to forward forever?
: 
: There is some special http parameter being added when forwarding 
: requests, so I'm sure each node will be able to decide whether it should 
: act as LB or if it is supposed to be the final destination. Or we can 
: add such a param. Of course, if SolrJ on the client side has already 
: selected a replica, the receiving node should not discard that and do 
: its own balancing. So there is some state to get right here.

"Forever" wasn'treally what i ment to say ... I'm concerned more about how 
you would implement this to work well in the 'general case' -- ie: 
multiple nodes, multiple collections, multiple shards, multiple replicas 
per shard -- w/o doing "too much" forwarding.


If nodeA gets a request, when exactly should it decide "i *COULD* handle 
this request for collection1 using local core, but I'll go ahead and 
forward it to nodeB instead." ? ... should it be based on what percentage 
of collection1's total replica list are located on nodeA, or based on what 
pecentage of nodeA is dedicated to collection1? ... should nodeB be more 
or less likely then nodeC to get the request based on how many total cores 
each node has for collection1, or how many unique shards each one has?


Also bear in mind that even if you assumed everything was nice and evenly 
distributed, a "simple" round robin based approach would have some pretty 
signifincat impacts on the number of intra-node network requests  

Say you have a 5 node cluster, hosting a 1shard/5replica collection such 
that each node has 1 replica:  today any node can process the request 
locally; but if we did a round robin proxy of the request, that means we'd 
only handle it locally 1/5th the time, and 4/5ths of the time you add an 
extra network hop and the assocaited network IO involved (plus the 
original node has a thread tied up waiting to proxy the response) .. so 
you'd go from needing 0 "internal" network requests/IO to having internal 
traffic of 80% of the amount of external traffic recieved.

If those 5 nodes host a collection with 2 shards/5replicas each, spread 
evenly over the 5 nodes: today any given request typically causes 2 
intra-cluster network requests to get the per-shard data; but if we round 
robin proxy the initial request to a differnet node 4/5ths of the time we 
now typically need 2.8 internal requests for each external request...


It just seems like adding more forwarding/proxy logic -- that isn't 
strictly neccessary to compute complete results -- could introduce a lot 
of complexity risk for a problem that already has multiple solutions:

1) client (or external load blanacer) can round robin over live nodes (and 
given that cluster state and metrics are available via HTTP, a client can 
make very sophisticated choices)

2) a single "extra" solr node in the cluster can be used as a "self 
configuring" load balancer that will automatically know when new nodes are 
added to the cluster, or when replicas get moved/added, etc...






: 
: Jan
: 
: > 10. mar. 2021 kl. 19:32 skrev Chris Hostetter :
: > 
: > 
: > : Is there any way whatsoever to solve this on the Solr side only?
: > : 
: > : Only I can think of is to send all requests to a 3rd node in the cluster 
: > : that does not have a core for the collection, then it will balance 
: > : between the two :)
: > 
: > correct -- you can create a Solr node w/o any cores that will act as a 
: > "load balancer" to other solr nodes.
: > 
: > : Or create a new, empty collection on the node, which acts as a routing 
: > : collection only to the target collection?
: > 
: > no -- this won't work, because the requerst your remote client sends will 
: > need to specify the actual collection you want to query, and when the node 
: > gets this it will hand it to the local core for that collection -- it 
: > won't care that there is another local collection that's unrelated.
: > 
: > : Sounds like there should be a way to explicitly disable the 
: > : "optimization" of always handling the request locally in single-shard 
: > : collections, i.e. always try to balance unless shards.preference=local?
: > 
: > that seems... dangerous.  you could easily wind up in a situation where 
: > nodes just keep trying to forward forever?
: > 
: > 
: > 
: > : 
: > : Jan
: > : 
: > : > 10. mar. 2021 kl. 19:06 skrev Chris Hostetter mailto:hossman_luc...@fucit.org>>:
: > : > 
: > : > 
: > : > : Ah, I missed "single shard" ... this looks relevant:
: > : > : https://issues.apache.org/jira/browse/SOLR-12217 

: > : > 
: > : > That improvement still isn't going to impact Jan's situation where the 
: > : > *client* isn't SolrJ ... as the description says:
: > : > 
: > : >>> NOTE: This Jira doesn't cover the single-sharded collections cases 
when 
: > : >>> not using the CloudSolrClient or Streaming Expressions (i.e. if you 
d

Re: Solr not distributing search requests among replicas

2021-03-10 Thread Mike Drob
>> 2) a single "extra" solr node in the cluster can be used as a "self
configuring" load balancer

I’ve thought about this a bunch before, are there mechanisms to instruct
Solr to not host shards for this purpose? Maybe it deserves its own
discussion.

On Wed, Mar 10, 2021 at 5:14 PM Chris Hostetter 
wrote:

>
> : > that seems... dangerous.  you could easily wind up in a situation
> where
> : > nodes just keep trying to forward forever?
> :
> : There is some special http parameter being added when forwarding
> : requests, so I'm sure each node will be able to decide whether it should
> : act as LB or if it is supposed to be the final destination. Or we can
> : add such a param. Of course, if SolrJ on the client side has already
> : selected a replica, the receiving node should not discard that and do
> : its own balancing. So there is some state to get right here.
>
> "Forever" wasn'treally what i ment to say ... I'm concerned more about how
> you would implement this to work well in the 'general case' -- ie:
> multiple nodes, multiple collections, multiple shards, multiple replicas
> per shard -- w/o doing "too much" forwarding.
>
>
> If nodeA gets a request, when exactly should it decide "i *COULD* handle
> this request for collection1 using local core, but I'll go ahead and
> forward it to nodeB instead." ? ... should it be based on what percentage
> of collection1's total replica list are located on nodeA, or based on what
> pecentage of nodeA is dedicated to collection1? ... should nodeB be more
> or less likely then nodeC to get the request based on how many total cores
> each node has for collection1, or how many unique shards each one has?
>
>
> Also bear in mind that even if you assumed everything was nice and evenly
> distributed, a "simple" round robin based approach would have some pretty
> signifincat impacts on the number of intra-node network requests
>
> Say you have a 5 node cluster, hosting a 1shard/5replica collection such
> that each node has 1 replica:  today any node can process the request
> locally; but if we did a round robin proxy of the request, that means we'd
> only handle it locally 1/5th the time, and 4/5ths of the time you add an
> extra network hop and the assocaited network IO involved (plus the
> original node has a thread tied up waiting to proxy the response) .. so
> you'd go from needing 0 "internal" network requests/IO to having internal
> traffic of 80% of the amount of external traffic recieved.
>
> If those 5 nodes host a collection with 2 shards/5replicas each, spread
> evenly over the 5 nodes: today any given request typically causes 2
> intra-cluster network requests to get the per-shard data; but if we round
> robin proxy the initial request to a differnet node 4/5ths of the time we
> now typically need 2.8 internal requests for each external request...
>
>
> It just seems like adding more forwarding/proxy logic -- that isn't
> strictly neccessary to compute complete results -- could introduce a lot
> of complexity risk for a problem that already has multiple solutions:
>
> 1) client (or external load blanacer) can round robin over live nodes (and
> given that cluster state and metrics are available via HTTP, a client can
> make very sophisticated choices)
>
> 2) a single "extra" solr node in the cluster can be used as a "self
> configuring" load balancer that will automatically know when new nodes are
> added to the cluster, or when replicas get moved/added, etc...
>
>
>
>
>
>
> :
> : Jan
> :
> : > 10. mar. 2021 kl. 19:32 skrev Chris Hostetter <
> hossman_luc...@fucit.org>:
> : >
> : >
> : > : Is there any way whatsoever to solve this on the Solr side only?
> : > :
> : > : Only I can think of is to send all requests to a 3rd node in the
> cluster
> : > : that does not have a core for the collection, then it will balance
> : > : between the two :)
> : >
> : > correct -- you can create a Solr node w/o any cores that will act as a
> : > "load balancer" to other solr nodes.
> : >
> : > : Or create a new, empty collection on the node, which acts as a
> routing
> : > : collection only to the target collection?
> : >
> : > no -- this won't work, because the requerst your remote client sends
> will
> : > need to specify the actual collection you want to query, and when the
> node
> : > gets this it will hand it to the local core for that collection -- it
> : > won't care that there is another local collection that's unrelated.
> : >
> : > : Sounds like there should be a way to explicitly disable the
> : > : "optimization" of always handling the request locally in
> single-shard
> : > : collections, i.e. always try to balance unless
> shards.preference=local?
> : >
> : > that seems... dangerous.  you could easily wind up in a situation
> where
> : > nodes just keep trying to forward forever?
> : >
> : >
> : >
> : > :
> : > : Jan
> : > :
> : > : > 10. mar. 2021 kl. 19:06 skrev Chris Hostetter <
> hossman_luc...@fucit.org >:
> : > : >

Re: Solr not distributing search requests among replicas

2021-03-10 Thread Walter Underwood
You could even run a separate Solr on the node just to redistribute the queries.
But if I was going to do that, I’d run a copy of nginx as a load balancer 
instead.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Mar 10, 2021, at 4:51 PM, Mike Drob  wrote:
> 
>>> 2) a single "extra" solr node in the cluster can be used as a "self
> configuring" load balancer
> 
> I’ve thought about this a bunch before, are there mechanisms to instruct
> Solr to not host shards for this purpose? Maybe it deserves its own
> discussion.
> 
> On Wed, Mar 10, 2021 at 5:14 PM Chris Hostetter 
> wrote:
> 
>> 
>> : > that seems... dangerous.  you could easily wind up in a situation
>> where
>> : > nodes just keep trying to forward forever?
>> :
>> : There is some special http parameter being added when forwarding
>> : requests, so I'm sure each node will be able to decide whether it should
>> : act as LB or if it is supposed to be the final destination. Or we can
>> : add such a param. Of course, if SolrJ on the client side has already
>> : selected a replica, the receiving node should not discard that and do
>> : its own balancing. So there is some state to get right here.
>> 
>> "Forever" wasn'treally what i ment to say ... I'm concerned more about how
>> you would implement this to work well in the 'general case' -- ie:
>> multiple nodes, multiple collections, multiple shards, multiple replicas
>> per shard -- w/o doing "too much" forwarding.
>> 
>> 
>> If nodeA gets a request, when exactly should it decide "i *COULD* handle
>> this request for collection1 using local core, but I'll go ahead and
>> forward it to nodeB instead." ? ... should it be based on what percentage
>> of collection1's total replica list are located on nodeA, or based on what
>> pecentage of nodeA is dedicated to collection1? ... should nodeB be more
>> or less likely then nodeC to get the request based on how many total cores
>> each node has for collection1, or how many unique shards each one has?
>> 
>> 
>> Also bear in mind that even if you assumed everything was nice and evenly
>> distributed, a "simple" round robin based approach would have some pretty
>> signifincat impacts on the number of intra-node network requests
>> 
>> Say you have a 5 node cluster, hosting a 1shard/5replica collection such
>> that each node has 1 replica:  today any node can process the request
>> locally; but if we did a round robin proxy of the request, that means we'd
>> only handle it locally 1/5th the time, and 4/5ths of the time you add an
>> extra network hop and the assocaited network IO involved (plus the
>> original node has a thread tied up waiting to proxy the response) .. so
>> you'd go from needing 0 "internal" network requests/IO to having internal
>> traffic of 80% of the amount of external traffic recieved.
>> 
>> If those 5 nodes host a collection with 2 shards/5replicas each, spread
>> evenly over the 5 nodes: today any given request typically causes 2
>> intra-cluster network requests to get the per-shard data; but if we round
>> robin proxy the initial request to a differnet node 4/5ths of the time we
>> now typically need 2.8 internal requests for each external request...
>> 
>> 
>> It just seems like adding more forwarding/proxy logic -- that isn't
>> strictly neccessary to compute complete results -- could introduce a lot
>> of complexity risk for a problem that already has multiple solutions:
>> 
>> 1) client (or external load blanacer) can round robin over live nodes (and
>> given that cluster state and metrics are available via HTTP, a client can
>> make very sophisticated choices)
>> 
>> 2) a single "extra" solr node in the cluster can be used as a "self
>> configuring" load balancer that will automatically know when new nodes are
>> added to the cluster, or when replicas get moved/added, etc...
>> 
>> 
>> 
>> 
>> 
>> 
>> :
>> : Jan
>> :
>> : > 10. mar. 2021 kl. 19:32 skrev Chris Hostetter <
>> hossman_luc...@fucit.org>:
>> : >
>> : >
>> : > : Is there any way whatsoever to solve this on the Solr side only?
>> : > :
>> : > : Only I can think of is to send all requests to a 3rd node in the
>> cluster
>> : > : that does not have a core for the collection, then it will balance
>> : > : between the two :)
>> : >
>> : > correct -- you can create a Solr node w/o any cores that will act as a
>> : > "load balancer" to other solr nodes.
>> : >
>> : > : Or create a new, empty collection on the node, which acts as a
>> routing
>> : > : collection only to the target collection?
>> : >
>> : > no -- this won't work, because the requerst your remote client sends
>> will
>> : > need to specify the actual collection you want to query, and when the
>> node
>> : > gets this it will hand it to the local core for that collection -- it
>> : > won't care that there is another local collection that's unrelated.
>> : >
>> : > : Sounds like there should be a way to explicitly disable the
>> : > : "optimization" of always h

Solr custom query component does not return correct facet counts

2021-03-10 Thread gnandre
I have a simple Solr query component that does some exact match processing
by replacing qf and pf params in incoming search requests with new values
that point to the fields that do not do stemming, synonymization etc.

This works as expected. However in a distributed context (not using
SolrCloud, just using shards param), although it works as expected, the
facets counts are off.

Facet counts are double of what they should be. Also, I noticed that I get
two "response" objects in JSON response in a distributed context. Please
note that I have already added following so that I do not get two responses
back:

  @Override
  public void process(ResponseBuilder rb) throws IOException
  {
// do nothing - needed so we don't execute the query here.
  }

This is what my prepare function looks like:
  @Override
  public void prepare( ResponseBuilder rb ) throws IOException
  {
if (exactMatchQueryProcessor != null) {
  exactMatchQueryProcessor.modifyForExactMatch(rb);
}
  }


RE: Potential Slow searching for unified highlighting on Solr 8.8.0/8.8.1

2021-03-10 Thread Flowerday, Matthew J
Hi Ere

Thanks for the help on this. I have raised SOLR-15246 to cover this.

Many thanks

Matthew

Matthew Flowerday | Consultant | ULEAF
Unisys | 01908 774830| matthew.flower...@unisys.com 
Address Enigma | Wavendon Business Park | Wavendon | Milton Keynes | MK17
8LX



THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is for use only by the intended recipient. If you received this
in error, please contact the sender and delete the e-mail and its
attachments from all devices.
   

-Original Message-
From: Ere Maijala  
Sent: 04 March 2021 10:20
To: solr-u...@lucene.apache.org
Subject: Re: Potential Slow searching for unified highlighting on Solr
8.8.0/8.8.1

EXTERNAL EMAIL - Be cautious of all links and attachments.

Hi,

Solr uses JIRA for issue tickets. You can find it here:
https://issues.apache.org/jira/browse/SOLR

I'd suggest filing a new bug issue in the SOLR project (note that several
other projects also use this JIRA installation). Here's an example of an
existing highlighter issue for reference:
https://issues.apache.org/jira/browse/SOLR-14019.

See also some brief documentation:

https://cwiki.apache.org/confluence/display/solr/HowToContribute#HowToContri
bute-JIRAtips(ourissue/bugtracker)

Regards,
Ere

Flowerday, Matthew J kirjoitti 1.3.2021 klo 14.58:
> Hi Ere
>
> Please to be of service!
>
> No I have not filed a JIRA ticket. I am new to interacting with the 
> Solr Community and only beginning to 'find my legs'. I am not too sure 
> what JIRA is I am afraid!
>
> Regards
>
> Matthew
>
> Matthew Flowerday | Consultant | ULEAF Unisys | 01908 774830| 
> matthew.flower...@unisys.com Address Enigma | Wavendon Business Park | 
> Wavendon | Milton Keynes | MK17 8LX
>
>
>
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE 
> PROPRIETARY MATERIAL and is for use only by the intended recipient. If 
> you received this in error, please contact the sender and delete the 
> e-mail and its attachments from all devices.
>
>
> -Original Message-
> From: Ere Maijala 
> Sent: 01 March 2021 12:53
> To: solr-u...@lucene.apache.org
> Subject: Re: Potential Slow searching for unified highlighting on Solr
> 8.8.0/8.8.1
>
> EXTERNAL EMAIL - Be cautious of all links and attachments.
>
> Hi,
>
> Whoa, thanks for the heads-up! You may just have saved me from a whole 
> lot of trouble. Did you file a JIRA ticket already?
>
> Thanks,
> Ere
>
> Flowerday, Matthew J kirjoitti 1.3.2021 klo 14.00:
>> Hi There
>>
>> I just came across a situation where a unified highlighting search 
>> under solr 8.8.0/8.8.1 can take over 20 mins to run and eventually 
>> times
> out.
>> I resolved it by a config change – but it can catch you out. Hence 
>> this email.
>>
>> With solr 8.8.0 a new unified highlighting parameter 
>> &hl.fragAlignRatio was implemented which if not set defaults to 0.5.
>> This attempts to improve the high lighting so that highlighted text 
>> does not appear right at the left. This works well but if you have a 
>> search result with numerous occurrences of the word in question 
>> within the record performance goes right down!
>>
>> 2021-02-27 06:45:03.151 INFO  (qtp762476028-20) [   x:uleaf] 
>> o.a.s.c.S.Request [uleaf]  webapp=/solr path=/select 
>> params={hl.snippets=2&q=test&hl=on&hl.maxAnalyzedChars=100&fl=id,
>> d
>> escription,specification,score&start=20&hl.fl=*&rows=10&_=16144051191
>> 3
>> 4}
>> hits=57008 status=0 QTime=1414320
>>
>> 2021-02-27 06:45:03.245 INFO  (qtp762476028-20) [   x:uleaf] 
>> o.a.s.s.HttpSolrCall Unable to write response, client closed 
>> connection or we are shutting down => 
>> org.eclipse.jetty.io.EofException
>>
>> at
>> org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
>>
>> org.eclipse.jetty.io.EofException: null
>>
>> at
>> org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279)
>> ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>>
>> at
>> org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422)
>> ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>>
>> at
>> org.eclipse.jetty.io.WriteFlusher.completeWrite(WriteFlusher.java:378
>> ) ~[jetty-io-9.4.34.v20201102.jar:9.4.34.v20201102]
>>
>> when I set &hl.fragAlignRatio=0.25 results came back much quicker
>>
>> 2021-02-27 14:59:57.189 INFO  (qtp1291367132-24) [   x:holmes] 
>> o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
>> params={hl.weightMatches=false&hl=on&fl=id,description,specification,
>> s 
>> core&start=1&hl.fragAlignRatio=0.25&rows=100&hl.snippets=2&q=test&hl.
>> m 
>> axAnalyzedChars=100&hl.fl=*&hl.method=unified&timeAllowed=9&_
>> =
>> 1614430061690}
>> hits=136939 status=0 QTime=87024
>>
>> And  &hl.fragAlignRatio=0.1
>>
>> 2021-02-27 15:18:45.542 INFO  (qtp1291367132-19) [   x:holmes] 
>> o.a.s.c.S.Request [holmes]  webapp=/solr path=/select 
>> params={hl.weightMatches=false&hl=on&fl=id,description,specification,
>> s