Re: Requests taking hours on solr cloud

2022-12-09 Thread Ere Maijala

Hi,

Are the same requests sometimes stalling and sometimes fast, or is it 
some particular queries that take hours?


There are some things you should avoid with SolrCloud, and deep paging 
(i.e. a large number for the start or rows parameter) is a typical issue 
(see e.g. https://yonik.com/solr/paging-and-deep-paging/ for more 
information).


Best,
Ere

Satya Nand kirjoitti 8.12.2022 klo 13.27:

Hi,

Greetings for the day,

We are facing a strange problem in Solr cloud where a few requests are
taking hours to complete. Some requests return with a 0 status code and
some with a 500 status code. The recent request took more than 5 hours to
complete with only a 9k results count.


These queries create problems in closing old searchers,  Some times there
are 3-4 searchers where one is a new searcher and the others are just stuck
because a few queries are tracking hours. Finally, the application slows
down horribly, and the load increases.

I have downloaded the stack trace of the affected node and tried to analyze
this stack trace online. but I couldn't get many insights from it.
.

Stack Trace:

https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMjIvMTIvOC9sb2dzLnR4dC0tMTAtNTUtMzA=&;

JVM Settings: We are using Parallel GC, can this be causing this much log
pause?

-XX:+UseParallelGC
-XX:-OmitStackTraceInFastThrow
-Xms12g
-Xmx12g
-Xss256k

What more we can check here to find the root cause and prevent this from
happening again?
Thanks in advance



--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


Re: Requests taking hours on solr cloud

2022-12-09 Thread Satya Nand
Hi Ere,

We tried executing this request again and it didn't take any time. So it is
not repeatable. average response time of all the queries around this period
was only approx 100-200 ms.

This was a group=true request where we get 14 groups and 5 results per
group. So no deep pagination.

On Fri, Dec 9, 2022 at 2:04 PM Ere Maijala  wrote:

> Hi,
>
> Are the same requests sometimes stalling and sometimes fast, or is it
> some particular queries that take hours?
>
> There are some things you should avoid with SolrCloud, and deep paging
> (i.e. a large number for the start or rows parameter) is a typical issue
> (see e.g. https://yonik.com/solr/paging-and-deep-paging/ for more
> information).
>
> Best,
> Ere
>
> Satya Nand kirjoitti 8.12.2022 klo 13.27:
> > Hi,
> >
> > Greetings for the day,
> >
> > We are facing a strange problem in Solr cloud where a few requests are
> > taking hours to complete. Some requests return with a 0 status code and
> > some with a 500 status code. The recent request took more than 5 hours to
> > complete with only a 9k results count.
> >
> >
> > These queries create problems in closing old searchers,  Some times there
> > are 3-4 searchers where one is a new searcher and the others are just
> stuck
> > because a few queries are tracking hours. Finally, the application slows
> > down horribly, and the load increases.
> >
> > I have downloaded the stack trace of the affected node and tried to
> analyze
> > this stack trace online. but I couldn't get many insights from it.
> > .
> >
> > Stack Trace:
> >
> >
> https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMjIvMTIvOC9sb2dzLnR4dC0tMTAtNTUtMzA=&;
> >
> > JVM Settings: We are using Parallel GC, can this be causing this much log
> > pause?
> >
> > -XX:+UseParallelGC
> > -XX:-OmitStackTraceInFastThrow
> > -Xms12g
> > -Xmx12g
> > -Xss256k
> >
> > What more we can check here to find the root cause and prevent this from
> > happening again?
> > Thanks in advance
> >
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland
>


Re: Duplicate docs with same unique id on update

2022-12-09 Thread Jan Høydahl
Hi,

So to be clear - you have a working fix by adding the _root_ field to your 
schema?

I suppose most 8.x users already have a _root_ field, so the thing you are 
seeing could very well be some bug related to atomic update.

Can I propose that you create a minimal reproduction of this issue and upload 
somewhere?
It could e.g. be a set of curl commands that, starting from a newly installed 
Solr 8.11 (or even better 9.1) reproduce the issue.
Hint: You can create a collection with default schema: `solr create -c test` 
and then remove the _root_ field by issuing a delete-field command as described 
here 
https://solr.apache.org/guide/solr/latest/indexing-guide/schema-api.html#delete-a-field

Jan

> 8. des. 2022 kl. 15:30 skrev Eduardo Gomez :
> 
>> At first it wasn't clear to me what the problem you're having actually
>> is.  Then I glanced back at the message subject ... it is the only place
>> you mention it.
> 
> Sorry Shawn, you are right, I didn't explain very clearly. So basically, in
> Solr 8.11.1,  I can see that updating an existing document, e.g. {"id":
> "22468d41-3b...", "title": "Old title"}:
> 
> curl -X POST -H 'Content-type:application/json' '
> http://localhost:8983/solr/clients_main/update?commit=true' --data "{'add':
> {'doc':{'id': '22468d41-3b...', 'title': 'New title'}}}"
> 
> I get two docs with the same id and different titles in the index. That is
> different to the behaviour I see using Solr 7.5, which is a single document
> with the updated title.To get that with the same schema in Solr 8.11.1, I
> have to add this to the schema:
> 
> 
> 
> So without the _root_ definition, the behaviour is as expected in Solr 7.5
> but produces duplicate documents in Solr 8.11. I haven't noticed Solr
> complainig if the _root_ field is not defined.
> 
> So my question was if that is expected, as that field seems to be related
> to parent-child documents, which I don't use at all.
> 
> The definition for the id field in my schema.xml is similar to the one you
> posted:
> 
> 
>  docValues="false"/>
> id
> 
> Eduardo
> 
> 
> 
> 
> 
> 
> On Thu, Dec 8, 2022 at 1:11 PM Mikhail Khludnev  wrote:
> 
>> Right, Shawn. That's how it works
>> 
>> https://lucene.apache.org/core/7_4_0/core/org/apache/lucene/index/IndexWriter.html#updateDocuments-org.apache.lucene.index.Term-java.lang.Iterable-
>> And it's really fast in query time.
>> 
>> On Thu, Dec 8, 2022 at 4:06 PM Shawn Heisey  wrote:
>> 
>>> On 12/8/22 05:58, Shawn Heisey wrote:
 So you can't just update a child document, you have to update all the
 children and all the parents at the same time, so the new documents
 are all in the same segment.
>>> 
>>> That's a little unclear and sounds like a draconian requirement. :)  I
>>> meant that all children must be in the same segment as their parent.  I
>>> think Solr might support the idea of multiple nesting levels ... if so,
>>> then the ultimate parent document and all its descendants need to be in
>>> the same segment.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>> 
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> 
> 
> -- 
> 
> Mintel Group Ltd | Mintel House, 4 Playhouse Yard | London | EC4V 5EX
> Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
> 
> Contact details for our other offices can be found at 
> http://www.mintel.com/office-locations 
> .
> 
> This email and any attachments 
> may include content that is confidential, privileged 
> or otherwise 
> protected under applicable law. Unauthorised disclosure, copying, 
> distribution 
> or use of the contents is prohibited and may be unlawful. If 
> you have received this email in error,
> including without appropriate 
> authorisation, then please reply to the sender about the error 
> and delete 
> this email and any attachments.
> 



Re: Duplicate docs with same unique id on update

2022-12-09 Thread Dave
So it was a decision to remove the unique field id and replace it with root? 
This seems, bad. You can’t have two documents with the same id/unique field.  

> On Dec 9, 2022, at 7:57 AM, Jan Høydahl  wrote:
> 
> Hi,
> 
> So to be clear - you have a working fix by adding the _root_ field to your 
> schema?
> 
> I suppose most 8.x users already have a _root_ field, so the thing you are 
> seeing could very well be some bug related to atomic update.
> 
> Can I propose that you create a minimal reproduction of this issue and upload 
> somewhere?
> It could e.g. be a set of curl commands that, starting from a newly installed 
> Solr 8.11 (or even better 9.1) reproduce the issue.
> Hint: You can create a collection with default schema: `solr create -c test` 
> and then remove the _root_ field by issuing a delete-field command as 
> described here 
> https://solr.apache.org/guide/solr/latest/indexing-guide/schema-api.html#delete-a-field
> 
> Jan
> 
>>> 8. des. 2022 kl. 15:30 skrev Eduardo Gomez :
>>> 
>>> At first it wasn't clear to me what the problem you're having actually
>>> is.  Then I glanced back at the message subject ... it is the only place
>>> you mention it.
>> 
>> Sorry Shawn, you are right, I didn't explain very clearly. So basically, in
>> Solr 8.11.1,  I can see that updating an existing document, e.g. {"id":
>> "22468d41-3b...", "title": "Old title"}:
>> 
>> curl -X POST -H 'Content-type:application/json' '
>> http://localhost:8983/solr/clients_main/update?commit=true' --data "{'add':
>> {'doc':{'id': '22468d41-3b...', 'title': 'New title'}}}"
>> 
>> I get two docs with the same id and different titles in the index. That is
>> different to the behaviour I see using Solr 7.5, which is a single document
>> with the updated title.To get that with the same schema in Solr 8.11.1, I
>> have to add this to the schema:
>> 
>> 
>> 
>> So without the _root_ definition, the behaviour is as expected in Solr 7.5
>> but produces duplicate documents in Solr 8.11. I haven't noticed Solr
>> complainig if the _root_ field is not defined.
>> 
>> So my question was if that is expected, as that field seems to be related
>> to parent-child documents, which I don't use at all.
>> 
>> The definition for the id field in my schema.xml is similar to the one you
>> posted:
>> 
>> 
>> > docValues="false"/>
>> id
>> 
>> Eduardo
>> 
>> 
>> 
>> 
>> 
>> 
>>> On Thu, Dec 8, 2022 at 1:11 PM Mikhail Khludnev  wrote:
>>> 
>>> Right, Shawn. That's how it works
>>> 
>>> https://lucene.apache.org/core/7_4_0/core/org/apache/lucene/index/IndexWriter.html#updateDocuments-org.apache.lucene.index.Term-java.lang.Iterable-
>>> And it's really fast in query time.
>>> 
 On Thu, Dec 8, 2022 at 4:06 PM Shawn Heisey  wrote:
>>> 
 On 12/8/22 05:58, Shawn Heisey wrote:
> So you can't just update a child document, you have to update all the
> children and all the parents at the same time, so the new documents
> are all in the same segment.
 
 That's a little unclear and sounds like a draconian requirement. :)  I
 meant that all children must be in the same segment as their parent.  I
 think Solr might support the idea of multiple nesting levels ... if so,
 then the ultimate parent document and all its descendants need to be in
 the same segment.
 
 Thanks,
 Shawn
 
 
>>> 
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> 
>> 
>> -- 
>> 
>> Mintel Group Ltd | Mintel House, 4 Playhouse Yard | London | EC4V 5EX
>> Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
>> 
>> Contact details for our other offices can be found at 
>> http://www.mintel.com/office-locations 
>> .
>> 
>> This email and any attachments 
>> may include content that is confidential, privileged 
>> or otherwise 
>> protected under applicable law. Unauthorised disclosure, copying, 
>> distribution 
>> or use of the contents is prohibited and may be unlawful. If 
>> you have received this email in error,
>> including without appropriate 
>> authorisation, then please reply to the sender about the error 
>> and delete 
>> this email and any attachments.
>> 
> 


Re: Duplicate docs with same unique id on update

2022-12-09 Thread Jan Høydahl
No no. The schema still has ONE a uniqueId field.
The _root_ field is used as a parent pointer for child documents, it will hold 
the ID of the parent.
Thus you should not need _root_ if you don't use parent/child. But this thread 
suggests that _root_ may be needed in some other code paths as well.

I suspect perhaps this JIRA https://issues.apache.org/jira/browse/SOLR-12638 
may be related in some way (have not looked at any of that code though, see 
https://github.com/apache/solr/search?q=SOLR-12638&type=commits)

Jan

> 9. des. 2022 kl. 14:32 skrev Dave :
> 
> So it was a decision to remove the unique field id and replace it with root? 
> This seems, bad. You can’t have two documents with the same id/unique field.  
> 
>> On Dec 9, 2022, at 7:57 AM, Jan Høydahl  wrote:
>> 
>> Hi,
>> 
>> So to be clear - you have a working fix by adding the _root_ field to your 
>> schema?
>> 
>> I suppose most 8.x users already have a _root_ field, so the thing you are 
>> seeing could very well be some bug related to atomic update.
>> 
>> Can I propose that you create a minimal reproduction of this issue and 
>> upload somewhere?
>> It could e.g. be a set of curl commands that, starting from a newly 
>> installed Solr 8.11 (or even better 9.1) reproduce the issue.
>> Hint: You can create a collection with default schema: `solr create -c test` 
>> and then remove the _root_ field by issuing a delete-field command as 
>> described here 
>> https://solr.apache.org/guide/solr/latest/indexing-guide/schema-api.html#delete-a-field
>> 
>> Jan
>> 
 8. des. 2022 kl. 15:30 skrev Eduardo Gomez :
 
 At first it wasn't clear to me what the problem you're having actually
 is.  Then I glanced back at the message subject ... it is the only place
 you mention it.
>>> 
>>> Sorry Shawn, you are right, I didn't explain very clearly. So basically, in
>>> Solr 8.11.1,  I can see that updating an existing document, e.g. {"id":
>>> "22468d41-3b...", "title": "Old title"}:
>>> 
>>> curl -X POST -H 'Content-type:application/json' '
>>> http://localhost:8983/solr/clients_main/update?commit=true' --data "{'add':
>>> {'doc':{'id': '22468d41-3b...', 'title': 'New title'}}}"
>>> 
>>> I get two docs with the same id and different titles in the index. That is
>>> different to the behaviour I see using Solr 7.5, which is a single document
>>> with the updated title.To get that with the same schema in Solr 8.11.1, I
>>> have to add this to the schema:
>>> 
>>> 
>>> 
>>> So without the _root_ definition, the behaviour is as expected in Solr 7.5
>>> but produces duplicate documents in Solr 8.11. I haven't noticed Solr
>>> complainig if the _root_ field is not defined.
>>> 
>>> So my question was if that is expected, as that field seems to be related
>>> to parent-child documents, which I don't use at all.
>>> 
>>> The definition for the id field in my schema.xml is similar to the one you
>>> posted:
>>> 
>>> 
>>> >> docValues="false"/>
>>> id
>>> 
>>> Eduardo
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
 On Thu, Dec 8, 2022 at 1:11 PM Mikhail Khludnev  wrote:
 
 Right, Shawn. That's how it works
 
 https://lucene.apache.org/core/7_4_0/core/org/apache/lucene/index/IndexWriter.html#updateDocuments-org.apache.lucene.index.Term-java.lang.Iterable-
 And it's really fast in query time.
 
> On Thu, Dec 8, 2022 at 4:06 PM Shawn Heisey  wrote:
 
> On 12/8/22 05:58, Shawn Heisey wrote:
>> So you can't just update a child document, you have to update all the
>> children and all the parents at the same time, so the new documents
>> are all in the same segment.
> 
> That's a little unclear and sounds like a draconian requirement. :)  I
> meant that all children must be in the same segment as their parent.  I
> think Solr might support the idea of multiple nesting levels ... if so,
> then the ultimate parent document and all its descendants need to be in
> the same segment.
> 
> Thanks,
> Shawn
> 
> 
 
 --
 Sincerely yours
 Mikhail Khludnev
 
>>> 
>>> -- 
>>> 
>>> Mintel Group Ltd | Mintel House, 4 Playhouse Yard | London | EC4V 5EX
>>> Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
>>> 
>>> Contact details for our other offices can be found at 
>>> http://www.mintel.com/office-locations 
>>> .
>>> 
>>> This email and any attachments 
>>> may include content that is confidential, privileged 
>>> or otherwise 
>>> protected under applicable law. Unauthorised disclosure, copying, 
>>> distribution 
>>> or use of the contents is prohibited and may be unlawful. If 
>>> you have received this email in error,
>>> including without appropriate 
>>> authorisation, then please reply to the sender about the error 
>>> and delete 
>>> this email and any attachments.
>>> 
>> 



Re: Core reload timeout on Solr 9

2022-12-09 Thread Nick Vladiceanu
tried to enable the -Dsolr.http1=true but it didn’t help. Seeing timeout after 
180s (even without sending any traffic to the cluster) and also noticed 

Caused by: java.util.concurrent.TimeoutException: Total timeout 60 
ms elapsed (stacktrace here https://justpaste.it/29bpv 
)

on some of the nodes. 


Also, spotting errors related to:
o.a.s.c.SolrCore java.lang.IllegalArgumentException: Unknown directory: 
MMapDirectory@/var/solr/data/my_collection_shard3_replica_t1643/data/snapshot_metadata
 (we do not use snapshots at all) (stacktrace https://justpaste.it/88en6 
 )
CoreIsClosedException o.a.s.u.CommitTracker auto commit error...: 
https://justpaste.it/bbbms  
org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException: Error 
from server at null  https://justpaste.it/5nq7b  
(this node is a leader)

From time to time observing in the logs (TLOG replicas across the board) across 
multiple nodes:
WARN  (indexFetcher-120-thread-1) [] o.a.s.h.IndexFetcher File _8ux.cfe did not 
match. expected checksum is 3843994300 and actual is checksum 2148229542. 
expected length is 542 and actual length is 542



> On 5. Dec 2022, at 5:12 PM, Houston Putman  wrote:
> 
> I'm not sure this is the issue, but maybe its http2 vs http1.
> 
> Could you retry with the following set on the cluster?
> 
> -Dsolr.http1=true
> 
> 
> 
> On Mon, Dec 5, 2022 at 5:08 AM Nick Vladiceanu  >
> wrote:
> 
>> Hello folks,
>> 
>> We’re running our SolrCloud cluster in Kubernetes. Recently we’ve upgraded
>> from 8.11 to 9.0 (and eventually to 9.1).
>> 
>> Fully reindexed collections after upgrade, all looking good, no errors,
>> response time improvements are noticed.
>> 
>> We have the following specs:
>> collection size:
>> 22M docs, 1.3Kb doc size; ~28Gb total collection size at this point;
>> shards: 6 shards, each ~4,7Gb; 1 core per node;
>> nodes:
>> 30Gi of RAM,
>> 16 cores
>> 96 nodes
>> Heap: 23Gb heap
>> JavaOpts: -Dsolr.modules=scripting,analysis-extras,ltr”
>> gcTune: -XX:+UseG1GC -XX:G1HeapRegionSize=16m -XX:MaxGCPauseMillis=300
>> -XX:InitiatingHeapOccupancyPercent=75 -XX:+UseLargePages
>> -XX:+ParallelRefProcEnabled -XX:ParallelGCThreads=10 -XX:ConcGCThreads=2
>> -XX:MinHeapFreeRatio=2 -XX:MaxHeapFreeRatio=10
>> 
>> 
>> Problem
>> 
>> The problem we face is when we try to reload the collection, in sync mode
>> we’re getting timed out or forever running task if reload executed in async
>> mode:
>> 
>> curl “reload” output: https://justpaste.it/ap4d2 <
>> https://justpaste.it/ap4d2 >
>> ErrorReportingConcurrentUpdateSolrClient stacktrace (appears in the logs
>> of some nodes): https://justpaste.it/aq3dw  
>> >
>> 
>> There are no issues on a newly created cluster if there is no incoming
>> traffic to it. Once we start sending requests to the cluster, collection
>> reload becomes impossible. Other collections (smaller) within the same
>> cluster are reloading just fine.
>> 
>> In some cases, on some node the Old generation GC is kicking in and makes
>> the entire cluster unstable, however, that doesn’t all the time when
>> collection reload is timing out.
>> 
>> We’ve tried the rollback to 8.11 and everything works normally as it used
>> to be, no errors with reload, no other errors in the logs during reload,
>> etc.
>> 
>> We tried the following:
>> run 9.0, 9.1 on Java 11 and Java 17: same result;
>> lower cache warming, disable firstSearcher queries: same result;
>> increase heap size, tune gc: same result;
>> use apiv1 and apiv2 to issue reload commands: no difference;
>> sync vs async reload: either forever running task or timing out after 180
>> seconds;
>> 
>> Did anyone face similar issues after upgrading to version 9 of Solr? Could
>> you please advice where should we focus our attention while debugging this
>> behavior? Any other advices/suggestions?
>> 
>> Thank you
>> 
>> 
>> Best regards,
>> Nick Vladiceanu



Re: Near Real Time not working as expected

2022-12-09 Thread Matias Laino
Thank you Tomas! This was really useful info, I checked some of my logs for 
today... but it say "Registered new searcher autowarm time: 0 ms"

I'm very confused right now lol, it sounds odd to have 0ms to me.

This is my current Cache connfiguration (I removed the maxWarmingSearchers 
option as a test):












From: Tomás Fernández Löbbe 
Sent: Thursday, December 8, 2022 8:22 PM
To: users@solr.apache.org 
Subject: Re: Near Real Time not working as expected

If you see this warning, then the issue is that your warming is taking too
long. Consider:
Reducing/Removing auto-warm[1]. You may also have static warming with query
listeners[2]? If you have INFO logging enabled in SolrCore it should be
printing something like:

"Registered new searcher autowarm time: X ms"

Check those values, with 1s autoSoftCommit you probably want to have
autowarm time to be as close as 0 as possible.


[1]
https://solr.apache.org/guide/solr/latest/configuration-guide/caches-warming.html
[2]
https://solr.apache.org/guide/solr/latest/configuration-guide/caches-warming.html#query-related-listeners
On Thu, Dec 8, 2022 at 6:51 AM Matias Laino
 wrote:

> Hi Tomas!
> Yes! I saw that message, my original setting for auto warm searchers was
> 2, I increased it to 6 and I was still seeing the message, now it's at to
> 16 and I don't see that ( I went from 6 to 16), but the issue still
> persists.
>
> I havent seen post commit events, where can I look for that ? Sorry, I'm
> relatively novice on configuring solr from scratch.
>
> Thanks in advance!
>
> 
> From: Tomás Fernández Löbbe 
> Sent: Wednesday, December 7, 2022 6:56 PM
> To: users@solr.apache.org 
> Subject: Re: Near Real Time not working as expected
>
> Are you seeing any messages in the logs with "PERFORMANCE WARNING:
> Overlapping onDeckSearchers"? Can you elaborate on the autowarm
> configuration that you have? any "postCommit" events?
>
> If you set the logger of "org.apache.solr.search.SolrIndexSearcher" to
> DEBUG level you should see when the searcher is open and how long it takes
> to warmup.
>
>
> On Wed, Dec 7, 2022 at 9:58 AM Matias Laino
>  wrote:
>
> > I'm sorry but I'm not sure what you mean with metal, our servers are EC2
> > instances if that helps in any way.
> >
> > MATIAS LAINO | DIRECTOR OF PASSARE REMOTE DEVELOPMENT
> > matias.la...@passare.com | +54 11-6357-2143
> >
> >
> > -Original Message-
> > From: Dave 
> > Sent: Wednesday, December 7, 2022 2:40 PM
> > To: users@solr.apache.org
> > Subject: Re: Near Real Time not working as expected
> >
> > Just out of curiosity are you using metal? And if so ran any disk io
> tests
> > to see if you may have a hardware problem on any of the nodes?  A
> document
> > won’t be available until all the nodes have it so it just takes one to
> get
> > slow to slow you down
> >
> > > On Dec 7, 2022, at 9:45 AM, Matias Laino  .invalid>
> > wrote:
> > >
> > > 
> > > Hi all,
> > >
> > > I recently had an issue with very high cpu usage on our Testing
> > SolrCloud cluster when sending data to Solr, I’ve tried several which
> > reduced the usage of CPU, now our testing SolrCloud is under an 8 core
> > machine with 32 gb de RAM (recently changed the heap to 21g as a test).
> > > When we push data to solr, it takes a couple of minutes for that
> > document to be available on search results, I’ve tried everything and
> > cannot find out what is going on, it was working perfectly fine until
> last
> > week when it suddenly started having this delay.
> > >
> > > Our configuration for NRT is very aggressive, 60s of auto commit with
> > open searcher false and 1s for auto soft commit, but it doesn’t matter
> what
> > configuration I try, it will always take a couple of minutes to have the
> > new document available on search results.
> > >
> > > I’ve tried modifying the cache configuration to use Caffeine, tried
> > removing max warming searchers values, tried modifying autoWarmCount to
> > different values and even tried, and still the same issue, it’s almost
> like
> > my configuration doesn’t matter.
> > >
> > > We are using a Solr 8.11 install in SolrCloud mode, 2 nodes, 1
> Zookeeper
> > node. On each node we have 6 collections of around 10-11M records each
> > (numbers didn’t change much before and after this issue started). The
> total
> > amount of disk spaced used is 20.4gb, our heap is now 21gb.
> > >
> > > I’m kind of desperate since I’ll be on vacation starting the end of
> next
> > week and I haven’t been able to find out what is wrong with this, my fear
> > is if this happens to our production server, we won’t be able to know how
> > to fix it other than reinstalling Solr from scratch.
> > >
> > > Our prod server only has 1 collection of 11gb and is under a 4 core
> > servers with 16 gb of ram (8gb heap setup).
> > >
> > > Any help or pointer will be highly appreciated as I’m desperate.
> > >
> > > Thanks in advance!
> > >
> > > MATIAS L

Re: Near Real Time not working as expected

2022-12-09 Thread Matias Laino
Tomas,

I just set the max warming searchers to 4, and I still see 0ms on warming time.

Not sure what else to check.

Thanks in advance!

From: Matias Laino 
Sent: Friday, December 9, 2022 12:10 PM
To: Tomás Fernández Löbbe ; users@solr.apache.org 

Subject: Re: Near Real Time not working as expected

Thank you Tomas! This was really useful info, I checked some of my logs for 
today... but it say "Registered new searcher autowarm time: 0 ms"

I'm very confused right now lol, it sounds odd to have 0ms to me.

This is my current Cache connfiguration (I removed the maxWarmingSearchers 
option as a test):












From: Tomás Fernández Löbbe 
Sent: Thursday, December 8, 2022 8:22 PM
To: users@solr.apache.org 
Subject: Re: Near Real Time not working as expected

If you see this warning, then the issue is that your warming is taking too
long. Consider:
Reducing/Removing auto-warm[1]. You may also have static warming with query
listeners[2]? If you have INFO logging enabled in SolrCore it should be
printing something like:

"Registered new searcher autowarm time: X ms"

Check those values, with 1s autoSoftCommit you probably want to have
autowarm time to be as close as 0 as possible.


[1]
https://solr.apache.org/guide/solr/latest/configuration-guide/caches-warming.html
[2]
https://solr.apache.org/guide/solr/latest/configuration-guide/caches-warming.html#query-related-listeners
On Thu, Dec 8, 2022 at 6:51 AM Matias Laino
 wrote:

> Hi Tomas!
> Yes! I saw that message, my original setting for auto warm searchers was
> 2, I increased it to 6 and I was still seeing the message, now it's at to
> 16 and I don't see that ( I went from 6 to 16), but the issue still
> persists.
>
> I havent seen post commit events, where can I look for that ? Sorry, I'm
> relatively novice on configuring solr from scratch.
>
> Thanks in advance!
>
> 
> From: Tomás Fernández Löbbe 
> Sent: Wednesday, December 7, 2022 6:56 PM
> To: users@solr.apache.org 
> Subject: Re: Near Real Time not working as expected
>
> Are you seeing any messages in the logs with "PERFORMANCE WARNING:
> Overlapping onDeckSearchers"? Can you elaborate on the autowarm
> configuration that you have? any "postCommit" events?
>
> If you set the logger of "org.apache.solr.search.SolrIndexSearcher" to
> DEBUG level you should see when the searcher is open and how long it takes
> to warmup.
>
>
> On Wed, Dec 7, 2022 at 9:58 AM Matias Laino
>  wrote:
>
> > I'm sorry but I'm not sure what you mean with metal, our servers are EC2
> > instances if that helps in any way.
> >
> > MATIAS LAINO | DIRECTOR OF PASSARE REMOTE DEVELOPMENT
> > matias.la...@passare.com | +54 11-6357-2143
> >
> >
> > -Original Message-
> > From: Dave 
> > Sent: Wednesday, December 7, 2022 2:40 PM
> > To: users@solr.apache.org
> > Subject: Re: Near Real Time not working as expected
> >
> > Just out of curiosity are you using metal? And if so ran any disk io
> tests
> > to see if you may have a hardware problem on any of the nodes?  A
> document
> > won’t be available until all the nodes have it so it just takes one to
> get
> > slow to slow you down
> >
> > > On Dec 7, 2022, at 9:45 AM, Matias Laino  .invalid>
> > wrote:
> > >
> > > 
> > > Hi all,
> > >
> > > I recently had an issue with very high cpu usage on our Testing
> > SolrCloud cluster when sending data to Solr, I’ve tried several which
> > reduced the usage of CPU, now our testing SolrCloud is under an 8 core
> > machine with 32 gb de RAM (recently changed the heap to 21g as a test).
> > > When we push data to solr, it takes a couple of minutes for that
> > document to be available on search results, I’ve tried everything and
> > cannot find out what is going on, it was working perfectly fine until
> last
> > week when it suddenly started having this delay.
> > >
> > > Our configuration for NRT is very aggressive, 60s of auto commit with
> > open searcher false and 1s for auto soft commit, but it doesn’t matter
> what
> > configuration I try, it will always take a couple of minutes to have the
> > new document available on search results.
> > >
> > > I’ve tried modifying the cache configuration to use Caffeine, tried
> > removing max warming searchers values, tried modifying autoWarmCount to
> > different values and even tried, and still the same issue, it’s almost
> like
> > my configuration doesn’t matter.
> > >
> > > We are using a Solr 8.11 install in SolrCloud mode, 2 nodes, 1
> Zookeeper
> > node. On each node we have 6 collections of around 10-11M records each
> > (numbers didn’t change much before and after this issue started). The
> total
> > amount of disk spaced used is 20.4gb, our heap is now 21gb.
> > >
> > > I’m kind of desperate since I’ll be on vacation starting the end of
> next
> > week and I haven’t been able to find out what is wrong with this, my fear
> > is if this happens to our production server, we won

How to run many SolrClouds within a single K8s cluster?

2022-12-09 Thread mtn search
Hello,

We have a single SolrCloud running in K8s (dev env) by implementing a
Zookeeper stateful set and a Solr node stateful set.  On startup the Solr
pods connect to the ZK ensemble defined in the ZK stateful set.

If we wanted more SolrClouds in this K8s cluster, would we create more
stateful sets? or within these current stateful sets be able to mark sets
of ZK pods and Solr pods for use by a specific SolrCloud?

Thanks,
Matt


Re: Near Real Time not working as expected

2022-12-09 Thread Michael Gibney
> now it's at to 16 and I don't see that ( I went from 6 to 16), but the issue 
> still persists

Just to clarify, the "overlapping ondeck searchers" went away at 16?
Assuming that the issue that still persists is docs not being visible?

It's tempting to interpret "Registered new searcher autowarm time: 0
ms" as representing the entire time it takes to open a new searcher.
In fact, autowarming (represented by this log msg) is just one aspect
of opening a new searcher. Lucene-level opensearcher can take
non-negligible time (and, somewhat confusingly, happens whether or not
`openSearcher=true` is specified). onDeckSearchers would be
incremented regardless of whether any autowarming is taking place.

Adjusting max overlapping searchers is a good clue that that's where
the problem lies, but that setting is a warning that you're committing
too frequently for the system to handle with its current
configuration. Increasing the maxWarmingSearchers setting is likely to
make the problem worse in some ways, not better. In your case I
suspect you may just need to dial back the commit frequency.

I'm curious, how many replicas (and what type -- NRT, TLOG, PULL?) do
you have per shard, and are you using any special request routing
(i.e., via shards.preferences)?

Michael

On Fri, Dec 9, 2022 at 10:43 AM Matias Laino
 wrote:
>
> Tomas,
>
> I just set the max warming searchers to 4, and I still see 0ms on warming 
> time.
>
> Not sure what else to check.
>
> Thanks in advance!
> 
> From: Matias Laino 
> Sent: Friday, December 9, 2022 12:10 PM
> To: Tomás Fernández Löbbe ; users@solr.apache.org 
> 
> Subject: Re: Near Real Time not working as expected
>
> Thank you Tomas! This was really useful info, I checked some of my logs for 
> today... but it say "Registered new searcher autowarm time: 0 ms"
>
> I'm very confused right now lol, it sounds odd to have 0ms to me.
>
> This is my current Cache connfiguration (I removed the maxWarmingSearchers 
> option as a test):
>
>
>   size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
>size="512"
>   initialSize="512"
>   autowarmCount="0"/>
>
>
> size="512"
>initialSize="512"
>autowarmCount="0"/>
>
> class="solr.search.LRUCache"
>size="10"
>initialSize="0"
>autowarmCount="10"
>regenerator="solr.NoOpRegenerator" />
>
> 
> From: Tomás Fernández Löbbe 
> Sent: Thursday, December 8, 2022 8:22 PM
> To: users@solr.apache.org 
> Subject: Re: Near Real Time not working as expected
>
> If you see this warning, then the issue is that your warming is taking too
> long. Consider:
> Reducing/Removing auto-warm[1]. You may also have static warming with query
> listeners[2]? If you have INFO logging enabled in SolrCore it should be
> printing something like:
>
> "Registered new searcher autowarm time: X ms"
>
> Check those values, with 1s autoSoftCommit you probably want to have
> autowarm time to be as close as 0 as possible.
>
>
> [1]
> https://solr.apache.org/guide/solr/latest/configuration-guide/caches-warming.html
> [2]
> https://solr.apache.org/guide/solr/latest/configuration-guide/caches-warming.html#query-related-listeners
> On Thu, Dec 8, 2022 at 6:51 AM Matias Laino
>  wrote:
>
> > Hi Tomas!
> > Yes! I saw that message, my original setting for auto warm searchers was
> > 2, I increased it to 6 and I was still seeing the message, now it's at to
> > 16 and I don't see that ( I went from 6 to 16), but the issue still
> > persists.
> >
> > I havent seen post commit events, where can I look for that ? Sorry, I'm
> > relatively novice on configuring solr from scratch.
> >
> > Thanks in advance!
> >
> > 
> > From: Tomás Fernández Löbbe 
> > Sent: Wednesday, December 7, 2022 6:56 PM
> > To: users@solr.apache.org 
> > Subject: Re: Near Real Time not working as expected
> >
> > Are you seeing any messages in the logs with "PERFORMANCE WARNING:
> > Overlapping onDeckSearchers"? Can you elaborate on the autowarm
> > configuration that you have? any "postCommit" events?
> >
> > If you set the logger of "org.apache.solr.search.SolrIndexSearcher" to
> > DEBUG level you should see when the searcher is open and how long it takes
> > to warmup.
> >
> >
> > On Wed, Dec 7, 2022 at 9:58 AM Matias Laino
> >  wrote:
> >
> > > I'm sorry but I'm not sure what you mean with metal, our servers are EC2
> > > instances if that helps in any way.
> > >
> > > MATIAS LAINO | DIRECTOR OF PASSARE REMOTE DEVELOPMENT
> > > matias.la...@passare.com | +54 11-6357-2143
> > >
> > >
> > > -Original Message-
> > > From: Dave 
> > > Sent: Wednesday, December 7, 2022 2:40 PM
> > > To: users@solr.apache.org
> > > Subject: Re: Near Real Time not working as expected
> > >
> > > Just out 

Using the fq parameter to filter for a value that is multivalued field.

2022-12-09 Thread Matthew Castrigno
I am having trouble using the fq parameter to filter for a value that is in a 
muilt-valued field.

This works:
"myField":["apple"]

fq=myField:"apple"

document is returned


This does not work:
"myField":["apple, pear"]

fq=myField:"apple"

document is NOT returned

What do I need to do get fq to find a value in a multi-valued field?

Thank you!


--
"This message is intended for the use of the person or entity to which it is 
addressed and may contain information that is confidential or privileged, the 
disclosure of which is governed by applicable law. If the reader of this 
message is not the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this information is strictly 
prohibited. If you have received this message by error, please notify us 
immediately and destroy the related message."


Re: Using the fq parameter to filter for a value that is multivalued field.

2022-12-09 Thread Dave
"apple, pear"

That looks like a string not a multi valued field to me. Maybe I’m wrong but 
you should have quotes around each element of the array 

> On Dec 9, 2022, at 12:23 PM, Matthew Castrigno  wrote:
> 
> "apple, pear"


Re: Using the fq parameter to filter for a value that is multivalued field.

2022-12-09 Thread Andy Lester



> On Dec 9, 2022, at 11:22 AM, Matthew Castrigno  wrote:
> 
> "myField":["apple, pear"]


That's not multivalued.  That's a single value and the value is "apple, pear".

You need to pass multiple values to Solr for the field when you do your 
indexing.  Basically, you need to pass one myField:apple and another 
myField:pear to the indexer when you add records.

Andy

Re: Near Real Time not working as expected

2022-12-09 Thread Matias Laino
Hi Michael! Thanks for helping.

> Just to clarify, the "overlapping ondeck searchers" went away at 16?
Assuming that the issue that still persists is docs not being visible?

Not really, after that I still saw the max overlapping searchers issues up to 6 
(even though config was set to 16 which is odd, but I guess maybe I uploaded 
the wrong config between tries)

Right now I have it at 4, and only saw the warning for one of our collections, 
but still the other collections take a long time for results to show up.

> Adjusting max overlapping searchers is a good clue that that's where
the problem lies, but that setting is a warning that you're committing
too frequently for the system to handle with its current
configuration. Increasing the maxWarmingSearchers setting is likely to
make the problem worse in some ways, not better. In your case I
suspect you may just need to dial back the commit frequency.

AutoSoftCommit right now is at 5s and auto commit is at 60s.
Maybe the fact that we have 6 collections is makig this worse? We have 6 
different environments pointing to the same solr cluster, each collection 
represent 1 environment. As far as I uderstand, we are not doing any commits 
from our app, we are relying on Solr to do the auto soft commits.

> I'm curious, how many replicas (and what type -- NRT, TLOG, PULL?) do
you have per shard, and are you using any special request routing
(i.e., via shards.preferences)?
 I'm not sure where to get the type -- NRT, TLOG,etc.

We have 6 collections, 2 shards each, with 2 replicas each.

Total disk amount is 22gb.


From: Michael Gibney 
Sent: Friday, December 9, 2022 1:25 PM
To: users@solr.apache.org 
Subject: Re: Near Real Time not working as expected

> now it's at to 16 and I don't see that ( I went from 6 to 16), but the issue 
> still persists

Just to clarify, the "overlapping ondeck searchers" went away at 16?
Assuming that the issue that still persists is docs not being visible?

It's tempting to interpret "Registered new searcher autowarm time: 0
ms" as representing the entire time it takes to open a new searcher.
In fact, autowarming (represented by this log msg) is just one aspect
of opening a new searcher. Lucene-level opensearcher can take
non-negligible time (and, somewhat confusingly, happens whether or not
`openSearcher=true` is specified). onDeckSearchers would be
incremented regardless of whether any autowarming is taking place.

Adjusting max overlapping searchers is a good clue that that's where
the problem lies, but that setting is a warning that you're committing
too frequently for the system to handle with its current
configuration. Increasing the maxWarmingSearchers setting is likely to
make the problem worse in some ways, not better. In your case I
suspect you may just need to dial back the commit frequency.

I'm curious, how many replicas (and what type -- NRT, TLOG, PULL?) do
you have per shard, and are you using any special request routing
(i.e., via shards.preferences)?

Michael

On Fri, Dec 9, 2022 at 10:43 AM Matias Laino
 wrote:
>
> Tomas,
>
> I just set the max warming searchers to 4, and I still see 0ms on warming 
> time.
>
> Not sure what else to check.
>
> Thanks in advance!
> 
> From: Matias Laino 
> Sent: Friday, December 9, 2022 12:10 PM
> To: Tomás Fernández Löbbe ; users@solr.apache.org 
> 
> Subject: Re: Near Real Time not working as expected
>
> Thank you Tomas! This was really useful info, I checked some of my logs for 
> today... but it say "Registered new searcher autowarm time: 0 ms"
>
> I'm very confused right now lol, it sounds odd to have 0ms to me.
>
> This is my current Cache connfiguration (I removed the maxWarmingSearchers 
> option as a test):
>
>
>   size="512"
>  initialSize="512"
>  autowarmCount="0"/>
>
>size="512"
>   initialSize="512"
>   autowarmCount="0"/>
>
>
> size="512"
>initialSize="512"
>autowarmCount="0"/>
>
> class="solr.search.LRUCache"
>size="10"
>initialSize="0"
>autowarmCount="10"
>regenerator="solr.NoOpRegenerator" />
>
> 
> From: Tomás Fernández Löbbe 
> Sent: Thursday, December 8, 2022 8:22 PM
> To: users@solr.apache.org 
> Subject: Re: Near Real Time not working as expected
>
> If you see this warning, then the issue is that your warming is taking too
> long. Consider:
> Reducing/Removing auto-warm[1]. You may also have static warming with query
> listeners[2]? If you have INFO logging enabled in SolrCore it should be
> printing something like:
>
> "Registered new searcher autowarm time: X ms"
>
> Check those values, with 1s autoSoftCommit you probably want to have
> autowarm time to be as close as 0 as possible.
>
>
> [1]
> https://solr.apache.org/

Re: Using the fq parameter to filter for a value that is multivalued field.

2022-12-09 Thread Walter Underwood
If you want apple OR pear, use:

myField:apple myField:pear

If you want apple AND pear, use:

+myField:apple +myField:pear

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 9, 2022, at 9:22 AM, Matthew Castrigno  wrote:
> 
> I am having trouble using the fq parameter to filter for a value that is in a 
> muilt-valued field.
> 
> This works:
> "myField":["apple"]
> 
> fq=myField:"apple"
> 
> document is returned
> 
> 
> This does not work:
> "myField":["apple, pear"]
> 
> fq=myField:"apple"
> 
> document is NOT returned
> 
> What do I need to do get fq to find a value in a multi-valued field?
> 
> Thank you!
> 
> 
> --
> "This message is intended for the use of the person or entity to which it is 
> addressed and may contain information that is confidential or privileged, the 
> disclosure of which is governed by applicable law. If the reader of this 
> message is not the intended recipient, you are hereby notified that any 
> dissemination, distribution, or copying of this information is strictly 
> prohibited. If you have received this message by error, please notify us 
> immediately and destroy the related message."



Re: How to run many SolrClouds within a single K8s cluster?

2022-12-09 Thread Shawn Heisey

On 12/9/22 09:23, mtn search wrote:

We have a single SolrCloud running in K8s (dev env) by implementing a
Zookeeper stateful set and a Solr node stateful set.  On startup the Solr
pods connect to the ZK ensemble defined in the ZK stateful set.

If we wanted more SolrClouds in this K8s cluster, would we create more
stateful sets? or within these current stateful sets be able to mark sets
of ZK pods and Solr pods for use by a specific SolrCloud?


I know pretty much nothing about k8s or the solr operator, so from that 
perspective I can't answer your question.


I can tell you one thing ... you can have many clusters all using the 
same ZK ensemble.  For multiple SolrCloud clusters using the same ZK 
setup, just add a chroot to the zkhost string used for each cluster.  
You can use any name you like, these are just examples:


zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181/solr1
zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181/solr2
zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181/solr3

I would recommend a chroot for any SolrCloud cluster set up for high 
availability with more than one Solr host.


Thanks,
Shawn



Re: Using the fq parameter to filter for a value that is multivalued field.

2022-12-09 Thread Matthew Castrigno
Thank you for your comments that appears to be the root of the problem.

Fixing it raises another question.

The incorrect multivalued fields were being created with a script line:
doc.addField("facets_ss", contentObject.Page.Facets.join(","));

When I try to fix that with:
doc.addField("facets_ss", contentObject.Page.Facets.join('","'));
or
doc.addField("facets_ss", contentObject.Page.Facets.join("\",\""));

SOLR is adding escape characters that I do not want:
facets_ss":["Blogs\",\"Article"]

Instead of
facets_ss":["Blogs","Article"]

How do I get SOPLR to not add the \ ?

Thank you!




From: Walter Underwood 
Sent: Friday, December 9, 2022 10:42 AM
To: users@solr.apache.org 
Subject: Re: Using the fq parameter to filter for a value that is multivalued 
field.

If you want apple OR pear, use: myField: apple myField: pear If you want apple 
AND pear, use: +myField: apple +myField: pear wunder Walter Underwood wunder@ 
wunderwood. org https: //urldefense. com/v3/__http: //observer. wunderwood. 
org/__;!!FkC3_z_N!LRDNStCCNvSo41v6qHKPwTuNZI6crIfaKdZYFvi5GOnrHPe-Mydnv-DIN9QR0Ljh37Du8EPZPnVx0u66$
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside the St. Luke's email system.

ZjQcmQRYFpfptBannerEnd

If you want apple OR pear, use:

myField:apple myField:pear

If you want apple AND pear, use:

+myField:apple +myField:pear

wunder
Walter Underwood
wun...@wunderwood.org
https://urldefense.com/v3/__http://observer.wunderwood.org/__;!!FkC3_z_N!LRDNStCCNvSo41v6qHKPwTuNZI6crIfaKdZYFvi5GOnrHPe-Mydnv-DIN9QR0Ljh37Du8EPZPnVx0u66$
  (my blog)

> On Dec 9, 2022, at 9:22 AM, Matthew Castrigno  wrote:
>
> I am having trouble using the fq parameter to filter for a value that is in a 
> muilt-valued field.
>
> This works:
> "myField":["apple"]
>
> fq=myField:"apple"
>
> document is returned
>
>
> This does not work:
> "myField":["apple, pear"]
>
> fq=myField:"apple"
>
> document is NOT returned
>
> What do I need to do get fq to find a value in a multi-valued field?
>
> Thank you!
>
>
> --
> "This message is intended for the use of the person or entity to which it is 
> addressed and may contain information that is confidential or privileged, the 
> disclosure of which is governed by applicable law. If the reader of this 
> message is not the intended recipient, you are hereby notified that any 
> dissemination, distribution, or copying of this information is strictly 
> prohibited. If you have received this message by error, please notify us 
> immediately and destroy the related message."



--
"This message is intended for the use of the person or entity to which it is 
addressed and may contain information that is confidential or privileged, the 
disclosure of which is governed by applicable law. If the reader of this 
message is not the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this information is strictly 
prohibited. If you have received this message by error, please notify us 
immediately and destroy the related message."


Re: Using the fq parameter to filter for a value that is multivalued field.

2022-12-09 Thread Dave
Try adding each value separately. Not joined in code, let solr do the 
multivalue work, 

> On Dec 9, 2022, at 1:11 PM, Matthew Castrigno  wrote:
> 
> 
> Thank you for your comments that appears to be the root of the problem.
> 
> Fixing it raises another question.
> 
> The incorrect multivalued fields were being created with a script line:
> doc.addField("facets_ss", contentObject.Page.Facets.join(","));
> 
> When I try to fix that with:
> doc.addField("facets_ss", contentObject.Page.Facets.join('","'));
> or
> doc.addField("facets_ss", contentObject.Page.Facets.join("\",\""));
> 
> SOLR is adding escape characters that I do not want:
> facets_ss":["Blogs\",\"Article"]
> 
> Instead of 
> facets_ss":["Blogs","Article"]
> 
> How do I get SOPLR to not add the \ ?
> 
> Thank you!
> 
> 
> 
> From: Walter Underwood 
> Sent: Friday, December 9, 2022 10:42 AM
> To: users@solr.apache.org 
> Subject: Re: Using the fq parameter to filter for a value that is multivalued 
> field.
>  
> This Message Is From an External Sender
> This message came from outside the St. Luke's email system.
> If you want apple OR pear, use:
> 
> myField:apple myField:pear
> 
> If you want apple AND pear, use:
> 
> +myField:apple +myField:pear
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> https://urldefense.com/v3/__http://observer.wunderwood.org/__;!!FkC3_z_N!LRDNStCCNvSo41v6qHKPwTuNZI6crIfaKdZYFvi5GOnrHPe-Mydnv-DIN9QR0Ljh37Du8EPZPnVx0u66$
>   (my blog)
> 
> > On Dec 9, 2022, at 9:22 AM, Matthew Castrigno  wrote:
> > 
> > I am having trouble using the fq parameter to filter for a value that is in 
> > a muilt-valued field.
> > 
> > This works:
> > "myField":["apple"]
> > 
> > fq=myField:"apple"
> > 
> > document is returned
> > 
> > 
> > This does not work:
> > "myField":["apple, pear"]
> > 
> > fq=myField:"apple"
> > 
> > document is NOT returned
> > 
> > What do I need to do get fq to find a value in a multi-valued field?
> > 
> > Thank you!
> > 
> > 
> > --
> > "This message is intended for the use of the person or entity to which it 
> > is addressed and may contain information that is confidential or 
> > privileged, the disclosure of which is governed by applicable law. If the 
> > reader of this message is not the intended recipient, you are hereby 
> > notified that any dissemination, distribution, or copying of this 
> > information is strictly prohibited. If you have received this message by 
> > error, please notify us immediately and destroy the related message."
> 
> "This message is intended for the use of the person or entity to which it is 
> addressed and may contain information that is confidential or privileged, the 
> disclosure of which is governed by applicable law. If the reader of this 
> message is not the intended recipient, you are hereby notified that any 
> dissemination, distribution, or copying of this information is strictly 
> prohibited. If you have received this message by error, please notify us 
> immediately and destroy the related message."


Re: Using the fq parameter to filter for a value that is multivalued field.

2022-12-09 Thread Matthew Castrigno
Thanks Dave. That seems to work.

From: Dave 
Sent: Friday, December 9, 2022 11:19 AM
To: Matthew Castrigno 
Cc: users@solr.apache.org ; a...@petdance.com 

Subject: Re: Using the fq parameter to filter for a value that is multivalued 
field.

Try adding each value separately. Not joined in code, let solr do the 
multivalue work, On Dec 9, 2022, at 1: 11 PM, Matthew Castrigno  wrote:  Thank you for your comments that appears to be the root of 
the problem. 
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside the St. Luke's email system.

ZjQcmQRYFpfptBannerEnd
Try adding each value separately. Not joined in code, let solr do the 
multivalue work,

On Dec 9, 2022, at 1:11 PM, Matthew Castrigno  wrote:


Thank you for your comments that appears to be the root of the problem.

Fixing it raises another question.

The incorrect multivalued fields were being created with a script line:
doc.addField("facets_ss", contentObject.Page.Facets.join(","));

When I try to fix that with:
doc.addField("facets_ss", contentObject.Page.Facets.join('","'));
or
doc.addField("facets_ss", contentObject.Page.Facets.join("\",\""));

SOLR is adding escape characters that I do not want:
facets_ss":["Blogs\",\"Article"]

Instead of
facets_ss":["Blogs","Article"]

How do I get SOPLR to not add the \ ?

Thank you!




From: Walter Underwood 
Sent: Friday, December 9, 2022 10:42 AM
To: users@solr.apache.org 
Subject: Re: Using the fq parameter to filter for a value that is multivalued 
field.

If you want apple OR pear, use: myField: apple myField: pear If you want apple 
AND pear, use: +myField: apple +myField: pear wunder Walter Underwood wunder@ 
wunderwood. org https: //urldefense. com/v3/__http: //observer. wunderwood. 
org/__;!!FkC3_z_N!LRDNStCCNvSo41v6qHKPwTuNZI6crIfaKdZYFvi5GOnrHPe-Mydnv-DIN9QR0Ljh37Du8EPZPnVx0u66$
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside the St. Luke's email system.

ZjQcmQRYFpfptBannerEnd

If you want apple OR pear, use:

myField:apple myField:pear

If you want apple AND pear, use:

+myField:apple +myField:pear

wunder
Walter Underwood
wun...@wunderwood.org
https://urldefense.com/v3/__http://observer.wunderwood.org/__;!!FkC3_z_N!LRDNStCCNvSo41v6qHKPwTuNZI6crIfaKdZYFvi5GOnrHPe-Mydnv-DIN9QR0Ljh37Du8EPZPnVx0u66$
  (my blog)

> On Dec 9, 2022, at 9:22 AM, Matthew Castrigno  wrote:
>
> I am having trouble using the fq parameter to filter for a value that is in a 
> muilt-valued field.
>
> This works:
> "myField":["apple"]
>
> fq=myField:"apple"
>
> document is returned
>
>
> This does not work:
> "myField":["apple, pear"]
>
> fq=myField:"apple"
>
> document is NOT returned
>
> What do I need to do get fq to find a value in a multi-valued field?
>
> Thank you!
>
>
> --
> "This message is intended for the use of the person or entity to which it is 
> addressed and may contain information that is confidential or privileged, the 
> disclosure of which is governed by applicable law. If the reader of this 
> message is not the intended recipient, you are hereby notified that any 
> dissemination, distribution, or copying of this information is strictly 
> prohibited. If you have received this message by error, please notify us 
> immediately and destroy the related message."




"This message is intended for the use of the person or entity to which it is 
addressed and may contain information that is confidential or privileged, the 
disclosure of which is governed by applicable law. If the reader of this 
message is not the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this information is strictly 
prohibited. If you have received this message by error, please notify us 
immediately and destroy the related message."

--
"This message is intended for the use of the person or entity to which it is 
addressed and may contain information that is confidential or privileged, the 
disclosure of which is governed by applicable law. If the reader of this 
message is not the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this information is strictly 
prohibited. If you have received this message by error, please notify us 
immediately and destroy the related message."


Re: Using the fq parameter to filter for a value that is multivalued field.

2022-12-09 Thread David Hastings
Of course.  Also, remember there are things to consider like if you want to
store/retrieve it as an absolute string including capitalization for a
facet/Drop down selection or only as a search field. lots of nuances.
-Dave


JVM threads and heap issue due to filtercache

2022-12-09 Thread Dominique Bejean
Hi,

I have a huge sharded collection. Each shard contains 100 millions docs.

Queries are using at maximum 20 filter queries. 10 are very often used and
the other not often.
filterCache size is 10 and autoWarmCount is 10.
filterCache statistics are very good except warmupTime is a little long.

"CACHE.searcher.filterCache":{
"lookups":4048,
"hits":4048,
"cumulative_evictions":52,
"size":10,
"hitratio":1.0,
"evictions":0,
"cumulative_lookups":2048362,
"cumulative_hitratio":1.0,
"warmupTime":13956,
"inserts":0,
"cumulative_inserts":146,
"cumulative_hits":2048049},

autoSoftCommit maxtime is 6
autoCommit maxtime si 30

softCommits occur nearly each one to 3 minutes according when updates occur.

At some moment (once a day), there is suddenly a threads count spike with
more than 150 threads executing the methods building and populating the
filter cache.

java.lang.Thread.State: RUNNABLE
at org.apache.lucene.search.ConjunctionDISI.doNext(ConjunctionDISI.java:200)
at
org.apache.lucene.search.ConjunctionDISI.nextDoc(ConjunctionDISI.java:240)
at
org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:261)
at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:214)
at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:670)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:471)
at
org.apache.solr.search.DocSetUtil.createDocSetGeneric(DocSetUtil.java:151)
at org.apache.solr.search.DocSetUtil.createDocSet(DocSetUtil.java:140)
at
org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1178)
at
org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:818)

Each thread uses 36 Mb in the heap, so this makes the heap size grow
suddenly and consecutive full GCs append.

I don't understand how it is possible that in the minute following a
softCommit 150 queries try to build filter caches whereas at maximum 10
distinct filter queries were executed..

There is no newSearcher or firstSearcher listener configured
useColdSearcher is set to false

I think I need to configure the firstSearcher listener in order to avoid
issues at solr startup, but I don't need to configure the newSearcher
listener because the 10 most used filter caches are autoWarmed.

Any suggestions ?

Thank you

Dominique