Re: Too Many Searcher Opening Events

2021-04-28 Thread Emir Arnautović
Hi Ronen,
If you think that it is unintentional explicit commit that is causing this, you 
can disable explicit commit with IgnoreCommitOptimizeUpdateProcessorFactory.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Apr 2021, at 19:07, Ronen Nussbaum  wrote:
> 
> Hi Everyone,
> 
> I have a cluster of seven servers, running Solr 8.3.0
> Collection is divided into 64 shards, each shard has a replica.
> Total number of documents: ~700M, but most are nested (childs) so an
> effective number is 20M parents.
> Ingestion is quite heavy.
> Auto commit is configured like this:
> 
>   
>   ${solr.autoCommit.maxTime:6}
>   ${solr.autoCommit.maxDocs:5}
>   *false*
> 
> 
> 
>   
>   ${solr.autoSoftCommit.maxTime:30}
> 
> 
> I'm trying to understand why there are so many "SolrIndexSearcher Opening"
> events in the log e.g.
> [2021-04-19T14:45:27.019] INFO [qtp1686100174-260205]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@3fae69f8[1602350_shard46_replica_n182]
> realtime]
> [2021-04-19T14:45:27.061] INFO [qtp1686100174-258896]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@2a47a89c[1602350_shard46_replica_n182]
> realtime]
> [2021-04-19T14:45:37.193] INFO [qtp1686100174-256821]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@3bf060ea[1602350_shard46_replica_n182]
> realtime]
> [2021-04-19T14:45:41.284] INFO [qtp1686100174-258269]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@2b18321b[1602350_shard46_replica_n182]
> realtime]
> [2021-04-19T14:46:02.238] INFO [qtp1686100174-258858]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@76f4935f[1602350_shard46_replica_n182]
> realtime]
> [2021-04-19T14:46:07.248] INFO [qtp1686100174-256407]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@f086b3a[1602350_shard46_replica_n182]
> realtime]
> [2021-04-19T14:46:16.609] INFO [qtp1686100174-257476]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@15b79751[1602350_shard46_replica_n182]
> realtime]
> [2021-04-19T14:46:29.856] INFO [qtp1686100174-259689]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@bf0a783[1602350_shard46_replica_n182]
> realtime]
> [2021-04-19T14:46:56.211] INFO [qtp1686100174-257346]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@43d22ad5[1602350_shard46_replica_n182]
> realtime]
> [2021-04-19T14:47:06.972] INFO [qtp1686100174-256721]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@1779ccd1[1602350_shard46_replica_n182]
> realtime]
> [2021-04-19T14:47:21.089] INFO [qtp1686100174-259395]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@368b2cfb[1602350_shard46_replica_n182]
> realtime]
> [2021-04-19T14:47:44.583] INFO [qtp1686100174-256722]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@11afa0d8[1602350_shard46_replica_n182]
> realtime]
> [2021-04-19T14:47:54.912] INFO [qtp1686100174-256157]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@38cb7e42[1602350_shard46_replica_n182]
> realtime]
> [2021-04-19T14:48:14.520] INFO [qtp1686100174-258515]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@479d4204[1602350_shard46_replica_n182]
> realtime]
> [2021-04-19T14:48:18.961] INFO [qtp1686100174-253862]
> org.apache.solr.search.SolrIndexSearcher Opening
> [Searcher@164a03a6[1602350_shard46_replica_n182]
> realtime]
> 
> Between 00:00 and 17:00 (17 hours) I have ~1500 lines like the above, and
> ~800 lines "registered new searcher".
> This time period is ~1000 minutes so I was expecting 1000/5=200 events
> (soft commit each 5 minutes).
> This doesn't look good to me.
> Could it be affected by clients submitting a commit request?
> Should I use a different configuration?
> 
> Thanks in advance,
> Ronen.



All replicas DOWN after Rebalance leaders | solrcloud v8.7.0

2021-04-28 Thread Mohsin Beg
Hello,

On solr v8.7.0 we consistently experience shards with all replicas in DOWN 
state after a node restart and shard rebalanceleaders.

During rebalance we stop all /update requests to all shard.

Once this issue happens, we see “No Servers Hosting Shard” error in solr log 
and only remedy is to manually validate if shard replicas have same index 
segment and files, and then delete the tlog files.

We also found that if we periodically invoke core reload on non-leader replicas 
during indexing, the tlog files on replicas don’t grow huge (100GB+) and number 
of DOWN shards after node restart is less.

We’re not sure if this is a bug or a config problem and would like your help. 
Thank you.


==> solr config

All replicas are TLOG replicas for all shards in the collection
Hard commit is 10sec and soft commit is 5 sec
updatelog numVersionBuckets is 65536. No other value is set for updateLog


==> solr log messages

2021-04-27 09:24:03.556 INFO  (qtp967677821-339) [c:mycollection s:1_c_e 
r:core_node1284 x:mycollection_1_c_e_replica_t1281] o.a.s.c.ZkController 
mycollection_1_c_e_replica_t1281 starting background replication from leader
mydata-solr-1

2021-04-27 09:24:03.556 INFO  (qtp967677821-339) [c:mycollection s:1_c_e 
r:core_node1284 x:mycollection_1_c_e_replica_t1281] o.a.s.c.ZkController 
mycollection_1_c_e_replica_t1281 stopping background replication from leader
mydata-solr-1

2021-04-27 09:24:03.553 INFO  (qtp967677821-339) [c:mycollection s:1_c_e 
r:core_node1284 x:mycollection_1_c_e_replica_t1281] 
o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral leader 
parent node, won't remove previous leader registration.
mydata-solr-1

2021-04-27 09:24:03.779 INFO  (zkCallback-14-thread-7) [c:mycollection s:1_c_e 
r:core_node1284 x:mycollection_1_c_e_replica_t1281] 
o.a.s.c.ShardLeaderElectionContext I may be the new leader - try and sync
mydata-solr-1

2021-04-27 09:24:10.027 ERROR (zkCallback-14-thread-7) [c:mycollection s:1_c_e 
r:core_node1284 x:mycollection_1_c_e_replica_t1281] o.a.s.u.PeerSync PeerSync: 
core=mycollection_1_c_e_replica_t1281 
url=http://mydata-solr-1.mydata-solr:8983/solr  Requested 37 updates from 
http://mydata-solr-3.mydata-solr:8983/solr/mycollection_1_c_e_replica_t1282/ 
but retrieved 28
mydata-solr-1

2021-04-27 09:24:10.027 INFO  (zkCallback-14-thread-7) [c:mycollection s:1_c_e 
r:core_node1284 x:mycollection_1_c_e_replica_t1281] o.a.s.c.SyncStrategy 
Leader's attempt to sync with shard failed, moving to the next candidate
mydata-solr-1

2021-04-27 09:24:10.027 INFO  (zkCallback-14-thread-7) [c:mycollection s:1_c_e 
r:core_node1284 x:mycollection_1_c_e_replica_t1281] o.a.s.u.PeerSync PeerSync: 
core=mycollection_1_c_e_replica_t1281 
url=http://mydata-solr-1.mydata-solr:8983/solr  DONE. sync failed
mydata-solr-1


==> solr log message that we see  continuously repeating

2021-04-27 10:00:57.583 ERROR (zkCallback-14-thread-17) [c:mycollection s:1_c_e 
r:core_node1284 x:mycollection_1_c_e_replica_t1281] o.a.s.u.PeerSync PeerSync: 
core=mycollection_1_c_e_replica_t1281 
url=http://mydata-solr-1.mydata-solr:8983/solr  Requested 37 updates from 
http://mydata-solr-3.mydata-solr:8983/solr/mycollection_1_c_e_replica_t1282/ 
but retrieved 28
mydata-solr-1

2021-04-27 10:01:01.244 ERROR (zkCallback-14-thread-43) [c:mycollection s:1_c_e 
r:core_node1285 x:mycollection_1_c_e_replica_t1282] o.a.s.u.PeerSync PeerSync: 
core=mycollection_1_c_e_replica_t1282 
url=http://mydata-solr-3.mydata-solr:8983/solr  Requested 30 updates from 
http://mydata-solr-1.mydata-solr:8983/solr/mycollection_1_c_e_replica_t1281/ 
but retrieved 24
mydata-solr-3

2021-04-27 10:01:04.352 ERROR (zkCallback-14-thread-15) [c:mycollection s:1_c_e 
r:core_node1286 x:mycollection_1_c_e_replica_t1283] o.a.s.u.PeerSync PeerSync: 
core=mycollection_1_c_e_replica_t1283 
url=http://mydata-solr-0.mydata-solr:8983/solr  Requested 41 updates from 
http://mydata-solr-3.mydata-solr:8983/solr/mycollection_1_c_e_replica_t1282/ 
but retrieved 32
mydata-solr-0



Problem with eDisMax and multi-word synonyms

2021-04-28 Thread Ere Maijala

Hi,

Here's one that I can't wrap my head around. The main question is: why 
are the search terms treated differently in eDisMax if the query expands 
to a multi-word synonym, and there are different field types and q.op=AND?


This gets complicated quickly, so I tried to reproduce the results with 
the techproducts example:


1. Start with vanilla Solr 8.8.2

2. echo "cor => Corsair" >> 
server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt


4. echo "cmi => Corsair Microsystems" >> 
server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt


4. bin/solr start -e techproducts


Now, a basic query that works fine produces 2 results:

http://localhost:8983/solr/techproducts/select?q=corsair+microsystems+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND

But if I use the synonym, I don't get any results:

http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND

If I leave cat field out, however, I get 2 results:

http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu&q.op=AND

Also if leave q.op out and add AND between the terms, I get 2 results 
even with the cat field:


http://localhost:8983/solr/techproducts/select?q=cmi+AND+memory&debugQuery=true&defType=edismax&qf=name+manu+cat

The single-word synonym works just fine:

http://localhost:8983/solr/techproducts/select?q=cor+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND


Can anyone shine a light on what's happening here?

Additional notes:

1. This is a simplified example, and the real-world case is much more 
complicated. It has our custom class create the synonyms for compound 
words in Finnish, and the queries come from users.


2. As far as I can see mm doesn't affect the results in any meaningful 
way, but I just might be doing something wrong.


3. I included the debugQuery parameter so that it's easy to see how 
different the queries become.


Best Regards,
Ere

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland


Re: Problem with eDisMax and multi-word synonyms

2021-04-28 Thread Markus Jelsma
Hello Ere,

The q.op parameter is not a dismax parameter. instead i think you are being
bitten bij de mm parameter [1] which by default is 100%, meaning all terms
must match. Multi word synonym handing and mm are not a very intuitive
match, and can lead to crazy problems. Also beware of mm and stopword
handling and check out mm.autoRelax [2]. But it is best not to use
stopwords at all.

Check it out,
Markus

[1] https://solr.apache.org/guide/6_6/the-dismax-query-parser.html
[2] https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html

Op wo 28 apr. 2021 om 15:02 schreef Ere Maijala :

> Hi,
>
> Here's one that I can't wrap my head around. The main question is: why
> are the search terms treated differently in eDisMax if the query expands
> to a multi-word synonym, and there are different field types and q.op=AND?
>
> This gets complicated quickly, so I tried to reproduce the results with
> the techproducts example:
>
> 1. Start with vanilla Solr 8.8.2
>
> 2. echo "cor => Corsair" >>
> server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt
>
> 4. echo "cmi => Corsair Microsystems" >>
> server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt
>
> 4. bin/solr start -e techproducts
>
>
> Now, a basic query that works fine produces 2 results:
>
>
> http://localhost:8983/solr/techproducts/select?q=corsair+microsystems+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND
>
> But if I use the synonym, I don't get any results:
>
>
> http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND
>
> If I leave cat field out, however, I get 2 results:
>
>
> http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu&q.op=AND
>
> Also if leave q.op out and add AND between the terms, I get 2 results
> even with the cat field:
>
>
> http://localhost:8983/solr/techproducts/select?q=cmi+AND+memory&debugQuery=true&defType=edismax&qf=name+manu+cat
>
> The single-word synonym works just fine:
>
>
> http://localhost:8983/solr/techproducts/select?q=cor+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND
>
>
> Can anyone shine a light on what's happening here?
>
> Additional notes:
>
> 1. This is a simplified example, and the real-world case is much more
> complicated. It has our custom class create the synonyms for compound
> words in Finnish, and the queries come from users.
>
> 2. As far as I can see mm doesn't affect the results in any meaningful
> way, but I just might be doing something wrong.
>
> 3. I included the debugQuery parameter so that it's easy to see how
> different the queries become.
>
> Best Regards,
> Ere
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland
>


Re: question to Collection Aliasing for Time series data

2021-04-28 Thread Gus Heck
Thinking a little longer, and remembering the code a little more, I
suspect removing middle collections probably won't actually error (assuming
you actually wanted gaps in your time series and that you don't consider
that an error), but any new data that would have hit the removed collection
it would then be added to the next oldest one which will create a funny
double-wide "time slice". The other thing I should have mentioned is be
careful not to change the order of the collections. That will cause
problems because the search for the destination collection iterates the
collection list until it finds a collection with a start time (name) older
than the document. If you moved the first one it might try to re-create it
with the same name and start failing very noisily for every update...

On Mon, Apr 26, 2021 at 11:35 AM Gus Heck  wrote:

> If I understand your question correctly you have
>
>  1. A Time Routed Alias (TRA) already created
>  2. Data already indexed into the TRA .
>
> You now want to delete (presumably) the oldest collection referenced by
> this alias?
>
> There are two options here.
>
>  1. If you will always want to delete collections older than X one is to
> set the router.autoDeleteAge parameter for your TRA (see
> https://solr.apache.org/guide/8_8/collection-aliasing.html#time-routed-alias-parameters
> ), just re-send the original create command with the added parameter
>  2. If this is a one time thing (i.e. a mistake or something that requires
> human approval for business reasons) you can issue CREATEALIAS with a
> collections list omitting the oldest collections to remove them from the
> alias.
>
> WARNING: You MUST NOT remove middle or newest collections, since new data
> will recreate the newer ones (or fail if maxFutureMS is too small for it)
> and everything breaks down (documents going to wrong collections or errors)
> if there is a gap in the middle of a TRA. Once the tail collections are
> removed it is safe to delete them as with any normal collection.
>
>
>
> On Mon, Apr 26, 2021 at 10:01 AM Polzer, Christian <
> christian.pol...@atos.net> wrote:
>
>> Hello,
>>
>>
>>
>> We plan to use Solr8.8 and the Collection Aliasing for Time series data.
>> Unfortunately, I have nothing found to remove a collection from a
>> Collection Aliasing that it can be removed/deleted afterwards.
>>
>>
>>
>> My question: how can I remove a collection from a Collection Aliasing or
>> what is the best way to delete the referenced collection?
>>
>>
>>
>>
>>
>> Kind regards,
>>
>>
>>
>> Christian Polzer
>>
>>
>>
>>
>> * Christian Polzer*
>> SW Engineer - Senior
>> Atos IT Solutions and Services GmbH
>> M. +43 664 88552369
>> christian.pol...@atos.net
>> Autokaderstr. 29
>>
>> 1210 Vienna, Austria
>>
>> atos.net 
>>
>> [image: Atos]
>>
>> [image: linkedin][image: twitter] 
>> [image:
>> xing][image: instagram] 
>>
>> Company Name: Atos Convergence Creators GmbH; Legal Form: Limited
>> Company; Company Seat: Vienna; Register Number: FN 386049w, Registered at:
>> Commercial Court Vienna; DVR-Number: 4009290
>>
>>
>>
>> This e-mail and the documents attached are confidential and intended
>> solely for the addressee; it may also be privileged. If you receive this
>> e-mail in error, please notify the sender immediately and destroy it. As
>> its integrity cannot be secured on the Internet, the Atos group liability
>> cannot be triggered for the message content. Although the sender endeavors
>> to maintain a computer virus-free network, the sender does not warrant that
>> this transmission is virus-free and will not be liable for any damages
>> resulting from any virus transmitted.
>>
>>
>>
>>
>>
>>
>> Company: Atos IT Solutions and Services GmbH
>> Legal form: Gesellschaft mit beschränkter Haftung
>> Company seat: Vienna
>> Commercial registry file nr.: FN 357865y
>> Commercial Court: Handelsgericht Wien
>> DVR: 4003754
>> ATU UID: 66190855
>> ARA Nr: 17961
>>
>> Important Note: This e-mail and the documents attached are confidential
>> and intended solely for the addressee; it may also be privileged. If you
>> receive this e-mail in error, please notify the sender immediately and
>> destroy it. As its integrity cannot be secured on the Internet, the Atos
>> group liability cannot be triggered for the message content. Although the
>> sender endeavors to maintain a computer virus-free network, the sender does
>> not warrant that this transmission is virus-free and will not be liable for
>> any damages resulting from any virus transmitted.
>>
>
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>


-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)


Re: Too Many Searcher Opening Events

2021-04-28 Thread Shawn Heisey

On 2021-04-27 11:07, Ronen Nussbaum wrote:
I'm trying to understand why there are so many "SolrIndexSearcher 
Opening"

events in the log e.g.


Those events are all for the realtime searcher, which is normally found 
at the /get handler path.  I believe that handler is implicit, meaning 
you don't have to define it in solrconfig.xml for it to be created.


The realtime searcher allows you to query uncommitted documents by the 
value in the uniqueKey field.  I do not know whether there is anything 
available to control how often this searcher is replaced.  I don't think 
there is.  It is part of Solr's norm operation.


When you are looking in the log for for "real" searchers opening, you 
will need exclude any lines that say "realtime".


Thanks,
Shawn


join with big 2nd collection

2021-04-28 Thread Jens Viebig
Hi List,

We have a join perfomance issue and are not sure in which direction we should 
look to solve the issue.
We currently only have a single node setup

We have 2 collections where we do join querys, joined by a "primary key" string 
field contentId_s
Each dataset for a single contentId_s consists of multiple timecode based 
documents in both indexes which makes this a many to many query.

collection1 - contains generic metadata and timecode based content (think 
timecode based comments)
Documents: 382.872
Unique contentId_s: 16715
~ 160MB size
single shard

collection2 - contains timecode based GPS data (gps posititon, field of 
view...timecodes are not related to timecodes in collection1, so flatten the 
structure would blow up the number of documents to incredible numbers) :
Documents: 695.887.875
Unique contenId_s: 10199
~ 300 GB size
single shard

Hardware is a HP DL360 with 32gb of ram (also tried on a machine with 64gb with 
not much improvement) and 1TB SSD for the index

In our use case there is lots of indexing/deletion traffic on both indexes and 
only few queries fired against the server.

We are constantly indexing new content and deleting old documents. This was 
already getting problematic with HDDs so we switched to SDDs,
now indexing speed is fine for now (Might need also to scale this up in the 
future to allow more throughput).

But search speed suffers when we need to join with the big collection2 (taking 
up to 30sec for the query to succeed). We had some success experimenting with 
score join queries when collection2 results only returns a few unique Ids, but 
we can't predict that this is always the case, and if a lot of documents are 
hit in collection2,
performance is 10x worse than with original normal join.

Sample queries look like this (simplified, but more complex queries are not 
much slower):

Sample1:
query: coll1field:someval OR {!join from=contentId_s to=contentId_s 
fromIndex=collection2 v='coll2field:someval}
filter: {!collapse field=contentId_s min=timecode_f}

Sample 2:
query: coll1field:someval
filter: {!join from=contentId_s to=contentId_s fromIndex=collection2 
v='coll2field:otherval}
filter: {!collapse field=contentId_s min=timecode_f}


I experimented with running the query on collection2 alone first only to get 
the numdocs (collapsing on contentId_s) to see how much results we get so we 
could choose the right join query, but then with many hits in collection2 this 
almost takes the same time as doing the join, so slow queries would get even 
slower

Caches also seem to not help much since almost every query fired is different 
and the index is mostly changing between requests anyways.

We are open to anything, adding nodes/hardware/shards/changing the index 
structure...
Currently we don't know how to get around the big join

Any advice in which direction we should look ?


SecureRandom algorithm 'NativePRNG' is in use

2021-04-28 Thread gnandre
Hi,

I intermittently face this issue sometimes while running the unit tests.

SecureRandom algorithm 'NativePRNG' is in use by your JVM, which is a
potentially blocking algorithm on some environments. Please report the
details of this failure (and your JVM vendor/version) to
solr-u...@lucene.apache.org. You can try to run your tests with
-Djava.security.egd=file:/dev/./urandom or bypass this check using
-Dtest.solr.allowed.securerandom=NativePRNG as a JVM option when
running tests.

Error says to report it here, so here it goes. Solr version - 8.5.2

JVM version - Amazon Corretto 1.8.0_242


Re: SecureRandom algorithm 'NativePRNG' is in use

2021-04-28 Thread gnandre
One more thing,  -Dtest.solr.allowed.securerandom=NativePRNG doesn't seem
to help and I haven't tried the other option yet.

On Wed, Apr 28, 2021 at 8:41 PM gnandre  wrote:

> Hi,
>
> I intermittently face this issue sometimes while running the unit tests.
>
> SecureRandom algorithm 'NativePRNG' is in use by your JVM, which is a 
> potentially blocking algorithm on some environments. Please report the 
> details of this failure (and your JVM vendor/version) to 
> solr-u...@lucene.apache.org. You can try to run your tests with 
> -Djava.security.egd=file:/dev/./urandom or bypass this check using 
> -Dtest.solr.allowed.securerandom=NativePRNG as a JVM option when running 
> tests.
>
> Error says to report it here, so here it goes. Solr version - 8.5.2
>
> JVM version - Amazon Corretto 1.8.0_242
>
>
>


RE: Too Many Searcher Opening Events

2021-04-28 Thread Nussbaum, Ronen
Thank you Shawn for your reply.
I didn't notice the "realtime", we do use it in our application workflow.
I tried to reproduce it in a similar environment but when used the /get handler 
I only saw this line in the log:
"org.apache.solr.handler.component.RealTimeGetComponent 
LOOKUP_SLICE:shard50=..."
Didn't see searcher opening events.
Since we experience high memory usage and overall performance slowness I 
thought it might be related.


-Original Message-
From: Shawn Heisey 
Sent: יום ד 28 אפריל 2021 16:46
To: users@solr.apache.org
Subject: Re: Too Many Searcher Opening Events

On 2021-04-27 11:07, Ronen Nussbaum wrote:
> I'm trying to understand why there are so many "SolrIndexSearcher
> Opening"
> events in the log e.g.

Those events are all for the realtime searcher, which is normally found at the 
/get handler path.  I believe that handler is implicit, meaning you don't have 
to define it in solrconfig.xml for it to be created.

The realtime searcher allows you to query uncommitted documents by the value in 
the uniqueKey field.  I do not know whether there is anything available to 
control how often this searcher is replaced.  I don't think there is.  It is 
part of Solr's norm operation.

When you are looking in the log for for "real" searchers opening, you will need 
exclude any lines that say "realtime".

Thanks,
Shawn


This electronic message may contain proprietary and confidential information of 
Verint Systems Inc., its affiliates and/or subsidiaries. The information is 
intended to be for the use of the individual(s) or entity(ies) named above. If 
you are not the intended recipient (or authorized to receive this e-mail for 
the intended recipient), you may not use, copy, disclose or distribute to 
anyone this message or any information contained in this message. If you have 
received this electronic message in error, please notify us by replying to this 
e-mail.


Re: Problem with eDisMax and multi-word synonyms

2021-04-28 Thread Ere Maijala

Hello Markus,

Thanks for the reply. I'm not sure I understand. The docs state the 
following:


"The default value of mm is 0% (all clauses optional), unless q.op is 
specified as "AND", in which case mm defaults to 100% (all clauses 
required)."

(https://solr.apache.org/guide/8_8/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter)

And obviously it has effect. You can also replace q.op=AND with 
mm=100%25 in my examples with the same results. The multi-word synonym 
makes the query explained by debugQuery=true seem wrong to me in that it 
requires all terms to match in the same field, whereas normally the 
match can be found in any of the fields listed in qf. For example this 
is the query from my first example:


+(+DisjunctionMaxQuery((name:corsair | manu:corsair | cat:corsair)) 
+DisjunctionMaxQuery((name:microsystems | manu:microsystems | 
cat:microsystems)) +DisjunctionMaxQuery((name:memory | manu:memory | 
cat:memory)))


Using the synonym instead of `corsair microsystems` produces this:

+(+((+name:corsair +name:microsystems +name:memory) | (+manu:corsair 
+manu:microsystems +manu:memory) | (+cat:cmi +cat:memory)))


We don't use stopwords. mm.autoRelax does not make a difference here.

Best,
Ere

Markus Jelsma kirjoitti 28.4.2021 klo 16.20:

Hello Ere,

The q.op parameter is not a dismax parameter. instead i think you are being
bitten bij de mm parameter [1] which by default is 100%, meaning all terms
must match. Multi word synonym handing and mm are not a very intuitive
match, and can lead to crazy problems. Also beware of mm and stopword
handling and check out mm.autoRelax [2]. But it is best not to use
stopwords at all.

Check it out,
Markus

[1] https://solr.apache.org/guide/6_6/the-dismax-query-parser.html
[2] https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html

Op wo 28 apr. 2021 om 15:02 schreef Ere Maijala :


Hi,

Here's one that I can't wrap my head around. The main question is: why
are the search terms treated differently in eDisMax if the query expands
to a multi-word synonym, and there are different field types and q.op=AND?

This gets complicated quickly, so I tried to reproduce the results with
the techproducts example:

1. Start with vanilla Solr 8.8.2

2. echo "cor => Corsair" >>
server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt

4. echo "cmi => Corsair Microsystems" >>
server/solr/configsets/sample_techproducts_configs/conf/synonyms.txt

4. bin/solr start -e techproducts


Now, a basic query that works fine produces 2 results:


http://localhost:8983/solr/techproducts/select?q=corsair+microsystems+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND

But if I use the synonym, I don't get any results:


http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND

If I leave cat field out, however, I get 2 results:


http://localhost:8983/solr/techproducts/select?q=cmi+memory&debugQuery=true&defType=edismax&qf=name+manu&q.op=AND

Also if leave q.op out and add AND between the terms, I get 2 results
even with the cat field:


http://localhost:8983/solr/techproducts/select?q=cmi+AND+memory&debugQuery=true&defType=edismax&qf=name+manu+cat

The single-word synonym works just fine:


http://localhost:8983/solr/techproducts/select?q=cor+memory&debugQuery=true&defType=edismax&qf=name+manu+cat&q.op=AND


Can anyone shine a light on what's happening here?

Additional notes:

1. This is a simplified example, and the real-world case is much more
complicated. It has our custom class create the synonyms for compound
words in Finnish, and the queries come from users.

2. As far as I can see mm doesn't affect the results in any meaningful
way, but I just might be doing something wrong.

3. I included the debugQuery parameter so that it's easy to see how
different the queries become.

Best Regards,
Ere

--
Ere Maijala
Kansalliskirjasto / The National Library of Finland





--
Ere Maijala
Kansalliskirjasto / The National Library of Finland