Re: ExtendedDismaxQParser changes

2023-11-16 Thread elisabeth benoit
Thanks a lot for taking time to answer.

I'll have to figure out a work around, decreasing mm is not an option for
me, maybe use a boost for this extra field.

Best regards,
Elisabeth

Le mar. 14 nov. 2023 à 12:05, Mikhail Khludnev  a écrit :

> Ok. Right
> (one two three four five six seven)~7 means match all of them ie in fact
> +one
> +two +three +four +five +six +seven
> Here we can see that how dismax handles fields with different analyzers is
> far from perfection.
> You can either decrease mm
>
> https://solr.apache.org/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter
> or experiment with mm.autoRelax=true
>
> https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html#TheExtendedDisMaxQueryParser-Themm.autoRelaxParameter
>
>
> On Mon, Nov 13, 2023 at 10:33 PM elisabeth benoit <
> elisaelisael...@gmail.com>
> wrote:
>
> > okay, thanks, for the answer. the thing is
> >
> > when there is no *wordf**ield* in the *qf* param, but only *edgefield1*
> and
> > *edgefield2*, I get this parsedQuery
> >
> > parsedQuery =
> >  +(DisjunctionMaxQuery(((edgefield1:musee)^1.1 | edgefield2:musee))
> >  DisjunctionMaxQuery(((edgefield1:maillol)^1.1 | edgefield2:maillol))
> >  DisjunctionMaxQuery(((edgefield1:61)^1.1 | edgefield2:61))
> >  DisjunctionMaxQuery(((edgefield1:r)^1.1 | edgefield2:r))
> >  DisjunctionMaxQuery(((edgefield1:grenelle)^1.1 | edgefield2:grenelle))
> >  DisjunctionMaxQuery(((edgefield1:75007)^1.1 | edgefield2:75007))
> >  DisjunctionMaxQuery(((edgefield1:paris)^1.1 | edgefield2:paris)))~7
> >
> > and SolR does return documents
> >
> > but when I have instead* wordf**ield* and *edgefield* in *qf*,  I get
> this
> > parsedQuery
> >
> > parsedQuery =
> > >  "+DisjunctionMaxQuerywordfield:musee wordfield:maillol
> wordfield:61
> > >  Synonym(wordfield:r wordfield:ru wordfield:rue) wordfield:grenelle
> > >  wordfield:75007 wordfield:paris)~7)^1.1 | ((edgefield:musee
> > > edgefield:maillol
> > >  edgefield:61 edgefield:r edgefield:grenelle edgefield:75007
> > > edgefield:paris)~7)))"
> >
> > and SolR does not return any documents.
> >
> > That is what makes me thing there is something wrong with the second
> > parsedQuery.
> >
> > Best regards,
> > Elisabeth
> >
> >
> >
> > Le lun. 13 nov. 2023 à 20:15, Mikhail Khludnev  a
> écrit :
> >
> > > >
> > > >  the first case listed in my mail
> > > > parsedQuery =
> > > >  "+DisjunctionMaxQuerywordfield:musee wordfield:maillol
> > wordfield:61
> > > >  Synonym(wordfield:r wordfield:ru wordfield:rue) wordfield:grenelle
> > > >  wordfield:75007 wordfield:paris)~7)^1.1 | ((edgefield:musee
> > > > edgefield:maillol
> > > >  edgefield:61 edgefield:r edgefield:grenelle edgefield:75007
> > > > edgefield:paris)~7)))"
> > >
> > >
> > > > The OR is different, it is all words must match wordfield OR all
> words
> > > must
> > > > match edgefield, but no mix between the two fields are allowed.
> > >
> > >
> > > It doesn't work this way. These two queries differs only in
> > scoring/results
> > > ordering. i.e
> > > this query matches  docs: {wordfield:musee, edgefield:musee} as well
> as {
> > > wordfield:musee,edgefield:maillol},   {wordfield:musee}, {
> > > edgefield:maillol}.
> > > This explanation might be useful
> > > https://lucidworks.com/post/solr-boolean-operators/
> > > Note: DisMax works like OR/| but takes max instead of sum as a score.
> > >
> > > On Mon, Nov 13, 2023 at 7:21 PM elisabeth benoit <
> > > elisaelisael...@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > Thanks for your answer.
> > > >
> > > > I mean that in the second case listed in my mail, the query is
> > > > parsedQuery =
> > > >  +(DisjunctionMaxQuery(((edgefield1:musee)^1.1 | edgefield2:musee))
> > > >  DisjunctionMaxQuery(((edgefield1:maillol)^1.1 | edgefield2:maillol))
> > > >  DisjunctionMaxQuery(((edgefield1:61)^1.1 | edgefield2:61))
> > > >  DisjunctionMaxQuery(((edgefield1:r)^1.1 | edgefield2:r))
> > > >  DisjunctionMaxQuery(((edgefield1:grenelle)^1.1 |
> edgefield2:grenelle))
> > > >  DisjunctionMaxQuery(((edgefield1:75007)^1.1 | edgefield2:75007))
> > > >  DisjunctionMaxQuery(((edgefield1:paris)^1.1 | edgefield2:paris)))~7
> > > >
> > > > and so the way I read it is "musee" can match edgefield1 OR
> edgefield2,
> > > > "maillol" can match edgefield1 OR edgefield2, and so on, so solr can
> > > return
> > > > a doc where some query words match with edgefield1 and some other
> query
> > > > words with edgefield2.
> > > >
> > > > But in the first case listed in my mail
> > > >
> > > > parsedQuery =
> > > >  "+DisjunctionMaxQuerywordfield:musee wordfield:maillol
> > wordfield:61
> > > >  Synonym(wordfield:r wordfield:ru wordfield:rue) wordfield:grenelle
> > > >  wordfield:75007 wordfield:paris)~7)^1.1 | ((edgefield:musee
> > > > edgefield:maillol
> > > >  edgefield:61 edgefield:r edgefield:grenelle edgefield:75007
> > > > edgefield:paris)~7)))"
> > > >
> > > > The OR is different, it is all words must ma

Re: G1GC not cleaning old gen

2023-11-16 Thread Shawn Heisey

On 11/16/23 00:14, bilal qureshi wrote:

I'm running facet queries with G1GC, and Solr is crashing with OOM, not
triggering GC.
All caches are disabled in solrconfig.xml.


The Java OutOfMemoryError exception can happen for resource depletions 
that are NOT memory.  Until you know exactly what resource was actually 
depleted, you could be chasing the wrong problem.


Solr 9.1.1 may not always log the actual OOME exception.  Solr 9.2.0 or 
later will - rather than executing a script on OOME, the later version 
actually crashes Java, and Java itself will log the reason in its crash log.


Once you know the reason for the OOME, we can proceed.  There are 
exactly two ways to deal with OOME:


1) Increase the amount available of the depleted resource.
2) Change the configuration or fix the software so it needs less of that 
resource.


Thanks,
Shawn



pivot vs json.facet

2023-11-16 Thread Vincenzo D'Amore
Hi all,

I'm comparing pivot faceting to json sub-faceting.
I read from the Yonik post about, very interesting indeed, that subfacets
are superior in terms of flexibility.

https://yonik.com/solr-subfacets/

I'm just curious to know what your opinion is and if in your experience
there are differences in terms of performance.

Best regards,
Vincenzo


-- 
Vincenzo D'Amore


Solr 8.11.2 - remote updates

2023-11-16 Thread Saur, Alexandre (ELS-AMS)
Hello,

I have one question related to how delete calls are processed in a sharded Solr 
8 collection. My setup is:
5 Nodes (NRT)
1 collection with 5 Shards
Each shard has 3 replicas
Each node has 3 replicas

When I tried to delete a document I noticed this message in the logs:

16/11/2023, 16:35:04 ERROR x:..._shard3_replica_n59
ErrorReportingConcurrentUpdateSolrClient Error when calling 
SolrCmdDistributor$Req: 
cmd=delete{_version_=-1782735337591668736,​query=`external_id_s:10.14463/…`,​commitWithin=-1};
 node=ForwardNode: http://10_shard1_replica_n69/ to http://10 
_shard1_replica_n69/

I’m not interested in the error but rather in what Solr is doing, because the 
CORE information is saying it’s something related to shard 3 (which seems ok, 
since replica 59 is the shard leader for that shard), but the error message is 
saying there was a problem while trying to delete the document from shard 1 (in 
another node)!
Is this expected?

Many thanks,
Alexandre






Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The 
Netherlands, Registration No. 33158992, Registered in The Netherlands.


Re: Solr 8.11.2 - remote updates

2023-11-16 Thread Shawn Heisey

On 11/16/23 09:38, Saur, Alexandre (ELS-AMS) wrote:

When I tried to delete a document I noticed this message in the logs:

16/11/2023, 16:35:04 ERROR x:..._shard3_replica_n59
ErrorReportingConcurrentUpdateSolrClient Error when calling SolrCmdDistributor$Req: 
cmd=delete{_version_=-1782735337591668736,​query=`external_id_s:10.14463/…`,​commitWithin=-1};
 node=ForwardNode: http://10_shard1_replica_n69/ to http://10 
_shard1_replica_n69/

I’m not interested in the error but rather in what Solr is doing, because the 
CORE information is saying it’s something related to shard 3 (which seems ok, 
since replica 59 is the shard leader for that shard), but the error message is 
saying there was a problem while trying to delete the document from shard 1 (in 
another node)!
Is this expected?


I am concerned by the presence of backticks in that output surrounding 
the query.  It might be OK, but seems very odd.  There are also other 
strange characters in your output, but I do not know how much of that 
might be due to your email client or hand-editing.  It looks like you 
have edited it to remove certain things.


A deleteByQuery would have to be sent to all shards.  That's the nature 
of the beast.  With the compositeId router, a deleteById could be 
directed to a specific shard, but not a DBQ.


Thanks,
Shawn



Re: pivot vs json.facet

2023-11-16 Thread Chris Hostetter


: I'm comparing pivot faceting to json sub-faceting.

JSON Faceting originally didn't support any sort of refinement -- at that 
point it was (IMO) really only useful for getting approximate information 
about the facet buckets -- it you needed hierarchical facets/stats and 
wanted any sort of confidence as to their accuracy, I would 
have absolutely said you should use pivot faceting.

Once (optional) refinement was added to JSON Faceting things got better -- 
but it was still just "two phase" refinement, and there weren't very many 
options for tuning how aggresive the refinement was.   At that point I 
would have still recommended pivot faceting for usecases where you wanted 
confidence in the accuracy in determining the top buckets.

Once *BOTH* the "overrequest" & "overrefine" params were added, it became 
possible to exert a lot of control over how much refinement is done, and 
how accurate the top buckets are -- and a long the way JSON faceting 
gained a lot of other additional features, optimizations, and bug fixes.

At this point, I really don't recommend pivot faceting for any "real 
world" usecases I've seen.  The one hypothetical usecase where pivot 
faceting *might* be better is if you needed to do *deeply* nested 
facets, on fields with huge numbers of terms, in a collection where your 
term/bucket distribution across shards is very imbalanced, and you really 
wanted to be completley confident that you were getting the best possible 
top buckets at every level of the nested buckets, even if the requests 
took much longer -- and that's because the pivot facets will make 
recursive refinement calls for each level of the nested facets, to ensure 
the stats and sub-facets of every bucket are populated by every shard.

For most usecases though: the JSON facet "two phase" approach is much 
faster, works just fine by default, and can be made even better by tuning 
"overrequest" & "overrefine".

-Hoss
http://www.lucidworks.com/


facet query question

2023-11-16 Thread Vince McMahon
Hi,

When working with large dataset and running a facet query, is there a way
to run it in background and when the second time query run, it will be
getting the result from cache?

in RDBMS, there are two concepts that will allow long running query getting
better chance to finish.
1) using materialized view, so all the complex data operation for the query
is pre made for the query.
2) using store procedure cache, so the when the query hit the second time,
the result set is pull directly from the memory cache.

 Does Solr have something caching results for facet queries over large
dataset?  Is there example how to make facet query faster?


Re: facet query question

2023-11-16 Thread Andy Lester



> Does Solr have something caching results for facet queries over large
> dataset?  Is there example how to make facet query faster?



Yes.  There are many articles about query caching in Solr, plus the docs.  
https://solr.apache.org/guide/8_8/query-settings-in-solrconfig.html for one 
version.

Andy

Re: facet query question

2023-11-16 Thread Mikhail Khludnev
On Fri, Nov 17, 2023 at 5:06 AM Vince McMahon 
wrote:

> Hi,
>
>  Does Solr have something caching results for facet queries over large
> dataset?

I'm not aware of anything like caching aggregated counts (facets) inside of
Solr. It's up to app/client to do that if necessary.


> Is there example how to make facet query faster?
>
You need to get what's slow particularly (starting from looking into
per-component times in debugQuery=true output) and then address that
bottleneck.

-- 
Sincerely yours
Mikhail Khludnev