Re: ExtendedDismaxQParser changes
Thanks a lot for taking time to answer. I'll have to figure out a work around, decreasing mm is not an option for me, maybe use a boost for this extra field. Best regards, Elisabeth Le mar. 14 nov. 2023 à 12:05, Mikhail Khludnev a écrit : > Ok. Right > (one two three four five six seven)~7 means match all of them ie in fact > +one > +two +three +four +five +six +seven > Here we can see that how dismax handles fields with different analyzers is > far from perfection. > You can either decrease mm > > https://solr.apache.org/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter > or experiment with mm.autoRelax=true > > https://solr.apache.org/guide/6_6/the-extended-dismax-query-parser.html#TheExtendedDisMaxQueryParser-Themm.autoRelaxParameter > > > On Mon, Nov 13, 2023 at 10:33 PM elisabeth benoit < > elisaelisael...@gmail.com> > wrote: > > > okay, thanks, for the answer. the thing is > > > > when there is no *wordf**ield* in the *qf* param, but only *edgefield1* > and > > *edgefield2*, I get this parsedQuery > > > > parsedQuery = > > +(DisjunctionMaxQuery(((edgefield1:musee)^1.1 | edgefield2:musee)) > > DisjunctionMaxQuery(((edgefield1:maillol)^1.1 | edgefield2:maillol)) > > DisjunctionMaxQuery(((edgefield1:61)^1.1 | edgefield2:61)) > > DisjunctionMaxQuery(((edgefield1:r)^1.1 | edgefield2:r)) > > DisjunctionMaxQuery(((edgefield1:grenelle)^1.1 | edgefield2:grenelle)) > > DisjunctionMaxQuery(((edgefield1:75007)^1.1 | edgefield2:75007)) > > DisjunctionMaxQuery(((edgefield1:paris)^1.1 | edgefield2:paris)))~7 > > > > and SolR does return documents > > > > but when I have instead* wordf**ield* and *edgefield* in *qf*, I get > this > > parsedQuery > > > > parsedQuery = > > > "+DisjunctionMaxQuerywordfield:musee wordfield:maillol > wordfield:61 > > > Synonym(wordfield:r wordfield:ru wordfield:rue) wordfield:grenelle > > > wordfield:75007 wordfield:paris)~7)^1.1 | ((edgefield:musee > > > edgefield:maillol > > > edgefield:61 edgefield:r edgefield:grenelle edgefield:75007 > > > edgefield:paris)~7)))" > > > > and SolR does not return any documents. > > > > That is what makes me thing there is something wrong with the second > > parsedQuery. > > > > Best regards, > > Elisabeth > > > > > > > > Le lun. 13 nov. 2023 à 20:15, Mikhail Khludnev a > écrit : > > > > > > > > > > the first case listed in my mail > > > > parsedQuery = > > > > "+DisjunctionMaxQuerywordfield:musee wordfield:maillol > > wordfield:61 > > > > Synonym(wordfield:r wordfield:ru wordfield:rue) wordfield:grenelle > > > > wordfield:75007 wordfield:paris)~7)^1.1 | ((edgefield:musee > > > > edgefield:maillol > > > > edgefield:61 edgefield:r edgefield:grenelle edgefield:75007 > > > > edgefield:paris)~7)))" > > > > > > > > > > The OR is different, it is all words must match wordfield OR all > words > > > must > > > > match edgefield, but no mix between the two fields are allowed. > > > > > > > > > It doesn't work this way. These two queries differs only in > > scoring/results > > > ordering. i.e > > > this query matches docs: {wordfield:musee, edgefield:musee} as well > as { > > > wordfield:musee,edgefield:maillol}, {wordfield:musee}, { > > > edgefield:maillol}. > > > This explanation might be useful > > > https://lucidworks.com/post/solr-boolean-operators/ > > > Note: DisMax works like OR/| but takes max instead of sum as a score. > > > > > > On Mon, Nov 13, 2023 at 7:21 PM elisabeth benoit < > > > elisaelisael...@gmail.com> > > > wrote: > > > > > > > Hello, > > > > > > > > Thanks for your answer. > > > > > > > > I mean that in the second case listed in my mail, the query is > > > > parsedQuery = > > > > +(DisjunctionMaxQuery(((edgefield1:musee)^1.1 | edgefield2:musee)) > > > > DisjunctionMaxQuery(((edgefield1:maillol)^1.1 | edgefield2:maillol)) > > > > DisjunctionMaxQuery(((edgefield1:61)^1.1 | edgefield2:61)) > > > > DisjunctionMaxQuery(((edgefield1:r)^1.1 | edgefield2:r)) > > > > DisjunctionMaxQuery(((edgefield1:grenelle)^1.1 | > edgefield2:grenelle)) > > > > DisjunctionMaxQuery(((edgefield1:75007)^1.1 | edgefield2:75007)) > > > > DisjunctionMaxQuery(((edgefield1:paris)^1.1 | edgefield2:paris)))~7 > > > > > > > > and so the way I read it is "musee" can match edgefield1 OR > edgefield2, > > > > "maillol" can match edgefield1 OR edgefield2, and so on, so solr can > > > return > > > > a doc where some query words match with edgefield1 and some other > query > > > > words with edgefield2. > > > > > > > > But in the first case listed in my mail > > > > > > > > parsedQuery = > > > > "+DisjunctionMaxQuerywordfield:musee wordfield:maillol > > wordfield:61 > > > > Synonym(wordfield:r wordfield:ru wordfield:rue) wordfield:grenelle > > > > wordfield:75007 wordfield:paris)~7)^1.1 | ((edgefield:musee > > > > edgefield:maillol > > > > edgefield:61 edgefield:r edgefield:grenelle edgefield:75007 > > > > edgefield:paris)~7)))" > > > > > > > > The OR is different, it is all words must ma
Re: G1GC not cleaning old gen
On 11/16/23 00:14, bilal qureshi wrote: I'm running facet queries with G1GC, and Solr is crashing with OOM, not triggering GC. All caches are disabled in solrconfig.xml. The Java OutOfMemoryError exception can happen for resource depletions that are NOT memory. Until you know exactly what resource was actually depleted, you could be chasing the wrong problem. Solr 9.1.1 may not always log the actual OOME exception. Solr 9.2.0 or later will - rather than executing a script on OOME, the later version actually crashes Java, and Java itself will log the reason in its crash log. Once you know the reason for the OOME, we can proceed. There are exactly two ways to deal with OOME: 1) Increase the amount available of the depleted resource. 2) Change the configuration or fix the software so it needs less of that resource. Thanks, Shawn
pivot vs json.facet
Hi all, I'm comparing pivot faceting to json sub-faceting. I read from the Yonik post about, very interesting indeed, that subfacets are superior in terms of flexibility. https://yonik.com/solr-subfacets/ I'm just curious to know what your opinion is and if in your experience there are differences in terms of performance. Best regards, Vincenzo -- Vincenzo D'Amore
Solr 8.11.2 - remote updates
Hello, I have one question related to how delete calls are processed in a sharded Solr 8 collection. My setup is: 5 Nodes (NRT) 1 collection with 5 Shards Each shard has 3 replicas Each node has 3 replicas When I tried to delete a document I noticed this message in the logs: 16/11/2023, 16:35:04 ERROR x:..._shard3_replica_n59 ErrorReportingConcurrentUpdateSolrClient Error when calling SolrCmdDistributor$Req: cmd=delete{_version_=-1782735337591668736,query=`external_id_s:10.14463/…`,commitWithin=-1}; node=ForwardNode: http://10_shard1_replica_n69/ to http://10 _shard1_replica_n69/ I’m not interested in the error but rather in what Solr is doing, because the CORE information is saying it’s something related to shard 3 (which seems ok, since replica 59 is the shard leader for that shard), but the error message is saying there was a problem while trying to delete the document from shard 1 (in another node)! Is this expected? Many thanks, Alexandre Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33158992, Registered in The Netherlands.
Re: Solr 8.11.2 - remote updates
On 11/16/23 09:38, Saur, Alexandre (ELS-AMS) wrote: When I tried to delete a document I noticed this message in the logs: 16/11/2023, 16:35:04 ERROR x:..._shard3_replica_n59 ErrorReportingConcurrentUpdateSolrClient Error when calling SolrCmdDistributor$Req: cmd=delete{_version_=-1782735337591668736,query=`external_id_s:10.14463/…`,commitWithin=-1}; node=ForwardNode: http://10_shard1_replica_n69/ to http://10 _shard1_replica_n69/ I’m not interested in the error but rather in what Solr is doing, because the CORE information is saying it’s something related to shard 3 (which seems ok, since replica 59 is the shard leader for that shard), but the error message is saying there was a problem while trying to delete the document from shard 1 (in another node)! Is this expected? I am concerned by the presence of backticks in that output surrounding the query. It might be OK, but seems very odd. There are also other strange characters in your output, but I do not know how much of that might be due to your email client or hand-editing. It looks like you have edited it to remove certain things. A deleteByQuery would have to be sent to all shards. That's the nature of the beast. With the compositeId router, a deleteById could be directed to a specific shard, but not a DBQ. Thanks, Shawn
Re: pivot vs json.facet
: I'm comparing pivot faceting to json sub-faceting. JSON Faceting originally didn't support any sort of refinement -- at that point it was (IMO) really only useful for getting approximate information about the facet buckets -- it you needed hierarchical facets/stats and wanted any sort of confidence as to their accuracy, I would have absolutely said you should use pivot faceting. Once (optional) refinement was added to JSON Faceting things got better -- but it was still just "two phase" refinement, and there weren't very many options for tuning how aggresive the refinement was. At that point I would have still recommended pivot faceting for usecases where you wanted confidence in the accuracy in determining the top buckets. Once *BOTH* the "overrequest" & "overrefine" params were added, it became possible to exert a lot of control over how much refinement is done, and how accurate the top buckets are -- and a long the way JSON faceting gained a lot of other additional features, optimizations, and bug fixes. At this point, I really don't recommend pivot faceting for any "real world" usecases I've seen. The one hypothetical usecase where pivot faceting *might* be better is if you needed to do *deeply* nested facets, on fields with huge numbers of terms, in a collection where your term/bucket distribution across shards is very imbalanced, and you really wanted to be completley confident that you were getting the best possible top buckets at every level of the nested buckets, even if the requests took much longer -- and that's because the pivot facets will make recursive refinement calls for each level of the nested facets, to ensure the stats and sub-facets of every bucket are populated by every shard. For most usecases though: the JSON facet "two phase" approach is much faster, works just fine by default, and can be made even better by tuning "overrequest" & "overrefine". -Hoss http://www.lucidworks.com/
facet query question
Hi, When working with large dataset and running a facet query, is there a way to run it in background and when the second time query run, it will be getting the result from cache? in RDBMS, there are two concepts that will allow long running query getting better chance to finish. 1) using materialized view, so all the complex data operation for the query is pre made for the query. 2) using store procedure cache, so the when the query hit the second time, the result set is pull directly from the memory cache. Does Solr have something caching results for facet queries over large dataset? Is there example how to make facet query faster?
Re: facet query question
> Does Solr have something caching results for facet queries over large > dataset? Is there example how to make facet query faster? Yes. There are many articles about query caching in Solr, plus the docs. https://solr.apache.org/guide/8_8/query-settings-in-solrconfig.html for one version. Andy
Re: facet query question
On Fri, Nov 17, 2023 at 5:06 AM Vince McMahon wrote: > Hi, > > Does Solr have something caching results for facet queries over large > dataset? I'm not aware of anything like caching aggregated counts (facets) inside of Solr. It's up to app/client to do that if necessary. > Is there example how to make facet query faster? > You need to get what's slow particularly (starting from looking into per-component times in debugQuery=true output) and then address that bottleneck. -- Sincerely yours Mikhail Khludnev