One thing you can use to influence ranking while still allowing the optimization is to use Rank Fields[1]. Multiple field queries should be OK, but I don't remember off the top of my head if DisMax queries work, I believe they do, but I don't know why you wouldn't be seeing an improvement.
[1] https://issues.apache.org/jira/browse/SOLR-14590 On Thu, Feb 15, 2024 at 11:05 AM Mikhail Khludnev <m...@apache.org> wrote: > Don't know exactly. It might be sum, product or any other combination func. > Another thought: > MinExactCount optimization always brings top score just skipping weaker > matches. > But if you introduce rescoring after extracting BM25 top hits, it loses > precision: > Think about a top rating doc, which has fewer matching score, it might not > be picked due to MinExactCount, and not rescored consequently. > To summarize: rescoring after MinExactCount is not functionally correct > though. > > On Thu, Feb 15, 2024 at 8:16 PM rajani m <rajinima...@gmail.com> wrote: > > > If the boosts are multiple function queries such as the following[1] then > > the boost query would be a sum function surrounding them, is it? I missed > > that one. > > [1] "sum(product(popularity,2),1.0)" and > > "recip(ms(NOW/HOUR,date),3.163e-11,1,1)" > > > > > > I will post the question on the dev channel regarding what is expected > when > > there are multiple query fields. > > > > Thanks again for all your help and pointers. > > > > > > On Thu, Feb 15, 2024 at 2:25 AM Mikhail Khludnev <m...@apache.org> > wrote: > > > > > Hello, > > > Please check inline below. > > > > > > On Thu, Feb 15, 2024 at 2:11 AM rajani m <rajinima...@gmail.com> > wrote: > > > > > > > Yes, rerank works as an alternative, but the rerank only supports one > > > boost > > > > query, correct? If there are multiple boost conditions such as boost > by > > > > date, season and popularity, putting all of them into one complex > boost > > > > query is a hard problem, rerank by LTR can help. Thank you for that > > > > pointer. > > > > > > > Perhaps I missed something, but why boost query can't combine multiple > > > clauses? > > > > > > > > > > > > > > The other limitation is that it is not possible to query multiple > > > > fields and leverage this feature, that is still an issue, because it > is > > > > also a common use case to have title, description and keyword fields > > > > separated rather than merged into one. > > > > > > > It's worth checking with dev@. > > > I didn't code anything there, my vague understanding is: it skips > blocks > > of > > > docIDs with max tf (terms freqs) fewer than seen so far. > > > So, such skip conditions should be pushed through query scoring logic, > > > which might not be (or it might?) obvious in case of the max over > fields. > > > > > > > > > > > > > > > > > > Regards, > > > > Rajani > > > > > > > > On Wed, Feb 14, 2024 at 1:54 PM Mikhail Khludnev <m...@apache.org> > > > wrote: > > > > > > > > > Cool. > > > > > Btw can you rerank results with the corresponding boost query? > > > > > > > > > > On Wed, Feb 14, 2024 at 8:46 PM rajani m <rajinima...@gmail.com> > > > wrote: > > > > > > > > > > > Milkhail, > > > > > > > > > > > > Thanks for that pointer to test with a simple query. It works > > > > perfectly > > > > > > with lucene query parser, I see qtime drop by 7 times with this > > > param. > > > > > > > > > > > > With edismax query, it works with certain caveats that "qf" > (query > > > > > fields) > > > > > > must have only one field and the query must not have boost/bf > > > > parameters. > > > > > > We would expect it to work with boost params because boost is > > applied > > > > > after > > > > > > the documents matched and scored by block max WAND as first pass. > > Am > > > I > > > > > > right? Without the support to "boost" params, the feature is not > > > really > > > > > > usable. The recency and popularity boosts are common to most > > queries. > > > > > What > > > > > > are your thoughts? > > > > > > > > > > > > Thank you, > > > > > > Rajani > > > > > > > > > > > > > > > > > > On Tue, Feb 6, 2024 at 2:54 PM rajani m <rajinima...@gmail.com> > > > wrote: > > > > > > > > > > > > > > > > > > > > > With a 400M index it's worth experimenting with skipping > about > > a > > > > > > million > > > > > > > of docs. > > > > > > > Is there a param that allows setting how many docs to skip? > > > > > > > > > > > > > > "minExactCount '' which decides how many docs it should care > to > > > > score > > > > > > and > > > > > > > I tested that with 100, 1000 and 2000 with latency only > > increased. > > > > > > > > > > > > > > Alessandro, > > > > > > > Assuming it is approximately the total number of files under > > > > > > > /solr/replica_name/data/index - is it 222. The top k files > sizes > > > > > > > > > > > > > > rw-r--r-- 1 solr solr 766M Feb 4 04:16 _chg1.cfs > > > > > > > -rw-r--r-- 1 solr solr 1020M Jan 29 18:37 _ca21.cfs > > > > > > > -rw-r--r-- 1 solr solr 3.7G Nov 5 23:49 _95vt.cfs > > > > > > > -rw-r--r-- 1 solr solr 3.8G Jan 15 08:59 _boyy.cfs > > > > > > > -rw-r--r-- 1 solr solr 3.8G Nov 29 16:01 _9ynt.cfs > > > > > > > -rw-r--r-- 1 solr solr 3.8G Jan 25 00:47 _c3t7.cfs > > > > > > > -rw-r--r-- 1 solr solr 4.1G Oct 26 14:37 _8pyh.cfs > > > > > > > -rw-r--r-- 1 solr solr 4.1G Oct 26 14:38 _7cwt.cfs > > > > > > > -rw-r--r-- 1 solr solr 4.3G Oct 27 06:04 _7s6c.cfs > > > > > > > -rw-r--r-- 1 solr solr 4.3G Oct 26 14:37 _7n8z.cfs > > > > > > > -rw-r--r-- 1 solr solr 4.5G Jan 18 00:30 _dteg.cfs > > > > > > > -rw-r--r-- 1 solr solr 4.5G Jan 19 17:44 _cwcc.cfs > > > > > > > -rw-r--r-- 1 solr solr 4.6G Jan 13 07:35 _blix.cfs > > > > > > > -rw-r--r-- 1 solr solr 4.9G Oct 26 14:39 _8gu9.cfs > > > > > > > -rw-r--r-- 1 solr solr 4.9G Oct 26 14:38 _3kj9.cfs > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 6, 2024 at 2:45 AM Alessandro Benedetti < > > > > > > a.benede...@sease.io> > > > > > > > wrote: > > > > > > > > > > > > > >> It would be interesting to see the level pf fragmentation of > > each > > > > > index > > > > > > >> indeed... > > > > > > >> I.e. How many segments per core, in a collection > > > > > > >> > > > > > > >> On Tue, 6 Feb 2024, 06:59 Mikhail Khludnev, <m...@apache.org> > > > > wrote: > > > > > > >> > > > > > > >> > 200-300 docs might be too few to get significant gain. With > a > > > 400M > > > > > > index > > > > > > >> > it's worth experimenting with skipping about a million of > > docs. > > > > > > >> > In simplified params I mean defType=lucene&df=description. > > > > > debugQuery > > > > > > >> might > > > > > > >> > expose some details as well. > > > > > > >> > As far as I understand this feature works with large > segments > > > > since > > > > > it > > > > > > >> > skips a block of a segment, not a segment (?). > > > > > > >> > > > > > > > >> > On Mon, Feb 5, 2024 at 8:04 PM rajani m < > > rajinima...@gmail.com> > > > > > > wrote: > > > > > > >> > > > > > > > >> > > The "numFound" value is 200-300 docs difference when > > compared > > > to > > > > > the > > > > > > >> > query > > > > > > >> > > without "minExactFound" param. The collection has over > 400m > > > > > records > > > > > > >> so > > > > > > >> > > testing the feature on a large collection. The > > numFoundExact > > > > > param > > > > > > in > > > > > > >> > the > > > > > > >> > > response is consistently false which tells me the feature > is > > > > > > >> functioning > > > > > > >> > > but the results(qtime) are just off, not as expected. > > > > > > >> > > > > > > > > >> > > Would a type of query parser matter?I tested without the > > > > secondary > > > > > > >> sort, > > > > > > >> > > even without it there is no improvement in the query time > > > > latency > > > > > > and > > > > > > >> is > > > > > > >> > > still more than the query without this param. > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > On Mon, Feb 5, 2024 at 10:34 AM Mikhail Khludnev < > > > > m...@apache.org > > > > > > > > > > > > >> > wrote: > > > > > > >> > > > > > > > > >> > > > Hello, > > > > > > >> > > > How many matches do you have in both cases? > > > > > > >> > > > I see there's a second sorting expression, it might not > > > comply > > > > > > with > > > > > > >> the > > > > > > >> > > > requirements. > > > > > > >> > > > I'd rather start from the simple single query parser, > just > > > for > > > > > the > > > > > > >> > > > experiments. > > > > > > >> > > > Note: I never tried it myself. > > > > > > >> > > > > > > > > > >> > > > On Mon, Feb 5, 2024 at 6:20 PM rajani m < > > > > rajinima...@gmail.com> > > > > > > >> wrote: > > > > > > >> > > > > > > > > > >> > > > > I ran performance tests with different query sets and > > the > > > > > > results > > > > > > >> > look > > > > > > >> > > no > > > > > > >> > > > > good, it is adding to the latency around ~15% instead > of > > > > > > reducing > > > > > > >> or > > > > > > >> > > even > > > > > > >> > > > > matching. Not sure if I am missing something in the > > > config > > > > or > > > > > > it > > > > > > >> is > > > > > > >> > an > > > > > > >> > > > > issue. > > > > > > >> > > > > > > > > > > >> > > > > Here is an example query *without* WAND query > parameter > > > > > > >> > > > > select?&fl=id,ext_id&start=0&q.op=OR&sort=score > > > desc,ext_id > > > > > > >> > > > > asc&rows=10&q=white flowers > > > card&defType=edismax&qf=keywords > > > > > > >> > > description > > > > > > >> > > > > title > > > > > > >> > > > > vs > > > > > > >> > > > > *With* WAND query parameter > > > > > > >> > > > > select?&fl=id,ext_id&start=0&q.op=OR&sort=score > > > desc,ext_id > > > > > > >> > > > > asc&rows=10&q=white flowers > > > card&defType=edismax&qf=keywords > > > > > > >> > > description > > > > > > >> > > > > title*&minExactCount=10* > > > > > > >> > > > > > > > > > > >> > > > > On Thu, Feb 1, 2024 at 8:36 AM rajani m < > > > > > rajinima...@gmail.com> > > > > > > >> > wrote: > > > > > > >> > > > > > > > > > > >> > > > > > Hi Ishan, > > > > > > >> > > > > > I have looked into that doc, and it looks like > the > > > solr > > > > > > >> version > > > > > > >> > > has > > > > > > >> > > > to > > > > > > >> > > > > > be >8.8 and the config needed is to add the query > > > > parameter > > > > > > >> > > > > "&minExactCount=k" > > > > > > >> > > > > > where k is 10 or 100 depending on the accuracy of > the > > > > first > > > > > k > > > > > > >> docs. > > > > > > >> > > > > > > > > > > > >> > > > > > I ran a query performance test using an internal > tool, > > > > with > > > > > k > > > > > > >> set > > > > > > >> > to > > > > > > >> > > 10 > > > > > > >> > > > > > and 100, which barely showed any difference in query > > > time > > > > > > >> latency, > > > > > > >> > I > > > > > > >> > > > > > didn't expect that so I was wondering if there is > any > > > > > > >> > configuration I > > > > > > >> > > > > > missed. > > > > > > >> > > > > > > > > > > > >> > > > > > I will run a couple more tests with different query > > sets > > > > > > >> meanwhile > > > > > > >> > > and > > > > > > >> > > > > dig > > > > > > >> > > > > > further into implementation of the feature to see > if I > > > am > > > > > > >> missing > > > > > > >> > any > > > > > > >> > > > > > config here. Appreciate any suggestions. > > > > > > >> > > > > > > > > > > > >> > > > > > Thanks, > > > > > > >> > > > > > Rajani > > > > > > >> > > > > > > > > > > > >> > > > > > On Thu, Feb 1, 2024 at 12:53 AM Ishan > Chattopadhyaya < > > > > > > >> > > > > > ichattopadhy...@gmail.com> wrote: > > > > > > >> > > > > > > > > > > > >> > > > > >> Is it possible to benchmark the query performance > > > across > > > > a > > > > > > >> larger > > > > > > >> > > set > > > > > > >> > > > of > > > > > > >> > > > > >> queries? You can leverage Solr Bench, if needed. > > > > > > >> > > > > >> https://github.com/fullstorydev/solr-bench > > > > > > >> > > > > >> > > > > > > >> > > > > >> On Thu, 1 Feb, 2024, 11:20 am Ishan > Chattopadhyaya, < > > > > > > >> > > > > >> ichattopadhy...@gmail.com> wrote: > > > > > > >> > > > > >> > > > > > > >> > > > > >> > Some documentation is here > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > https://solr.apache.org/guide/8_6/common-query-parameters.html#minexactcount-parameter > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > On Thu, 1 Feb, 2024, 9:53 am rajani m, < > > > > > > >> rajinima...@gmail.com> > > > > > > >> > > > wrote: > > > > > > >> > > > > >> > > > > > > > >> > > > > >> >> Hi All, > > > > > > >> > > > > >> >> > > > > > > >> > > > > >> >> To leverage the query time improvements that > > come > > > > with > > > > > > the > > > > > > >> > > Block > > > > > > >> > > > > MAX > > > > > > >> > > > > >> >> WAND > > > > > > >> > > > > >> >> feature, what are the required configurations? > > > > > > >> > > > > >> >> > > > > > > >> > > > > >> >> I am on solr 9.1.1 version. As per docs, > including > > > > > > >> > > > > "minExactCount=100" > > > > > > >> > > > > >> >> query param should do it, however I don't see > any > > > drop > > > > > in > > > > > > >> query > > > > > > >> > > > time, > > > > > > >> > > > > >> it > > > > > > >> > > > > >> >> is > > > > > > >> > > > > >> >> more or less the same. Am I missing something? > > > > > > >> > > > > >> >> > > > > > > >> > > > > >> >> The queries I tested with are standard ones with > > > > edismax > > > > > > as > > > > > > >> > query > > > > > > >> > > > > >> parser > > > > > > >> > > > > >> >> and query text is converted into boolean clauses > > and > > > > > query > > > > > > >> has > > > > > > >> > 2 > > > > > > >> > > > > boost > > > > > > >> > > > > >> >> params by date and popularity field. I included > > the > > > > > > >> > > "minExactCount" > > > > > > >> > > > > >> set to > > > > > > >> > > > > >> >> as low as 10 and 100 and increased to 1000 but > > > didn't > > > > > see > > > > > > >> key > > > > > > >> > > > change > > > > > > >> > > > > in > > > > > > >> > > > > >> >> query time, it was about the same. > > > > > > >> > > > > >> >> > > > > > > >> > > > > >> >> Would including boost or use of edismax parser > > not > > > > > > benefit > > > > > > >> > with > > > > > > >> > > > > block > > > > > > >> > > > > >> MAX > > > > > > >> > > > > >> >> WAND? Example query /select?q=((white) AND > (roses > > > OR > > > > > > >> > > > > >> >> jasmine))&defType=edismax&qf=keywords > description > > > > > > >> > > > > >> >> > > > > > > title&pf2=title&bf=recip(ms(NOW,datefield),3.16e-11,1,1)^2.0 > > > > > > >> > > > > >> >> > > > > > > >> > > > > >> >> > > > > > > >> > > > > >> >> Thank you, > > > > > > >> > > > > >> >> Rajani > > > > > > >> > > > > >> >> > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > -- > > > > > > >> > > > Sincerely yours > > > > > > >> > > > Mikhail Khludnev > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > -- > > > > > > >> > Sincerely yours > > > > > > >> > Mikhail Khludnev > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sincerely yours > > > > > Mikhail Khludnev > > > > > > > > > > > > > > > > > > -- > > > Sincerely yours > > > Mikhail Khludnev > > > > > > > > -- > Sincerely yours > Mikhail Khludnev >