Cool. Btw can you rerank results with the corresponding boost query? On Wed, Feb 14, 2024 at 8:46 PM rajani m <rajinima...@gmail.com> wrote:
> Milkhail, > > Thanks for that pointer to test with a simple query. It works perfectly > with lucene query parser, I see qtime drop by 7 times with this param. > > With edismax query, it works with certain caveats that "qf" (query fields) > must have only one field and the query must not have boost/bf parameters. > We would expect it to work with boost params because boost is applied after > the documents matched and scored by block max WAND as first pass. Am I > right? Without the support to "boost" params, the feature is not really > usable. The recency and popularity boosts are common to most queries. What > are your thoughts? > > Thank you, > Rajani > > > On Tue, Feb 6, 2024 at 2:54 PM rajani m <rajinima...@gmail.com> wrote: > > > > > > With a 400M index it's worth experimenting with skipping about a > million > > of docs. > > Is there a param that allows setting how many docs to skip? > > > > "minExactCount '' which decides how many docs it should care to score > and > > I tested that with 100, 1000 and 2000 with latency only increased. > > > > Alessandro, > > Assuming it is approximately the total number of files under > > /solr/replica_name/data/index - is it 222. The top k files sizes > > > > rw-r--r-- 1 solr solr 766M Feb 4 04:16 _chg1.cfs > > -rw-r--r-- 1 solr solr 1020M Jan 29 18:37 _ca21.cfs > > -rw-r--r-- 1 solr solr 3.7G Nov 5 23:49 _95vt.cfs > > -rw-r--r-- 1 solr solr 3.8G Jan 15 08:59 _boyy.cfs > > -rw-r--r-- 1 solr solr 3.8G Nov 29 16:01 _9ynt.cfs > > -rw-r--r-- 1 solr solr 3.8G Jan 25 00:47 _c3t7.cfs > > -rw-r--r-- 1 solr solr 4.1G Oct 26 14:37 _8pyh.cfs > > -rw-r--r-- 1 solr solr 4.1G Oct 26 14:38 _7cwt.cfs > > -rw-r--r-- 1 solr solr 4.3G Oct 27 06:04 _7s6c.cfs > > -rw-r--r-- 1 solr solr 4.3G Oct 26 14:37 _7n8z.cfs > > -rw-r--r-- 1 solr solr 4.5G Jan 18 00:30 _dteg.cfs > > -rw-r--r-- 1 solr solr 4.5G Jan 19 17:44 _cwcc.cfs > > -rw-r--r-- 1 solr solr 4.6G Jan 13 07:35 _blix.cfs > > -rw-r--r-- 1 solr solr 4.9G Oct 26 14:39 _8gu9.cfs > > -rw-r--r-- 1 solr solr 4.9G Oct 26 14:38 _3kj9.cfs > > > > > > > > On Tue, Feb 6, 2024 at 2:45 AM Alessandro Benedetti < > a.benede...@sease.io> > > wrote: > > > >> It would be interesting to see the level pf fragmentation of each index > >> indeed... > >> I.e. How many segments per core, in a collection > >> > >> On Tue, 6 Feb 2024, 06:59 Mikhail Khludnev, <m...@apache.org> wrote: > >> > >> > 200-300 docs might be too few to get significant gain. With a 400M > index > >> > it's worth experimenting with skipping about a million of docs. > >> > In simplified params I mean defType=lucene&df=description. debugQuery > >> might > >> > expose some details as well. > >> > As far as I understand this feature works with large segments since it > >> > skips a block of a segment, not a segment (?). > >> > > >> > On Mon, Feb 5, 2024 at 8:04 PM rajani m <rajinima...@gmail.com> > wrote: > >> > > >> > > The "numFound" value is 200-300 docs difference when compared to the > >> > query > >> > > without "minExactFound" param. The collection has over 400m records > >> so > >> > > testing the feature on a large collection. The numFoundExact param > in > >> > the > >> > > response is consistently false which tells me the feature is > >> functioning > >> > > but the results(qtime) are just off, not as expected. > >> > > > >> > > Would a type of query parser matter?I tested without the secondary > >> sort, > >> > > even without it there is no improvement in the query time latency > and > >> is > >> > > still more than the query without this param. > >> > > > >> > > > >> > > > >> > > On Mon, Feb 5, 2024 at 10:34 AM Mikhail Khludnev <m...@apache.org> > >> > wrote: > >> > > > >> > > > Hello, > >> > > > How many matches do you have in both cases? > >> > > > I see there's a second sorting expression, it might not comply > with > >> the > >> > > > requirements. > >> > > > I'd rather start from the simple single query parser, just for the > >> > > > experiments. > >> > > > Note: I never tried it myself. > >> > > > > >> > > > On Mon, Feb 5, 2024 at 6:20 PM rajani m <rajinima...@gmail.com> > >> wrote: > >> > > > > >> > > > > I ran performance tests with different query sets and the > results > >> > look > >> > > no > >> > > > > good, it is adding to the latency around ~15% instead of > reducing > >> or > >> > > even > >> > > > > matching. Not sure if I am missing something in the config or > it > >> is > >> > an > >> > > > > issue. > >> > > > > > >> > > > > Here is an example query *without* WAND query parameter > >> > > > > select?&fl=id,ext_id&start=0&q.op=OR&sort=score desc,ext_id > >> > > > > asc&rows=10&q=white flowers card&defType=edismax&qf=keywords > >> > > description > >> > > > > title > >> > > > > vs > >> > > > > *With* WAND query parameter > >> > > > > select?&fl=id,ext_id&start=0&q.op=OR&sort=score desc,ext_id > >> > > > > asc&rows=10&q=white flowers card&defType=edismax&qf=keywords > >> > > description > >> > > > > title*&minExactCount=10* > >> > > > > > >> > > > > On Thu, Feb 1, 2024 at 8:36 AM rajani m <rajinima...@gmail.com> > >> > wrote: > >> > > > > > >> > > > > > Hi Ishan, > >> > > > > > I have looked into that doc, and it looks like the solr > >> version > >> > > has > >> > > > to > >> > > > > > be >8.8 and the config needed is to add the query parameter > >> > > > > "&minExactCount=k" > >> > > > > > where k is 10 or 100 depending on the accuracy of the first k > >> docs. > >> > > > > > > >> > > > > > I ran a query performance test using an internal tool, with k > >> set > >> > to > >> > > 10 > >> > > > > > and 100, which barely showed any difference in query time > >> latency, > >> > I > >> > > > > > didn't expect that so I was wondering if there is any > >> > configuration I > >> > > > > > missed. > >> > > > > > > >> > > > > > I will run a couple more tests with different query sets > >> meanwhile > >> > > and > >> > > > > dig > >> > > > > > further into implementation of the feature to see if I am > >> missing > >> > any > >> > > > > > config here. Appreciate any suggestions. > >> > > > > > > >> > > > > > Thanks, > >> > > > > > Rajani > >> > > > > > > >> > > > > > On Thu, Feb 1, 2024 at 12:53 AM Ishan Chattopadhyaya < > >> > > > > > ichattopadhy...@gmail.com> wrote: > >> > > > > > > >> > > > > >> Is it possible to benchmark the query performance across a > >> larger > >> > > set > >> > > > of > >> > > > > >> queries? You can leverage Solr Bench, if needed. > >> > > > > >> https://github.com/fullstorydev/solr-bench > >> > > > > >> > >> > > > > >> On Thu, 1 Feb, 2024, 11:20 am Ishan Chattopadhyaya, < > >> > > > > >> ichattopadhy...@gmail.com> wrote: > >> > > > > >> > >> > > > > >> > Some documentation is here > >> > > > > >> > > >> > > > > >> > >> > > > > > >> > > > > >> > > > >> > > >> > https://solr.apache.org/guide/8_6/common-query-parameters.html#minexactcount-parameter > >> > > > > >> > > >> > > > > >> > On Thu, 1 Feb, 2024, 9:53 am rajani m, < > >> rajinima...@gmail.com> > >> > > > wrote: > >> > > > > >> > > >> > > > > >> >> Hi All, > >> > > > > >> >> > >> > > > > >> >> To leverage the query time improvements that come with > the > >> > > Block > >> > > > > MAX > >> > > > > >> >> WAND > >> > > > > >> >> feature, what are the required configurations? > >> > > > > >> >> > >> > > > > >> >> I am on solr 9.1.1 version. As per docs, including > >> > > > > "minExactCount=100" > >> > > > > >> >> query param should do it, however I don't see any drop in > >> query > >> > > > time, > >> > > > > >> it > >> > > > > >> >> is > >> > > > > >> >> more or less the same. Am I missing something? > >> > > > > >> >> > >> > > > > >> >> The queries I tested with are standard ones with edismax > as > >> > query > >> > > > > >> parser > >> > > > > >> >> and query text is converted into boolean clauses and query > >> has > >> > 2 > >> > > > > boost > >> > > > > >> >> params by date and popularity field. I included the > >> > > "minExactCount" > >> > > > > >> set to > >> > > > > >> >> as low as 10 and 100 and increased to 1000 but didn't see > >> key > >> > > > change > >> > > > > in > >> > > > > >> >> query time, it was about the same. > >> > > > > >> >> > >> > > > > >> >> Would including boost or use of edismax parser not > benefit > >> > with > >> > > > > block > >> > > > > >> MAX > >> > > > > >> >> WAND? Example query /select?q=((white) AND (roses OR > >> > > > > >> >> jasmine))&defType=edismax&qf=keywords description > >> > > > > >> >> > title&pf2=title&bf=recip(ms(NOW,datefield),3.16e-11,1,1)^2.0 > >> > > > > >> >> > >> > > > > >> >> > >> > > > > >> >> Thank you, > >> > > > > >> >> Rajani > >> > > > > >> >> > >> > > > > >> > > >> > > > > >> > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > -- > >> > > > Sincerely yours > >> > > > Mikhail Khludnev > >> > > > > >> > > > >> > > >> > > >> > -- > >> > Sincerely yours > >> > Mikhail Khludnev > >> > > >> > > > -- Sincerely yours Mikhail Khludnev