Re: Block MAX WAND feature use

rajani m Wed, 14 Feb 2024 09:45:54 -0800

Milkhail,

  Thanks for that pointer to test with a simple query. It works perfectly
with lucene query parser, I see qtime drop by 7 times with this param.


With edismax query, it works with certain caveats that "qf" (query fields)
must have only one field and the query must not have boost/bf parameters.
We would expect it to work with boost params because boost is applied after
the documents matched and scored by block max WAND as first pass. Am I
right? Without the support to "boost" params, the feature is not really
usable. The recency and popularity boosts are common to most queries.  What
are your thoughts?

Thank you,
Rajani


On Tue, Feb 6, 2024 at 2:54 PM rajani m <[email protected]> wrote:

>
> > With a 400M index it's worth experimenting with skipping about a million
> of docs.
>  Is there a param that allows setting how many docs to skip?
>
>  "minExactCount '' which decides how many docs it should care to score and
> I tested that with 100, 1000 and 2000 with latency only increased.
>
> Alessandro,
> Assuming it is approximately the total number of files under
> /solr/replica_name/data/index  - is it 222. The top k files sizes
>
> rw-r--r-- 1 solr solr  766M Feb  4 04:16 _chg1.cfs
> -rw-r--r-- 1 solr solr 1020M Jan 29 18:37 _ca21.cfs
> -rw-r--r-- 1 solr solr  3.7G Nov  5 23:49 _95vt.cfs
> -rw-r--r-- 1 solr solr  3.8G Jan 15 08:59 _boyy.cfs
> -rw-r--r-- 1 solr solr  3.8G Nov 29 16:01 _9ynt.cfs
> -rw-r--r-- 1 solr solr  3.8G Jan 25 00:47 _c3t7.cfs
> -rw-r--r-- 1 solr solr  4.1G Oct 26 14:37 _8pyh.cfs
> -rw-r--r-- 1 solr solr  4.1G Oct 26 14:38 _7cwt.cfs
> -rw-r--r-- 1 solr solr  4.3G Oct 27 06:04 _7s6c.cfs
> -rw-r--r-- 1 solr solr  4.3G Oct 26 14:37 _7n8z.cfs
> -rw-r--r-- 1 solr solr  4.5G Jan 18 00:30 _dteg.cfs
> -rw-r--r-- 1 solr solr  4.5G Jan 19 17:44 _cwcc.cfs
> -rw-r--r-- 1 solr solr  4.6G Jan 13 07:35 _blix.cfs
> -rw-r--r-- 1 solr solr  4.9G Oct 26 14:39 _8gu9.cfs
> -rw-r--r-- 1 solr solr  4.9G Oct 26 14:38 _3kj9.cfs
>
>
>
> On Tue, Feb 6, 2024 at 2:45 AM Alessandro Benedetti <[email protected]>
> wrote:
>
>> It would be interesting to see the level pf fragmentation of each index
>> indeed...
>> I.e. How many segments per core, in a collection
>>
>> On Tue, 6 Feb 2024, 06:59 Mikhail Khludnev, <[email protected]> wrote:
>>
>> > 200-300 docs might be too few to get significant gain. With a 400M index
>> > it's worth experimenting with skipping about a million of docs.
>> > In simplified params I mean defType=lucene&df=description. debugQuery
>> might
>> > expose some details as well.
>> > As far as I understand this feature works with large segments since it
>> > skips a block of a segment, not a segment (?).
>> >
>> > On Mon, Feb 5, 2024 at 8:04 PM rajani m <[email protected]> wrote:
>> >
>> > > The "numFound" value is 200-300 docs difference when compared to the
>> > query
>> > > without "minExactFound" param.  The collection has over 400m records
>> so
>> > > testing the feature on a large collection.  The numFoundExact param in
>> > the
>> > > response is consistently false which tells me the feature is
>> functioning
>> > > but the results(qtime) are just off, not as expected.
>> > >
>> > > Would a type of query parser matter?I tested without the secondary
>> sort,
>> > > even without it there is no improvement in the query time latency and
>> is
>> > > still more than the query without this param.
>> > >
>> > >
>> > >
>> > > On Mon, Feb 5, 2024 at 10:34 AM Mikhail Khludnev <[email protected]>
>> > wrote:
>> > >
>> > > > Hello,
>> > > > How many matches do you have in both cases?
>> > > > I see there's a second sorting expression, it might not comply with
>> the
>> > > > requirements.
>> > > > I'd rather start from the simple single query parser, just for the
>> > > > experiments.
>> > > > Note: I never tried it myself.
>> > > >
>> > > > On Mon, Feb 5, 2024 at 6:20 PM rajani m <[email protected]>
>> wrote:
>> > > >
>> > > > > I ran performance tests with different query sets and the results
>> > look
>> > > no
>> > > > > good, it is adding to the latency around ~15% instead of reducing
>> or
>> > > even
>> > > > > matching.  Not sure if I am missing something in the config or it
>> is
>> > an
>> > > > > issue.
>> > > > >
>> > > > > Here is an example query *without* WAND query parameter
>> > > > > select?&fl=id,ext_id&start=0&q.op=OR&sort=score desc,ext_id
>> > > > > asc&rows=10&q=white flowers card&defType=edismax&qf=keywords
>> > > description
>> > > > > title
>> > > > > vs
>> > > > > *With* WAND query parameter
>> > > > > select?&fl=id,ext_id&start=0&q.op=OR&sort=score desc,ext_id
>> > > > > asc&rows=10&q=white flowers card&defType=edismax&qf=keywords
>> > > description
>> > > > > title*&minExactCount=10*
>> > > > >
>> > > > > On Thu, Feb 1, 2024 at 8:36 AM rajani m <[email protected]>
>> > wrote:
>> > > > >
>> > > > > > Hi Ishan,
>> > > > > >    I have looked into that doc, and it looks like the solr
>> version
>> > > has
>> > > > to
>> > > > > > be >8.8 and the config needed is to add the query parameter
>> > > > > "&minExactCount=k"
>> > > > > > where k is 10 or 100 depending on the accuracy of the first k
>> docs.
>> > > > > >
>> > > > > > I ran a query performance test using an internal tool, with k
>> set
>> > to
>> > > 10
>> > > > > > and 100, which barely showed any difference in query time
>> latency,
>> > I
>> > > > > > didn't expect that so I was wondering if there is any
>> > configuration I
>> > > > > > missed.
>> > > > > >
>> > > > > > I will run a couple more tests with different query sets
>> meanwhile
>> > > and
>> > > > > dig
>> > > > > > further into implementation of the feature to see if I am
>> missing
>> > any
>> > > > > > config here. Appreciate any suggestions.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Rajani
>> > > > > >
>> > > > > > On Thu, Feb 1, 2024 at 12:53 AM Ishan Chattopadhyaya <
>> > > > > > [email protected]> wrote:
>> > > > > >
>> > > > > >> Is it possible to benchmark the query performance across a
>> larger
>> > > set
>> > > > of
>> > > > > >> queries? You can leverage Solr Bench, if needed.
>> > > > > >> https://github.com/fullstorydev/solr-bench
>> > > > > >>
>> > > > > >> On Thu, 1 Feb, 2024, 11:20 am Ishan Chattopadhyaya, <
>> > > > > >> [email protected]> wrote:
>> > > > > >>
>> > > > > >> > Some documentation is here
>> > > > > >> >
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://solr.apache.org/guide/8_6/common-query-parameters.html#minexactcount-parameter
>> > > > > >> >
>> > > > > >> > On Thu, 1 Feb, 2024, 9:53 am rajani m, <
>> [email protected]>
>> > > > wrote:
>> > > > > >> >
>> > > > > >> >> Hi All,
>> > > > > >> >>
>> > > > > >> >>   To leverage the query time improvements that come with the
>> > > Block
>> > > > > MAX
>> > > > > >> >> WAND
>> > > > > >> >> feature, what are the required configurations?
>> > > > > >> >>
>> > > > > >> >> I am on solr 9.1.1 version. As per docs, including
>> > > > > "minExactCount=100"
>> > > > > >> >> query param should do it, however I don't see any drop in
>> query
>> > > > time,
>> > > > > >> it
>> > > > > >> >> is
>> > > > > >> >> more or less the same. Am I missing something?
>> > > > > >> >>
>> > > > > >> >> The queries I tested with are standard ones with edismax as
>> > query
>> > > > > >> parser
>> > > > > >> >> and query text is converted into boolean clauses and query
>> has
>> > 2
>> > > > > boost
>> > > > > >> >> params by date and popularity field. I included the
>> > > "minExactCount"
>> > > > > >> set to
>> > > > > >> >> as low as 10 and 100 and increased to 1000 but didn't see
>> key
>> > > > change
>> > > > > in
>> > > > > >> >> query time, it was about the same.
>> > > > > >> >>
>> > > > > >> >>  Would including boost or use of edismax parser not benefit
>> > with
>> > > > > block
>> > > > > >> MAX
>> > > > > >> >> WAND? Example query  /select?q=((white) AND (roses OR
>> > > > > >> >> jasmine))&defType=edismax&qf=keywords description
>> > > > > >> >> title&pf2=title&bf=recip(ms(NOW,datefield),3.16e-11,1,1)^2.0
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >> >> Thank you,
>> > > > > >> >> Rajani
>> > > > > >> >>
>> > > > > >> >
>> > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Sincerely yours
>> > > > Mikhail Khludnev
>> > > >
>> > >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> >
>>
>

Re: Block MAX WAND feature use

Reply via email to