Re: Block MAX WAND feature use

Mikhail Khludnev Wed, 14 Feb 2024 10:54:20 -0800

Cool.
Btw can you rerank results with the corresponding boost query?

On Wed, Feb 14, 2024 at 8:46 PM rajani m <rajinima...@gmail.com> wrote:


> Milkhail,
>
>   Thanks for that pointer to test with a simple query. It works perfectly
> with lucene query parser, I see qtime drop by 7 times with this param.
>
> With edismax query, it works with certain caveats that "qf" (query fields)
> must have only one field and the query must not have boost/bf parameters.
> We would expect it to work with boost params because boost is applied after
> the documents matched and scored by block max WAND as first pass. Am I
> right? Without the support to "boost" params, the feature is not really
> usable. The recency and popularity boosts are common to most queries.  What
> are your thoughts?
>
> Thank you,
> Rajani
>
>
> On Tue, Feb 6, 2024 at 2:54 PM rajani m <rajinima...@gmail.com> wrote:
>
> >
> > > With a 400M index it's worth experimenting with skipping about a
> million
> > of docs.
> >  Is there a param that allows setting how many docs to skip?
> >
> >  "minExactCount '' which decides how many docs it should care to score
> and
> > I tested that with 100, 1000 and 2000 with latency only increased.
> >
> > Alessandro,
> > Assuming it is approximately the total number of files under
> > /solr/replica_name/data/index  - is it 222. The top k files sizes
> >
> > rw-r--r-- 1 solr solr  766M Feb  4 04:16 _chg1.cfs
> > -rw-r--r-- 1 solr solr 1020M Jan 29 18:37 _ca21.cfs
> > -rw-r--r-- 1 solr solr  3.7G Nov  5 23:49 _95vt.cfs
> > -rw-r--r-- 1 solr solr  3.8G Jan 15 08:59 _boyy.cfs
> > -rw-r--r-- 1 solr solr  3.8G Nov 29 16:01 _9ynt.cfs
> > -rw-r--r-- 1 solr solr  3.8G Jan 25 00:47 _c3t7.cfs
> > -rw-r--r-- 1 solr solr  4.1G Oct 26 14:37 _8pyh.cfs
> > -rw-r--r-- 1 solr solr  4.1G Oct 26 14:38 _7cwt.cfs
> > -rw-r--r-- 1 solr solr  4.3G Oct 27 06:04 _7s6c.cfs
> > -rw-r--r-- 1 solr solr  4.3G Oct 26 14:37 _7n8z.cfs
> > -rw-r--r-- 1 solr solr  4.5G Jan 18 00:30 _dteg.cfs
> > -rw-r--r-- 1 solr solr  4.5G Jan 19 17:44 _cwcc.cfs
> > -rw-r--r-- 1 solr solr  4.6G Jan 13 07:35 _blix.cfs
> > -rw-r--r-- 1 solr solr  4.9G Oct 26 14:39 _8gu9.cfs
> > -rw-r--r-- 1 solr solr  4.9G Oct 26 14:38 _3kj9.cfs
> >
> >
> >
> > On Tue, Feb 6, 2024 at 2:45 AM Alessandro Benedetti <
> a.benede...@sease.io>
> > wrote:
> >
> >> It would be interesting to see the level pf fragmentation of each index
> >> indeed...
> >> I.e. How many segments per core, in a collection
> >>
> >> On Tue, 6 Feb 2024, 06:59 Mikhail Khludnev, <m...@apache.org> wrote:
> >>
> >> > 200-300 docs might be too few to get significant gain. With a 400M
> index
> >> > it's worth experimenting with skipping about a million of docs.
> >> > In simplified params I mean defType=lucene&df=description. debugQuery
> >> might
> >> > expose some details as well.
> >> > As far as I understand this feature works with large segments since it
> >> > skips a block of a segment, not a segment (?).
> >> >
> >> > On Mon, Feb 5, 2024 at 8:04 PM rajani m <rajinima...@gmail.com>
> wrote:
> >> >
> >> > > The "numFound" value is 200-300 docs difference when compared to the
> >> > query
> >> > > without "minExactFound" param.  The collection has over 400m records
> >> so
> >> > > testing the feature on a large collection.  The numFoundExact param
> in
> >> > the
> >> > > response is consistently false which tells me the feature is
> >> functioning
> >> > > but the results(qtime) are just off, not as expected.
> >> > >
> >> > > Would a type of query parser matter?I tested without the secondary
> >> sort,
> >> > > even without it there is no improvement in the query time latency
> and
> >> is
> >> > > still more than the query without this param.
> >> > >
> >> > >
> >> > >
> >> > > On Mon, Feb 5, 2024 at 10:34 AM Mikhail Khludnev <m...@apache.org>
> >> > wrote:
> >> > >
> >> > > > Hello,
> >> > > > How many matches do you have in both cases?
> >> > > > I see there's a second sorting expression, it might not comply
> with
> >> the
> >> > > > requirements.
> >> > > > I'd rather start from the simple single query parser, just for the
> >> > > > experiments.
> >> > > > Note: I never tried it myself.
> >> > > >
> >> > > > On Mon, Feb 5, 2024 at 6:20 PM rajani m <rajinima...@gmail.com>
> >> wrote:
> >> > > >
> >> > > > > I ran performance tests with different query sets and the
> results
> >> > look
> >> > > no
> >> > > > > good, it is adding to the latency around ~15% instead of
> reducing
> >> or
> >> > > even
> >> > > > > matching.  Not sure if I am missing something in the config or
> it
> >> is
> >> > an
> >> > > > > issue.
> >> > > > >
> >> > > > > Here is an example query *without* WAND query parameter
> >> > > > > select?&fl=id,ext_id&start=0&q.op=OR&sort=score desc,ext_id
> >> > > > > asc&rows=10&q=white flowers card&defType=edismax&qf=keywords
> >> > > description
> >> > > > > title
> >> > > > > vs
> >> > > > > *With* WAND query parameter
> >> > > > > select?&fl=id,ext_id&start=0&q.op=OR&sort=score desc,ext_id
> >> > > > > asc&rows=10&q=white flowers card&defType=edismax&qf=keywords
> >> > > description
> >> > > > > title*&minExactCount=10*
> >> > > > >
> >> > > > > On Thu, Feb 1, 2024 at 8:36 AM rajani m <rajinima...@gmail.com>
> >> > wrote:
> >> > > > >
> >> > > > > > Hi Ishan,
> >> > > > > >    I have looked into that doc, and it looks like the solr
> >> version
> >> > > has
> >> > > > to
> >> > > > > > be >8.8 and the config needed is to add the query parameter
> >> > > > > "&minExactCount=k"
> >> > > > > > where k is 10 or 100 depending on the accuracy of the first k
> >> docs.
> >> > > > > >
> >> > > > > > I ran a query performance test using an internal tool, with k
> >> set
> >> > to
> >> > > 10
> >> > > > > > and 100, which barely showed any difference in query time
> >> latency,
> >> > I
> >> > > > > > didn't expect that so I was wondering if there is any
> >> > configuration I
> >> > > > > > missed.
> >> > > > > >
> >> > > > > > I will run a couple more tests with different query sets
> >> meanwhile
> >> > > and
> >> > > > > dig
> >> > > > > > further into implementation of the feature to see if I am
> >> missing
> >> > any
> >> > > > > > config here. Appreciate any suggestions.
> >> > > > > >
> >> > > > > > Thanks,
> >> > > > > > Rajani
> >> > > > > >
> >> > > > > > On Thu, Feb 1, 2024 at 12:53 AM Ishan Chattopadhyaya <
> >> > > > > > ichattopadhy...@gmail.com> wrote:
> >> > > > > >
> >> > > > > >> Is it possible to benchmark the query performance across a
> >> larger
> >> > > set
> >> > > > of
> >> > > > > >> queries? You can leverage Solr Bench, if needed.
> >> > > > > >> https://github.com/fullstorydev/solr-bench
> >> > > > > >>
> >> > > > > >> On Thu, 1 Feb, 2024, 11:20 am Ishan Chattopadhyaya, <
> >> > > > > >> ichattopadhy...@gmail.com> wrote:
> >> > > > > >>
> >> > > > > >> > Some documentation is here
> >> > > > > >> >
> >> > > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://solr.apache.org/guide/8_6/common-query-parameters.html#minexactcount-parameter
> >> > > > > >> >
> >> > > > > >> > On Thu, 1 Feb, 2024, 9:53 am rajani m, <
> >> rajinima...@gmail.com>
> >> > > > wrote:
> >> > > > > >> >
> >> > > > > >> >> Hi All,
> >> > > > > >> >>
> >> > > > > >> >>   To leverage the query time improvements that come with
> the
> >> > > Block
> >> > > > > MAX
> >> > > > > >> >> WAND
> >> > > > > >> >> feature, what are the required configurations?
> >> > > > > >> >>
> >> > > > > >> >> I am on solr 9.1.1 version. As per docs, including
> >> > > > > "minExactCount=100"
> >> > > > > >> >> query param should do it, however I don't see any drop in
> >> query
> >> > > > time,
> >> > > > > >> it
> >> > > > > >> >> is
> >> > > > > >> >> more or less the same. Am I missing something?
> >> > > > > >> >>
> >> > > > > >> >> The queries I tested with are standard ones with edismax
> as
> >> > query
> >> > > > > >> parser
> >> > > > > >> >> and query text is converted into boolean clauses and query
> >> has
> >> > 2
> >> > > > > boost
> >> > > > > >> >> params by date and popularity field. I included the
> >> > > "minExactCount"
> >> > > > > >> set to
> >> > > > > >> >> as low as 10 and 100 and increased to 1000 but didn't see
> >> key
> >> > > > change
> >> > > > > in
> >> > > > > >> >> query time, it was about the same.
> >> > > > > >> >>
> >> > > > > >> >>  Would including boost or use of edismax parser not
> benefit
> >> > with
> >> > > > > block
> >> > > > > >> MAX
> >> > > > > >> >> WAND? Example query  /select?q=((white) AND (roses OR
> >> > > > > >> >> jasmine))&defType=edismax&qf=keywords description
> >> > > > > >> >>
> title&pf2=title&bf=recip(ms(NOW,datefield),3.16e-11,1,1)^2.0
> >> > > > > >> >>
> >> > > > > >> >>
> >> > > > > >> >> Thank you,
> >> > > > > >> >> Rajani
> >> > > > > >> >>
> >> > > > > >> >
> >> > > > > >>
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Sincerely yours
> >> > > > Mikhail Khludnev
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Sincerely yours
> >> > Mikhail Khludnev
> >> >
> >>
> >
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Block MAX WAND feature use

Reply via email to