Re: Block MAX WAND feature use

rajani m Thu, 15 Feb 2024 09:16:32 -0800

If the boosts are multiple function queries such as the following[1] then
the boost query would be a sum function surrounding them, is it? I missed
that one.
 [1] "sum(product(popularity,2),1.0)" and
"recip(ms(NOW/HOUR,date),3.163e-11,1,1)"



I will post the question on the dev channel regarding what is expected when
there are multiple query fields.

Thanks again for all your help and pointers.


On Thu, Feb 15, 2024 at 2:25 AM Mikhail Khludnev <m...@apache.org> wrote:

> Hello,
> Please check inline below.
>
> On Thu, Feb 15, 2024 at 2:11 AM rajani m <rajinima...@gmail.com> wrote:
>
> > Yes, rerank works as an alternative, but the rerank only supports one
> boost
> > query, correct? If there are multiple boost conditions such as boost by
> > date, season and popularity, putting all of them into one complex boost
> > query is a hard problem, rerank by LTR can help.  Thank you for that
> > pointer.
> >
> Perhaps I missed something, but why boost query can't combine multiple
> clauses?
>
>
> >
> > The other limitation is that it is not possible to query multiple
> > fields and leverage this feature, that is still an issue, because it is
> > also a common use case to have title, description and keyword fields
> > separated rather than merged into one.
> >
> It's worth checking with dev@.
> I didn't code anything there, my vague understanding is: it skips blocks of
> docIDs with max tf (terms freqs) fewer than seen so far.
> So, such skip conditions should be pushed through query scoring logic,
> which might not be (or it might?) obvious in case of the max over fields.
>
>
> >
> >
> > Regards,
> > Rajani
> >
> > On Wed, Feb 14, 2024 at 1:54 PM Mikhail Khludnev <m...@apache.org>
> wrote:
> >
> > > Cool.
> > > Btw can you rerank results with the corresponding boost query?
> > >
> > > On Wed, Feb 14, 2024 at 8:46 PM rajani m <rajinima...@gmail.com>
> wrote:
> > >
> > > > Milkhail,
> > > >
> > > >   Thanks for that pointer to test with a simple query. It works
> > perfectly
> > > > with lucene query parser, I see qtime drop by 7 times with this
> param.
> > > >
> > > > With edismax query, it works with certain caveats that "qf" (query
> > > fields)
> > > > must have only one field and the query must not have boost/bf
> > parameters.
> > > > We would expect it to work with boost params because boost is applied
> > > after
> > > > the documents matched and scored by block max WAND as first pass. Am
> I
> > > > right? Without the support to "boost" params, the feature is not
> really
> > > > usable. The recency and popularity boosts are common to most queries.
> > > What
> > > > are your thoughts?
> > > >
> > > > Thank you,
> > > > Rajani
> > > >
> > > >
> > > > On Tue, Feb 6, 2024 at 2:54 PM rajani m <rajinima...@gmail.com>
> wrote:
> > > >
> > > > >
> > > > > > With a 400M index it's worth experimenting with skipping about a
> > > > million
> > > > > of docs.
> > > > >  Is there a param that allows setting how many docs to skip?
> > > > >
> > > > >  "minExactCount '' which decides how many docs it should care to
> > score
> > > > and
> > > > > I tested that with 100, 1000 and 2000 with latency only increased.
> > > > >
> > > > > Alessandro,
> > > > > Assuming it is approximately the total number of files under
> > > > > /solr/replica_name/data/index  - is it 222. The top k files sizes
> > > > >
> > > > > rw-r--r-- 1 solr solr  766M Feb  4 04:16 _chg1.cfs
> > > > > -rw-r--r-- 1 solr solr 1020M Jan 29 18:37 _ca21.cfs
> > > > > -rw-r--r-- 1 solr solr  3.7G Nov  5 23:49 _95vt.cfs
> > > > > -rw-r--r-- 1 solr solr  3.8G Jan 15 08:59 _boyy.cfs
> > > > > -rw-r--r-- 1 solr solr  3.8G Nov 29 16:01 _9ynt.cfs
> > > > > -rw-r--r-- 1 solr solr  3.8G Jan 25 00:47 _c3t7.cfs
> > > > > -rw-r--r-- 1 solr solr  4.1G Oct 26 14:37 _8pyh.cfs
> > > > > -rw-r--r-- 1 solr solr  4.1G Oct 26 14:38 _7cwt.cfs
> > > > > -rw-r--r-- 1 solr solr  4.3G Oct 27 06:04 _7s6c.cfs
> > > > > -rw-r--r-- 1 solr solr  4.3G Oct 26 14:37 _7n8z.cfs
> > > > > -rw-r--r-- 1 solr solr  4.5G Jan 18 00:30 _dteg.cfs
> > > > > -rw-r--r-- 1 solr solr  4.5G Jan 19 17:44 _cwcc.cfs
> > > > > -rw-r--r-- 1 solr solr  4.6G Jan 13 07:35 _blix.cfs
> > > > > -rw-r--r-- 1 solr solr  4.9G Oct 26 14:39 _8gu9.cfs
> > > > > -rw-r--r-- 1 solr solr  4.9G Oct 26 14:38 _3kj9.cfs
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Feb 6, 2024 at 2:45 AM Alessandro Benedetti <
> > > > a.benede...@sease.io>
> > > > > wrote:
> > > > >
> > > > >> It would be interesting to see the level pf fragmentation of each
> > > index
> > > > >> indeed...
> > > > >> I.e. How many segments per core, in a collection
> > > > >>
> > > > >> On Tue, 6 Feb 2024, 06:59 Mikhail Khludnev, <m...@apache.org>
> > wrote:
> > > > >>
> > > > >> > 200-300 docs might be too few to get significant gain. With a
> 400M
> > > > index
> > > > >> > it's worth experimenting with skipping about a million of docs.
> > > > >> > In simplified params I mean defType=lucene&df=description.
> > > debugQuery
> > > > >> might
> > > > >> > expose some details as well.
> > > > >> > As far as I understand this feature works with large segments
> > since
> > > it
> > > > >> > skips a block of a segment, not a segment (?).
> > > > >> >
> > > > >> > On Mon, Feb 5, 2024 at 8:04 PM rajani m <rajinima...@gmail.com>
> > > > wrote:
> > > > >> >
> > > > >> > > The "numFound" value is 200-300 docs difference when compared
> to
> > > the
> > > > >> > query
> > > > >> > > without "minExactFound" param.  The collection has over 400m
> > > records
> > > > >> so
> > > > >> > > testing the feature on a large collection.  The numFoundExact
> > > param
> > > > in
> > > > >> > the
> > > > >> > > response is consistently false which tells me the feature is
> > > > >> functioning
> > > > >> > > but the results(qtime) are just off, not as expected.
> > > > >> > >
> > > > >> > > Would a type of query parser matter?I tested without the
> > secondary
> > > > >> sort,
> > > > >> > > even without it there is no improvement in the query time
> > latency
> > > > and
> > > > >> is
> > > > >> > > still more than the query without this param.
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > On Mon, Feb 5, 2024 at 10:34 AM Mikhail Khludnev <
> > m...@apache.org
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Hello,
> > > > >> > > > How many matches do you have in both cases?
> > > > >> > > > I see there's a second sorting expression, it might not
> comply
> > > > with
> > > > >> the
> > > > >> > > > requirements.
> > > > >> > > > I'd rather start from the simple single query parser, just
> for
> > > the
> > > > >> > > > experiments.
> > > > >> > > > Note: I never tried it myself.
> > > > >> > > >
> > > > >> > > > On Mon, Feb 5, 2024 at 6:20 PM rajani m <
> > rajinima...@gmail.com>
> > > > >> wrote:
> > > > >> > > >
> > > > >> > > > > I ran performance tests with different query sets and the
> > > > results
> > > > >> > look
> > > > >> > > no
> > > > >> > > > > good, it is adding to the latency around ~15% instead of
> > > > reducing
> > > > >> or
> > > > >> > > even
> > > > >> > > > > matching.  Not sure if I am missing something in the
> config
> > or
> > > > it
> > > > >> is
> > > > >> > an
> > > > >> > > > > issue.
> > > > >> > > > >
> > > > >> > > > > Here is an example query *without* WAND query parameter
> > > > >> > > > > select?&fl=id,ext_id&start=0&q.op=OR&sort=score
> desc,ext_id
> > > > >> > > > > asc&rows=10&q=white flowers
> card&defType=edismax&qf=keywords
> > > > >> > > description
> > > > >> > > > > title
> > > > >> > > > > vs
> > > > >> > > > > *With* WAND query parameter
> > > > >> > > > > select?&fl=id,ext_id&start=0&q.op=OR&sort=score
> desc,ext_id
> > > > >> > > > > asc&rows=10&q=white flowers
> card&defType=edismax&qf=keywords
> > > > >> > > description
> > > > >> > > > > title*&minExactCount=10*
> > > > >> > > > >
> > > > >> > > > > On Thu, Feb 1, 2024 at 8:36 AM rajani m <
> > > rajinima...@gmail.com>
> > > > >> > wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi Ishan,
> > > > >> > > > > >    I have looked into that doc, and it looks like the
> solr
> > > > >> version
> > > > >> > > has
> > > > >> > > > to
> > > > >> > > > > > be >8.8 and the config needed is to add the query
> > parameter
> > > > >> > > > > "&minExactCount=k"
> > > > >> > > > > > where k is 10 or 100 depending on the accuracy of the
> > first
> > > k
> > > > >> docs.
> > > > >> > > > > >
> > > > >> > > > > > I ran a query performance test using an internal tool,
> > with
> > > k
> > > > >> set
> > > > >> > to
> > > > >> > > 10
> > > > >> > > > > > and 100, which barely showed any difference in query
> time
> > > > >> latency,
> > > > >> > I
> > > > >> > > > > > didn't expect that so I was wondering if there is any
> > > > >> > configuration I
> > > > >> > > > > > missed.
> > > > >> > > > > >
> > > > >> > > > > > I will run a couple more tests with different query sets
> > > > >> meanwhile
> > > > >> > > and
> > > > >> > > > > dig
> > > > >> > > > > > further into implementation of the feature to see if I
> am
> > > > >> missing
> > > > >> > any
> > > > >> > > > > > config here. Appreciate any suggestions.
> > > > >> > > > > >
> > > > >> > > > > > Thanks,
> > > > >> > > > > > Rajani
> > > > >> > > > > >
> > > > >> > > > > > On Thu, Feb 1, 2024 at 12:53 AM Ishan Chattopadhyaya <
> > > > >> > > > > > ichattopadhy...@gmail.com> wrote:
> > > > >> > > > > >
> > > > >> > > > > >> Is it possible to benchmark the query performance
> across
> > a
> > > > >> larger
> > > > >> > > set
> > > > >> > > > of
> > > > >> > > > > >> queries? You can leverage Solr Bench, if needed.
> > > > >> > > > > >> https://github.com/fullstorydev/solr-bench
> > > > >> > > > > >>
> > > > >> > > > > >> On Thu, 1 Feb, 2024, 11:20 am Ishan Chattopadhyaya, <
> > > > >> > > > > >> ichattopadhy...@gmail.com> wrote:
> > > > >> > > > > >>
> > > > >> > > > > >> > Some documentation is here
> > > > >> > > > > >> >
> > > > >> > > > > >>
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://solr.apache.org/guide/8_6/common-query-parameters.html#minexactcount-parameter
> > > > >> > > > > >> >
> > > > >> > > > > >> > On Thu, 1 Feb, 2024, 9:53 am rajani m, <
> > > > >> rajinima...@gmail.com>
> > > > >> > > > wrote:
> > > > >> > > > > >> >
> > > > >> > > > > >> >> Hi All,
> > > > >> > > > > >> >>
> > > > >> > > > > >> >>   To leverage the query time improvements that come
> > with
> > > > the
> > > > >> > > Block
> > > > >> > > > > MAX
> > > > >> > > > > >> >> WAND
> > > > >> > > > > >> >> feature, what are the required configurations?
> > > > >> > > > > >> >>
> > > > >> > > > > >> >> I am on solr 9.1.1 version. As per docs, including
> > > > >> > > > > "minExactCount=100"
> > > > >> > > > > >> >> query param should do it, however I don't see any
> drop
> > > in
> > > > >> query
> > > > >> > > > time,
> > > > >> > > > > >> it
> > > > >> > > > > >> >> is
> > > > >> > > > > >> >> more or less the same. Am I missing something?
> > > > >> > > > > >> >>
> > > > >> > > > > >> >> The queries I tested with are standard ones with
> > edismax
> > > > as
> > > > >> > query
> > > > >> > > > > >> parser
> > > > >> > > > > >> >> and query text is converted into boolean clauses and
> > > query
> > > > >> has
> > > > >> > 2
> > > > >> > > > > boost
> > > > >> > > > > >> >> params by date and popularity field. I included the
> > > > >> > > "minExactCount"
> > > > >> > > > > >> set to
> > > > >> > > > > >> >> as low as 10 and 100 and increased to 1000 but
> didn't
> > > see
> > > > >> key
> > > > >> > > > change
> > > > >> > > > > in
> > > > >> > > > > >> >> query time, it was about the same.
> > > > >> > > > > >> >>
> > > > >> > > > > >> >>  Would including boost or use of edismax parser not
> > > > benefit
> > > > >> > with
> > > > >> > > > > block
> > > > >> > > > > >> MAX
> > > > >> > > > > >> >> WAND? Example query  /select?q=((white) AND (roses
> OR
> > > > >> > > > > >> >> jasmine))&defType=edismax&qf=keywords description
> > > > >> > > > > >> >>
> > > > title&pf2=title&bf=recip(ms(NOW,datefield),3.16e-11,1,1)^2.0
> > > > >> > > > > >> >>
> > > > >> > > > > >> >>
> > > > >> > > > > >> >> Thank you,
> > > > >> > > > > >> >> Rajani
> > > > >> > > > > >> >>
> > > > >> > > > > >> >
> > > > >> > > > > >>
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > > Sincerely yours
> > > > >> > > > Mikhail Khludnev
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Sincerely yours
> > > > >> > Mikhail Khludnev
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Re: Block MAX WAND feature use

Reply via email to