Hi Mikhail,

Thank you for your explanation.

> Thus, if we put fq={!... cost=200 cache=false } ..., it will defer filter
execution, and just check a few docs matching the main query - a huge
performance boost.

I understood that the performance boost is achieved by intersecting the
smaller set against the larger set.

> Note: if one gets along with bare Lucene via q= +{!.. main q} +{! ..
filter q} it is smart enough to combine them in a most effective way,
letting the main query to drive intersection.

Oh, that's nice!

> NB noone yet parser Occur.FILTER but already print it as #, combining two
queries might not be easy due to query parser quirks.

Understood.


Thank you again for your helpful answer.

BEST,
Mingchun



2024年12月23日(月) 6:32 Mikhail Khludnev <m...@apache.org>:

> Let's have highly selective q (matches a few docs), and weakly selective fq
> (matches many documents, let it be a kind of access control query).
> If we query them as is, it will took a while to materialize heavy filter
> query eagerly, and then just check intersection with a few query results.
> Thus, if we put fq={!... cost=200 cache=false } ..., it will defer filter
> execution, and just check a few docs matching the main query - a huge
> performance boost.
>
> Note: if one gets along with bare Lucene via q= +{!.. main q} +{! .. filter
> q} it is smart enough to combine them in a most effective way, letting the
> main query to drive intersection.
> NB noone yet parser Occur.FILTER but already print it as #, combining two
> queries might not be easy due to query parser quirks.
>
> On Sun, Dec 22, 2024 at 4:27 PM Mingchun Zhao <mingchun.zha...@gmail.com>
> wrote:
>
> > Hi Mikhail,
> >
> > Thanks for your answer!
> >
> > > Here are two answers. Order of calling BQ.Builder.add() doesn't decide
> > the
> > > order of execution, as well as occur.
> > > BooleanQuery lazily executes intersection dynamically and adjusts to
> > actual
> > > values, with many conditions and spec cases.
> >
> > Understood!
> >
> > > However, if you check SolrIndexSearcher.getProcessedFilter() you notice
> > > that it executes filters eagerly and cache them (both up to
> parameters).
> > > So, here the filterQuery in most cases will be bitset (or other) ie
> > > materialized filter.
> >
> > Understood, I checked the getProcessedFilter() method in the source code,
> > and it was as you explained.
> >
> > > It depends on relative selectivity (number of matched documents) of q
> and
> > fq.
> > > In an edge case deferring filters with cost>100 might get significant
> > gain.
> >
> > I didn’t quite understand this part. Could you explain it in more detail
> > please?
> > Are you saying that the overall search performance differs depending on
> the
> > number of documents matched by q and fq, due to the varying load of
> > calculating the intersection? Or are you suggesting that the load of the
> > filtering process in the filterQuery can change depending on the number
> of
> > documents matched by q and fq as well?
> > My understanding was that the filterQuery is executed independently of
> the
> > scoreQuery, applying the filter logic to the entire index and then
> > calculating the intersection of the respective results. Therefore, I
> > thought the processing order of q and fq wouldn’t affect the overall
> search
> > performance.
> >
> >
> > Regards,
> > Mingchun
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Reply via email to