Hi Mikhail, Thank you for your explanation.
> Thus, if we put fq={!... cost=200 cache=false } ..., it will defer filter execution, and just check a few docs matching the main query - a huge performance boost. I understood that the performance boost is achieved by intersecting the smaller set against the larger set. > Note: if one gets along with bare Lucene via q= +{!.. main q} +{! .. filter q} it is smart enough to combine them in a most effective way, letting the main query to drive intersection. Oh, that's nice! > NB noone yet parser Occur.FILTER but already print it as #, combining two queries might not be easy due to query parser quirks. Understood. Thank you again for your helpful answer. BEST, Mingchun 2024年12月23日(月) 6:32 Mikhail Khludnev <m...@apache.org>: > Let's have highly selective q (matches a few docs), and weakly selective fq > (matches many documents, let it be a kind of access control query). > If we query them as is, it will took a while to materialize heavy filter > query eagerly, and then just check intersection with a few query results. > Thus, if we put fq={!... cost=200 cache=false } ..., it will defer filter > execution, and just check a few docs matching the main query - a huge > performance boost. > > Note: if one gets along with bare Lucene via q= +{!.. main q} +{! .. filter > q} it is smart enough to combine them in a most effective way, letting the > main query to drive intersection. > NB noone yet parser Occur.FILTER but already print it as #, combining two > queries might not be easy due to query parser quirks. > > On Sun, Dec 22, 2024 at 4:27 PM Mingchun Zhao <mingchun.zha...@gmail.com> > wrote: > > > Hi Mikhail, > > > > Thanks for your answer! > > > > > Here are two answers. Order of calling BQ.Builder.add() doesn't decide > > the > > > order of execution, as well as occur. > > > BooleanQuery lazily executes intersection dynamically and adjusts to > > actual > > > values, with many conditions and spec cases. > > > > Understood! > > > > > However, if you check SolrIndexSearcher.getProcessedFilter() you notice > > > that it executes filters eagerly and cache them (both up to > parameters). > > > So, here the filterQuery in most cases will be bitset (or other) ie > > > materialized filter. > > > > Understood, I checked the getProcessedFilter() method in the source code, > > and it was as you explained. > > > > > It depends on relative selectivity (number of matched documents) of q > and > > fq. > > > In an edge case deferring filters with cost>100 might get significant > > gain. > > > > I didn’t quite understand this part. Could you explain it in more detail > > please? > > Are you saying that the overall search performance differs depending on > the > > number of documents matched by q and fq, due to the varying load of > > calculating the intersection? Or are you suggesting that the load of the > > filtering process in the filterQuery can change depending on the number > of > > documents matched by q and fq as well? > > My understanding was that the filterQuery is executed independently of > the > > scoreQuery, applying the filter logic to the entire index and then > > calculating the intersection of the respective results. Therefore, I > > thought the processing order of q and fq wouldn’t affect the overall > search > > performance. > > > > > > Regards, > > Mingchun > > > > > -- > Sincerely yours > Mikhail Khludnev >