[ https://issues.apache.org/jira/browse/SOLR-16858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798321#comment-17798321 ]
Chris M. Hostetter commented on SOLR-16858: ------------------------------------------- {quote}...let me explain here and if that old pull request is not relevant anymore, I'll proceed with a deep review of this PR... {quote} yes please – I definitely think you should give a more in depth read of some of the examples I posted above, and the test cases added in my PR – what you're talking about really seems to be orthogonal to the flexibility I'm trying to add? {quote}In this way, any filter should work as it works for any other Solr query (where you can decide if doing a prefilter of postfilter based on the cache and cost >100 of the filter). Also, include/exclude should work as usual. {quote} I think we're having a disconnect in concepts, so I'd like to clarify terminology.... Historically, before any notion of knn, solr has had 2 types of {{fq}} "filters" * Plain old "regular" filter queries – that can be evaluated completely independently of each other or the main query ** These are typically cached, so that they can be re-used in other request ** If they aren't cached some small optimizations are available * {{PostFilter}} which allows the ability to defer "expensive" match/scoring computation until we _know_ that the document matches all other parts of the query (including "regular filter queries" ** Only a small handful of built in {{QParser}} s can be used a {{PostFilter}} ** Never cached, because they are entirely dependent on situation KNN queries, really have their own filtering: * "pre-filter" the set of documents considered per segment when identifying the topK ** The {{KnnVectorQuery}} classes is essentially a _wrapper_ around another "inner" query (like {{{}ConstantScoreQuery{}}}, {{{}BoostQuery{}}}, etc...) ** The KNN score calculations only consider documents that match the inner query So conceptually, we've got 1. Solr's "regular" filters 2. Solr's {{PostFilter}} 3. KNN's pre-filter When the {{KnnQParser}} was added to Solr, The assumption/implementation was/is: * (A) when {{KnnQParser}} is used as the main query: ** *All* of Solr's "regular" filter quries should be used as the KNN "pre-filter" * (B) when {{KnnQParser}} is itself used as a "regular" solr filter, or as a subquery: ** There should be *no* KNN pre-filter at all Essentially: the design of the {{KnnQParser}} assumes a tight, "all or nothing" coupling between the KNN pre-filter and Solr's "regular" filters. If we go back to what you described regarding your PR... {quote}In this way, any filter should work as it works for any other Solr query (where you can decide if doing a prefilter of postfilter based on the cache and cost >100 of the filter). Also, include/exclude should work as usual. {quote} ... that sounds like you're saying there is a bug in "(A)" you want to address, such that it's not just wrapping the "regular" filters, it's also wrapping any/all {{PostFilter}} s as well (which i hadn't noticed before, but thinking about how the code works it makes sense that bug would exist) and you have an approach in PR you were planning to use to tackle that. As I said, I have not considered the {{PostFilter}} bug case you are describing – What I'm focused on is giving users the ability to decouple that "all or nothing" assumption: * Give users the ability to control what "pre-filtering" is used when {{KnnQParser}} is the main query ** Really important for a lot of faceting usecases, when you want to be able to add "regular filters" for facet drill down (to narrow the set of documents returned) that should not become part of the KNN "pre-filter" * Give users the ability to specify _some_ "pre-filtering" even when {{KnnQParser}} is *NOT* the main query ** Example: pre-filter on {{inStock:true}} to get the best {{topK}} possible, even when {{KnnQParser}} is a subquery... {noformat} q=(name:foo AND (category:hot-reviews OR {!knn f=vfield topK=100 v=$vec fq='inStock'}) {noformat} ** Example: Multiple {{KnnQParser}} instances in the same request that want to pre-filter on different things... {noformat} q=({!knn f=vfield topK=100 v=$vec fq='category:legal'}^3 OR {!knn f=vfield topK=100 v=$vec fq='category:hr'}^7) vec=... {noformat} Does that make sense? ---- {quote}Do we have other query parsers that have a local FQ? {quote} No, but we also don't have any other query parsers that _implicitly_ slurp up all other "regular" filters and use those to change their internal behavior. My goal in adding an {{fq}} (and {{{}excludeTag{}}}/{{{}includeTag{}}}) local params is to give users the ability to override that very special implicit behavior. We _DO_ have lots of other query parsers that are designed to "wrap" other queries – and IMO {{KnnQParser}} probably should have been implemented that way from the beginning – similar to how the {{boost}} or {{join}} QParsers work. ie: no implicit slurping of {{fq}} params, the vector to score with is a local param, and the body of the parser is the query to wrap for the "knn pre-filter"... {noformat} q={!knn f=myfield topK=10 v='[1,2,3,4]'}inStock:true fq=foo:"nothing special happens to this filter" {noformat} ...but we can't go back in time now :) So instead i'm trying to provide ways to move forward with additional use cases that aren't supported by the current design. > Allow KnnQParser to selectively apply filters > --------------------------------------------- > > Key: SOLR-16858 > URL: https://issues.apache.org/jira/browse/SOLR-16858 > Project: Solr > Issue Type: Bug > Reporter: Joel Bernstein > Assignee: Chris M. Hostetter > Priority: Major > Labels: hybrid-search > Attachments: SOLR-16858-1.patch, SOLR-16858.patch > > Time Spent: 10m > Remaining Estimate: 0h > > The KnnQParser is parsing the filter query which limits the rows considered > by the vector query with the following method: > {code:java} > private Query getFilterQuery() throws SolrException, SyntaxError { > boolean isSubQuery = recurseCount != 0; > if (!isFilter() && !isSubQuery) { > String[] filterQueries = req.getParams().getParams(CommonParams.FQ); > if (filterQueries != null && filterQueries.length != 0) { > try { > List<Query> filters = QueryUtils.parseFilterQueries(req); > SolrIndexSearcher.ProcessedFilter processedFilter = > req.getSearcher().getProcessedFilter(filters); > return processedFilter.filter; > } catch (IOException e) { > throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e); > } > } > } > return null; > } > {code} > This is pulling all filter queries from the main query parameters and using > them to limit the vector query. This is the automatic behavior of the > KnnQParser. > There are cases where you may want to selectively apply different filters. > One such case is SOLR-16857 which involves reRanking a collapsed query. > Overriding the default filter behavior could be done by adding an "fq" local > parameter to the KnnQParser which would override the default filtering > behavior. > {code:java} > {!knn f=vector topK=10 fq=$kfq}[...]&kfq=myquery > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org