[
https://issues.apache.org/jira/browse/SOLR-17928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18032183#comment-18032183
]
Chris M. Hostetter commented on SOLR-17928:
-------------------------------------------
misc comments skimming the patch:
* Jira summary/description (and the CHANGES entry in the patch) revert to
adding an {{"efSearch"}} parameter – which is what the code variable gets
named, but the actual parameter users will see is named {{"ef-search"}} in the
code ... i don't know of any other local parameters in solr that use a dash
character? using the cameal case {{"efSearch"}} would certainly be more
consistent with other params
* I had to re-read the {{SolrKnnXXXVectorQuery}} classes 4 times before I
could wrap my head around why it was maintaining a local copy of {{private
final int topK}} since the super class already has a {{protected final int k}}
... can we please add a comment drawing attention to the fact that we pass
{{efSearch}} to the super constructor to use as the "k" for the graph walk
* Neither of the {{SolrKnnXXXVectorQuery}} classes override hashCode or
equals, so similar queries with distinct {{topK}} values are going to cause
false positive cache hits.
* IIUC defaulting {{efSearch}} to {{topK * 2}} is a backcompat change – won't
that make all knn queries two twice as much work under the covers (ideally
providing better accuracy) on upgrade? ... not sure how i feel about that. I
feel like a lot of people may already setting {{topK}} based on how
aggressively the want the graph walk to be, even if they only intend to look at
a smaller number of results.
* I don't see any validation that {{efSearch}} is bigger then {{topK}} ...
should there be?
> Add efSearch parameter to knn query
> -----------------------------------
>
> Key: SOLR-17928
> URL: https://issues.apache.org/jira/browse/SOLR-17928
> Project: Solr
> Issue Type: Improvement
> Components: vector-search
> Reporter: Ishan Chattopadhyaya
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Right now, only topK can be requested. efSearch is a standard overfetch
> parameter.
> Proposing that we add it for better recall accuracy.
> (FYI, Elasticsearch calls it num_candidates. Commonly referred to as
> efSearch, similar to efConstruction that we call beamWidth)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]