gabrielmagno commented on PR #1245:
URL: https://github.com/apache/solr/pull/1245#issuecomment-1370930774

   Just to bring a "data-science KNN user" view on this.
   
   I think there is actually _two_ related but different ways one could use the 
Dense Vectors and KNN features. The first one is the most obvious and 
straightforward: you use it as KNN per si, i.e. you bring the top K documents 
most similar to the target. 
   
   The second way to use the KNN is to use the _similarity score_ that is 
calculated with the vectors. In this case we use the vectors and their 
similarity for _ranking_ rather than for _retrieval_. And there are some use 
cases that we could even combine multiple similarity scores to create an 
aggregated score, or even combine the similarity score with the actual "lexical 
score" (BM25). For this second use case we often use a very high K to guarantee 
that we calculate the similarity score for _all_ the relevant documents.
   
   So, considering this second use case of using the KNN for calculating the 
similarity score, having the ability to filter the documents based on the value 
of the similarity is very useful. And as previously mentioned, sometimes this 
is not even directly a single similarity score, it could be a combination of 
multiple scores, and we could apply the Post Filter on top of the aggregated 
score.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to