Hi,
I want to inform you about a behavior change in SolR 9.6 (Lucene 9.10) vs. SolR 
9.7 (Lucene 9.11) for vector searches.
We heavily rely on vector searches for embeddings in combination with filter 
queries on the parent documents.
Our queries in general looked like this:
select?q={ knn f=vector topK=2048}[...]
rows=100
fq={ child of='childtype:root'}...
start=0
sort=score desc,ID desc
With SolR 9.7 and higher, this results in ~10% of the queries producing the 
following error:
java.lang.IllegalArgumentException: Doc id 27227879 doesn't match the query
        at 
org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:478)
 ~[?:?]
        at 
org.apache.solr.search.SolrIndexSearcher.populateScoresIfNeeded(SolrIndexSearcher.java:1812)
 ~[?:?]
        at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:2001)
 ~[?:?]
        at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1775)
 ~[?:?]
        at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:772) 
~[?:?]
        at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:767) 
~[?:?]
After several days of debugging, I confirmed that the number of errors 
correlates to the topK value:

  *   k = 8 -> 44 errors
  *   k = 2048 -> 17 errors
  *   k = 16384 -> 1 error
I found a workaround for the issue by modifying the sort parameter to:
sort=score desc
With this change, our queries work like a charm again. The initial thought of 
adding the ID desc sorting was to get more reproducible results, but it is not 
strictly necessary for us.
Could you clarify if this change in SolR/Lucene was intended? If so, perhaps 
you want to add documentation on vector queries that adding an additional 
sorting might cause errors.
Best regards,
Dr. Andreas Moll

Reply via email to