Vector search problems SolR 9.6 (Lucene 9.10) vs. SolR 9.7 (Lucene 9.11)

2025-01-21 Thread Moll, Dr. Andreas
Hi,
I want to inform you about a behavior change in SolR 9.6 (Lucene 9.10) vs. SolR 
9.7 (Lucene 9.11) for vector searches.
We heavily rely on vector searches for embeddings in combination with filter 
queries on the parent documents.
Our queries in general looked like this:
select?q={ knn f=vector topK=2048}[...]
rows=100
fq={ child of='childtype:root'}...
start=0
sort=score desc,ID desc
With SolR 9.7 and higher, this results in ~10% of the queries producing the 
following error:
java.lang.IllegalArgumentException: Doc id 27227879 doesn't match the query
at 
org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:478)
 ~[?:?]
at 
org.apache.solr.search.SolrIndexSearcher.populateScoresIfNeeded(SolrIndexSearcher.java:1812)
 ~[?:?]
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:2001)
 ~[?:?]
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1775)
 ~[?:?]
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:772) 
~[?:?]
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:767) 
~[?:?]
After several days of debugging, I confirmed that the number of errors 
correlates to the topK value:

  *   k = 8 -> 44 errors
  *   k = 2048 -> 17 errors
  *   k = 16384 -> 1 error
I found a workaround for the issue by modifying the sort parameter to:
sort=score desc
With this change, our queries work like a charm again. The initial thought of 
adding the ID desc sorting was to get more reproducible results, but it is not 
strictly necessary for us.
Could you clarify if this change in SolR/Lucene was intended? If so, perhaps 
you want to add documentation on vector queries that adding an additional 
sorting might cause errors.
Best regards,
Dr. Andreas Moll



Re: Vector search problems SolR 9.6 (Lucene 9.10) vs. SolR 9.7 (Lucene 9.11)

2025-01-21 Thread Houston Putman
We have seen this, and when we tested against Solr 9.8 (currently being
released), the error went away.
It turned out to be a weird thing about the search executor, but we
couldn't necessarily narrow down why it happened.

Anyway, please test out with the new Solr 9.8.0 release when it is
available within the next day, and let us know if that fixes the problem
for you.

- Houston

On Tue, Jan 21, 2025 at 11:50 AM Alessandro Benedetti 
wrote:

> I'm not sure the KNN search supports multiple sort conditions.
> I should do some deep dive in the Lucene code but I don't have time in the
> foreseeable short future.
> I can imagine that anyway it would only support 're-ranking' the retrieved
> topK by the additional score condition, this can not affect how the topK is
> retrieved, at best how it's sorted.
>
> What would be really valuable is if you can reproduce the issue and even
> better if the issue can be reproduced using a solr test
> (org.apache.solr.search.neural.KnnQParserTest)
> Cheers
> --
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benede...@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io 
> LinkedIn  | Twitter
>  | Youtube
>  | Github
> 
>
>
> On Tue, 21 Jan 2025 at 13:04, Moll, Dr. Andreas 
> wrote:
>
> > Hi,
> > I want to inform you about a behavior change in SolR 9.6 (Lucene 9.10)
> vs.
> > SolR 9.7 (Lucene 9.11) for vector searches.
> > We heavily rely on vector searches for embeddings in combination with
> > filter queries on the parent documents.
> > Our queries in general looked like this:
> > select?q={ knn f=vector topK=2048}[...]
> > rows=100
> > fq={ child of='childtype:root'}...
> > start=0
> > sort=score desc,ID desc
> > With SolR 9.7 and higher, this results in ~10% of the queries producing
> > the following error:
> > java.lang.IllegalArgumentException: Doc id 27227879 doesn't match the
> query
> > at
> >
> org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:478)
> > ~[?:?]
> > at
> >
> org.apache.solr.search.SolrIndexSearcher.populateScoresIfNeeded(SolrIndexSearcher.java:1812)
> > ~[?:?]
> > at
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:2001)
> > ~[?:?]
> > at
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1775)
> > ~[?:?]
> > at
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:772)
> > ~[?:?]
> > at
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:767)
> > ~[?:?]
> > After several days of debugging, I confirmed that the number of errors
> > correlates to the topK value:
> >
> >   *   k = 8 -> 44 errors
> >   *   k = 2048 -> 17 errors
> >   *   k = 16384 -> 1 error
> > I found a workaround for the issue by modifying the sort parameter to:
> > sort=score desc
> > With this change, our queries work like a charm again. The initial
> thought
> > of adding the ID desc sorting was to get more reproducible results, but
> it
> > is not strictly necessary for us.
> > Could you clarify if this change in SolR/Lucene was intended? If so,
> > perhaps you want to add documentation on vector queries that adding an
> > additional sorting might cause errors.
> > Best regards,
> > Dr. Andreas Moll
> >
> >
>


Re: Vector search problems SolR 9.6 (Lucene 9.10) vs. SolR 9.7 (Lucene 9.11)

2025-01-21 Thread Alessandro Benedetti
I'm not sure the KNN search supports multiple sort conditions.
I should do some deep dive in the Lucene code but I don't have time in the
foreseeable short future.
I can imagine that anyway it would only support 're-ranking' the retrieved
topK by the additional score condition, this can not affect how the topK is
retrieved, at best how it's sorted.

What would be really valuable is if you can reproduce the issue and even
better if the issue can be reproduced using a solr test
(org.apache.solr.search.neural.KnnQParserTest)
Cheers
--
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 | Github



On Tue, 21 Jan 2025 at 13:04, Moll, Dr. Andreas 
wrote:

> Hi,
> I want to inform you about a behavior change in SolR 9.6 (Lucene 9.10) vs.
> SolR 9.7 (Lucene 9.11) for vector searches.
> We heavily rely on vector searches for embeddings in combination with
> filter queries on the parent documents.
> Our queries in general looked like this:
> select?q={ knn f=vector topK=2048}[...]
> rows=100
> fq={ child of='childtype:root'}...
> start=0
> sort=score desc,ID desc
> With SolR 9.7 and higher, this results in ~10% of the queries producing
> the following error:
> java.lang.IllegalArgumentException: Doc id 27227879 doesn't match the query
> at
> org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:478)
> ~[?:?]
> at
> org.apache.solr.search.SolrIndexSearcher.populateScoresIfNeeded(SolrIndexSearcher.java:1812)
> ~[?:?]
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:2001)
> ~[?:?]
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1775)
> ~[?:?]
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:772)
> ~[?:?]
> at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:767)
> ~[?:?]
> After several days of debugging, I confirmed that the number of errors
> correlates to the topK value:
>
>   *   k = 8 -> 44 errors
>   *   k = 2048 -> 17 errors
>   *   k = 16384 -> 1 error
> I found a workaround for the issue by modifying the sort parameter to:
> sort=score desc
> With this change, our queries work like a charm again. The initial thought
> of adding the ID desc sorting was to get more reproducible results, but it
> is not strictly necessary for us.
> Could you clarify if this change in SolR/Lucene was intended? If so,
> perhaps you want to add documentation on vector queries that adding an
> additional sorting might cause errors.
> Best regards,
> Dr. Andreas Moll
>
>