Performance issue: distributed grouping + dense vector search

2025-03-25 Thread Yue Yu
Hi All,

When running vector search with grouping in multi-shard setting, the
KnnFloatVectorQuery is executed (2+rows) times.

The culprit is this function call in
https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/grouping/distributed/command/TopGroupsFieldCommand.java#L202

if (needScores) {
  for (GroupDocs group : topGroups.groups) {
TopFieldCollector.populateScores(group.scoreDocs, searcher, query);
  }
}

Where
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/TopFieldCollector.java#L403
does

final Weight weight = searcher.createWeight(searcher.rewrite(query),
ScoreMode.COMPLETE, 1);

Here, if the query is KnnFloatVectorQuery, *searcher.rewrite(query) *will
execute the same vector search for each topGroups.groups

a simple fix could be moving *searcher.rewrite(query) *out of the
topGroups.groups loop. Thoughts?

Best,

Yue


Haystack US 2025 is coming! I want you there!

2025-03-25 Thread David Eric Pugh
My fellow Solr community members, I'd like to invite you to MY conference in MY 
home town of Charlottesvile Virginia, Haystack ;-).    Haystack is focused on 
the value side of search.  It's where we talk about how to meet those business 
goals that our companies set for us!  It's where we talk about ways to delight 
our users, and how do we even know that we are doing that in the first place?   
It's a small practitioner focused conference, limited to 150 people.  Oh, and 
each of you is invited to my house for a BBQ on Monday night ;-).   

We have a pretty amazing set of speakers lined up, 
https://haystackconf.com/2025/, including our very own Trey Grainger:
⭐ Don’t miss AI-Powered Search Authors Trey Grainger and Doug Turnbull at 
Haystack US 2025! ⭐ 
We’re excited to share that Trey Grainger and Doug Turnbull will be taking YOUR 
questions live at Haystack US 2025! On Thursday, April 24, you'll have the rare 
opportunity to tap into decades of search expertise from two pioneers who have 
helped leading organizations solve their most complex search challenges. 

Curious about:
➡️ Dense vs. sparse vector search? 
➡️ Semantic knowledge graphs? 
➡️ RAG best practices? 
➡️ Query intent interpretation? 
➡️ Or any cutting-edge search and AI topic? 

This is your chance to ask anything – or see if you can "stump the chump" with 
your most perplexing search questions. Early bird pricing ends this Saturday, 
March 22 so register today! https://www.eventbee.com/t/292625631/solr

There are also workshops, a meetup, co-working time, and for those lonely 
Search Product Managers out there, even a barcamp just for them!

Got some questions?  Email me.
Eric


[Operator] [ANNOUNCE] Apache Solr Operator v0.9.1 released

2025-03-25 Thread Jason Gerlowski
The Apache Solr PMC is pleased to announce the release of the Apache
Solr Operator v0.9.1.

The Apache Solr Operator is a safe and easy way of managing a Solr
ecosystem in Kubernetes.

This release contains numerous bug fixes, and optimizations, some of
which are highlighted below. The release is available for immediate
download at:

  

### Solr Operator v0.9.1 Release Highlights:

* initContainers can once again connect to ZooKeeper when Solr 9.8.x
images are used
* podReadinessGates on existing solrcloud resources are now properly
updated following an operator upgrade
* 'setup-zk' initContainer no longer emits "No such file or directory" warning

A summary of important changes is published in the documentation at:

  

For the most exhaustive list, see the change log on ArtifactHub or
view the git history in the solr-operator repo.

  


  


Re: how is mlt.fl list evaluated?

2025-03-25 Thread Mikhail Khludnev
Hi Dima
Do you use MLT component or hadler?

On Tue, Mar 25, 2025 at 12:37 AM Dmitri Maziuk 
wrote:

> Hi all,
>
> is there a way to run "more like this" with x = foo AND y = bar?
>
> What we do now is run a 2nd query looking for "x:foo AND y:bar" after
> extracting foo and bar from the initial result. That is programmed on
> the front-end and we were asked if that has to be that way. So I took a
> closer look at MLT.
>
> In my experiments, "mlt.fl=full_date" returns 234 hits whereas
> "mlt.fl=full_date,county" returns 20323. The latter returns non-matching
> counties so it looks like "full_date OR county".
>
> Is MLT field list only "OR", or am I missing something?
>
> TIA,
> Dima
>


-- 
Sincerely yours
Mikhail Khludnev


Re: how is mlt.fl list evaluated?

2025-03-25 Thread Mikhail Khludnev
Nevertheless, MLT has a plenty of shades, but fied logic seems pretty
disjunctive regardless of fields
https://github.com/apache/lucene/blob/70abd1f41e223f1a175f717059d63825bf236249/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java#L609


On Tue, Mar 25, 2025 at 12:37 AM Dmitri Maziuk 
wrote:

> Hi all,
>
> is there a way to run "more like this" with x = foo AND y = bar?
>
> What we do now is run a 2nd query looking for "x:foo AND y:bar" after
> extracting foo and bar from the initial result. That is programmed on
> the front-end and we were asked if that has to be that way. So I took a
> closer look at MLT.
>
> In my experiments, "mlt.fl=full_date" returns 234 hits whereas
> "mlt.fl=full_date,county" returns 20323. The latter returns non-matching
> counties so it looks like "full_date OR county".
>
> Is MLT field list only "OR", or am I missing something?
>
> TIA,
> Dima
>


-- 
Sincerely yours
Mikhail Khludnev