[
https://issues.apache.org/jira/browse/SOLR-17974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18031903#comment-18031903
]
Chris M. Hostetter edited comment on SOLR-17974 at 10/21/25 9:32 PM:
---------------------------------------------------------------------
A few things to note...
1) Lucene's existing HNSW graph based field only supports "single valued vector
fields" -- so when Solr's {{DenseVectorField}} type was added, it models itself
as a "multi-valued numeric" field (either "float" or "byte" based) which is why
this hasn't been a problem ... yet. (there is work in process considering what
multi-valued HNSW fields might look at like a lucene level)
2) Lucene 10.3 added a
[LateInteractionField|https://lucene.apache.org/core/10_3_1/core/org/apache/lucene/document/LateInteractionField.html]
which *does* support "multi-valued vectors" -- so the quirks of our
SolrDocument/SolrInputDocument API are not a "future" problem -- they are a
"currently preventing us from easily adding support for this cool feature"
problem.
3) I'm attaching a small test only patch that just tries to demonstrate (some
of) the existing quirks for folks who may not be familiar with what i'm
describing -- which means it currently passes, but that doesn't mean the
behavior is useful.
was (Author: hossman):
A few things to note...
1) Lucene's existing HNSW graph based field only supports "single valued vector
fields" -- so when Solr's {{DenseVectorField}} type was added, it models itself
as a "multi-valued numeric" field (either "float" or "byte" based) which is why
this hasn't been a problem ... yet.
2) Lucene 10.3 added a
[LateInteractionField|https://lucene.apache.org/core/10_3_1/core/org/apache/lucene/document/LateInteractionField.html]
which *does* support "multi-valued vectors" -- so the quirks of our
SolrDocument/SolrInputDocument API are not a "future" problem -- they are a
"currently preventing us from easily adding support for this cool feature"
problem.
3) I'm attaching a small test only patch that just tries to demonstrate (some
of) the existing quirks for folks who may not be familiar with what i'm
describing -- which means it currently passes, but that doesn't mean the
behavior is useful.
> Tech-Debt repayment: SolrDocument/SolrInputDocument will merge multiple
> "list" values
> -------------------------------------------------------------------------------------
>
> Key: SOLR-17974
> URL: https://issues.apache.org/jira/browse/SOLR-17974
> Project: Solr
> Issue Type: Task
> Reporter: Chris M. Hostetter
> Priority: Major
> Attachments: SOLR-17974.tests.patch
>
>
> A long standing bit of "convenience" logic in SolrDocument (that was later
> copied/inherited in SolrInputDocument) is that if you "add" a
> {{java.util.Collection}} of "values" for a field name it will either use that
> {{java.util.Collection}} as is; or -- if the document already has some values
> in it for that field name -- it unwraps the (new) {{java.util.Collection}}
> and adds each of the items in it to whatever existing
> {{java.util.Collection}} of values it already has for that field name.
> Once upon a time this kind of made life easier for folks - you could call one
> method on either a single value, or a list of values and Solr would "do what
> you mean".
> But as we get into a world where "multi-valued vector fields" start being a
> thing we have to consider, we need to rethink our APIs to ensure that (at a
> conceptual level, if not in terms of specific {{java.util.Collections}} class
> names) it's possible to have a "list of floats" as a single "field value" in
> a "multi-valued" field -- w/o a user getting confused why adding additional
> "field values" breaks their existing data.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]