[
https://issues.apache.org/jira/browse/SOLR-17974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18044384#comment-18044384
]
Alessandro Benedetti commented on SOLR-17974:
---------------------------------------------
Thanks [~hossman] to raise this, I was planning to start working on the matter
of multi valued vectors in Solr these days.
I believe we should find a a representation that is going to be in common for
any field type, and then the field type will differ how the lucene
implementaiton looks like:
At the moment I see two scenarios:
1) Late interaction fields (like ColBert family of models) -> multi valued
storage and multi valued query
2) Multi Valued vectors in HNSW -> multi valued storage and single valued query
The way we model a multi valued vector in Solr in the SolrInputDocument should
be the same, then internally, depending on the field type Solr will:
1) Late Interaction Fields -> implement the surfacing of Lucene later
interactions
2) Nested vectors -> build nested documents automatically (as this is the route
we've decided in Lucene after many discussions)
Let's keep discussing, I'll also open a draft Pull request soon with some
ideas, at least for the HNSW use case (pretty much an alternative syntax to
indexing nested vectors)
> Tech-Debt repayment: SolrDocument/SolrInputDocument will merge multiple
> "list" values
> -------------------------------------------------------------------------------------
>
> Key: SOLR-17974
> URL: https://issues.apache.org/jira/browse/SOLR-17974
> Project: Solr
> Issue Type: Task
> Reporter: Chris M. Hostetter
> Priority: Major
> Attachments: SOLR-17974.tests.patch
>
>
> A long standing bit of "convenience" logic in SolrDocument (that was later
> copied/inherited in SolrInputDocument) is that if you "add" a
> {{java.util.Collection}} of "values" for a field name it will either use that
> {{java.util.Collection}} as is; or -- if the document already has some values
> in it for that field name -- it unwraps the (new) {{java.util.Collection}}
> and adds each of the items in it to whatever existing
> {{java.util.Collection}} of values it already has for that field name.
> Once upon a time this kind of made life easier for folks - you could call one
> method on either a single value, or a list of values and Solr would "do what
> you mean".
> But as we get into a world where "multi-valued vector fields" start being a
> thing we have to consider, we need to rethink our APIs to ensure that (at a
> conceptual level, if not in terms of specific {{java.util.Collections}} class
> names) it's possible to have a "list of floats" as a single "field value" in
> a "multi-valued" field -- w/o a user getting confused why adding additional
> "field values" breaks their existing data.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]