[ 
https://issues.apache.org/jira/browse/SOLR-17974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18044384#comment-18044384
 ] 

Alessandro Benedetti commented on SOLR-17974:
---------------------------------------------

Thanks [~hossman] to raise this, I was planning to start working on the matter 
of multi valued vectors in Solr these days.

I believe we should  find a a representation that is going to be in common for 
any field type, and then the field type will differ how the lucene 
implementaiton looks like:

At the moment I see two scenarios:


1) Late interaction fields (like ColBert family of models) -> multi valued 
storage and multi valued query 

2) Multi Valued vectors in HNSW -> multi valued storage and single valued query

The way we model a multi valued vector in Solr in the SolrInputDocument should 
be the same, then internally, depending on the field type Solr will:

1) Late Interaction Fields -> implement the surfacing of Lucene later 
interactions
2) Nested vectors -> build nested documents automatically (as this is the route 
we've decided in Lucene after many discussions)

Let's keep discussing, I'll also open a draft Pull request soon with some 
ideas, at least for the HNSW use case (pretty much an alternative syntax to 
indexing nested vectors)

> Tech-Debt repayment: SolrDocument/SolrInputDocument will merge multiple 
> "list" values
> -------------------------------------------------------------------------------------
>
>                 Key: SOLR-17974
>                 URL: https://issues.apache.org/jira/browse/SOLR-17974
>             Project: Solr
>          Issue Type: Task
>            Reporter: Chris M. Hostetter
>            Priority: Major
>         Attachments: SOLR-17974.tests.patch
>
>
> A long standing bit of "convenience" logic in SolrDocument (that was later 
> copied/inherited in SolrInputDocument) is that if you "add" a 
> {{java.util.Collection}} of "values" for a field name it will either use that 
> {{java.util.Collection}} as is; or -- if the document already has some values 
> in it for that field name -- it unwraps the (new) {{java.util.Collection}} 
> and adds each of the items in it to whatever existing 
> {{java.util.Collection}} of values it already has for that field name.
> Once upon a time this kind of made life easier for folks - you could call one 
> method on either a single value, or a list of values and Solr would "do what 
> you mean".
> But as we get into a world where "multi-valued vector fields" start being a 
> thing we have to consider, we need to rethink our APIs to ensure that (at a 
> conceptual level, if not in terms of specific {{java.util.Collections}} class 
> names) it's possible to have a "list of floats" as a single "field value" in 
> a "multi-valued" field -- w/o a user getting confused why adding additional 
> "field values" breaks their existing data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to