[
https://issues.apache.org/jira/browse/SOLR-17974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18043169#comment-18043169
]
Chris M. Hostetter commented on SOLR-17974:
-------------------------------------------
{quote}... could we "solve" this problem by adding new vector-specific methods
to SolrDocument/SolrInputDocument that explicitly don't have the
List-flattening semantics? Say: "addVectorField" and "setVectorField"?
{quote}
Maybe? ... but if we go that route, we have to be careful to define what we
mean by "Vector" – do we mean a {{List<Number>}} ? (So that
{{DenseVectorField}} is treated "singlevalued Vector" field, but
{{LateInteractionField}} is a "multivalued Vector" field?) or is a
{{float[][]}} also a "singlevalued multi-Vector?
That would also only real help for the foreseeable "vector" situations – it
wouldn't (for example) have helped in the situation i gave up on a few years
back trying to have a multi-valued field of "Pairs", nor would it help if
someone wants to encode something like a {{Map<Enum,Int>}} down the road etc...
And I'm leery of what other types of "multi-multi-multi vector" situations may
exist down the road – late interaction vector comparisons are computing
similarities of "lists of vectors" that (IIUC) can represent small contextual
chunks of original content. It seems plausible that someone will at some point
come up with an idea that involves "lists of lists of vectors" (or something
else that looks even less like our "List<X>" model)
My gut tells me that the way solr APIs intrinsically assumes "list of X" is
equivalent to "multivalued X" is just a flawed world view to try and maintain
moving forward. (it's probably been flawed for a long time)
I think ideally SolrDocment would just maintain a mapping of "(String)fieldName
-> (Object)value" and if that value happens to be a List<X> – cool, let the
FieldType worry about whether that means it's multivalued X field, or a Vector
of X field, etc... and then if someone wants to write a FieldType where the
value is a Map that's cool to – SolrJ doesn't care as long as it's something
your (configurable) encoder can serialize.
> Tech-Debt repayment: SolrDocument/SolrInputDocument will merge multiple
> "list" values
> -------------------------------------------------------------------------------------
>
> Key: SOLR-17974
> URL: https://issues.apache.org/jira/browse/SOLR-17974
> Project: Solr
> Issue Type: Task
> Reporter: Chris M. Hostetter
> Priority: Major
> Attachments: SOLR-17974.tests.patch
>
>
> A long standing bit of "convenience" logic in SolrDocument (that was later
> copied/inherited in SolrInputDocument) is that if you "add" a
> {{java.util.Collection}} of "values" for a field name it will either use that
> {{java.util.Collection}} as is; or -- if the document already has some values
> in it for that field name -- it unwraps the (new) {{java.util.Collection}}
> and adds each of the items in it to whatever existing
> {{java.util.Collection}} of values it already has for that field name.
> Once upon a time this kind of made life easier for folks - you could call one
> method on either a single value, or a list of values and Solr would "do what
> you mean".
> But as we get into a world where "multi-valued vector fields" start being a
> thing we have to consider, we need to rethink our APIs to ensure that (at a
> conceptual level, if not in terms of specific {{java.util.Collections}} class
> names) it's possible to have a "list of floats" as a single "field value" in
> a "multi-valued" field -- w/o a user getting confused why adding additional
> "field values" breaks their existing data.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]