[
https://issues.apache.org/jira/browse/SOLR-17948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18031445#comment-18031445
]
ASF subversion and git services commented on SOLR-17948:
--------------------------------------------------------
Commit c2aaca72af6a001597727807930a3e559481ceaf in solr's branch
refs/heads/branch_10x from Puneet Ahuja
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=c2aaca72af6 ]
SOLR-17948: Support indexing primitive float[] values for DenseVectorField
(#3747)
(cherry picked from commit 06a3b5e77e94771ee35c407cc90be0ce46d7a748)
> Support indexing primitive float[] values for DenseVectorField via JavaBin
> --------------------------------------------------------------------------
>
> Key: SOLR-17948
> URL: https://issues.apache.org/jira/browse/SOLR-17948
> Project: Solr
> Issue Type: Task
> Reporter: Puneet Ahuja
> Priority: Major
> Labels: pull-request-available
> Fix For: 10.0
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Currently, when a document containing a primitive float[] or double[] field
> is sent to Solr using the JavaBin format, indexing fails because
> DenseVectorParser does not recognize primitive arrays as valid input types.
> Other Solr loaders (JSON, CSV, XML) typically represent vector values as
> lists when parsed, which means the ability to accept primitive
> float[]/double[] would particularly benefit JavaBin use cases—allowing more
> compact serialization paths for clients that can produce primitive arrays.
> JavaBin (including SolrJ’s JavaBin codec) can serialize primitive arrays
> efficiently without boxing. Today, users must box vectors into
> List<Float>/List<Double>, which adds padding/overhead and produces larger
> payloads. Accepting primitive arrays allows everyone to send leaner JavaBin
> updates and reduce overhead.
> I plan to extend DenseVectorParser to handle float[] and double[] inputs in
> addition to the existing List-based formats.
> In typical cases, JavaBin request bodies can be ~20% smaller when vectors are
> sent as primitive arrays instead of boxed lists, and Solr will parse and
> index them correctly.
>
> Manual test I conducted:
> 1. Write javabin with both List and primitive float.
> 2. Then we index both these payloads, and search on both of them to validate
> the index.
> We do this using solrj client.
> Script used:
> [https://gist.github.com/punAhuja/c77cc60e396ccf7aa5a55a92ba23ffc3]
> JavaBin sizes:
> List : 63.1 MB (66188931 bytes)
> float[] : 51.1 MB (53588931 bytes)
> Savings : 12.0 MB (19.04% smaller)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]