[ 
https://issues.apache.org/jira/browse/SOLR-17948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18031445#comment-18031445
 ] 

ASF subversion and git services commented on SOLR-17948:
--------------------------------------------------------

Commit c2aaca72af6a001597727807930a3e559481ceaf in solr's branch 
refs/heads/branch_10x from Puneet Ahuja
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=c2aaca72af6 ]

SOLR-17948: Support indexing primitive float[] values for DenseVectorField 
(#3747)

(cherry picked from commit 06a3b5e77e94771ee35c407cc90be0ce46d7a748)


> Support indexing primitive float[] values for DenseVectorField via JavaBin
> --------------------------------------------------------------------------
>
>                 Key: SOLR-17948
>                 URL: https://issues.apache.org/jira/browse/SOLR-17948
>             Project: Solr
>          Issue Type: Task
>            Reporter: Puneet Ahuja
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 10.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently, when a document containing a primitive float[] or double[] field 
> is sent to Solr using the JavaBin format, indexing fails because 
> DenseVectorParser does not recognize primitive arrays as valid input types. 
> Other Solr loaders (JSON, CSV, XML) typically represent vector values as 
> lists when parsed, which means the ability to accept primitive 
> float[]/double[] would particularly benefit JavaBin use cases—allowing more 
> compact serialization paths for clients that can produce primitive arrays.
> JavaBin (including SolrJ’s JavaBin codec) can serialize primitive arrays 
> efficiently without boxing. Today, users must box vectors into 
> List<Float>/List<Double>, which adds padding/overhead and produces larger 
> payloads. Accepting primitive arrays allows everyone to send leaner JavaBin 
> updates and reduce overhead.
> I plan to extend DenseVectorParser to handle float[] and double[] inputs in 
> addition to the existing List-based formats.
> In typical cases, JavaBin request bodies can be ~20% smaller when vectors are 
> sent as primitive arrays instead of boxed lists, and Solr will parse and 
> index them correctly.
>  
> Manual test I conducted:
> 1. Write javabin with both List and primitive float.
> 2. Then we index both these payloads, and search on both of them to validate 
> the index.
> We do this using solrj client.
> Script used: 
> [https://gist.github.com/punAhuja/c77cc60e396ccf7aa5a55a92ba23ffc3]
> JavaBin sizes:
> List : 63.1 MB (66188931 bytes)
> float[] : 51.1 MB (53588931 bytes)
> Savings : 12.0 MB (19.04% smaller)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to