I have opened an issue on JIRA <https://issues.apache.org/jira/browse/SOLR-17487> for this problem. After a bit of digging, I've found that the root cause wasn't JSON... In fact, Solr kind of "deduplicates" the vector dimensions. So, a vector of 384 that contains the very same value twice will end up as a 383 vector. The second occurrence of the value is simply eluded.
Le jeu. 10 oct. 2024 à 16:31, Guillaume <gjac...@gmail.com> a écrit : > Hello, > > I'm using Solr 9.7 as a vector database. I've come across something I > can't explain : I POST my documents as JSON and I've got a vector field of > dimension 768. > > The JSON document I POST has a vector field, which is an array of length > 768. Each value is a float. > > Solr complains that my array is only 767 long... > I've compared the JSON I POST and the array parsed by Solr and written in > the logs.... And indeed, one of the 768 values has simply disappeared in > the process. > > I'm pretty sure it is realted to some JSON array parsing issue on Solr > side but I don't know how to fix this :/ > > Anyone came across something similar ? > > Thanks for reading ! > > >