benwtrent commented on PR #13525:
URL: https://github.com/apache/lucene/pull/13525#issuecomment-2200120237
> I could update the existing FlatVectorsFormat and write these data
offsets only for when the field is a tensor.
I was thinking something like this. We should dynamically handle if more
than one vector is provided. Having to configure up front "Hey, more than one
vector is incoming" is weird. Why would anybody ever configure the "single
vector case" as the multi-vector case would also just handle the single vector
one. Seems like we don't need specialized formats but should instead update the
current flat vector formats.
> I feel this fits conveniently with a lot of our existing interfaces. Do
you see a specific need for [Float|Byte]VectorValues to iterate on individual
vector values instead?
Yes, we need to be able to iterate vectors via doc Ids and gather each
individual vector for a given document.
Three concerns immediately come to mind:
- rescoring docs that are gathered via some quantized methodology
- Determining the true nearest vector for a given document.
- Ability to iterate vectors when quantizing them. We need randomly sample
across all vectors. We don't want to sample via docs ids, this will likely add
bias and hurt the quantization quality.
As for adding new information to the FieldInfo, another valid option is
making it configurable directly on the format and not update fieldinfo. I am
not sure its valuable to have it in fieldinfo. I wouldn't expect the useages
for how to resolve the multi-vector scoring to be as broad as our similarity
functions or vector dimensions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]