On 5/25/23 1:48 PM, Oliver Rice wrote:
A nice side effect of using the float8[] to represent vectors is that it allows for vectors of different sizes to coexist in the same column.We most frequently see (pgvector) vector columns being used for storing ML embeddings. Given that different models produce embeddings with a different number of dimensions, the need to specify a vector’s size in DDL tightly couples the schema to a single model. Support for variable length vectors would be a great way to decouple those concepts. It would also be a differentiating feature from existing vector stores.
I hadn't thought of that, given most of what I've seen (or at least my personal bias in designing systems) is you keep a vector of one dimensionality in a column. But this sounds like where having native support in a variable array would help.
One drawback is that variable length vectors complicates indexing for similarity search because similarity measures require vectors of consistent length. Partial indexes are a possible solution to that challenge
Yeah, that presents a challenge. This may also be an argument for a vector data type, since that would eliminate the need to check for consistent dimensionality on the indexing.
Jonathan
OpenPGP_signature
Description: OpenPGP digital signature