Could folk voting against a general purpose type (that could well be called a vector) briefly explain their reasoning?

We established in the other thread that it’s technically trivial, meaning folk must think it is strictly superior to only support float rather than eg all numeric types (note: for the type, not the ANN). 

I am surprised, and the blurbs accompanying votes so far don’t seem to touch on this, mostly just endorsing the idea of a vector.


On 2 May 2023, at 20:20, Patrick McFadin <pmcfa...@gmail.com> wrote:


A > B > C on both polls.

Having talked to several users in the community that are highly excited about this change, this gets to what developers want to do at Cassandra scale: store embeddings and retrieve them.

On Tue, May 2, 2023 at 11:47 AM Andrés de la Peña <adelap...@apache.org> wrote:
A > B > C

I don't think that ML is such a niche application that it can't have its own CQL data type. Also, vectors are mathematical elements that have more applications that ML.

On Tue, 2 May 2023 at 19:15, Mick Semb Wever <m...@apache.org> wrote:


On Tue, 2 May 2023 at 17:14, Jonathan Ellis <jbel...@gmail.com> wrote:
Should we add a vector type to Cassandra designed to meet the needs of machine learning use cases, specifically feature and embedding vectors for training, inference, and vector search?  

ML vectors are fixed-dimension (fixed-length) sequences of numeric types, with no nulls allowed, and with no need for random access. The ML industry overwhelmingly uses float32 vectors, to the point that the industry-leading special-purpose vector database ONLY supports that data type.

This poll is to gauge consensus subsequent to the recent discussion thread at https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.

Please rank the discussed options from most preferred option to least, e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B = A (C is my preference, followed by B or A approximately equally.)

(A) I am in favor of adding a vector type for floats; I do not believe we need to tie it to any particular implementation details.

(B) I am okay with adding a vector type but I believe we must add array types that compose with all Cassandra types first, and make vectors a special case of arrays-without-null-elements.

(C) I am not in favor of adding a built-in vector type.



A  > B > C

B is stated as "must add array types…".  I think this is a bit loaded.  If B was the (A + the implementation needs to be a non-null frozen float32 array, serialisation forward compatible with other frozen arrays later implemented) I would put this before (A).  Especially because it's been shown already this is easy to implement.

 

Reply via email to