My preference: A > B > C. Vectors are distinct enough from arrays that we should not make adding the latter a prerequisite for adding the former.
On Tue, May 2, 2023 at 10:13 AM Jonathan Ellis <jbel...@gmail.com> wrote: > Should we add a vector type to Cassandra designed to meet the needs of > machine learning use cases, specifically feature and embedding vectors for > training, inference, and vector search? > > ML vectors are fixed-dimension (fixed-length) sequences of numeric types, > with no nulls allowed, and with no need for random access. The ML industry > overwhelmingly uses float32 vectors, to the point that the industry-leading > special-purpose vector database ONLY supports that data type. > > This poll is to gauge consensus subsequent to the recent discussion thread > at https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0. > > Please rank the discussed options from most preferred option to least, > e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B > = A (C is my preference, followed by B or A approximately equally.) > > (A) I am in favor of adding a vector type for floats; I do not believe we > need to tie it to any particular implementation details. > > (B) I am okay with adding a vector type but I believe we must add array > types that compose with all Cassandra types first, and make vectors a > special case of arrays-without-null-elements. > > (C) I am not in favor of adding a built-in vector type. > > -- > Jonathan Ellis > co-founder, http://www.datastax.com > @spyced > -- Jonathan Ellis co-founder, http://www.datastax.com @spyced