[POLL] Vector type for ML

Jonathan Ellis Tue, 02 May 2023 08:14:22 -0700

Should we add a vector type to Cassandra designed to meet the needs of
machine learning use cases, specifically feature and embedding vectors for
training, inference, and vector search?


ML vectors are fixed-dimension (fixed-length) sequences of numeric types,
with no nulls allowed, and with no need for random access. The ML industry
overwhelmingly uses float32 vectors, to the point that the industry-leading
special-purpose vector database ONLY supports that data type.

This poll is to gauge consensus subsequent to the recent discussion thread
at https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.

Please rank the discussed options from most preferred option to least,
e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B
= A (C is my preference, followed by B or A approximately equally.)

(A) I am in favor of adding a vector type for floats; I do not believe we
need to tie it to any particular implementation details.

(B) I am okay with adding a vector type but I believe we must add array
types that compose with all Cassandra types first, and make vectors a
special case of arrays-without-null-elements.

(C) I am not in favor of adding a built-in vector type.

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced

[POLL] Vector type for ML

Reply via email to