On Tue, 2 May 2023 at 17:14, Jonathan Ellis <jbel...@gmail.com> wrote:
> Should we add a vector type to Cassandra designed to meet the needs of > machine learning use cases, specifically feature and embedding vectors for > training, inference, and vector search? > > ML vectors are fixed-dimension (fixed-length) sequences of numeric types, > with no nulls allowed, and with no need for random access. The ML industry > overwhelmingly uses float32 vectors, to the point that the industry-leading > special-purpose vector database ONLY supports that data type. > > This poll is to gauge consensus subsequent to the recent discussion thread > at https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0. > > Please rank the discussed options from most preferred option to least, > e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B > = A (C is my preference, followed by B or A approximately equally.) > > (A) I am in favor of adding a vector type for floats; I do not believe we > need to tie it to any particular implementation details. > > (B) I am okay with adding a vector type but I believe we must add array > types that compose with all Cassandra types first, and make vectors a > special case of arrays-without-null-elements. > > (C) I am not in favor of adding a built-in vector type. > A > B > C B is stated as "must add array types…". I think this is a bit loaded. If B was the (A + the implementation needs to be a non-null frozen float32 array, serialisation forward compatible with other frozen arrays later implemented) I would put this before (A). Especially because it's been shown already this is easy to implement.