On Tue, 2 May 2023 at 17:14, Jonathan Ellis <jbel...@gmail.com> wrote:

> Should we add a vector type to Cassandra designed to meet the needs of
> machine learning use cases, specifically feature and embedding vectors for
> training, inference, and vector search?
>
> ML vectors are fixed-dimension (fixed-length) sequences of numeric types,
> with no nulls allowed, and with no need for random access. The ML industry
> overwhelmingly uses float32 vectors, to the point that the industry-leading
> special-purpose vector database ONLY supports that data type.
>
> This poll is to gauge consensus subsequent to the recent discussion thread
> at https://lists.apache.org/thread/0lj1nk9jbhkf1rlgqcvxqzfyntdjrnk0.
>
> Please rank the discussed options from most preferred option to least,
> e.g., A > B > C (A is my preference, followed by B, followed by C) or C > B
> = A (C is my preference, followed by B or A approximately equally.)
>
> (A) I am in favor of adding a vector type for floats; I do not believe we
> need to tie it to any particular implementation details.
>
> (B) I am okay with adding a vector type but I believe we must add array
> types that compose with all Cassandra types first, and make vectors a
> special case of arrays-without-null-elements.
>
> (C) I am not in favor of adding a built-in vector type.
>



A  > B > C

B is stated as "must add array types…".  I think this is a bit loaded.  If
B was the (A + the implementation needs to be a non-null frozen float32
array, serialisation forward compatible with other frozen arrays later
implemented) I would put this before (A).  Especially because it's been
shown already this is easy to implement.

Reply via email to