I agree with David's reasoning and the use of DENSE (and maybe eventually
SPARSE). This is terminology well established in the data world, and it
would lead to much easier adoption from users. VECTOR is close, but I can
see having to create a lot of content around "How to use it and not get in
trouble." (I have a lot of that content already)

 - We don't have to explain what it is. A lot of prior art out there
already [1][2][3]
 - We're matching an established term with what users would expect. No
surprises.
 - Shorter ramp-up time for users. Cassandra is being modernized.

The implementation is flexible, but the interface should empower our users
to be awesome.

Patrick

1 -
https://stats.stackexchange.com/questions/266996/what-do-the-terms-dense-and-sparse-mean-in-the-context-of-neural-networks
2 -
https://induraj2020.medium.com/what-are-sparse-features-and-dense-features-8d1746a77035
3 - https://revware.net/sparse-vs-dense-data-the-power-of-points-and-clouds/

On Thu, May 4, 2023 at 10:25 AM David Capwell <dcapw...@apple.com> wrote:

> My views have changed over time on syntax and I feel type[dimention] may
> not be the best, so it has gone lower in my own personal ranking… this is
> my current preference
>
> 1) DENSE <type>[dimention] | NON NULL <type>[dimention]
> 2) VECTOR<type, dimention>
> 3) type[dimention]
>
> My reasoning for this order
>
> * type[dimention] looks like syntax sugar for array<type, dimention>, so
> users may assume list/array semantics, but we limit to non-null elements in
> a frozen array
> * feel VECTOR as a prefix feels out of place, but VECTOR as a direct type
> makes more sense… this also leads to a possible future of VECTOR<type>
> which is the non-fixed length version of this type.  What makes VECTOR
> different from list/array?  non-null elements and is frozen.  I don’t feel
> that VECTOR really tells users to expect non-null or frozen semantics, as
> there exists different VECTOR types for those reasons (sparse vs dense)…
> * DENSE may be confusing for people coming from languages where this just
> means “sequential layout”, which is what our frozen array/list already are…
> but since the target user is coming from a ML background, this shouldn’t
> offer much confusion.  DENSE just means FROZEN in Cassandra, with NON NULL
> elements (SPARSE allows for NULL and isn’t frozen)… So DENSE just acts as
> syntax sugar for frozen<non null type[dimention]>
>
>
> On May 4, 2023, at 4:13 AM, Brandon Williams <dri...@gmail.com> wrote:
>
> 1. VECTOR<FLOAT,n>
> 2. VECTOR FLOAT[n]
> 3. FLOAT[N]   (Non null by default)
>
> Redundant or not, I think having the VECTOR keyword helps signify what
> the app is generally about and helps get buy-in from ML stakeholders.
>
> On Thu, May 4, 2023 at 3:45 AM Benedict <bened...@apache.org> wrote:
>
>
> Hurrah for initial agreement.
>
> For syntax, I think one option was just FLOAT[N]. In VECTOR FLOAT[N],
> VECTOR is redundant - FLOAT[N] is fully descriptive by itself. I don’t
> think VECTOR should be used to simply imply non-null, as this would be very
> unintuitive. More logical would be NONNULL, if this is the only condition
> being applied. Alternatively for arrays we could default to NONNULL and
> later introduce NULLABLE if we want to permit nulls.
>
> If the word vector is to be used it makes more sense to make it look like
> a list, so VECTOR<FLOAT, N> as here the word VECTOR is clearly not
> redundant.
>
> So, I vote:
>
> 1) (NON NULL) FLOAT[N]
> 2) FLOAT[N]   (Non null by default)
> 3) VECTOR<FLOAT, N>
>
>
>
> On 4 May 2023, at 08:52, Mick Semb Wever <m...@apache.org> wrote:
>
> 
>
>
> Did we agree on a CQL syntax?
>
> I don’t believe there has been a pool on CQL syntax… my understanding
> reading all the threads is that there are ~4-5 options and non are -1ed, so
> believe we are waiting for majority rule on this?
>
>
>
>
> Re-reading that thread, IIUC the valid choices remaining are…
>
> 1. VECTOR FLOAT[n]
> 2. FLOAT VECTOR[n]
> 3. VECTOR<FLOAT,n>
> 4. VECTOR[n]<FLOAT>
> 5. ARRAY<FLOAT, n>
> 6. NON-NULL FROZEN<FLOAT[n]>
>
>
> Yes I'm putting my preference (1) first ;) because (banging on) if the
> future of CQL will have FLOAT[n] and FROZEN<FLOAT[n]>, where the VECTOR
> keyword is: for general cql users; just meaning "non-null and frozen",
> these gel best together.
>
> Options (5) and (6) are for those that feel we can and should provide this
> type without introducing the vector keyword.
>
>
>
>

Reply via email to