I actually still prefer *type[dimension]*, because I think I intuitively
read this as a primitive (meaning no null elements) array. Then we can have
the indexing apparatus only accept *frozen<float[n]>* for the HSNW case.

If that isn't intuitive to anyone else, I don't really have a strong
opinion...but...conflating "frozen" and "dense" seems like a bad idea. One
should indicate single vs. multi-cell, and the other the presence or
absence of nulls/zeros/whatever.

On Thu, May 4, 2023 at 12:51 PM Patrick McFadin <pmcfa...@gmail.com> wrote:

> I agree with David's reasoning and the use of DENSE (and maybe eventually
> SPARSE). This is terminology well established in the data world, and it
> would lead to much easier adoption from users. VECTOR is close, but I can
> see having to create a lot of content around "How to use it and not get in
> trouble." (I have a lot of that content already)
>
>  - We don't have to explain what it is. A lot of prior art out there
> already [1][2][3]
>  - We're matching an established term with what users would expect. No
> surprises.
>  - Shorter ramp-up time for users. Cassandra is being modernized.
>
> The implementation is flexible, but the interface should empower our users
> to be awesome.
>
> Patrick
>
> 1 -
> https://stats.stackexchange.com/questions/266996/what-do-the-terms-dense-and-sparse-mean-in-the-context-of-neural-networks
> 2 -
> https://induraj2020.medium.com/what-are-sparse-features-and-dense-features-8d1746a77035
> 3 -
> https://revware.net/sparse-vs-dense-data-the-power-of-points-and-clouds/
>
> On Thu, May 4, 2023 at 10:25 AM David Capwell <dcapw...@apple.com> wrote:
>
>> My views have changed over time on syntax and I feel type[dimention] may
>> not be the best, so it has gone lower in my own personal ranking… this is
>> my current preference
>>
>> 1) DENSE <type>[dimention] | NON NULL <type>[dimention]
>> 2) VECTOR<type, dimention>
>> 3) type[dimention]
>>
>> My reasoning for this order
>>
>> * type[dimention] looks like syntax sugar for array<type, dimention>, so
>> users may assume list/array semantics, but we limit to non-null elements in
>> a frozen array
>> * feel VECTOR as a prefix feels out of place, but VECTOR as a direct type
>> makes more sense… this also leads to a possible future of VECTOR<type>
>> which is the non-fixed length version of this type.  What makes VECTOR
>> different from list/array?  non-null elements and is frozen.  I don’t feel
>> that VECTOR really tells users to expect non-null or frozen semantics, as
>> there exists different VECTOR types for those reasons (sparse vs dense)…
>> * DENSE may be confusing for people coming from languages where this just
>> means “sequential layout”, which is what our frozen array/list already are…
>> but since the target user is coming from a ML background, this shouldn’t
>> offer much confusion.  DENSE just means FROZEN in Cassandra, with NON NULL
>> elements (SPARSE allows for NULL and isn’t frozen)… So DENSE just acts as
>> syntax sugar for frozen<non null type[dimention]>
>>
>>
>> On May 4, 2023, at 4:13 AM, Brandon Williams <dri...@gmail.com> wrote:
>>
>> 1. VECTOR<FLOAT,n>
>> 2. VECTOR FLOAT[n]
>> 3. FLOAT[N]   (Non null by default)
>>
>> Redundant or not, I think having the VECTOR keyword helps signify what
>> the app is generally about and helps get buy-in from ML stakeholders.
>>
>> On Thu, May 4, 2023 at 3:45 AM Benedict <bened...@apache.org> wrote:
>>
>>
>> Hurrah for initial agreement.
>>
>> For syntax, I think one option was just FLOAT[N]. In VECTOR FLOAT[N],
>> VECTOR is redundant - FLOAT[N] is fully descriptive by itself. I don’t
>> think VECTOR should be used to simply imply non-null, as this would be very
>> unintuitive. More logical would be NONNULL, if this is the only condition
>> being applied. Alternatively for arrays we could default to NONNULL and
>> later introduce NULLABLE if we want to permit nulls.
>>
>> If the word vector is to be used it makes more sense to make it look like
>> a list, so VECTOR<FLOAT, N> as here the word VECTOR is clearly not
>> redundant.
>>
>> So, I vote:
>>
>> 1) (NON NULL) FLOAT[N]
>> 2) FLOAT[N]   (Non null by default)
>> 3) VECTOR<FLOAT, N>
>>
>>
>>
>> On 4 May 2023, at 08:52, Mick Semb Wever <m...@apache.org> wrote:
>>
>> 
>>
>>
>> Did we agree on a CQL syntax?
>>
>> I don’t believe there has been a pool on CQL syntax… my understanding
>> reading all the threads is that there are ~4-5 options and non are -1ed, so
>> believe we are waiting for majority rule on this?
>>
>>
>>
>>
>> Re-reading that thread, IIUC the valid choices remaining are…
>>
>> 1. VECTOR FLOAT[n]
>> 2. FLOAT VECTOR[n]
>> 3. VECTOR<FLOAT,n>
>> 4. VECTOR[n]<FLOAT>
>> 5. ARRAY<FLOAT, n>
>> 6. NON-NULL FROZEN<FLOAT[n]>
>>
>>
>> Yes I'm putting my preference (1) first ;) because (banging on) if the
>> future of CQL will have FLOAT[n] and FROZEN<FLOAT[n]>, where the VECTOR
>> keyword is: for general cql users; just meaning "non-null and frozen",
>> these gel best together.
>>
>> Options (5) and (6) are for those that feel we can and should provide
>> this type without introducing the vector keyword.
>>
>>
>>
>>

Reply via email to