I actually still prefer *type[dimension]*, because I think I intuitively read this as a primitive (meaning no null elements) array. Then we can have the indexing apparatus only accept *frozen<float[n]>* for the HSNW case.
If that isn't intuitive to anyone else, I don't really have a strong opinion...but...conflating "frozen" and "dense" seems like a bad idea. One should indicate single vs. multi-cell, and the other the presence or absence of nulls/zeros/whatever. On Thu, May 4, 2023 at 12:51 PM Patrick McFadin <pmcfa...@gmail.com> wrote: > I agree with David's reasoning and the use of DENSE (and maybe eventually > SPARSE). This is terminology well established in the data world, and it > would lead to much easier adoption from users. VECTOR is close, but I can > see having to create a lot of content around "How to use it and not get in > trouble." (I have a lot of that content already) > > - We don't have to explain what it is. A lot of prior art out there > already [1][2][3] > - We're matching an established term with what users would expect. No > surprises. > - Shorter ramp-up time for users. Cassandra is being modernized. > > The implementation is flexible, but the interface should empower our users > to be awesome. > > Patrick > > 1 - > https://stats.stackexchange.com/questions/266996/what-do-the-terms-dense-and-sparse-mean-in-the-context-of-neural-networks > 2 - > https://induraj2020.medium.com/what-are-sparse-features-and-dense-features-8d1746a77035 > 3 - > https://revware.net/sparse-vs-dense-data-the-power-of-points-and-clouds/ > > On Thu, May 4, 2023 at 10:25 AM David Capwell <dcapw...@apple.com> wrote: > >> My views have changed over time on syntax and I feel type[dimention] may >> not be the best, so it has gone lower in my own personal ranking… this is >> my current preference >> >> 1) DENSE <type>[dimention] | NON NULL <type>[dimention] >> 2) VECTOR<type, dimention> >> 3) type[dimention] >> >> My reasoning for this order >> >> * type[dimention] looks like syntax sugar for array<type, dimention>, so >> users may assume list/array semantics, but we limit to non-null elements in >> a frozen array >> * feel VECTOR as a prefix feels out of place, but VECTOR as a direct type >> makes more sense… this also leads to a possible future of VECTOR<type> >> which is the non-fixed length version of this type. What makes VECTOR >> different from list/array? non-null elements and is frozen. I don’t feel >> that VECTOR really tells users to expect non-null or frozen semantics, as >> there exists different VECTOR types for those reasons (sparse vs dense)… >> * DENSE may be confusing for people coming from languages where this just >> means “sequential layout”, which is what our frozen array/list already are… >> but since the target user is coming from a ML background, this shouldn’t >> offer much confusion. DENSE just means FROZEN in Cassandra, with NON NULL >> elements (SPARSE allows for NULL and isn’t frozen)… So DENSE just acts as >> syntax sugar for frozen<non null type[dimention]> >> >> >> On May 4, 2023, at 4:13 AM, Brandon Williams <dri...@gmail.com> wrote: >> >> 1. VECTOR<FLOAT,n> >> 2. VECTOR FLOAT[n] >> 3. FLOAT[N] (Non null by default) >> >> Redundant or not, I think having the VECTOR keyword helps signify what >> the app is generally about and helps get buy-in from ML stakeholders. >> >> On Thu, May 4, 2023 at 3:45 AM Benedict <bened...@apache.org> wrote: >> >> >> Hurrah for initial agreement. >> >> For syntax, I think one option was just FLOAT[N]. In VECTOR FLOAT[N], >> VECTOR is redundant - FLOAT[N] is fully descriptive by itself. I don’t >> think VECTOR should be used to simply imply non-null, as this would be very >> unintuitive. More logical would be NONNULL, if this is the only condition >> being applied. Alternatively for arrays we could default to NONNULL and >> later introduce NULLABLE if we want to permit nulls. >> >> If the word vector is to be used it makes more sense to make it look like >> a list, so VECTOR<FLOAT, N> as here the word VECTOR is clearly not >> redundant. >> >> So, I vote: >> >> 1) (NON NULL) FLOAT[N] >> 2) FLOAT[N] (Non null by default) >> 3) VECTOR<FLOAT, N> >> >> >> >> On 4 May 2023, at 08:52, Mick Semb Wever <m...@apache.org> wrote: >> >> >> >> >> Did we agree on a CQL syntax? >> >> I don’t believe there has been a pool on CQL syntax… my understanding >> reading all the threads is that there are ~4-5 options and non are -1ed, so >> believe we are waiting for majority rule on this? >> >> >> >> >> Re-reading that thread, IIUC the valid choices remaining are… >> >> 1. VECTOR FLOAT[n] >> 2. FLOAT VECTOR[n] >> 3. VECTOR<FLOAT,n> >> 4. VECTOR[n]<FLOAT> >> 5. ARRAY<FLOAT, n> >> 6. NON-NULL FROZEN<FLOAT[n]> >> >> >> Yes I'm putting my preference (1) first ;) because (banging on) if the >> future of CQL will have FLOAT[n] and FROZEN<FLOAT[n]>, where the VECTOR >> keyword is: for general cql users; just meaning "non-null and frozen", >> these gel best together. >> >> Options (5) and (6) are for those that feel we can and should provide >> this type without introducing the vector keyword. >> >> >> >>