Yep, fair point…. SPARSE VECTOR better maps to NON NULL MAP<int32, type>

> On May 5, 2023, at 11:58 AM, David Capwell <dcapw...@apple.com> wrote:
> 
>> If we ever add sparse vectors, we can assume that DENSE is the default and 
>> allow to use either DENSE, SPARSE or nothing.
> 
> I have been feeling that sparse is just a fixed size list with nulls… so 
> array<type, dimension>… if you insert {0: 42, 3: 17} then you get a array of 
> [42, null, null, 17]?  One negative doing this is any operator/function that 
> needs to reify large vectors (lets say 10k elements) you have a ton of memory 
> due to us making it a array… so a new type could be used to lower this cost…
> 
> With DENSE VECTOR we have the syntax in place that we “could” add SPARSE 
> later… With VECTOR we will have complications adding a sparse vector after 
> the fact due to this implying DENSE…
> 
> Updated ranking
> 
> Syntax
> Score
> VECTOR<type, dimension>
> 21
> DENSE VECTOR<type, dimension>
> 12
> type[dimension]
> 10
> NON NULL <type>[dimention]
> 8
> VECTOR type[n]
> 5
> DENSE_VECTOR<type, dimension>
> 4
> NON-NULL FROZEN<type[n]>
> 3
> ARRAY<type, n>
> 1
> 
> Syntax
> Round 1
> Round 2
> VECTOR<type, dimension>
> 4
> 4
> DENSE VECTOR<type, dimension>
> 2
> 3
> NON NULL <type>[dimention]
> 2
> 1
> VECTOR type[n]
> 1
> 
> type[dimension]
> 1
> 
> DENSE_VECTOR<type, dimension>
> 1
> 
> NON-NULL FROZEN<type[n]>
> 1
> 
> ARRAY<type, n>
> 0
> 
> 
> VECTOR<type, dimension> is still in the lead…
> 
>> On May 5, 2023, at 11:40 AM, Andrés de la Peña <adelap...@apache.org> wrote:
>> 
>> My vote is:
>> 
>> 1. VECTOR<type, dimension>
>> 2. DENSE VECTOR<type, dimension>
>> 3. type[dimension]
>> 
>> If we ever add sparse vectors, we can assume that DENSE is the default and 
>> allow to use either DENSE, SPARSE or nothing.
>> 
>> Perhaps the dimension could be separated from the type, such as in 
>> VECTOR<type>[dimension] or VECTOR<type>(dimension).
>> 
>> On Fri, 5 May 2023 at 19:05, David Capwell <dcapw...@apple.com 
>> <mailto:dcapw...@apple.com>> wrote:
>>>>> ...where, just to be clear, VECTOR<type, dimension> means a frozen fixed 
>>>>> size array w/ no null values?
>>>> Assuming this is the case
>>> 
>>> The current agreed requirements are:
>>> 
>>> 1) non-null elements
>>> 2) fixed length
>>> 3) frozen 
>>> 
>>> You pointed out 3 isn’t actually required, but that would be a different 
>>> conversation to remove =)… maybe defer this to JIRA as long as all parties 
>>> agree in the ticket?
>>> 
>>> With all votes in, this is what I see
>>> 
>>> Syntax
>>> Jonathan Ellis
>>> David Capwell
>>> Josh McKenzie
>>> Caleb Rackliffe
>>> Patrick McFadin
>>> Brandon Williams
>>> Mike Adamson
>>> Benedict
>>> Mick Semb Wever
>>> Derek Chen-Becker
>>> VECTOR<type, dimension>
>>> 1
>>> 2
>>> 2
>>> 
>>> 2
>>> 1
>>> 1
>>> 3
>>> 2
>>> 
>>> DENSE VECTOR<type, dimension>
>>> 2
>>> 1
>>> 
>>> 
>>> 1
>>> 
>>> 2
>>> 
>>> 
>>> 
>>> type[dimension]
>>> 3
>>> 3
>>> 3
>>> 1
>>> 
>>> 3
>>> 
>>> 2
>>> 
>>> 
>>> DENSE_VECTOR<type, dimension>
>>> 
>>> 
>>> 1
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 3
>>> NON NULL <type>[dimention]
>>> 
>>> 1
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 1
>>> 
>>> 2
>>> VECTOR type[n]
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 2
>>> 
>>> 
>>> 1
>>> 
>>> ARRAY<type, n>
>>> 
>>> 
>>> 
>>> 
>>> 3
>>> 
>>> 
>>> 
>>> 
>>> 
>>> NON-NULL FROZEN<type[n]>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 1
>>> 
>>> Rank
>>> Weight
>>> 1
>>> 3
>>> 2
>>> 2
>>> 3
>>> 1
>>> ?
>>> 3
>>> 
>>> Syntax
>>> Score
>>> VECTOR<type, dimension>
>>> 18
>>> DENSE VECTOR<type, dimension>
>>> 10
>>> type[dimension]
>>> 9
>>> NON NULL <type>[dimention]
>>> 8
>>> VECTOR type[n]
>>> 5
>>> DENSE_VECTOR<type, dimension>
>>> 4
>>> NON-NULL FROZEN<type[n]>
>>> 3
>>> ARRAY<type, n>
>>> 1
>>> 
>>> 
>>> Syntax
>>> Round 1
>>> Round 2
>>> VECTOR<type, dimension>
>>> 3
>>> 4
>>> DENSE VECTOR<type, dimension>
>>> 2
>>> 2
>>> NON NULL <type>[dimention]
>>> 2
>>> 1
>>> VECTOR type[n]
>>> 1
>>> 
>>> type[dimension]
>>> 1
>>> 
>>> DENSE_VECTOR<type, dimension>
>>> 1
>>> 
>>> NON-NULL FROZEN<type[n]>
>>> 1
>>> 
>>> ARRAY<type, n>
>>> 0
>>> 
>>> 
>>> Under 2 different voting systems vector<type, dimension> is in the lead and 
>>> by a good amount… I have updated the patch locally to reflect this change 
>>> as well.
>>> 
>>>> On May 5, 2023, at 10:41 AM, Mike Adamson <madam...@datastax.com 
>>>> <mailto:madam...@datastax.com>> wrote:
>>>> 
>>>>> ...where, just to be clear, VECTOR<type, dimension> means a frozen fixed 
>>>>> size array w/ no null values?
>>>> Assuming this is the case, my vote is:
>>>> 
>>>> 1. VECTOR<type, dimension>
>>>> 2. DENSE VECTOR<type, dimension>
>>>> 
>>>> I don't really have a 3rd vote because I think that type[dimension] is too 
>>>> ambiguous. 
>>>> 
>>>> 
>>>> On Fri, 5 May 2023 at 18:32, Derek Chen-Becker <de...@chen-becker.org 
>>>> <mailto:de...@chen-becker.org>> wrote:
>>>>> LOL, I'm holding you to that at the summit :) In all seriousness, I'm 
>>>>> glad to see a robust debate around it. I guess for completeness, my order 
>>>>> of preference is 
>>>>> 
>>>>> 1 - NONNULL FROZEN<TYPE<N>>
>>>>> 2 - NONNULL TYPE<N> (which part of this implies frozen? The NONNULL or 
>>>>> the cardinality?)
>>>>> 3 - DENSE_VECTOR<type, N>
>>>>> 
>>>>> I guess my main concern with just "VECTOR" is that it's such an 
>>>>> overloaded term. Maybe in ML it means something specific, but for anyone 
>>>>> coming from C++, Rust, Java, etc, a Vector is both mutable and can carry 
>>>>> null (or equivalent, e.g. None, in Rust). If the argument hadn't also 
>>>>> been made that we should be working toward something that's not 
>>>>> ML-specific maybe I would be less concerned.
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Derek
>>>>> 
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Derek
>>>>> 
>>>>> On Fri, May 5, 2023 at 11:14 AM Patrick McFadin <pmcfa...@gmail.com 
>>>>> <mailto:pmcfa...@gmail.com>> wrote:
>>>>>> Derek, despite your preference, I would hang out with you at a party. 
>>>>>> 
>>>>>> On Fri, May 5, 2023 at 9:44 AM Derek Chen-Becker <de...@chen-becker.org 
>>>>>> <mailto:de...@chen-becker.org>> wrote:
>>>>>>> Speaking as someone who likes Erlang, maybe that's why I also like 
>>>>>>> NONNULL FROZEN<TYPE<[n]>>. It's unambiguous what Cassandra is going to 
>>>>>>> do with that type. DENSE VECTOR means I need to go read docs (and then 
>>>>>>> probably double-check in the source to be sure) to be sure what exactly 
>>>>>>> is going on. 
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> Derek
>>>>>>> 
>>>>>>> On Fri, May 5, 2023 at 9:54 AM Patrick McFadin <pmcfa...@gmail.com 
>>>>>>> <mailto:pmcfa...@gmail.com>> wrote:
>>>>>>>> I hope we are willing to consider developers that use our system 
>>>>>>>> because if I had to teach people to use "NON-NULL FROZEN<TYPE[n]>" I'm 
>>>>>>>> pretty sure the response would be:
>>>>>>>> 
>>>>>>>> Did you tell me to go write a distributed map-reduce job in Erlang? I 
>>>>>>>> beleive I did, Bob.  
>>>>>>>> 
>>>>>>>> On Fri, May 5, 2023 at 8:05 AM Josh McKenzie <jmcken...@apache.org 
>>>>>>>> <mailto:jmcken...@apache.org>> wrote:
>>>>>>>>> Idiomatically, to my mind, there's a question of "what space are we 
>>>>>>>>> thinking about this datatype in"?
>>>>>>>>> 
>>>>>>>>> - In the context of mathematics, nullability in a vector would be 0
>>>>>>>>> - In the context of Cassandra, nullability tends to mean a tombstone 
>>>>>>>>> (or nothing)
>>>>>>>>> - In the context of programming languages, it's all over the place
>>>>>>>>> 
>>>>>>>>> Given many models are exploring quantizing to int8 and other data 
>>>>>>>>> types, there's definitely the "support other data types easily in the 
>>>>>>>>> future" piece to me we need to keep in mind.
>>>>>>>>> 
>>>>>>>>> So with the above and the "meet the user where they are and don't 
>>>>>>>>> make them understand more of Cassandra than absolutely critical to 
>>>>>>>>> use it", I lean:
>>>>>>>>> 
>>>>>>>>> 1. DENSE_VECTOR<type, dimension>
>>>>>>>>> 2. VECTOR<type, dimension>
>>>>>>>>> 3. type[dimension]
>>>>>>>>> 
>>>>>>>>> This leaves the path open for us to expand on it in the future with 
>>>>>>>>> sparse support and allows us to introduce some semantics that 
>>>>>>>>> indicate idioms around nullability for the users coming from a 
>>>>>>>>> different space.
>>>>>>>>> 
>>>>>>>>> "NON-NULL FROZEN<TYPE[n]>" is strictly correct, however it requires 
>>>>>>>>> understanding idioms of how Cassandra thinks about data (nulls mean 
>>>>>>>>> different things to us, we have differences between frozen and 
>>>>>>>>> non-frozen due to constraints in our storage engine and 
>>>>>>>>> materialization of data, etc) that get in the way of users doing 
>>>>>>>>> things in the pattern they're familiar with without learning more 
>>>>>>>>> about the DB than they're probably looking to learn. Historically 
>>>>>>>>> this has been a challenge for us in adoption; the classic "Why can't 
>>>>>>>>> I just write and delete and write as much as I want? Why are deletes 
>>>>>>>>> filling up my disk?" problem comes to mind.
>>>>>>>>> 
>>>>>>>>> I'd also be happy with us supporting:
>>>>>>>>> * NON-NULL FROZEN<TYPE[n]>
>>>>>>>>> * DENSE_VECTOR<type, dimension> as syntactic sugar for the above
>>>>>>>>> 
>>>>>>>>> If getting into the "built-in syntactic sugar mapping for communities 
>>>>>>>>> and specific use-cases" is something we're willing to consider.
>>>>>>>>> 
>>>>>>>>> On Fri, May 5, 2023, at 7:26 AM, Patrick McFadin wrote:
>>>>>>>>>> I think we are still discussing implementation here when I'm talking 
>>>>>>>>>> about developer experience. I want developers to adopt this quickly, 
>>>>>>>>>> easily and be successful. Vector search is already a thing. People 
>>>>>>>>>> use it every day. A successful outcome, in my view, is developers 
>>>>>>>>>> picking up this feature without reading a manual. (Because they 
>>>>>>>>>> don't anyway and get in trouble) I did some more extensive research 
>>>>>>>>>> about what other DBs are using for syntax. The consensus is some 
>>>>>>>>>> variety of 'VECTOR', 'DENSE' and 'SPARSE'
>>>>>>>>>> 
>>>>>>>>>> Pinecone[1] - dense_vector, sparse_vector
>>>>>>>>>> Elastic[2]: dense_vector
>>>>>>>>>> Milvus[3]: float_vector, binary_vector
>>>>>>>>>> pgvector[4]: vector
>>>>>>>>>> Weaviate[5]: Different approach. All typed arrays can be indexed
>>>>>>>>>> 
>>>>>>>>>> Based on that I'm advocating a similar syntax:
>>>>>>>>>> 
>>>>>>>>>> - DENSE VECTOR
>>>>>>>>>> or
>>>>>>>>>> - VECTOR
>>>>>>>>>> 
>>>>>>>>>> [1] https://docs.pinecone.io/docs/hybrid-search 
>>>>>>>>>> <https://urldefense.com/v3/__https://docs.pinecone.io/docs/hybrid-search__;!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7nGOa1KY4$>
>>>>>>>>>> [2] 
>>>>>>>>>> https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html
>>>>>>>>>>  
>>>>>>>>>> <https://urldefense.com/v3/__https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html__;!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7n--HiUaw$>
>>>>>>>>>> [3] https://milvus.io/docs/create_collection.md 
>>>>>>>>>> <https://urldefense.com/v3/__https://milvus.io/docs/create_collection.md__;!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7nQttAKvY$>
>>>>>>>>>> [4] https://github.com/pgvector/pgvector
>>>>>>>>>> [5] https://weaviate.io/developers/weaviate/config-refs/datatypes 
>>>>>>>>>> <https://urldefense.com/v3/__https://weaviate.io/developers/weaviate/config-refs/datatypes__;!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7n0yKoHLs$>
>>>>>>>>>> 
>>>>>>>>>> On Fri, May 5, 2023 at 6:07 AM Mike Adamson <madam...@datastax.com 
>>>>>>>>>> <mailto:madam...@datastax.com>> wrote:
>>>>>>>>>> Then we can have the indexing apparatus only accept frozen<float[n]> 
>>>>>>>>>> for the HSNW case.
>>>>>>>>>> I'm inclined to agree with Benedict that the index will need to be 
>>>>>>>>>> specifically select by option rather than inferred based on type. As 
>>>>>>>>>> such there is no real reason for the frozen requirement on the type. 
>>>>>>>>>> The hnsw index can be built just as easily from a non-frozen array.
>>>>>>>>>> 
>>>>>>>>>> I am in favour of enforcing non-null on the elements of an array by 
>>>>>>>>>> default. I would prefer that allowing nulls in the array would be a 
>>>>>>>>>> later addition if and when a use case arose for it.
>>>>>>>>>> 
>>>>>>>>>> On Fri, 5 May 2023 at 03:02, Caleb Rackliffe 
>>>>>>>>>> <calebrackli...@gmail.com <mailto:calebrackli...@gmail.com>> wrote:
>>>>>>>>>> Even in the ML case, sparse can just mean zeros rather than nulls, 
>>>>>>>>>> and they should compress similarly anyway.
>>>>>>>>>> 
>>>>>>>>>> If we really want null values, I'd rather leave that in collections 
>>>>>>>>>> space.
>>>>>>>>>> 
>>>>>>>>>> On Thu, May 4, 2023 at 8:59 PM Caleb Rackliffe 
>>>>>>>>>> <calebrackli...@gmail.com <mailto:calebrackli...@gmail.com>> wrote:
>>>>>>>>>> I actually still prefer type[dimension], because I think I 
>>>>>>>>>> intuitively read this as a primitive (meaning no null elements) 
>>>>>>>>>> array. Then we can have the indexing apparatus only accept 
>>>>>>>>>> frozen<float[n]> for the HSNW case.
>>>>>>>>>> 
>>>>>>>>>> If that isn't intuitive to anyone else, I don't really have a strong 
>>>>>>>>>> opinion...but...conflating "frozen" and "dense" seems like a bad 
>>>>>>>>>> idea. One should indicate single vs. multi-cell, and the other the 
>>>>>>>>>> presence or absence of nulls/zeros/whatever.
>>>>>>>>>> 
>>>>>>>>>> On Thu, May 4, 2023 at 12:51 PM Patrick McFadin <pmcfa...@gmail.com 
>>>>>>>>>> <mailto:pmcfa...@gmail.com>> wrote:
>>>>>>>>>> I agree with David's reasoning and the use of DENSE (and maybe 
>>>>>>>>>> eventually SPARSE). This is terminology well established in the data 
>>>>>>>>>> world, and it would lead to much easier adoption from users. VECTOR 
>>>>>>>>>> is close, but I can see having to create a lot of content around 
>>>>>>>>>> "How to use it and not get in trouble." (I have a lot of that 
>>>>>>>>>> content already)
>>>>>>>>>> 
>>>>>>>>>>  - We don't have to explain what it is. A lot of prior art out there 
>>>>>>>>>> already [1][2][3]
>>>>>>>>>>  - We're matching an established term with what users would expect. 
>>>>>>>>>> No surprises. 
>>>>>>>>>>  - Shorter ramp-up time for users. Cassandra is being modernized.
>>>>>>>>>> 
>>>>>>>>>> The implementation is flexible, but the interface should empower our 
>>>>>>>>>> users to be awesome. 
>>>>>>>>>> 
>>>>>>>>>> Patrick
>>>>>>>>>> 
>>>>>>>>>> 1 - 
>>>>>>>>>> https://stats.stackexchange.com/questions/266996/what-do-the-terms-dense-and-sparse-mean-in-the-context-of-neural-networks
>>>>>>>>>>  
>>>>>>>>>> <https://urldefense.com/v3/__https://stats.stackexchange.com/questions/266996/what-do-the-terms-dense-and-sparse-mean-in-the-context-of-neural-networks__;!!PbtH5S7Ebw!dpAaXazB6qZfr_FdkU9ThEq4X0DDTa-DlNvF5V4AvTiZSpHeYn6zqhFD4ZVaRLYoQBmNTn7n6jt5ymZs5Ud6ieKGQw$>
>>>>>>>>>> 2 - 
>>>>>>>>>> https://induraj2020.medium.com/what-are-sparse-features-and-dense-features-8d1746a77035
>>>>>>>>>>  
>>>>>>>>>> <https://urldefense.com/v3/__https://induraj2020.medium.com/what-are-sparse-features-and-dense-features-8d1746a77035__;!!PbtH5S7Ebw!dpAaXazB6qZfr_FdkU9ThEq4X0DDTa-DlNvF5V4AvTiZSpHeYn6zqhFD4ZVaRLYoQBmNTn7n6jt5ymZs5Ue1o2CO2Q$>
>>>>>>>>>> 3 - 
>>>>>>>>>> https://revware.net/sparse-vs-dense-data-the-power-of-points-and-clouds/
>>>>>>>>>>  
>>>>>>>>>> <https://urldefense.com/v3/__https://revware.net/sparse-vs-dense-data-the-power-of-points-and-clouds/__;!!PbtH5S7Ebw!dpAaXazB6qZfr_FdkU9ThEq4X0DDTa-DlNvF5V4AvTiZSpHeYn6zqhFD4ZVaRLYoQBmNTn7n6jt5ymZs5Ud3U6Hw5A$>
>>>>>>>>>> 
>>>>>>>>>> On Thu, May 4, 2023 at 10:25 AM David Capwell <dcapw...@apple.com 
>>>>>>>>>> <mailto:dcapw...@apple.com>> wrote:
>>>>>>>>>> My views have changed over time on syntax and I feel type[dimention] 
>>>>>>>>>> may not be the best, so it has gone lower in my own personal 
>>>>>>>>>> ranking… this is my current preference
>>>>>>>>>> 
>>>>>>>>>> 1) DENSE <type>[dimention] | NON NULL <type>[dimention]
>>>>>>>>>> 2) VECTOR<type, dimention>
>>>>>>>>>> 3) type[dimention]
>>>>>>>>>> 
>>>>>>>>>> My reasoning for this order
>>>>>>>>>> 
>>>>>>>>>> * type[dimention] looks like syntax sugar for array<type, 
>>>>>>>>>> dimention>, so users may assume list/array semantics, but we limit 
>>>>>>>>>> to non-null elements in a frozen array
>>>>>>>>>> * feel VECTOR as a prefix feels out of place, but VECTOR as a direct 
>>>>>>>>>> type makes more sense… this also leads to a possible future of 
>>>>>>>>>> VECTOR<type> which is the non-fixed length version of this type.  
>>>>>>>>>> What makes VECTOR different from list/array?  non-null elements and 
>>>>>>>>>> is frozen.  I don’t feel that VECTOR really tells users to expect 
>>>>>>>>>> non-null or frozen semantics, as there exists different VECTOR types 
>>>>>>>>>> for those reasons (sparse vs dense)… 
>>>>>>>>>> * DENSE may be confusing for people coming from languages where this 
>>>>>>>>>> just means “sequential layout”, which is what our frozen array/list 
>>>>>>>>>> already are… but since the target user is coming from a ML 
>>>>>>>>>> background, this shouldn’t offer much confusion.  DENSE just means 
>>>>>>>>>> FROZEN in Cassandra, with NON NULL elements (SPARSE allows for NULL 
>>>>>>>>>> and isn’t frozen)… So DENSE just acts as syntax sugar for frozen<non 
>>>>>>>>>> null type[dimention]>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On May 4, 2023, at 4:13 AM, Brandon Williams <dri...@gmail.com 
>>>>>>>>>>> <mailto:dri...@gmail.com>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 1. VECTOR<FLOAT,n>
>>>>>>>>>>> 2. VECTOR FLOAT[n]
>>>>>>>>>>> 3. FLOAT[N]   (Non null by default)
>>>>>>>>>>> 
>>>>>>>>>>> Redundant or not, I think having the VECTOR keyword helps signify 
>>>>>>>>>>> what
>>>>>>>>>>> the app is generally about and helps get buy-in from ML 
>>>>>>>>>>> stakeholders.
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, May 4, 2023 at 3:45 AM Benedict <bened...@apache.org 
>>>>>>>>>>> <mailto:bened...@apache.org>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hurrah for initial agreement.
>>>>>>>>>>>> 
>>>>>>>>>>>> For syntax, I think one option was just FLOAT[N]. In VECTOR 
>>>>>>>>>>>> FLOAT[N], VECTOR is redundant - FLOAT[N] is fully descriptive by 
>>>>>>>>>>>> itself. I don’t think VECTOR should be used to simply imply 
>>>>>>>>>>>> non-null, as this would be very unintuitive. More logical would be 
>>>>>>>>>>>> NONNULL, if this is the only condition being applied. 
>>>>>>>>>>>> Alternatively for arrays we could default to NONNULL and later 
>>>>>>>>>>>> introduce NULLABLE if we want to permit nulls.
>>>>>>>>>>>> 
>>>>>>>>>>>> If the word vector is to be used it makes more sense to make it 
>>>>>>>>>>>> look like a list, so VECTOR<FLOAT, N> as here the word VECTOR is 
>>>>>>>>>>>> clearly not redundant.
>>>>>>>>>>>> 
>>>>>>>>>>>> So, I vote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1) (NON NULL) FLOAT[N]
>>>>>>>>>>>> 2) FLOAT[N]   (Non null by default)
>>>>>>>>>>>> 3) VECTOR<FLOAT, N>
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On 4 May 2023, at 08:52, Mick Semb Wever <m...@apache.org 
>>>>>>>>>>>> <mailto:m...@apache.org>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Did we agree on a CQL syntax?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I don’t believe there has been a pool on CQL syntax… my 
>>>>>>>>>>>>> understanding reading all the threads is that there are ~4-5 
>>>>>>>>>>>>> options and non are -1ed, so believe we are waiting for majority 
>>>>>>>>>>>>> rule on this?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Re-reading that thread, IIUC the valid choices remaining are…
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. VECTOR FLOAT[n]
>>>>>>>>>>>> 2. FLOAT VECTOR[n]
>>>>>>>>>>>> 3. VECTOR<FLOAT,n>
>>>>>>>>>>>> 4. VECTOR[n]<FLOAT>
>>>>>>>>>>>> 5. ARRAY<FLOAT, n>
>>>>>>>>>>>> 6. NON-NULL FROZEN<FLOAT[n]>
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Yes I'm putting my preference (1) first ;) because (banging on) if 
>>>>>>>>>>>> the future of CQL will have FLOAT[n] and FROZEN<FLOAT[n]>, where 
>>>>>>>>>>>> the VECTOR keyword is: for general cql users; just meaning 
>>>>>>>>>>>> "non-null and frozen", these gel best together.
>>>>>>>>>>>> 
>>>>>>>>>>>> Options (5) and (6) are for those that feel we can and should 
>>>>>>>>>>>> provide this type without introducing the vector keyword.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>>  <https://www.datastax.com/>
>>>>>>>>>> Mike Adamson
>>>>>>>>>> Engineering
>>>>>>>>>> +1 650 389 6000 <tel:16503896000> | datastax.com 
>>>>>>>>>> <https://www.datastax.com/>
>>>>>>>>>> Find DataStax Online:
>>>>>>>>>>  
>>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=akx0E6l2bnTjOvA-YxtonbW0M4b6bNg4nRwmcHNDo4Q&e=>
>>>>>>>>>>     
>>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=ncMlB41-6hHuqx-EhnM83-KVtjMegQ9c2l2zDzHAxiU&e=>
>>>>>>>>>>     <https://twitter.com/DataStax>    
>>>>>>>>>> <https://www.datastax.com/blog/rss.xml>    
>>>>>>>>>> <https://github.com/datastax>
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> +---------------------------------------------------------------+
>>>>>>> | Derek Chen-Becker                                             |
>>>>>>> | GPG Key available at https://keybase.io/dchenbecker 
>>>>>>> <https://urldefense.com/v3/__https://keybase.io/dchenbecker__;!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7nLBpa-Vg$>
>>>>>>>  and       |
>>>>>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org 
>>>>>>> <https://urldefense.com/v3/__https://pgp.mit.edu/pks/lookup?search=derek*40chen-becker.org__;JQ!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7nkqpt2mA$>
>>>>>>>  |
>>>>>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>>>>>>> +---------------------------------------------------------------+
>>>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> +---------------------------------------------------------------+
>>>>> | Derek Chen-Becker                                             |
>>>>> | GPG Key available at https://keybase.io/dchenbecker 
>>>>> <https://urldefense.com/v3/__https://keybase.io/dchenbecker__;!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7nLBpa-Vg$>
>>>>>  and       |
>>>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org 
>>>>> <https://urldefense.com/v3/__https://pgp.mit.edu/pks/lookup?search=derek*40chen-becker.org__;JQ!!PbtH5S7Ebw!epFk5syZ_avANqrEkFR0WT7Alkybo0yrvO-_awqqn8mVWpnyuSgAm0FMgbE_rYpSWJSC91KmoX7nkqpt2mA$>
>>>>>  |
>>>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>>>>> +---------------------------------------------------------------+
>>>>> 
>>>> 
>>>> 
>>>> -- 
>>>>  <https://www.datastax.com/>       Mike Adamson
>>>> Engineering
>>>> 
>>>> +1 650 389 6000 <tel:16503896000> | datastax.com 
>>>> <https://www.datastax.com/>
>>>> Find DataStax Online:       
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=akx0E6l2bnTjOvA-YxtonbW0M4b6bNg4nRwmcHNDo4Q&e=>
>>>>     
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_datastax&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=uHzE4WhPViSF0rsjSxKhfwGDU1Bo7USObSc_aIcgelo&s=ncMlB41-6hHuqx-EhnM83-KVtjMegQ9c2l2zDzHAxiU&e=>
>>>>     <https://twitter.com/DataStax>    
>>>> <https://www.datastax.com/blog/rss.xml>    <https://github.com/datastax>
>>>> 
>>> 
> 

Reply via email to