Re: [POLL] Vector type for ML

2023-05-05 Thread Rahul Xavier Singh
Love it. Thank you folks for coming to a decision on this. This is very helpful to move forward on planning on for the current Python frameworks: - Langchain.CassandraVectorStore - Langchain.CassandraVectorRetriever - Langchain.CassandraVectorStoreAgent - LlamaIndex.CassandraVectorLoad

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
https://issues.apache.org/jira/browse/CASSANDRA-18504 > On May 5, 2023, at 12:27 PM, David Capwell wrote: > > Yep, fair point…. SPARSE VECTOR better maps to NON NULL MAP > >> On May 5, 2023, at 11:58 AM, David Capwell wrote: >> >>> If we ever add sparse vectors, we can assume that DENSE is t

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
Yep, fair point…. SPARSE VECTOR better maps to NON NULL MAP > On May 5, 2023, at 11:58 AM, David Capwell wrote: > >> If we ever add sparse vectors, we can assume that DENSE is the default and >> allow to use either DENSE, SPARSE or nothing. > > I have been feeling that sparse is just a fixed s

Re: [POLL] Vector type for ML

2023-05-05 Thread Jonathan Ellis
Sparse vector in ML has the semantics that elements not explicitly set are zero. I believe most (all?) sparse vector implementations use a map under the hood; the point is to save a lot of space when you have 10K zeros and 100 that are nonzero. On Fri, May 5, 2023 at 2:00 PM David Capwell wrote:

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
> If we ever add sparse vectors, we can assume that DENSE is the default and > allow to use either DENSE, SPARSE or nothing. I have been feeling that sparse is just a fixed size list with nulls… so array… if you insert {0: 42, 3: 17} then you get a array of [42, null, null, 17]? One negative d

Re: [POLL] Vector type for ML

2023-05-05 Thread Andrés de la Peña
My vote is: 1. VECTOR 2. DENSE VECTOR 3. type[dimension] If we ever add sparse vectors, we can assume that DENSE is the default and allow to use either DENSE, SPARSE or nothing. Perhaps the dimension could be separated from the type, such as in VECTOR[dimension] or VECTOR(dimension). On Fri, 5

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
>> ...where, just to be clear, VECTOR means a frozen fixed >> size array w/ no null values? > Assuming this is the case The current agreed requirements are: 1) non-null elements 2) fixed length 3) frozen You pointed out 3 isn’t actually required, but that would be a different conversation to

Re: [POLL] Vector type for ML

2023-05-05 Thread Mike Adamson
> > ...where, just to be clear, VECTOR means a frozen fixed > size array w/ no null values? > Assuming this is the case, my vote is: 1. VECTOR 2. DENSE VECTOR I don't really have a 3rd vote because I think that *type[dimension]* is too ambiguous. On Fri, 5 May 2023 at 18:32, Derek Chen-Becker

Re: [POLL] Vector type for ML

2023-05-05 Thread Derek Chen-Becker
LOL, I'm holding you to that at the summit :) In all seriousness, I'm glad to see a robust debate around it. I guess for completeness, my order of preference is 1 - NONNULL FROZEN> 2 - NONNULL TYPE (which part of this implies frozen? The NONNULL or the cardinality?) 3 - DENSE_VECTOR I guess my ma

Re: [POLL] Vector type for ML

2023-05-05 Thread Patrick McFadin
Derek, despite your preference, I would hang out with you at a party. On Fri, May 5, 2023 at 9:44 AM Derek Chen-Becker wrote: > Speaking as someone who likes Erlang, maybe that's why I also like NONNULL > FROZEN>. It's unambiguous what Cassandra is going to do with that > type. DENSE VECTOR mean

Re: [POLL] Vector type for ML

2023-05-05 Thread Patrick McFadin
My vote is: 1. DENSE VECTOR 2. VECTOR 3. ARRAY On Fri, May 5, 2023 at 9:43 AM David Capwell wrote: > Went through and created a spreed sheet of current votes… For Patric and > Mike, I don’t see a clear vote, so I put a ? where I “think” your > preference is… for Mick, I only put one vote as the

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
Sorry, DENSE_VECTOR was pointing to the wrong row, updated score Syntax Score VECTOR 16 DENSE VECTOR 11 type[dimension] 9 NON NULL [dimention] 6 VECTOR type[n] 5 DENSE_VECTOR 3 NON-NULL FROZEN 3 ARRAY 0 > On May 5, 2023, at 10:01 AM, David Capwell wrote: > > Updated > > Syntax > Jonathan Ellis

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
Updated Syntax Jonathan Ellis David Capwell Josh McKenzie Caleb Rackliffe Patrick McFadin Brandon Williams Mike Adamson Benedict Mick Semb Wever Derek Chen-Becker VECTOR 1 2 2 1 ? 3 2 DENSE VECTOR 2 1 ? ? type[dimension] 3 3 3 1 3 2 DENSE_VECTOR 1 NON NULL [dimention] 1

Re: [POLL] Vector type for ML

2023-05-05 Thread Mick Semb Wever
On Fri, 5 May 2023 at 18:43, David Capwell wrote: > Went through and created a spreed sheet of current votes… For Patric and > Mike, I don’t see a clear vote, so I put a ? where I “think” your > preference is… for Mick, I only put one vote as the list looked like a > summary, but you mentioned th

Re: [POLL] Vector type for ML

2023-05-05 Thread Derek Chen-Becker
Speaking as someone who likes Erlang, maybe that's why I also like NONNULL FROZEN>. It's unambiguous what Cassandra is going to do with that type. DENSE VECTOR means I need to go read docs (and then probably double-check in the source to be sure) to be sure what exactly is going on. Cheers, Derek

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
Went through and created a spreed sheet of current votes… For Patric and Mike, I don’t see a clear vote, so I put a ? where I “think” your preference is… for Mick, I only put one vote as the list looked like a summary, but you mentioned the first was your preference Syntax Jonathan Ellis David

Re: [POLL] Vector type for ML

2023-05-05 Thread Caleb Rackliffe
...where, just to be clear, VECTOR means a frozen fixed size array w/ no null values? On Fri, May 5, 2023 at 11:23 AM Jonathan Ellis wrote: > +10 for not inflicting unwieldy keywords on ML users. > > Re Josh's summary, mostly agreed, my only objection to adding the DENSE > keyword is that I don'

Re: [POLL] Vector type for ML

2023-05-05 Thread Jonathan Ellis
+10 for not inflicting unwieldy keywords on ML users. Re Josh's summary, mostly agreed, my only objection to adding the DENSE keyword is that I don't see a foreseeable future where we also support sparse vectors, so it would end up being unnecessary extra verbosity. So my preference would be 1.

Re: [POLL] Vector type for ML

2023-05-05 Thread David Capwell
> The hnsw index can be built just as easily from a non-frozen array. I have 0 issues removing that limitation =) > I am in favour of enforcing non-null on the elements of an array by default. This is why I feel DENSE or NON NULL are the best prefix, as those both imply elements may not be null

Re: [POLL] Vector type for ML

2023-05-05 Thread Patrick McFadin
I hope we are willing to consider developers that use our system because if I had to teach people to use "NON-NULL FROZEN" I'm pretty sure the response would be: Did you tell me to go write a distributed map-reduce job in Erlang? I beleive I did, Bob. On Fri, May 5, 2023 at 8:05 AM Josh McKenzie

Re: [POLL] Vector type for ML

2023-05-05 Thread Josh McKenzie
Idiomatically, to my mind, there's a question of "what space are we thinking about this datatype in"? - In the context of mathematics, nullability in a vector would be 0 - In the context of Cassandra, nullability tends to mean a tombstone (or nothing) - In the context of programming languages, i

Re: [POLL] Vector type for ML

2023-05-05 Thread Patrick McFadin
I think we are still discussing implementation here when I'm talking about developer experience. I want developers to adopt this quickly, easily and be successful. Vector search is already a thing. People use it every day. A successful outcome, in my view, is developers picking up this feature with

Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-05 Thread SAURABH VERMA
+1 We at zeotap did something similar with Scylla DB and Janusgraph for Spark graph OLAP use cases, this is truly transformative, C* HTAP On Fri, May 5, 2023 at 4:15 PM Sam Tunnicliffe wrote: > +1 > > On 4 May 2023, at 17:46, Doug Rohrer wrote: > > Hello all, > > I’d like to put CEP-28 to a vo

Re: [POLL] Vector type for ML

2023-05-05 Thread Mike Adamson
> > Then we can have the indexing apparatus only accept *frozen* for > the HSNW case. > I'm inclined to agree with Benedict that the index will need to be specifically select by option rather than inferred based on type. As such there is no real reason for the *frozen* requirement on the type. The

[ANNOUNCE] Apache Cassandra 3.0.29 test artifact available

2023-05-05 Thread Miklosovic, Stefan
The test build of Cassandra 3.0.29 is available. sha1: 087cffce636b63c12e328994d52bdf8f4ccc9750 Git: https://github.com/apache/cassandra/tree/3.0.29-tentative Maven Artifacts: https://repository.apache.org/content/repositories/orgapachecassandra-1288/org/apache/cassandra/cassandra-all/3.0.29/ Th

[RELEASE] Apache Cassandra 3.11.15 released

2023-05-05 Thread Miklosovic, Stefan
The Cassandra team is pleased to announce the release of Apache Cassandra version 3.11.15. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high availability without compromising performance. http://cassandra.apache.org/ Downloads of sourc

Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-05 Thread Sam Tunnicliffe
+1 > On 4 May 2023, at 17:46, Doug Rohrer wrote: > > Hello all, > > I’d like to put CEP-28 to a vote. > > Proposal: > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics > > Jira: > https://issues.apache.org/jira/brow

Re: [VOTE] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-05-05 Thread Aleksey Yeshchenko
+1 > On 4 May 2023, at 17:46, Doug Rohrer wrote: > > Hello all, > > I’d like to put CEP-28 to a vote. > > Proposal: > > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics > > Jira: > https://issues.apache.org/jira/brow

Re: [VOTE] Release Apache Cassandra 3.11.15

2023-05-05 Thread Miklosovic, Stefan
The vote has passed. From: Tommy Stendahl via dev Sent: Thursday, May 4, 2023 10:27 To: dev@cassandra.apache.org Subject: Re: [VOTE] Release Apache Cassandra 3.11.15 NetApp Security WARNING: This is an external email. Do not click links or open attachmen