Love it. Thank you folks for coming to a decision on this. This is very
helpful to move forward on planning on for the current Python frameworks:
- Langchain.CassandraVectorStore
- Langchain.CassandraVectorRetriever
- Langchain.CassandraVectorStoreAgent
- LlamaIndex.CassandraVectorLoad
https://issues.apache.org/jira/browse/CASSANDRA-18504
> On May 5, 2023, at 12:27 PM, David Capwell wrote:
>
> Yep, fair point…. SPARSE VECTOR better maps to NON NULL MAP
>
>> On May 5, 2023, at 11:58 AM, David Capwell wrote:
>>
>>> If we ever add sparse vectors, we can assume that DENSE is t
Yep, fair point…. SPARSE VECTOR better maps to NON NULL MAP
> On May 5, 2023, at 11:58 AM, David Capwell wrote:
>
>> If we ever add sparse vectors, we can assume that DENSE is the default and
>> allow to use either DENSE, SPARSE or nothing.
>
> I have been feeling that sparse is just a fixed s
Sparse vector in ML has the semantics that elements not explicitly set are
zero. I believe most (all?) sparse vector implementations use a map under
the hood; the point is to save a lot of space when you have 10K zeros and
100 that are nonzero.
On Fri, May 5, 2023 at 2:00 PM David Capwell wrote:
> If we ever add sparse vectors, we can assume that DENSE is the default and
> allow to use either DENSE, SPARSE or nothing.
I have been feeling that sparse is just a fixed size list with nulls… so
array… if you insert {0: 42, 3: 17} then you get a array of
[42, null, null, 17]? One negative d
My vote is:
1. VECTOR
2. DENSE VECTOR
3. type[dimension]
If we ever add sparse vectors, we can assume that DENSE is the default and
allow to use either DENSE, SPARSE or nothing.
Perhaps the dimension could be separated from the type, such as in
VECTOR[dimension] or VECTOR(dimension).
On Fri, 5
>> ...where, just to be clear, VECTOR means a frozen fixed
>> size array w/ no null values?
> Assuming this is the case
The current agreed requirements are:
1) non-null elements
2) fixed length
3) frozen
You pointed out 3 isn’t actually required, but that would be a different
conversation to
>
> ...where, just to be clear, VECTOR means a frozen fixed
> size array w/ no null values?
>
Assuming this is the case, my vote is:
1. VECTOR
2. DENSE VECTOR
I don't really have a 3rd vote because I think that *type[dimension]* is
too ambiguous.
On Fri, 5 May 2023 at 18:32, Derek Chen-Becker
LOL, I'm holding you to that at the summit :) In all seriousness, I'm glad
to see a robust debate around it. I guess for completeness, my order of
preference is
1 - NONNULL FROZEN>
2 - NONNULL TYPE (which part of this implies frozen? The NONNULL or the
cardinality?)
3 - DENSE_VECTOR
I guess my ma
Derek, despite your preference, I would hang out with you at a party.
On Fri, May 5, 2023 at 9:44 AM Derek Chen-Becker
wrote:
> Speaking as someone who likes Erlang, maybe that's why I also like NONNULL
> FROZEN>. It's unambiguous what Cassandra is going to do with that
> type. DENSE VECTOR mean
My vote is:
1. DENSE VECTOR
2. VECTOR
3. ARRAY
On Fri, May 5, 2023 at 9:43 AM David Capwell wrote:
> Went through and created a spreed sheet of current votes… For Patric and
> Mike, I don’t see a clear vote, so I put a ? where I “think” your
> preference is… for Mick, I only put one vote as the
Sorry, DENSE_VECTOR was pointing to the wrong row, updated score
Syntax
Score
VECTOR
16
DENSE VECTOR
11
type[dimension]
9
NON NULL [dimention]
6
VECTOR type[n]
5
DENSE_VECTOR
3
NON-NULL FROZEN
3
ARRAY
0
> On May 5, 2023, at 10:01 AM, David Capwell wrote:
>
> Updated
>
> Syntax
> Jonathan Ellis
Updated
Syntax
Jonathan Ellis
David Capwell
Josh McKenzie
Caleb Rackliffe
Patrick McFadin
Brandon Williams
Mike Adamson
Benedict
Mick Semb Wever
Derek Chen-Becker
VECTOR
1
2
2
1
?
3
2
DENSE VECTOR
2
1
?
?
type[dimension]
3
3
3
1
3
2
DENSE_VECTOR
1
NON NULL [dimention]
1
On Fri, 5 May 2023 at 18:43, David Capwell wrote:
> Went through and created a spreed sheet of current votes… For Patric and
> Mike, I don’t see a clear vote, so I put a ? where I “think” your
> preference is… for Mick, I only put one vote as the list looked like a
> summary, but you mentioned th
Speaking as someone who likes Erlang, maybe that's why I also like NONNULL
FROZEN>. It's unambiguous what Cassandra is going to do with that
type. DENSE VECTOR means I need to go read docs (and then probably
double-check in the source to be sure) to be sure what exactly is going on.
Cheers,
Derek
Went through and created a spreed sheet of current votes… For Patric and Mike,
I don’t see a clear vote, so I put a ? where I “think” your preference is… for
Mick, I only put one vote as the list looked like a summary, but you mentioned
the first was your preference
Syntax
Jonathan Ellis
David
...where, just to be clear, VECTOR means a frozen fixed
size array w/ no null values?
On Fri, May 5, 2023 at 11:23 AM Jonathan Ellis wrote:
> +10 for not inflicting unwieldy keywords on ML users.
>
> Re Josh's summary, mostly agreed, my only objection to adding the DENSE
> keyword is that I don'
+10 for not inflicting unwieldy keywords on ML users.
Re Josh's summary, mostly agreed, my only objection to adding the DENSE
keyword is that I don't see a foreseeable future where we also support
sparse vectors, so it would end up being unnecessary extra verbosity. So
my preference would be
1.
> The hnsw index can be built just as easily from a non-frozen array.
I have 0 issues removing that limitation =)
> I am in favour of enforcing non-null on the elements of an array by default.
This is why I feel DENSE or NON NULL are the best prefix, as those both imply
elements may not be null
I hope we are willing to consider developers that use our system because if
I had to teach people to use "NON-NULL FROZEN" I'm pretty sure the
response would be:
Did you tell me to go write a distributed map-reduce job in Erlang? I
beleive I did, Bob.
On Fri, May 5, 2023 at 8:05 AM Josh McKenzie
Idiomatically, to my mind, there's a question of "what space are we thinking
about this datatype in"?
- In the context of mathematics, nullability in a vector would be 0
- In the context of Cassandra, nullability tends to mean a tombstone (or
nothing)
- In the context of programming languages, i
I think we are still discussing implementation here when I'm talking about
developer experience. I want developers to adopt this quickly, easily and
be successful. Vector search is already a thing. People use it every day. A
successful outcome, in my view, is developers picking up this feature
with
+1
We at zeotap did something similar with Scylla DB and Janusgraph for Spark
graph OLAP use cases, this is truly transformative, C* HTAP
On Fri, May 5, 2023 at 4:15 PM Sam Tunnicliffe wrote:
> +1
>
> On 4 May 2023, at 17:46, Doug Rohrer wrote:
>
> Hello all,
>
> I’d like to put CEP-28 to a vo
>
> Then we can have the indexing apparatus only accept *frozen* for
> the HSNW case.
>
I'm inclined to agree with Benedict that the index will need to be
specifically select by option rather than inferred based on type. As such
there is no real reason for the *frozen* requirement on the type. The
The test build of Cassandra 3.0.29 is available.
sha1: 087cffce636b63c12e328994d52bdf8f4ccc9750
Git: https://github.com/apache/cassandra/tree/3.0.29-tentative
Maven Artifacts:
https://repository.apache.org/content/repositories/orgapachecassandra-1288/org/apache/cassandra/cassandra-all/3.0.29/
Th
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.11.15.
Apache Cassandra is a fully distributed database. It is the right choice when
you need scalability and high availability without compromising performance.
http://cassandra.apache.org/
Downloads of sourc
+1
> On 4 May 2023, at 17:46, Doug Rohrer wrote:
>
> Hello all,
>
> I’d like to put CEP-28 to a vote.
>
> Proposal:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics
>
> Jira:
> https://issues.apache.org/jira/brow
+1
> On 4 May 2023, at 17:46, Doug Rohrer wrote:
>
> Hello all,
>
> I’d like to put CEP-28 to a vote.
>
> Proposal:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-28%3A+Reading+and+Writing+Cassandra+Data+with+Spark+Bulk+Analytics
>
> Jira:
> https://issues.apache.org/jira/brow
The vote has passed.
From: Tommy Stendahl via dev
Sent: Thursday, May 4, 2023 10:27
To: dev@cassandra.apache.org
Subject: Re: [VOTE] Release Apache Cassandra 3.11.15
NetApp Security WARNING: This is an external email. Do not click links or open
attachmen
29 matches
Mail list logo