I found some other confusing behavior in LWT around null value and empty multicell collection. I opened CASSANDRA-19637 <https://issues.apache.org/jira/browse/CASSANDRA-19637>for those interested.
Le jeu. 4 avr. 2024 à 18:34, Caleb Rackliffe <calebrackli...@gmail.com> a écrit : > The easiest way to check out how Accord uses IS NULL and IS NOT NULL is to > look at the examples in the cep-15-accord branch: > > > https://github.com/apache/cassandra/blob/cep-15-accord/test/distributed/org/apache/cassandra/distributed/test/accord/AccordCQLTestBase.java > > tl;dr We did indeed try to go with an approach that more closely matches > SQL, although there may still be some edges we didn't test. > > I'd have no problem w/ moving to 3-value logic everywhere, I guess, but > "everywhere" could just mean filtering queries and Accord. (i.e. If we want > to deprecate LWT col = NULL syntax, do we really want people rewriting > those LWTs...or just moving to the new Accord syntax, which obviously > supports it? We should "spend" our user query rewrite budget wisely.) > > On Thu, Apr 4, 2024 at 4:53 AM Benjamin Lerer <ble...@apache.org> wrote: > >> Now, in Cassandra setting a column to null means deleting it and if *all* >>> columns in a row are null the row is deleted. This might be another edge >>> case... >> >> >> It is slightly more complicated than that as the primary key columns >> count in (*all* columns) >> >> For example if you have the following table: CREATE TABLE tlb (pk int, c >> int, v int, PRIMARY KEY (pk, c)) and the following row: INSERT INTO tlb >> (pk, c, v) VALUES (1, 1, 1) >> deleting the column v (DELETE v FROM %s WHERE pk = 1 AND c = 1) will not >> delete the row as the primary key columns have a timestamp and therefore do >> exist. So the row will still exist with a null value for column v. >> >> If the row was created through an UPDATE (UPDATE tlb SET v = 1 WHERE pk = >> 1 AND c = 1) things will be different as an UPDATE statement do not create >> a timestamp for the clustering columns. By consequence, if V is deleted >> (set to null) the row will not have any columns anymore and will be deleted. >> >> The issue here is that in practice we never consider partition keys or >> clustering columns as null if the database returns a row for it. Whether >> the primary key columns have a timestamp or not. >> I believe that we should ignore that fact too as far as IS NULL/IS NOT >> NULL are concerned. If a row exists, its primary columns should be >> considered as not null. Otherwise we are getting on a really slippery >> slope. The INSERT/UPDATE logic is confusing enough in my opinion without >> adding another layer to it. >> >> One other issue that we have though is that the code for != LWT does not >> work with the Three-Valued logic. If you have: [...] IF col != 1 and col >> is null then in the TVL the value should be UNKNOWN therefore the condition >> should not match. >> It feels to me that we should probably keep that behavior for backward >> compatibility reasons but probably change the behavior in Accord if it is >> not already done. >> >> >> >> Le jeu. 21 mars 2024 à 01:10, German Eichberger via dev < >> dev@cassandra.apache.org> a écrit : >> >>> Hi, >>> >>> +1 I like doing it the SQL way. This makes sense to me. >>> >>> Now, in Cassandra setting a column to null means deleting it and if >>> *all* columns in a row are null the row is deleted. This might be >>> another edge case... >>> >>> German >>> ------------------------------ >>> *From:* Benjamin Lerer <b.le...@gmail.com> >>> *Sent:* Wednesday, March 20, 2024 9:15 AM >>> *To:* dev@cassandra.apache.org <dev@cassandra.apache.org> >>> *Subject:* [EXTERNAL] [DISCUSS] NULL handling and the unfrozen >>> collection issue >>> >>> You don't often get email from b.le...@gmail.com. Learn why this is >>> important <https://aka.ms/LearnAboutSenderIdentification> >>> Hi everybody, >>> >>> CEP-29 (CQL NOT Operator) is hitting the grey area of how we want as a >>> community to handle NULL including for things like unfrozen (multi-cell) >>> collections and I would like to make a proposal for moving forward with >>> NULL related issues. >>> >>> We have currently 2 tickets open about NULL handling (I might have >>> missed others): >>> >>> 1. CASSANDRA-10715 >>> <https://issues.apache.org/jira/browse/CASSANDRA-10715>: Allowing >>> Filtering on NULL >>> 2. CASSANDRA-17762 >>> <https://issues.apache.org/jira/browse/CASSANDRA-17762>: LWT IF col >>> = NULL is inconsistent with SQL NULL >>> >>> We also had previously some discussion on which we touched the subject: >>> >>> - [DISCUSS] LWT UPDATE semantics with + and - when null >>> - CEP-15 multi key transaction syntax >>> >>> In all those tickets and discussions the consensus was to have a >>> behavior similar to SQL. >>> >>> For null comparisons, SQL uses the three-value logic ( >>> https://modern-sql.com/concept/three-valued-logic) introducing the need >>> for IS NULL and IS NOT NULL operators. Those conflict with the col = NULL >>> predicate supported in LWT conditions (CASSANDRA-17762 >>> <https://issues.apache.org/jira/browse/CASSANDRA-17762>). >>> >>> So far, as Cassandra was only using inclusive operators, comparisons >>> were behaving in an expected way. According to three-valued logic NULL >>> CONTAINS 'foo' should return UNKNOWN and the filtering behavior should >>> exclude everything which is not true.Therefore the row should not be >>> returned as expected. With exclusive operators things are more tricky. >>> NULL NOT CONTAINS 'foo' will also return UNKNOWN causing the row to not be >>> returned which might not match people's expectations. >>> This behavior can be even more confusing once you take into account >>> empty and null collections. NOT CONTAINS on an empty collection will return >>> true while it will return UNKNOWN on a NULL collection. Unfortunately, for >>> unfrozen (multicell) collections we are unable to differentiate between an >>> empty and null collection and therefore always treat empty collections as >>> NULL. >>> For predicates such as map[myKey] != 'foo' when myKey is not present the >>> result can also be surprising as it will end up comparing NULL to 'foo' >>> returning once more UNKNOWN and ignoring the row. >>> In order to respect the SQL three-valued logic and be able to allow the >>> user to fetch all the rows which do not contains a specific value we would >>> need support IS NULL, IS NOT NULL and OR to allow query like: >>> WHERE c IS NULL OR c NOT CONTAINS 'foo' / WHERE m IS NULL OR m[myKey] != >>> foo >>> >>> Supporting the three-valued logic makes sense to me even if some >>> behavior might end up being confusing. In which case we can easily fix >>> CASSANDRA-10715 and deprectate support for col = NULL/col != NULL in LWT. >>> >>> What is people's opinion? Should we go for the three-valued logic >>> everywhere? Should we try something else? >>> >>> >>> >>> >>> >>>