Re: [DISCUSS] NULL handling and the unfrozen collection issue

Benjamin Lerer Thu, 04 Apr 2024 02:53:34 -0700

>
> Now, in Cassandra setting a column to null means deleting it and if *all*
> columns in a row are null the row is deleted. This might be another edge
> case...



It is slightly more complicated than that as the primary key columns count
in (*all* columns)

For example if you have the following table: CREATE TABLE tlb (pk int, c
int, v int, PRIMARY KEY (pk, c)) and the following row: INSERT INTO tlb
(pk, c, v) VALUES (1, 1, 1)
deleting the column v (DELETE v FROM %s WHERE pk = 1 AND c = 1) will not
delete the row as the primary key columns have a timestamp and therefore do
exist. So the row will still exist with a null value for column v.

If the row was created through an UPDATE (UPDATE tlb SET v = 1 WHERE pk = 1
AND c = 1) things will be different as an UPDATE statement do not create a
timestamp for the clustering columns. By consequence, if V is deleted (set
to null) the row will not have any columns anymore and will be deleted.

The issue here is that in practice we never consider partition keys or
clustering columns as null if the database returns a row for it. Whether
the primary key columns have a timestamp or not.
I believe that we should ignore that fact too as far as IS NULL/IS NOT NULL
are concerned. If a row exists, its primary columns should be considered as
not null. Otherwise we are getting on a really slippery slope. The
INSERT/UPDATE logic is confusing enough in my opinion without adding
another layer to it.

One other issue that we have though is that the code for != LWT does not
work with the Three-Valued logic. If you have: [...] IF col != 1  and col
is null then in the TVL the value should be UNKNOWN therefore the condition
should not match.
It feels to me that we should probably keep that behavior for backward
compatibility reasons but probably change the behavior in Accord if it is
not already done.



Le jeu. 21 mars 2024 à 01:10, German Eichberger via dev <
dev@cassandra.apache.org> a écrit :

> Hi,
>
> +1 I like doing it the SQL way. This makes sense to me.
>
> Now, in Cassandra setting a column to null means deleting it and if *all*
> columns in a row are null the row is deleted. This might be another edge
> case...
>
> German
> ------------------------------
> *From:* Benjamin Lerer <b.le...@gmail.com>
> *Sent:* Wednesday, March 20, 2024 9:15 AM
> *To:* dev@cassandra.apache.org <dev@cassandra.apache.org>
> *Subject:* [EXTERNAL] [DISCUSS] NULL handling and the unfrozen collection
> issue
>
> You don't often get email from b.le...@gmail.com. Learn why this is
> important <https://aka.ms/LearnAboutSenderIdentification>
> Hi everybody,
>
> CEP-29 (CQL NOT Operator) is hitting the grey area of how we want as a
> community to handle NULL including for things like unfrozen (multi-cell)
> collections and I would like to make a proposal for moving forward with
> NULL related issues.
>
> We have currently 2 tickets open about NULL handling (I might have missed
> others):
>
>    1. CASSANDRA-10715
>    <https://issues.apache.org/jira/browse/CASSANDRA-10715>: Allowing
>    Filtering on NULL
>    2. CASSANDRA-17762
>    <https://issues.apache.org/jira/browse/CASSANDRA-17762>: LWT IF col =
>    NULL is inconsistent with SQL NULL
>
> We also had previously some discussion on which we touched the subject:
>
>    - [DISCUSS] LWT UPDATE semantics with + and - when null
>    - CEP-15 multi key transaction syntax
>
> In all those tickets and discussions the consensus was to have a behavior
> similar to SQL.
>
> For null comparisons, SQL uses the three-value logic (
> https://modern-sql.com/concept/three-valued-logic) introducing the need
> for IS NULL and IS NOT NULL operators. Those conflict with the col = NULL
> predicate supported in LWT conditions (CASSANDRA-17762
> <https://issues.apache.org/jira/browse/CASSANDRA-17762>).
>
> So far, as Cassandra was only using inclusive operators, comparisons were
> behaving in an expected way. According to three-valued logic NULL CONTAINS
> 'foo' should return UNKNOWN and the filtering behavior should exclude
> everything which is not true.Therefore the row should not be returned as
> expected. With exclusive operators things are more tricky. NULL NOT
> CONTAINS 'foo' will also return UNKNOWN causing the row to not be returned
> which might not match people's expectations.
> This behavior can be even more confusing once you take into account empty
> and null collections. NOT CONTAINS on an empty collection will return true
> while it will return UNKNOWN on a NULL collection. Unfortunately, for
> unfrozen (multicell) collections we are unable to differentiate between an
> empty and null collection and therefore always treat empty collections as
> NULL.
> For predicates such as map[myKey] != 'foo' when myKey is not present the
> result can also be surprising as it will end up comparing NULL to 'foo'
> returning once more UNKNOWN and ignoring the row.
> In order to respect the SQL three-valued logic and be able to allow the
> user to fetch all the rows which do not contains a specific value we would
> need support IS NULL, IS NOT NULL and OR to allow query like:
> WHERE c IS NULL OR c NOT CONTAINS 'foo' / WHERE m IS NULL OR m[myKey] !=
> foo
>
> Supporting the three-valued logic makes sense to me even if some behavior
> might end up being confusing. In which case we can easily fix
> CASSANDRA-10715 and deprectate support for col = NULL/col != NULL in LWT.
>
> What is people's opinion? Should we go for the three-valued logic
> everywhere? Should we try something else?
>
>
>
>
>
>

Re: [DISCUSS] NULL handling and the unfrozen collection issue

Reply via email to