Hi everyone,

Stefan Miklosovic and I have been working on a NOT_NULL 
(https://github.com/apache/cassandra/pull/3867) constraint to be added to the 
constraints tool belt, and a really interesting conversation came up.

First, as a problem statement, let's consider this:

-----------------------------------------
CREATE TABLE ks.tb2 (
    id int,
    cl1 int,
    cl2 int,
    val text CHECK NOT_NULL(val),
    PRIMARY KEY (id, cl1, cl2)
) 

cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, null);
InvalidRequest: Error from server: code=2200 [Invalid query] message="Column 
value does not satisfy value constraint for column 'val' as it is null."

cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2, val) VALUES ( 1, 2, 3, 
“text");
cassandra@cqlsh> select * from ks.tb2;

 id | cl1 | cl2 | val
----+-----+-----+------
  1 |   2 |   3 | text

(1 rows)
cassandra@cqlsh> INSERT INTO ks.tb2 (id, cl1, cl2) VALUES ( 1, 2, 4);
cassandra@cqlsh> select * from ks.tb2;

 id | cl1 | cl2 | val
----+-----+-----+------
  1 |   2 |   3 | text
  1 |   2 |   4 | null

-----------------------------------------

As you see, we have a hole in which a 'null' value is getting written on column 
val even if we have a NOT_NULL on that particular column whenever the column is 
NOT specified on the write. That raises the question on how this particular 
constraint should behave.

If we consider the other constraints (scalar constraint and length constraint 
so far), this particular behavior is fine. But, if the constraint is NOT_NULL, 
then it becomes a little bit trickier.

The conclusions we have reached is that the meaning of constraints should be 
interpreted like: I check whatever you give me as part of the write, ignoring 
everything else. Let me elaborate:
If we decide to treat this particular NOT_NULL constraint differently, and 
check if the value for that column is present in the insert statement, we then 
open a different can of worms. What happens if the row already exists with a 
valid value, and that insert statement is only trying to do an update to a 
different column in the row? If that was the case, we would be forcing the user 
to specify the 'val' column value for every update, even if it is not needed. 

Mainly for this reason, we think it is better to treat this NOT_NULL constraint 
just like the other constraints, and execute it ONLY on the values that are 
present on the insert statement.

The main con is that it may lead to a little bit of confussion (as in, why I 
just added a null value to the table even if I have a NOT_NULL constraint?). We 
have thought on aliviating this particular confusion by:
- Extensive documentation. Let's be upfront on what this constraint does and 
does not. 
(https://github.com/apache/cassandra/blob/ed58c404e8c880b69584e71a3690d3d9f73ef9fa/doc/modules/cassandra/pages/developing/cql/constraints.adoc#not_null-constraint)
- Adding, as part of this patch, yet another constraint (STRICTLY_NOT_NULL), 
that checks for the actual column value to be present in the insert statement..

If you've made it until here, that means you are really interested in 
constraints. Thanks! The question for you is, would you have any concern with 
this approach?

Thanks,
Bernardo

Reply via email to