Re: Constraint's "not null" alignment with transactions and their simplification

Bernardo Botella Mon, 14 Apr 2025 12:14:26 -0700

Now this is becoming a really interesting discussion. Thanks everyone for 
pitching in!

Here is my take on some of the proposed changes:

We are talking about treating some constraints (NOT_NULL, JSON) as special 
cases by omitting the CHECK keyword (not reserved as per current 
implementation). Now, while this may seem like a nice approach to feature gaps 
on our CQL, it really worries me that by doing so we open the door to not 
needed complexity both at implementation and conceptually with the constraints 
framework. 

In my mind, what’s the constraints framework? It is a simple and really easy to 
extend integration point for validators for a row. (LENGHT, SCALAR, REGEX, are 
really good examples of it).

What’s NOT the responsibility of the constraints framework? I don’t think this 
should be used to deliver partial solutions to feature gaps on CQL data 
modeling. Let’s take JSON constraint as an example. In the constraints case, it 
is as simple as checking that the provided string is valid json. Easy. Simple. 
But, how would JSON look like if it was a first class citizen in CQL? Setting 
the grammar aside, it would be handled differently probably. Things like: Can 
we store it better? Do we allow making queries for fields inside the json blob? 
Are there any optimizations that can be done when serializing/deserializing it? 
All of those definitely fall out of the scope of the constraints framework. So, 
I guess the question then becomes, is the JSON constraint a valid constraint to 
have? Just a temporal patch until (if) JSON type is in? Should we just remove 
it and keep ignoring JSON? Those are valid questions and discussions to have. 
But, I really think that we shouldn’t see this simple validator as a full 
fledged, first class citizen, type in CQL. Similar arguments could be have for 
the NOT_NULL constraint that has spawned so many interesting conversations.

Now, having made that distinction, I don’t think we should have constraints 
that can be defined differently on the CQL statement. They should all have a 
CHECK keyword, specifying that they are a constraint that will be checked (aka, 
row value will be validated against whatever function). That’s easy to 
identify, and it’s conceptually easy to understand the limitations it comes 
with (as opposed to the JSON example mentioned above).

Bernardo

> On Apr 14, 2025, at 10:53 AM, Štefan Miklošovič <smikloso...@apache.org> 
> wrote:
> 
> As Yifan already said, "check" is not a reserved word now and its usage does 
> not collide with anything. 
> 
> If people have columns, tables, keyspaces with name "check" that seems to 
> work already so they don't need to do anything:
> 
> CREATE TABLE ks.tb (id int check id > 0, val int check val > 0, primary key 
> (id));
> 
> ALTER TABLE ks.tb ADD check int check check > 0;
> 
> DESCRIBE ks.tb;
> 
> CREATE TABLE ks.tb (
>     id int CHECK id > 0 PRIMARY KEY,
>     check int CHECK check > 0,
>     val int CHECK val > 0
> ) ....
> 
> CREATE TABLE ks.check (id int check id > 0, check int check check > 0, 
> primary key (id));
> CREATE KEYSPACE check WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> CREATE TABLE check.check (check int check check > 0, val int check val > 0, 
> primary key (check));
> INSERT INTO check.check (check , val ) VALUES ( 1, 1);
> 
> PostgreSQL has this:
> 
> CREATE TABLE products (
>     product_no integer,
>     name text,
>     price numeric CHECK (price > 0)
> );
> 
> we follow this approach (minus parenthesis). We can also chain constraints 
> whatever we like
> 
> val int CHECK val > 0 and age < 100
> 
> We can make a stab in trying to model
> 
> val int not null check val > 0
> 
> this is how PostgreSQL has it (1).
> 
> but that would be more complicated on the implementation side because we 
> would need to also accommodate "CQL describe" to dump it like that, plus I am 
> not sure how complicated it would be to tweak the parser as well.
> 
> I will try to make some progress and will report back.
> 
> Regards  
> 
> (1) 
> https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-NOT-NULL
> 
> On Sun, Apr 13, 2025 at 6:49 PM Dinesh Joshi <djo...@apache.org 
> <mailto:djo...@apache.org>> wrote:
>> On Sun, Apr 13, 2025 at 9:24 AM Patrick McFadin <pmcfa...@gmail.com 
>> <mailto:pmcfa...@gmail.com>> wrote:
>>> I'm loving all the syntax discussion lately. It's a good debate and 
>>> essential for the project's future with a good developer experience.
>> 
>> +1
>>  
>>> On NULL. I've been asked this a million times by end users. Why is there no 
>>> "NOT NULL" in the schema?
>> 
>> I would've expected this to be in billions by now ;)
>>  
>>> I'm in favor of the standard SQL syntax here because it's what users have 
>>> been using forever: 
>>> name       text NOT NULL
>> 
>> I hold a weak opinion on this. We don't have to necessarily align on to the 
>> standard SQL syntax. In my experience, users subconsciously feel Cassandra 
>> is a SQL database and try to design their schema to fit the traditional SQL 
>> / RDBMS design and then later are disappointed to find out it doesn't have 
>> joins or foreign key constraints. But that's just my personal experience 
>> working with users. However, I think we should strive for consistency and if 
>> it aligns with SQL I have no issues with the syntax. Just sharing my 
>> experience.
>>  
>>> On CHECK. Also in favor of making this a reserved word but in context. 
>>> Namely, how Postgres SQL works. CHECK ( boolean_expression_on_column)
>> 
>> Making CHECK a reserved keyword may result in issues that Scott described. 
>> It will present a substantial barrier for users to upgrade as applications 
>> will have to be updated.
>> 
>> Thanks,
>> 
>> Dinesh

Re: Constraint's "not null" alignment with transactions and their simplification

Reply via email to