Re: CEP-15 multi key transaction syntax

Avi Kivity via dev Mon, 22 Aug 2022 01:36:05 -0700

Agree it's better to reuse existing syntax than invent new syntax.


On 8/21/22 16:52, Konstantin Osipov wrote:

* Avi Kivity via dev <[email protected]> [22/08/14 15:59]:

MySQL supports SELECT <expr_list> INTO <var_list> FROM ... WHERE
...

PostgreSQL supports pretty much the same syntax.

Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
MySQL/PostgreSQL SELECT ... INTO?

On 14/08/2022 01.29, Benedict Elliott Smith wrote:


I’ll do my best to express with my thinking, as well as how I would
explain the feature to a user.

My mental model for LET statements is that they are simply SELECT
statements where the columns that are selected become variables
accessible anywhere in the scope of the transaction. That is to say, you
should be able to run something like s/LET/SELECT and
s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
and produce a valid SELECT statement, and vice versa. Both should
perform identically.

e.g.
SELECT pk AS key, v AS value FROM table

=>
LET key = pk, value = v FROM table


"=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
supports selecting comparisons:


$ psql
psql (14.3)
Type "help" for help.

avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
  ?column? | ?column? | ?column?
----------+----------+----------
  f        | t        |
(1 row)


Using "=" as a syntactic element in LET would make SELECT and LET
incompatible once comparisons become valid selectors. Unless they become
mandatory (and then you'd write "LET q = a = b" if you wanted to select a
comparison).


I personally prefer the nested query syntax:


     LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);


So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
immediately recognizable by everyone as a query, LET is not.

Identical form, identical behaviour. Every statement should be directly
translatable with some simple text manipulation.

We can then make this more powerful for users by simply expanding SELECT
statements, e.g. by permitting them to declare constants and tuples in
the column results. In this scheme LET x = * is simply syntactic sugar
for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
and 5 all at once, consistently alongside each other.

Option 6 is in fact very similar, but is strictly less flexible for the
user as they have no way to declare multiple scalar variables without
scoping them inside a tuple.

e.g.
LET key = pk, value = v FROM table
IF key > 1 AND value > 1 THEN...

=>
LET row = SELECT pk AS key, v AS value FROM table
IF row.key > 1 AND row.value > 1 THEN…

However, both are expressible in the existing proposal, as if you prefer
this naming scheme you can simply write

LET row = (pk AS key, v AS value) FROM table
IF row.key > 1 AND row.value > 1 THEN…

With respect to auto converting single column results to a scalar, we do
need a way for the user to say they care whether the row was null or the
column. I think an implicit conversion here could be surprising. However
we could implement tuple expressions anyway and let the user explicitly
declare v as a tuple as Caleb has suggested for the existing proposal as
well.

Assigning constants or other values not selected from a table would also
be a little clunky:

LET v1 = someFunc(), v2 = someOtherFunc(?)
IF v1 > 1 AND v2 > 1 THEN…

=>
LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
IF row.v1 > 1 AND row.v2 > 1 THEN...

That said, the proposals are /close/ to identical, it is just slightly
more verbose and slightly less flexible.

Which one would be most intuitive to users is hard to predict. It might
be that Option 6 would be slightly easier, but I’m unsure if there would
be a huge difference.

On 13 Aug 2022, at 16:59, Patrick McFadin <[email protected]> wrote:

I'm really happy to see CEP-15 getting closer to a final
implementation. I'm going to walk through my reasoning for your
proposals wrt trying to explain this to somebody new.

Looking at all the options, the first thing that comes up for me is
the Cassandra project's complicated relationship with NULL.  We have
prior art with EXISTS/NOT EXISTS when creating new tables. IS
NULL/IS NOT NULL is used in materialized views similarly to
proposals 2,4 and 5.

CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
   AS SELECT [ (column_list) ]
   FROM [keyspace_name.]table_name
   [ WHERE column_name IS NOT NULL
   [ AND column_name IS NOT NULL ... ] ]
   [ AND relation [ AND ... ] ]
   PRIMARY KEY ( column_list )
   [ WITH [ table_properties ]
   [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ;

  Based on that, I believe 1 and 3 would just confuse users, so -1 on
those.

Trying to explain the difference between row and column operations
with LET, I can't see the difference between a row and column in #2.

#4 introduces a boolean instead of column names and just adds more
syntax.

#5 is verbose and, in my opinion, easier to reason when writing a
query. Thinking top down, I need to know if these exact rows and/or
column values exist before changing them, so I'll define them first.
Then I'll iterate over the state I created in my actual changes so I
know I'm changing precisely what I want.

#5 could use a bit more to be clearer to somebody who doesn't write
CQL queries daily and wouldn't require memorizing subtle
differences. It should be similar to all the other syntax, so
learning a little about CQL will let you move into more without
completely re-learning the new syntax.

So I propose #6)
BEGIN TRANSACTION
LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
columns
LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0;
   SELECT row1, row2
IF row1 IS NULL AND row2.v = 3 THEN
   INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
END IF
COMMIT TRANSACTION

I added the SELECT in the LET just so it's straightforward, you are
reading, and it's just like doing a regular select, but you are
assigning it to a variable.

I removed the confusing 'row1.v'and replaced it with 'row1'I can't
see why you would need the '.v'vs having the complete variable I
created in the statement above.

EOL

Patrick

On Thu, Aug 11, 2022 at 1:37 PM Caleb Rackliffe
<[email protected]> wrote:

     ...and one more option...

     5.) Introduce tuple assignments, removing all ambiguity around
     row vs. column operations.

     BEGIN TRANSACTION
       LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
     columns
       LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0;
     SELECT row1.v, row2.v
       IF row1 IS NULL AND row2.v = 3 THEN
         INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
       END IF
     COMMIT TRANSACTION



     On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe
     <[email protected]> wrote:

         via Benedict, here is a 4th option:

         4.) Similar to #2, but don't rely on the key element being NULL.

         If the read returns no result, x effectively becomes NULL.
         Otherwise, it remains true/NOT NULL.

         BEGIN TRANSACTION
           LET x = true FROM ks.tbl WHERE k=0 AND c=0;
           LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
           SELECT x, row2_v
           IF x IS NULL AND row2_v = 3 THEN
             INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
           END IF
         COMMIT TRANSACTION

         On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe
         <[email protected]> wrote:

             Hello again everyone!

             I've been working on a prototype
             <https://issues.apache.org/jira/browse/CASSANDRA-17719> in
             CASSANDRA-17719 for a grammar that roughly corresponds to
             what we've agreed on in this thread. One thing that isn't
             immediately obvious to me is how the LET syntax handles
             cases where we want to check for the plain existence of a
             row in IF. For example, in this hybrid of the originally
             proposed syntax and something more like what we've agreed
             on (and the RETURNING just to distinguish between that
             and SELECT), this could be pretty straightforward:

             BEGIN TRANSACTION
               SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
               SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
               RETURNING row1.v, row2.v
               IF row1 NOT EXISTS AND row2.v = 3 THEN
                 INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
               END IF
             COMMIT TRANSACTION

             The NOT EXISTS operator has row1 to work with. One the
             other hand, w/ the LET syntax and no naming of reads,
             it's not clear what the best solution would be. Here are
             a few possibilities:

             1.) Provide a few built-in functions that operate on a
             whole result row. If we assume a SQL style IS NULL and IS
             NOT NULL (see my last post here) for operations on
             particular columns, this probably eliminates the need for
             EXISTS/NOT EXISTS as well.

             BEGIN TRANSACTION
               LET row1_missing = notExists() FROM ks.tbl WHERE k=0
             AND c=0;
               LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
               SELECT row1_missing, row2_v
               IF row1_missing AND row2_v = 3 THEN
                 INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
               END IF
             COMMIT TRANSACTION

             2.) Assign and check the first primary key element to
             determine whether the row exists.

             BEGIN TRANSACTION
               LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0;
               LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
               SELECT row1_k, row2_v
               IF row1_k IS NULL AND row2_v = 3 THEN
                 INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
               END IF
             COMMIT TRANSACTION

             3.) Reconsider the LET concept toward something that
             allows us to explicitly name our reads again.

             BEGIN TRANSACTION
               WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0) AS row1;
               WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0) AS row2;
               SELECT row1.v, row2.v
               IF row1 NOT EXISTS AND row2.v = 3 THEN
                 INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
               END IF
             COMMIT TRANSACTION

             I don't have a strong affinity for any of these, although
             #1 seems the most awkward.

             Does anyone have any other alternatives? Preference for
             one of the above options?

             Thanks!

             On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe
             <[email protected]> wrote:

                 Avi brought up an interesting point around NULLness
                 checking inCASSANDRA-17762
                 <https://issues.apache.org/jira/browse/CASSANDRA-17762>...

                     In SQL, any comparison with NULL is NULL, which
                     is interpreted as FALSE in a condition. To test
                     for NULLness, you use IS NULL or IS NOT NULL. But
                     LWT uses IF col = NULL as a NULLness test. This
                     is likely to confuse people coming from SQL and
                     hamper attempts to extend the dialect.


                 We can leave that Jira open to address what to do in
                 the legacy LWT case, but I'd support a SQL-congruent
                 syntax here (IS NULL or IS NOT NULL), where we have
                 something closer to a blank slate.

                 Thoughts?

                 On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky
                 <[email protected]> wrote:

                     The new syntax looks great, and I’m really
                     excited to see this coming together.

                     One piece of feedback on the proposed syntax is
                     around the use of “=“ as a declaration in
                     addition to its current use as an equality
                     operator in a WHERE clause and an assignment
                     operator in an UPDATE:

                         BEGIN TRANSACTION
                           LET car_miles = miles_driven,
                         car_is_running = is_running FROM cars WHERE
                         model=’pinto’
                           LET user_miles = miles_driven FROM users
                         WHERE name=’blake’
                           SELECT something else from some other table
                           IF NOT car_is_running THEN ABORT
                           UPDATE users SET miles_driven = user_miles
                         + 30 WHERE name='blake';
                           UPDATE cars SET miles_driven = car_miles +
                         30 WHERE model='pinto';
                         COMMIT TRANSACTION

                     This is supported in languages like PL/pgSQL, but
                     in a normal SQL query kind of local declaration
                     is often expressed as an alias (SELECT col AS
                     new_col), subquery alias (SELECT col) t, or
                     common table expression (WITH t AS (SELECT col)).

                     Here’s an example of an alternative to the
                     proposed syntax that I’d find more readable:

                         BEGIN TRANSACTION
                           WITH car_miles, car_is_running AS (SELECT
                         miles_driven, is_running FROM cars WHERE
                         model=’pinto’),
                         user_miles AS (SELECT miles_driven FROM users
                         WHERE name=’blake’)
                           IF NOT car_is_running THEN ABORT
                           UPDATE users SET miles_driven = user_miles
                         + 30 WHERE name='blake';
                         UPDATE cars SET miles_driven = car_miles + 30
                         WHERE model='pinto';
                         COMMIT TRANSACTION

                     There’s also the option of naming the transaction
                     like a subquery, and supporting LET via AS (this
                     one I’m less sure about but wanted to propose
                     anyway):

                         BEGIN TRANSACTION t1
                           SELECT miles_driven AS t1.car_miles,
                         is_running AS t1.car_is_running FROM cars
                         WHERE model=’pinto’;
                           SELECT miles_driven AS t1.user_miles FROM
                         users WHERE name=’blake’;
                           IF NOT car_is_running THEN ABORT
                           UPDATE users SET miles_driven = user_miles
                         + 30 WHERE name='blake';
                           UPDATE cars SET miles_driven = car_miles +
                         30 WHERE model='pinto';
                         COMMIT TRANSACTION

                     This also has the benefit of resolving ambiguity
                     in case of naming conflicts with existing (or
                     future) column names.

                     --
                     Abe

Re: CEP-15 multi key transaction syntax

Reply via email to