Re: CEP-15 multi key transaction syntax

Benedict Sun, 21 Aug 2022 07:45:03 -0700

> On 21 Aug 2022, at 14:59, Benedict <bened...@apache.org> wrote:
> 
> SELECT INTO in T-SQL creates a new table with the results. Since our 
> semantics are likely to be different than Postgres and MySQL, I’m not sure 
> it’s less confusing or otherwise beneficial to mimic an existing syntax.
> 
> Personally I find the LET syntax easier to read, and where ANSI SQL isn’t 
> prescriptive it may be better to aim for a more modern look.
> 
> 
> 
>> On 21 Aug 2022, at 14:53, Konstantin Osipov <kostja.osi...@gmail.com> wrote:
>> 
>> * Avi Kivity via dev <dev@cassandra.apache.org> [22/08/14 15:59]:
>> 
>> MySQL supports SELECT <expr_list> INTO <var_list> FROM ... WHERE
>> ...
>> 
>> PostgreSQL supports pretty much the same syntax.
>> 
>> Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
>> MySQL/PostgreSQL SELECT ... INTO?
>> 
>>> 
>>>>> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>>>>> 
>>>>> I’ll do my best to express with my thinking, as well as how I would
>>>>> explain the feature to a user.
>>>>> 
>>>>> My mental model for LET statements is that they are simply SELECT
>>>>> statements where the columns that are selected become variables
>>>>> accessible anywhere in the scope of the transaction. That is to say, you
>>>>> should be able to run something like s/LET/SELECT and
>>>>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
>>>>> and produce a valid SELECT statement, and vice versa. Both should
>>>>> perform identically.
>>>>> 
>>>>> e.g.
>>>>> SELECT pk AS key, v AS value FROM table
>>>>> 
>>>>> =>
>>>>> LET key = pk, value = v FROM table
>>> 
>>> 
>>> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
>>> supports selecting comparisons:
>>> 
>>> 
>>> $ psql
>>> psql (14.3)
>>> Type "help" for help.
>>> 
>>> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>>> ?column? | ?column? | ?column?
>>> ----------+----------+----------
>>> f        | t        |
>>> (1 row)
>>> 
>>> 
>>> Using "=" as a syntactic element in LET would make SELECT and LET
>>> incompatible once comparisons become valid selectors. Unless they become
>>> mandatory (and then you'd write "LET q = a = b" if you wanted to select a
>>> comparison).
>>> 
>>> 
>>> I personally prefer the nested query syntax:
>>> 
>>> 
>>>    LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>>> 
>>> 
>>> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
>>> immediately recognizable by everyone as a query, LET is not.
>>> 
>>> 
>>>> 
>>>> Identical form, identical behaviour. Every statement should be directly
>>>> translatable with some simple text manipulation.
>>>> 
>>>> We can then make this more powerful for users by simply expanding SELECT
>>>> statements, e.g. by permitting them to declare constants and tuples in
>>>> the column results. In this scheme LET x = * is simply syntactic sugar
>>>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
>>>> and 5 all at once, consistently alongside each other.
>>>> 
>>>> Option 6 is in fact very similar, but is strictly less flexible for the
>>>> user as they have no way to declare multiple scalar variables without
>>>> scoping them inside a tuple.
>>>> 
>>>> e.g.
>>>> LET key = pk, value = v FROM table
>>>> IF key > 1 AND value > 1 THEN...
>>>> 
>>>> =>
>>>> LET row = SELECT pk AS key, v AS value FROM table
>>>> IF row.key > 1 AND row.value > 1 THEN…
>>>> 
>>>> However, both are expressible in the existing proposal, as if you prefer
>>>> this naming scheme you can simply write
>>>> 
>>>> LET row = (pk AS key, v AS value) FROM table
>>>> IF row.key > 1 AND row.value > 1 THEN…
>>>> 
>>>> With respect to auto converting single column results to a scalar, we do
>>>> need a way for the user to say they care whether the row was null or the
>>>> column. I think an implicit conversion here could be surprising. However
>>>> we could implement tuple expressions anyway and let the user explicitly
>>>> declare v as a tuple as Caleb has suggested for the existing proposal as
>>>> well.
>>>> 
>>>> Assigning constants or other values not selected from a table would also
>>>> be a little clunky:
>>>> 
>>>> LET v1 = someFunc(), v2 = someOtherFunc(?)
>>>> IF v1 > 1 AND v2 > 1 THEN…
>>>> 
>>>> =>
>>>> LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
>>>> IF row.v1 > 1 AND row.v2 > 1 THEN...
>>>> 
>>>> That said, the proposals are /close/ to identical, it is just slightly
>>>> more verbose and slightly less flexible.
>>>> 
>>>> Which one would be most intuitive to users is hard to predict. It might
>>>> be that Option 6 would be slightly easier, but I’m unsure if there would
>>>> be a huge difference.
>>>> 
>>>> 
>>>>> On 13 Aug 2022, at 16:59, Patrick McFadin <pmcfa...@gmail.com> wrote:
>>>>> 
>>>>> I'm really happy to see CEP-15 getting closer to a final
>>>>> implementation. I'm going to walk through my reasoning for your
>>>>> proposals wrt trying to explain this to somebody new.
>>>>> 
>>>>> Looking at all the options, the first thing that comes up for me is
>>>>> the Cassandra project's complicated relationship with NULL.  We have
>>>>> prior art with EXISTS/NOT EXISTS when creating new tables. IS
>>>>> NULL/IS NOT NULL is used in materialized views similarly to
>>>>> proposals 2,4 and 5.
>>>>> 
>>>>> CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
>>>>>  AS SELECT [ (column_list) ]
>>>>>  FROM [keyspace_name.]table_name
>>>>>  [ WHERE column_name IS NOT NULL
>>>>>  [ AND column_name IS NOT NULL ... ] ]
>>>>>  [ AND relation [ AND ... ] ]
>>>>>  PRIMARY KEY ( column_list )
>>>>>  [ WITH [ table_properties ]
>>>>>  [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ;
>>>>> 
>>>>> Based on that, I believe 1 and 3 would just confuse users, so -1 on
>>>>> those.
>>>>> 
>>>>> Trying to explain the difference between row and column operations
>>>>> with LET, I can't see the difference between a row and column in #2.
>>>>> 
>>>>> #4 introduces a boolean instead of column names and just adds more
>>>>> syntax.
>>>>> 
>>>>> #5 is verbose and, in my opinion, easier to reason when writing a
>>>>> query. Thinking top down, I need to know if these exact rows and/or
>>>>> column values exist before changing them, so I'll define them first.
>>>>> Then I'll iterate over the state I created in my actual changes so I
>>>>> know I'm changing precisely what I want.
>>>>> 
>>>>> #5 could use a bit more to be clearer to somebody who doesn't write
>>>>> CQL queries daily and wouldn't require memorizing subtle
>>>>> differences. It should be similar to all the other syntax, so
>>>>> learning a little about CQL will let you move into more without
>>>>> completely re-learning the new syntax.
>>>>> 
>>>>> So I propose #6)
>>>>> BEGIN TRANSACTION
>>>>> LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
>>>>> columns
>>>>> LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0;
>>>>>  SELECT row1, row2
>>>>> IF row1 IS NULL AND row2.v = 3 THEN
>>>>>  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>> END IF
>>>>> COMMIT TRANSACTION
>>>>> 
>>>>> I added the SELECT in the LET just so it's straightforward, you are
>>>>> reading, and it's just like doing a regular select, but you are
>>>>> assigning it to a variable.
>>>>> 
>>>>> I removed the confusing 'row1.v'and replaced it with 'row1'I can't
>>>>> see why you would need the '.v'vs having the complete variable I
>>>>> created in the statement above.
>>>>> 
>>>>> EOL
>>>>> 
>>>>> Patrick
>>>>> 
>>>>> On Thu, Aug 11, 2022 at 1:37 PM Caleb Rackliffe
>>>>> <calebrackli...@gmail.com> wrote:
>>>>> 
>>>>>   ...and one more option...
>>>>> 
>>>>>   5.) Introduce tuple assignments, removing all ambiguity around
>>>>>   row vs. column operations.
>>>>> 
>>>>>   BEGIN TRANSACTION
>>>>>     LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
>>>>>   columns
>>>>>     LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0;
>>>>>   SELECT row1.v, row2.v
>>>>>     IF row1 IS NULL AND row2.v = 3 THEN
>>>>>       INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>>     END IF
>>>>>   COMMIT TRANSACTION
>>>>> 
>>>>> 
>>>>> 
>>>>>   On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe
>>>>>   <calebrackli...@gmail.com> wrote:
>>>>> 
>>>>>       via Benedict, here is a 4th option:
>>>>> 
>>>>>       4.) Similar to #2, but don't rely on the key element being NULL.
>>>>> 
>>>>>       If the read returns no result, x effectively becomes NULL.
>>>>>       Otherwise, it remains true/NOT NULL.
>>>>> 
>>>>>       BEGIN TRANSACTION
>>>>>         LET x = true FROM ks.tbl WHERE k=0 AND c=0;
>>>>>         LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>>>>         SELECT x, row2_v
>>>>>         IF x IS NULL AND row2_v = 3 THEN
>>>>>           INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>>         END IF
>>>>>       COMMIT TRANSACTION
>>>>> 
>>>>>       On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe
>>>>>       <calebrackli...@gmail.com> wrote:
>>>>> 
>>>>>           Hello again everyone!
>>>>> 
>>>>>           I've been working on a prototype
>>>>>           <https://issues.apache.org/jira/browse/CASSANDRA-17719> in
>>>>>           CASSANDRA-17719 for a grammar that roughly corresponds to
>>>>>           what we've agreed on in this thread. One thing that isn't
>>>>>           immediately obvious to me is how the LET syntax handles
>>>>>           cases where we want to check for the plain existence of a
>>>>>           row in IF. For example, in this hybrid of the originally
>>>>>           proposed syntax and something more like what we've agreed
>>>>>           on (and the RETURNING just to distinguish between that
>>>>>           and SELECT), this could be pretty straightforward:
>>>>> 
>>>>>           BEGIN TRANSACTION
>>>>>             SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
>>>>>             SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
>>>>>             RETURNING row1.v, row2.v
>>>>>             IF row1 NOT EXISTS AND row2.v = 3 THEN
>>>>>               INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>>             END IF
>>>>>           COMMIT TRANSACTION
>>>>> 
>>>>>           The NOT EXISTS operator has row1 to work with. One the
>>>>>           other hand, w/ the LET syntax and no naming of reads,
>>>>>           it's not clear what the best solution would be. Here are
>>>>>           a few possibilities:
>>>>> 
>>>>>           1.) Provide a few built-in functions that operate on a
>>>>>           whole result row. If we assume a SQL style IS NULL and IS
>>>>>           NOT NULL (see my last post here) for operations on
>>>>>           particular columns, this probably eliminates the need for
>>>>>           EXISTS/NOT EXISTS as well.
>>>>> 
>>>>>           BEGIN TRANSACTION
>>>>>             LET row1_missing = notExists() FROM ks.tbl WHERE k=0
>>>>>           AND c=0;
>>>>>             LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>>>>             SELECT row1_missing, row2_v
>>>>>             IF row1_missing AND row2_v = 3 THEN
>>>>>               INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>>             END IF
>>>>>           COMMIT TRANSACTION
>>>>> 
>>>>>           2.) Assign and check the first primary key element to
>>>>>           determine whether the row exists.
>>>>> 
>>>>>           BEGIN TRANSACTION
>>>>>             LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0;
>>>>>             LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>>>>             SELECT row1_k, row2_v
>>>>>             IF row1_k IS NULL AND row2_v = 3 THEN
>>>>>               INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>>             END IF
>>>>>           COMMIT TRANSACTION
>>>>> 
>>>>>           3.) Reconsider the LET concept toward something that
>>>>>           allows us to explicitly name our reads again.
>>>>> 
>>>>>           BEGIN TRANSACTION
>>>>>             WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0) AS row1;
>>>>>             WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0) AS row2;
>>>>>             SELECT row1.v, row2.v
>>>>>             IF row1 NOT EXISTS AND row2.v = 3 THEN
>>>>>               INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>>             END IF
>>>>>           COMMIT TRANSACTION
>>>>> 
>>>>>           I don't have a strong affinity for any of these, although
>>>>>           #1 seems the most awkward.
>>>>> 
>>>>>           Does anyone have any other alternatives? Preference for
>>>>>           one of the above options?
>>>>> 
>>>>>           Thanks!
>>>>> 
>>>>>           On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe
>>>>>           <calebrackli...@gmail.com> wrote:
>>>>> 
>>>>>               Avi brought up an interesting point around NULLness
>>>>>               checking inCASSANDRA-17762
>>>>>               <https://issues.apache.org/jira/browse/CASSANDRA-17762>...
>>>>> 
>>>>>                   In SQL, any comparison with NULL is NULL, which
>>>>>                   is interpreted as FALSE in a condition. To test
>>>>>                   for NULLness, you use IS NULL or IS NOT NULL. But
>>>>>                   LWT uses IF col = NULL as a NULLness test. This
>>>>>                   is likely to confuse people coming from SQL and
>>>>>                   hamper attempts to extend the dialect.
>>>>> 
>>>>> 
>>>>>               We can leave that Jira open to address what to do in
>>>>>               the legacy LWT case, but I'd support a SQL-congruent
>>>>>               syntax here (IS NULL or IS NOT NULL), where we have
>>>>>               something closer to a blank slate.
>>>>> 
>>>>>               Thoughts?
>>>>> 
>>>>>               On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky
>>>>>               <a...@aber.io> wrote:
>>>>> 
>>>>>                   The new syntax looks great, and I’m really
>>>>>                   excited to see this coming together.
>>>>> 
>>>>>                   One piece of feedback on the proposed syntax is
>>>>>                   around the use of “=“ as a declaration in
>>>>>                   addition to its current use as an equality
>>>>>                   operator in a WHERE clause and an assignment
>>>>>                   operator in an UPDATE:
>>>>> 
>>>>>                       BEGIN TRANSACTION
>>>>>                         LET car_miles = miles_driven,
>>>>>                       car_is_running = is_running FROM cars WHERE
>>>>>                       model=’pinto’
>>>>>                         LET user_miles = miles_driven FROM users
>>>>>                       WHERE name=’blake’
>>>>>                         SELECT something else from some other table
>>>>>                         IF NOT car_is_running THEN ABORT
>>>>>                         UPDATE users SET miles_driven = user_miles
>>>>>                       + 30 WHERE name='blake';
>>>>>                         UPDATE cars SET miles_driven = car_miles +
>>>>>                       30 WHERE model='pinto';
>>>>>                       COMMIT TRANSACTION
>>>>> 
>>>>>                   This is supported in languages like PL/pgSQL, but
>>>>>                   in a normal SQL query kind of local declaration
>>>>>                   is often expressed as an alias (SELECT col AS
>>>>>                   new_col), subquery alias (SELECT col) t, or
>>>>>                   common table expression (WITH t AS (SELECT col)).
>>>>> 
>>>>>                   Here’s an example of an alternative to the
>>>>>                   proposed syntax that I’d find more readable:
>>>>> 
>>>>>                       BEGIN TRANSACTION
>>>>>                         WITH car_miles, car_is_running AS (SELECT
>>>>>                       miles_driven, is_running FROM cars WHERE
>>>>>                       model=’pinto’),
>>>>>                       user_miles AS (SELECT miles_driven FROM users
>>>>>                       WHERE name=’blake’)
>>>>>                         IF NOT car_is_running THEN ABORT
>>>>>                         UPDATE users SET miles_driven = user_miles
>>>>>                       + 30 WHERE name='blake';
>>>>>                       UPDATE cars SET miles_driven = car_miles + 30
>>>>>                       WHERE model='pinto';
>>>>>                       COMMIT TRANSACTION
>>>>> 
>>>>>                   There’s also the option of naming the transaction
>>>>>                   like a subquery, and supporting LET via AS (this
>>>>>                   one I’m less sure about but wanted to propose
>>>>>                   anyway):
>>>>> 
>>>>>                       BEGIN TRANSACTION t1
>>>>>                         SELECT miles_driven AS t1.car_miles,
>>>>>                       is_running AS t1.car_is_running FROM cars
>>>>>                       WHERE model=’pinto’;
>>>>>                         SELECT miles_driven AS t1.user_miles FROM
>>>>>                       users WHERE name=’blake’;
>>>>>                         IF NOT car_is_running THEN ABORT
>>>>>                         UPDATE users SET miles_driven = user_miles
>>>>>                       + 30 WHERE name='blake';
>>>>>                         UPDATE cars SET miles_driven = car_miles +
>>>>>                       30 WHERE model='pinto';
>>>>>                       COMMIT TRANSACTION
>>>>> 
>>>>>                   This also has the benefit of resolving ambiguity
>>>>>                   in case of naming conflicts with existing (or
>>>>>                   future) column names.
>>>>> 
>>>>>                   --
>>>>>                   Abe
>>>>> 
>>>> 
>> 
>> -- 
>> Konstantin Osipov, Moscow, Russia
Re: CEP-15 multi key transaction syntax

Reply via email to