Re: CEP-15 multi key transaction syntax

Patrick McFadin Wed, 21 Sep 2022 12:56:40 -0700

I'm also working on different use cases and syntax for Accord :D

I'm +1 on this change and leaving the door open for maybe a few more as we
test this out. It needs to be functionally useful for developers in v1, and
I think it's worth the changes to get it right.


One other thing Caleb and I have been discussing is how, when running a
transaction, the statement returns with no message. In CQLSH you have no
idea if anything happened unless you select from the tables and look for
changes. Even something like LWT adds with "applied=true|false"

Patrick

On Wed, Sep 21, 2022 at 12:42 PM David Capwell <[email protected]> wrote:

> Caleb is making great progress on this, and I have been working on CQL
> fuzz testing the new grammar to make sure we flesh out cases quickly; one
> thing we hit was about mixing conditional and non-conditional updates; will
> use a example to better show
>
> BEGIN TRANSACTION
>   LET a = (SELECT * FROM ….);
>   IF a IS NOT NULL THEN
>     UPDATE …;
>   END IF
>   INSERT INTO ...
> COMMIT TRANSACTION
>
> In this case we have 1 UPDATE tied to the IF condition, and one INSERT
> that isn’t… for v1 do we need/want to support this, or is it best for v1 to
> be simple and have all updates tied to conditional when present?
>
> On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev <[email protected]>
> wrote:
>
> I wasn't referring to specific syntax but to the concept. If a SQL dialect
> (or better, the standard) has a way to select data into a variable, let's
> adopt it.
>
> If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is
> my preference.
>
> On 8/22/22 19:13, Patrick McFadin wrote:
>
> The replies got trashed pretty badly in the responses.
> When you say: "Agree it's better to reuse existing syntax than invent new
> syntax."
>
> Which syntax are you referring to?
>
> Patrick
>
>
> On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev <
> [email protected]> wrote:
>
>> Agree it's better to reuse existing syntax than invent new syntax.
>>
>> On 8/21/22 16:52, Konstantin Osipov wrote:
>> > * Avi Kivity via dev <[email protected]> [22/08/14 15:59]:
>> >
>> > MySQL supports SELECT <expr_list> INTO <var_list> FROM ... WHERE
>> > ...
>> >
>> > PostgreSQL supports pretty much the same syntax.
>> >
>> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
>> > MySQL/PostgreSQL SELECT ... INTO?
>> >
>> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>> >>> 
>> >>> I’ll do my best to express with my thinking, as well as how I would
>> >>> explain the feature to a user.
>> >>>
>> >>> My mental model for LET statements is that they are simply SELECT
>> >>> statements where the columns that are selected become variables
>> >>> accessible anywhere in the scope of the transaction. That is to say,
>> you
>> >>> should be able to run something like s/LET/SELECT and
>> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
>> >>> and produce a valid SELECT statement, and vice versa. Both should
>> >>> perform identically.
>> >>>
>> >>> e.g.
>> >>> SELECT pk AS key, v AS value FROM table
>> >>>
>> >>> =>
>> >>> LET key = pk, value = v FROM table
>> >>
>> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
>> >> supports selecting comparisons:
>> >>
>> >>
>> >> $ psql
>> >> psql (14.3)
>> >> Type "help" for help.
>> >>
>> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>> >>   ?column? | ?column? | ?column?
>> >> ----------+----------+----------
>> >>   f        | t        |
>> >> (1 row)
>> >>
>> >>
>> >> Using "=" as a syntactic element in LET would make SELECT and LET
>> >> incompatible once comparisons become valid selectors. Unless they
>> become
>> >> mandatory (and then you'd write "LET q = a = b" if you wanted to
>> select a
>> >> comparison).
>> >>
>> >>
>> >> I personally prefer the nested query syntax:
>> >>
>> >>
>> >>      LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>> >>
>> >>
>> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
>> >> immediately recognizable by everyone as a query, LET is not.
>> >>
>> >>
>> >>> Identical form, identical behaviour. Every statement should be
>> directly
>> >>> translatable with some simple text manipulation.
>> >>>
>> >>> We can then make this more powerful for users by simply expanding
>> SELECT
>> >>> statements, e.g. by permitting them to declare constants and tuples in
>> >>> the column results. In this scheme LET x = * is simply syntactic sugar
>> >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
>> >>> and 5 all at once, consistently alongside each other.
>> >>>
>> >>> Option 6 is in fact very similar, but is strictly less flexible for
>> the
>> >>> user as they have no way to declare multiple scalar variables without
>> >>> scoping them inside a tuple.
>> >>>
>> >>> e.g.
>> >>> LET key = pk, value = v FROM table
>> >>> IF key > 1 AND value > 1 THEN...
>> >>>
>> >>> =>
>> >>> LET row = SELECT pk AS key, v AS value FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> However, both are expressible in the existing proposal, as if you
>> prefer
>> >>> this naming scheme you can simply write
>> >>>
>> >>> LET row = (pk AS key, v AS value) FROM table
>> >>> IF row.key > 1 AND row.value > 1 THEN…
>> >>>
>> >>> With respect to auto converting single column results to a scalar, we
>> do
>> >>> need a way for the user to say they care whether the row was null or
>> the
>> >>> column. I think an implicit conversion here could be surprising.
>> However
>> >>> we could implement tuple expressions anyway and let the user
>> explicitly
>> >>> declare v as a tuple as Caleb has suggested for the existing proposal
>> as
>> >>> well.
>> >>>
>> >>> Assigning constants or other values not selected from a table would
>> also
>> >>> be a little clunky:
>> >>>
>> >>> LET v1 = someFunc(), v2 = someOtherFunc(?)
>> >>> IF v1 > 1 AND v2 > 1 THEN…
>> >>>
>> >>> =>
>> >>> LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
>> >>> IF row.v1 > 1 AND row.v2 > 1 THEN...
>> >>>
>> >>> That said, the proposals are /close/ to identical, it is just slightly
>> >>> more verbose and slightly less flexible.
>> >>>
>> >>> Which one would be most intuitive to users is hard to predict. It
>> might
>> >>> be that Option 6 would be slightly easier, but I’m unsure if there
>> would
>> >>> be a huge difference.
>> >>>
>> >>>
>> >>>> On 13 Aug 2022, at 16:59, Patrick McFadin <[email protected]>
>> wrote:
>> >>>>
>> >>>> I'm really happy to see CEP-15 getting closer to a final
>> >>>> implementation. I'm going to walk through my reasoning for your
>> >>>> proposals wrt trying to explain this to somebody new.
>> >>>>
>> >>>> Looking at all the options, the first thing that comes up for me is
>> >>>> the Cassandra project's complicated relationship with NULL.  We have
>> >>>> prior art with EXISTS/NOT EXISTS when creating new tables. IS
>> >>>> NULL/IS NOT NULL is used in materialized views similarly to
>> >>>> proposals 2,4 and 5.
>> >>>>
>> >>>> CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
>> >>>>    AS SELECT [ (column_list) ]
>> >>>>    FROM [keyspace_name.]table_name
>> >>>>    [ WHERE column_name IS NOT NULL
>> >>>>    [ AND column_name IS NOT NULL ... ] ]
>> >>>>    [ AND relation [ AND ... ] ]
>> >>>>    PRIMARY KEY ( column_list )
>> >>>>    [ WITH [ table_properties ]
>> >>>>    [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option)
>> ] ] ;
>> >>>>
>> >>>>   Based on that, I believe 1 and 3 would just confuse users, so -1 on
>> >>>> those.
>> >>>>
>> >>>> Trying to explain the difference between row and column operations
>> >>>> with LET, I can't see the difference between a row and column in #2.
>> >>>>
>> >>>> #4 introduces a boolean instead of column names and just adds more
>> >>>> syntax.
>> >>>>
>> >>>> #5 is verbose and, in my opinion, easier to reason when writing a
>> >>>> query. Thinking top down, I need to know if these exact rows and/or
>> >>>> column values exist before changing them, so I'll define them first.
>> >>>> Then I'll iterate over the state I created in my actual changes so I
>> >>>> know I'm changing precisely what I want.
>> >>>>
>> >>>> #5 could use a bit more to be clearer to somebody who doesn't write
>> >>>> CQL queries daily and wouldn't require memorizing subtle
>> >>>> differences. It should be similar to all the other syntax, so
>> >>>> learning a little about CQL will let you move into more without
>> >>>> completely re-learning the new syntax.
>> >>>>
>> >>>> So I propose #6)
>> >>>> BEGIN TRANSACTION
>> >>>> LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
>> >>>> columns
>> >>>> LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0;
>> >>>>    SELECT row1, row2
>> >>>> IF row1 IS NULL AND row2.v = 3 THEN
>> >>>>    INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>> END IF
>> >>>> COMMIT TRANSACTION
>> >>>>
>> >>>> I added the SELECT in the LET just so it's straightforward, you are
>> >>>> reading, and it's just like doing a regular select, but you are
>> >>>> assigning it to a variable.
>> >>>>
>> >>>> I removed the confusing 'row1.v'and replaced it with 'row1'I can't
>> >>>> see why you would need the '.v'vs having the complete variable I
>> >>>> created in the statement above.
>> >>>>
>> >>>> EOL
>> >>>>
>> >>>> Patrick
>> >>>>
>> >>>> On Thu, Aug 11, 2022 at 1:37 PM Caleb Rackliffe
>> >>>> <[email protected]> wrote:
>> >>>>
>> >>>>      ...and one more option...
>> >>>>
>> >>>>      5.) Introduce tuple assignments, removing all ambiguity around
>> >>>>      row vs. column operations.
>> >>>>
>> >>>>      BEGIN TRANSACTION
>> >>>>        LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
>> >>>>      columns
>> >>>>        LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0;
>> >>>>      SELECT row1.v, row2.v
>> >>>>        IF row1 IS NULL AND row2.v = 3 THEN
>> >>>>          INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>>        END IF
>> >>>>      COMMIT TRANSACTION
>> >>>>
>> >>>>
>> >>>>
>> >>>>      On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe
>> >>>>      <[email protected]> wrote:
>> >>>>
>> >>>>          via Benedict, here is a 4th option:
>> >>>>
>> >>>>          4.) Similar to #2, but don't rely on the key element being
>> NULL.
>> >>>>
>> >>>>          If the read returns no result, x effectively becomes NULL.
>> >>>>          Otherwise, it remains true/NOT NULL.
>> >>>>
>> >>>>          BEGIN TRANSACTION
>> >>>>            LET x = true FROM ks.tbl WHERE k=0 AND c=0;
>> >>>>            LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>> >>>>            SELECT x, row2_v
>> >>>>            IF x IS NULL AND row2_v = 3 THEN
>> >>>>              INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>>            END IF
>> >>>>          COMMIT TRANSACTION
>> >>>>
>> >>>>          On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe
>> >>>>          <[email protected]> wrote:
>> >>>>
>> >>>>              Hello again everyone!
>> >>>>
>> >>>>              I've been working on a prototype
>> >>>>              <https://issues.apache.org/jira/browse/CASSANDRA-17719
>> > in
>> >>>>              CASSANDRA-17719 for a grammar that roughly corresponds
>> to
>> >>>>              what we've agreed on in this thread. One thing that
>> isn't
>> >>>>              immediately obvious to me is how the LET syntax handles
>> >>>>              cases where we want to check for the plain existence of
>> a
>> >>>>              row in IF. For example, in this hybrid of the originally
>> >>>>              proposed syntax and something more like what we've
>> agreed
>> >>>>              on (and the RETURNING just to distinguish between that
>> >>>>              and SELECT), this could be pretty straightforward:
>> >>>>
>> >>>>              BEGIN TRANSACTION
>> >>>>                SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
>> >>>>                SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
>> >>>>                RETURNING row1.v, row2.v
>> >>>>                IF row1 NOT EXISTS AND row2.v = 3 THEN
>> >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>>                END IF
>> >>>>              COMMIT TRANSACTION
>> >>>>
>> >>>>              The NOT EXISTS operator has row1 to work with. One the
>> >>>>              other hand, w/ the LET syntax and no naming of reads,
>> >>>>              it's not clear what the best solution would be. Here are
>> >>>>              a few possibilities:
>> >>>>
>> >>>>              1.) Provide a few built-in functions that operate on a
>> >>>>              whole result row. If we assume a SQL style IS NULL and
>> IS
>> >>>>              NOT NULL (see my last post here) for operations on
>> >>>>              particular columns, this probably eliminates the need
>> for
>> >>>>              EXISTS/NOT EXISTS as well.
>> >>>>
>> >>>>              BEGIN TRANSACTION
>> >>>>                LET row1_missing = notExists() FROM ks.tbl WHERE k=0
>> >>>>              AND c=0;
>> >>>>                LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>> >>>>                SELECT row1_missing, row2_v
>> >>>>                IF row1_missing AND row2_v = 3 THEN
>> >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>>                END IF
>> >>>>              COMMIT TRANSACTION
>> >>>>
>> >>>>              2.) Assign and check the first primary key element to
>> >>>>              determine whether the row exists.
>> >>>>
>> >>>>              BEGIN TRANSACTION
>> >>>>                LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0;
>> >>>>                LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>> >>>>                SELECT row1_k, row2_v
>> >>>>                IF row1_k IS NULL AND row2_v = 3 THEN
>> >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>>                END IF
>> >>>>              COMMIT TRANSACTION
>> >>>>
>> >>>>              3.) Reconsider the LET concept toward something that
>> >>>>              allows us to explicitly name our reads again.
>> >>>>
>> >>>>              BEGIN TRANSACTION
>> >>>>                WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0) AS row1;
>> >>>>                WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0) AS row2;
>> >>>>                SELECT row1.v, row2.v
>> >>>>                IF row1 NOT EXISTS AND row2.v = 3 THEN
>> >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>> >>>>                END IF
>> >>>>              COMMIT TRANSACTION
>> >>>>
>> >>>>              I don't have a strong affinity for any of these,
>> although
>> >>>>              #1 seems the most awkward.
>> >>>>
>> >>>>              Does anyone have any other alternatives? Preference for
>> >>>>              one of the above options?
>> >>>>
>> >>>>              Thanks!
>> >>>>
>> >>>>              On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe
>> >>>>              <[email protected]> wrote:
>> >>>>
>> >>>>                  Avi brought up an interesting point around NULLness
>> >>>>                  checking inCASSANDRA-17762
>> >>>>                  <
>> https://issues.apache.org/jira/browse/CASSANDRA-17762>...
>> >>>>
>> >>>>                      In SQL, any comparison with NULL is NULL, which
>> >>>>                      is interpreted as FALSE in a condition. To test
>> >>>>                      for NULLness, you use IS NULL or IS NOT NULL.
>> But
>> >>>>                      LWT uses IF col = NULL as a NULLness test. This
>> >>>>                      is likely to confuse people coming from SQL and
>> >>>>                      hamper attempts to extend the dialect.
>> >>>>
>> >>>>
>> >>>>                  We can leave that Jira open to address what to do in
>> >>>>                  the legacy LWT case, but I'd support a SQL-congruent
>> >>>>                  syntax here (IS NULL or IS NOT NULL), where we have
>> >>>>                  something closer to a blank slate.
>> >>>>
>> >>>>                  Thoughts?
>> >>>>
>> >>>>                  On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky
>> >>>>                  <[email protected]> wrote:
>> >>>>
>> >>>>                      The new syntax looks great, and I’m really
>> >>>>                      excited to see this coming together.
>> >>>>
>> >>>>                      One piece of feedback on the proposed syntax is
>> >>>>                      around the use of “=“ as a declaration in
>> >>>>                      addition to its current use as an equality
>> >>>>                      operator in a WHERE clause and an assignment
>> >>>>                      operator in an UPDATE:
>> >>>>
>> >>>>                          BEGIN TRANSACTION
>> >>>>                            LET car_miles = miles_driven,
>> >>>>                          car_is_running = is_running FROM cars WHERE
>> >>>>                          model=’pinto’
>> >>>>                            LET user_miles = miles_driven FROM users
>> >>>>                          WHERE name=’blake’
>> >>>>                            SELECT something else from some other
>> table
>> >>>>                            IF NOT car_is_running THEN ABORT
>> >>>>                            UPDATE users SET miles_driven = user_miles
>> >>>>                          + 30 WHERE name='blake';
>> >>>>                            UPDATE cars SET miles_driven = car_miles +
>> >>>>                          30 WHERE model='pinto';
>> >>>>                          COMMIT TRANSACTION
>> >>>>
>> >>>>                      This is supported in languages like PL/pgSQL,
>> but
>> >>>>                      in a normal SQL query kind of local declaration
>> >>>>                      is often expressed as an alias (SELECT col AS
>> >>>>                      new_col), subquery alias (SELECT col) t, or
>> >>>>                      common table expression (WITH t AS (SELECT
>> col)).
>> >>>>
>> >>>>                      Here’s an example of an alternative to the
>> >>>>                      proposed syntax that I’d find more readable:
>> >>>>
>> >>>>                          BEGIN TRANSACTION
>> >>>>                            WITH car_miles, car_is_running AS (SELECT
>> >>>>                          miles_driven, is_running FROM cars WHERE
>> >>>>                          model=’pinto’),
>> >>>>                          user_miles AS (SELECT miles_driven FROM
>> users
>> >>>>                          WHERE name=’blake’)
>> >>>>                            IF NOT car_is_running THEN ABORT
>> >>>>                            UPDATE users SET miles_driven = user_miles
>> >>>>                          + 30 WHERE name='blake';
>> >>>>                          UPDATE cars SET miles_driven = car_miles +
>> 30
>> >>>>                          WHERE model='pinto';
>> >>>>                          COMMIT TRANSACTION
>> >>>>
>> >>>>                      There’s also the option of naming the
>> transaction
>> >>>>                      like a subquery, and supporting LET via AS (this
>> >>>>                      one I’m less sure about but wanted to propose
>> >>>>                      anyway):
>> >>>>
>> >>>>                          BEGIN TRANSACTION t1
>> >>>>                            SELECT miles_driven AS t1.car_miles,
>> >>>>                          is_running AS t1.car_is_running FROM cars
>> >>>>                          WHERE model=’pinto’;
>> >>>>                            SELECT miles_driven AS t1.user_miles FROM
>> >>>>                          users WHERE name=’blake’;
>> >>>>                            IF NOT car_is_running THEN ABORT
>> >>>>                            UPDATE users SET miles_driven = user_miles
>> >>>>                          + 30 WHERE name='blake';
>> >>>>                            UPDATE cars SET miles_driven = car_miles +
>> >>>>                          30 WHERE model='pinto';
>> >>>>                          COMMIT TRANSACTION
>> >>>>
>> >>>>                      This also has the benefit of resolving ambiguity
>> >>>>                      in case of naming conflicts with existing (or
>> >>>>                      future) column names.
>> >>>>
>> >>>>                      --
>> >>>>                      Abe
>> >>>>
>>
>>
>
>

Re: CEP-15 multi key transaction syntax

Reply via email to