Re: CEP-15 multi key transaction syntax

Patrick McFadin Mon, 22 Aug 2022 09:14:17 -0700

The replies got trashed pretty badly in the responses.
When you say: "Agree it's better to reuse existing syntax than invent new
syntax."


Which syntax are you referring to?

Patrick


On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev <dev@cassandra.apache.org>
wrote:

> Agree it's better to reuse existing syntax than invent new syntax.
>
> On 8/21/22 16:52, Konstantin Osipov wrote:
> > * Avi Kivity via dev <dev@cassandra.apache.org> [22/08/14 15:59]:
> >
> > MySQL supports SELECT <expr_list> INTO <var_list> FROM ... WHERE
> > ...
> >
> > PostgreSQL supports pretty much the same syntax.
> >
> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
> > MySQL/PostgreSQL SELECT ... INTO?
> >
> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
> >>> 
> >>> I’ll do my best to express with my thinking, as well as how I would
> >>> explain the feature to a user.
> >>>
> >>> My mental model for LET statements is that they are simply SELECT
> >>> statements where the columns that are selected become variables
> >>> accessible anywhere in the scope of the transaction. That is to say,
> you
> >>> should be able to run something like s/LET/SELECT and
> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
> >>> and produce a valid SELECT statement, and vice versa. Both should
> >>> perform identically.
> >>>
> >>> e.g.
> >>> SELECT pk AS key, v AS value FROM table
> >>>
> >>> =>
> >>> LET key = pk, value = v FROM table
> >>
> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
> >> supports selecting comparisons:
> >>
> >>
> >> $ psql
> >> psql (14.3)
> >> Type "help" for help.
> >>
> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
> >>   ?column? | ?column? | ?column?
> >> ----------+----------+----------
> >>   f        | t        |
> >> (1 row)
> >>
> >>
> >> Using "=" as a syntactic element in LET would make SELECT and LET
> >> incompatible once comparisons become valid selectors. Unless they become
> >> mandatory (and then you'd write "LET q = a = b" if you wanted to select
> a
> >> comparison).
> >>
> >>
> >> I personally prefer the nested query syntax:
> >>
> >>
> >>      LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
> >>
> >>
> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
> >> immediately recognizable by everyone as a query, LET is not.
> >>
> >>
> >>> Identical form, identical behaviour. Every statement should be directly
> >>> translatable with some simple text manipulation.
> >>>
> >>> We can then make this more powerful for users by simply expanding
> SELECT
> >>> statements, e.g. by permitting them to declare constants and tuples in
> >>> the column results. In this scheme LET x = * is simply syntactic sugar
> >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
> >>> and 5 all at once, consistently alongside each other.
> >>>
> >>> Option 6 is in fact very similar, but is strictly less flexible for the
> >>> user as they have no way to declare multiple scalar variables without
> >>> scoping them inside a tuple.
> >>>
> >>> e.g.
> >>> LET key = pk, value = v FROM table
> >>> IF key > 1 AND value > 1 THEN...
> >>>
> >>> =>
> >>> LET row = SELECT pk AS key, v AS value FROM table
> >>> IF row.key > 1 AND row.value > 1 THEN…
> >>>
> >>> However, both are expressible in the existing proposal, as if you
> prefer
> >>> this naming scheme you can simply write
> >>>
> >>> LET row = (pk AS key, v AS value) FROM table
> >>> IF row.key > 1 AND row.value > 1 THEN…
> >>>
> >>> With respect to auto converting single column results to a scalar, we
> do
> >>> need a way for the user to say they care whether the row was null or
> the
> >>> column. I think an implicit conversion here could be surprising.
> However
> >>> we could implement tuple expressions anyway and let the user explicitly
> >>> declare v as a tuple as Caleb has suggested for the existing proposal
> as
> >>> well.
> >>>
> >>> Assigning constants or other values not selected from a table would
> also
> >>> be a little clunky:
> >>>
> >>> LET v1 = someFunc(), v2 = someOtherFunc(?)
> >>> IF v1 > 1 AND v2 > 1 THEN…
> >>>
> >>> =>
> >>> LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
> >>> IF row.v1 > 1 AND row.v2 > 1 THEN...
> >>>
> >>> That said, the proposals are /close/ to identical, it is just slightly
> >>> more verbose and slightly less flexible.
> >>>
> >>> Which one would be most intuitive to users is hard to predict. It might
> >>> be that Option 6 would be slightly easier, but I’m unsure if there
> would
> >>> be a huge difference.
> >>>
> >>>
> >>>> On 13 Aug 2022, at 16:59, Patrick McFadin <pmcfa...@gmail.com> wrote:
> >>>>
> >>>> I'm really happy to see CEP-15 getting closer to a final
> >>>> implementation. I'm going to walk through my reasoning for your
> >>>> proposals wrt trying to explain this to somebody new.
> >>>>
> >>>> Looking at all the options, the first thing that comes up for me is
> >>>> the Cassandra project's complicated relationship with NULL.  We have
> >>>> prior art with EXISTS/NOT EXISTS when creating new tables. IS
> >>>> NULL/IS NOT NULL is used in materialized views similarly to
> >>>> proposals 2,4 and 5.
> >>>>
> >>>> CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
> >>>>    AS SELECT [ (column_list) ]
> >>>>    FROM [keyspace_name.]table_name
> >>>>    [ WHERE column_name IS NOT NULL
> >>>>    [ AND column_name IS NOT NULL ... ] ]
> >>>>    [ AND relation [ AND ... ] ]
> >>>>    PRIMARY KEY ( column_list )
> >>>>    [ WITH [ table_properties ]
> >>>>    [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ]
> ] ;
> >>>>
> >>>>   Based on that, I believe 1 and 3 would just confuse users, so -1 on
> >>>> those.
> >>>>
> >>>> Trying to explain the difference between row and column operations
> >>>> with LET, I can't see the difference between a row and column in #2.
> >>>>
> >>>> #4 introduces a boolean instead of column names and just adds more
> >>>> syntax.
> >>>>
> >>>> #5 is verbose and, in my opinion, easier to reason when writing a
> >>>> query. Thinking top down, I need to know if these exact rows and/or
> >>>> column values exist before changing them, so I'll define them first.
> >>>> Then I'll iterate over the state I created in my actual changes so I
> >>>> know I'm changing precisely what I want.
> >>>>
> >>>> #5 could use a bit more to be clearer to somebody who doesn't write
> >>>> CQL queries daily and wouldn't require memorizing subtle
> >>>> differences. It should be similar to all the other syntax, so
> >>>> learning a little about CQL will let you move into more without
> >>>> completely re-learning the new syntax.
> >>>>
> >>>> So I propose #6)
> >>>> BEGIN TRANSACTION
> >>>> LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
> >>>> columns
> >>>> LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0;
> >>>>    SELECT row1, row2
> >>>> IF row1 IS NULL AND row2.v = 3 THEN
> >>>>    INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
> >>>> END IF
> >>>> COMMIT TRANSACTION
> >>>>
> >>>> I added the SELECT in the LET just so it's straightforward, you are
> >>>> reading, and it's just like doing a regular select, but you are
> >>>> assigning it to a variable.
> >>>>
> >>>> I removed the confusing 'row1.v'and replaced it with 'row1'I can't
> >>>> see why you would need the '.v'vs having the complete variable I
> >>>> created in the statement above.
> >>>>
> >>>> EOL
> >>>>
> >>>> Patrick
> >>>>
> >>>> On Thu, Aug 11, 2022 at 1:37 PM Caleb Rackliffe
> >>>> <calebrackli...@gmail.com> wrote:
> >>>>
> >>>>      ...and one more option...
> >>>>
> >>>>      5.) Introduce tuple assignments, removing all ambiguity around
> >>>>      row vs. column operations.
> >>>>
> >>>>      BEGIN TRANSACTION
> >>>>        LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
> >>>>      columns
> >>>>        LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0;
> >>>>      SELECT row1.v, row2.v
> >>>>        IF row1 IS NULL AND row2.v = 3 THEN
> >>>>          INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
> >>>>        END IF
> >>>>      COMMIT TRANSACTION
> >>>>
> >>>>
> >>>>
> >>>>      On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe
> >>>>      <calebrackli...@gmail.com> wrote:
> >>>>
> >>>>          via Benedict, here is a 4th option:
> >>>>
> >>>>          4.) Similar to #2, but don't rely on the key element being
> NULL.
> >>>>
> >>>>          If the read returns no result, x effectively becomes NULL.
> >>>>          Otherwise, it remains true/NOT NULL.
> >>>>
> >>>>          BEGIN TRANSACTION
> >>>>            LET x = true FROM ks.tbl WHERE k=0 AND c=0;
> >>>>            LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
> >>>>            SELECT x, row2_v
> >>>>            IF x IS NULL AND row2_v = 3 THEN
> >>>>              INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
> >>>>            END IF
> >>>>          COMMIT TRANSACTION
> >>>>
> >>>>          On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe
> >>>>          <calebrackli...@gmail.com> wrote:
> >>>>
> >>>>              Hello again everyone!
> >>>>
> >>>>              I've been working on a prototype
> >>>>              <https://issues.apache.org/jira/browse/CASSANDRA-17719
> > in
> >>>>              CASSANDRA-17719 for a grammar that roughly corresponds to
> >>>>              what we've agreed on in this thread. One thing that isn't
> >>>>              immediately obvious to me is how the LET syntax handles
> >>>>              cases where we want to check for the plain existence of a
> >>>>              row in IF. For example, in this hybrid of the originally
> >>>>              proposed syntax and something more like what we've agreed
> >>>>              on (and the RETURNING just to distinguish between that
> >>>>              and SELECT), this could be pretty straightforward:
> >>>>
> >>>>              BEGIN TRANSACTION
> >>>>                SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
> >>>>                SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
> >>>>                RETURNING row1.v, row2.v
> >>>>                IF row1 NOT EXISTS AND row2.v = 3 THEN
> >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
> >>>>                END IF
> >>>>              COMMIT TRANSACTION
> >>>>
> >>>>              The NOT EXISTS operator has row1 to work with. One the
> >>>>              other hand, w/ the LET syntax and no naming of reads,
> >>>>              it's not clear what the best solution would be. Here are
> >>>>              a few possibilities:
> >>>>
> >>>>              1.) Provide a few built-in functions that operate on a
> >>>>              whole result row. If we assume a SQL style IS NULL and IS
> >>>>              NOT NULL (see my last post here) for operations on
> >>>>              particular columns, this probably eliminates the need for
> >>>>              EXISTS/NOT EXISTS as well.
> >>>>
> >>>>              BEGIN TRANSACTION
> >>>>                LET row1_missing = notExists() FROM ks.tbl WHERE k=0
> >>>>              AND c=0;
> >>>>                LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
> >>>>                SELECT row1_missing, row2_v
> >>>>                IF row1_missing AND row2_v = 3 THEN
> >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
> >>>>                END IF
> >>>>              COMMIT TRANSACTION
> >>>>
> >>>>              2.) Assign and check the first primary key element to
> >>>>              determine whether the row exists.
> >>>>
> >>>>              BEGIN TRANSACTION
> >>>>                LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0;
> >>>>                LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
> >>>>                SELECT row1_k, row2_v
> >>>>                IF row1_k IS NULL AND row2_v = 3 THEN
> >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
> >>>>                END IF
> >>>>              COMMIT TRANSACTION
> >>>>
> >>>>              3.) Reconsider the LET concept toward something that
> >>>>              allows us to explicitly name our reads again.
> >>>>
> >>>>              BEGIN TRANSACTION
> >>>>                WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0) AS row1;
> >>>>                WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0) AS row2;
> >>>>                SELECT row1.v, row2.v
> >>>>                IF row1 NOT EXISTS AND row2.v = 3 THEN
> >>>>                  INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
> >>>>                END IF
> >>>>              COMMIT TRANSACTION
> >>>>
> >>>>              I don't have a strong affinity for any of these, although
> >>>>              #1 seems the most awkward.
> >>>>
> >>>>              Does anyone have any other alternatives? Preference for
> >>>>              one of the above options?
> >>>>
> >>>>              Thanks!
> >>>>
> >>>>              On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe
> >>>>              <calebrackli...@gmail.com> wrote:
> >>>>
> >>>>                  Avi brought up an interesting point around NULLness
> >>>>                  checking inCASSANDRA-17762
> >>>>                  <
> https://issues.apache.org/jira/browse/CASSANDRA-17762>...
> >>>>
> >>>>                      In SQL, any comparison with NULL is NULL, which
> >>>>                      is interpreted as FALSE in a condition. To test
> >>>>                      for NULLness, you use IS NULL or IS NOT NULL. But
> >>>>                      LWT uses IF col = NULL as a NULLness test. This
> >>>>                      is likely to confuse people coming from SQL and
> >>>>                      hamper attempts to extend the dialect.
> >>>>
> >>>>
> >>>>                  We can leave that Jira open to address what to do in
> >>>>                  the legacy LWT case, but I'd support a SQL-congruent
> >>>>                  syntax here (IS NULL or IS NOT NULL), where we have
> >>>>                  something closer to a blank slate.
> >>>>
> >>>>                  Thoughts?
> >>>>
> >>>>                  On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky
> >>>>                  <a...@aber.io> wrote:
> >>>>
> >>>>                      The new syntax looks great, and I’m really
> >>>>                      excited to see this coming together.
> >>>>
> >>>>                      One piece of feedback on the proposed syntax is
> >>>>                      around the use of “=“ as a declaration in
> >>>>                      addition to its current use as an equality
> >>>>                      operator in a WHERE clause and an assignment
> >>>>                      operator in an UPDATE:
> >>>>
> >>>>                          BEGIN TRANSACTION
> >>>>                            LET car_miles = miles_driven,
> >>>>                          car_is_running = is_running FROM cars WHERE
> >>>>                          model=’pinto’
> >>>>                            LET user_miles = miles_driven FROM users
> >>>>                          WHERE name=’blake’
> >>>>                            SELECT something else from some other table
> >>>>                            IF NOT car_is_running THEN ABORT
> >>>>                            UPDATE users SET miles_driven = user_miles
> >>>>                          + 30 WHERE name='blake';
> >>>>                            UPDATE cars SET miles_driven = car_miles +
> >>>>                          30 WHERE model='pinto';
> >>>>                          COMMIT TRANSACTION
> >>>>
> >>>>                      This is supported in languages like PL/pgSQL, but
> >>>>                      in a normal SQL query kind of local declaration
> >>>>                      is often expressed as an alias (SELECT col AS
> >>>>                      new_col), subquery alias (SELECT col) t, or
> >>>>                      common table expression (WITH t AS (SELECT col)).
> >>>>
> >>>>                      Here’s an example of an alternative to the
> >>>>                      proposed syntax that I’d find more readable:
> >>>>
> >>>>                          BEGIN TRANSACTION
> >>>>                            WITH car_miles, car_is_running AS (SELECT
> >>>>                          miles_driven, is_running FROM cars WHERE
> >>>>                          model=’pinto’),
> >>>>                          user_miles AS (SELECT miles_driven FROM users
> >>>>                          WHERE name=’blake’)
> >>>>                            IF NOT car_is_running THEN ABORT
> >>>>                            UPDATE users SET miles_driven = user_miles
> >>>>                          + 30 WHERE name='blake';
> >>>>                          UPDATE cars SET miles_driven = car_miles + 30
> >>>>                          WHERE model='pinto';
> >>>>                          COMMIT TRANSACTION
> >>>>
> >>>>                      There’s also the option of naming the transaction
> >>>>                      like a subquery, and supporting LET via AS (this
> >>>>                      one I’m less sure about but wanted to propose
> >>>>                      anyway):
> >>>>
> >>>>                          BEGIN TRANSACTION t1
> >>>>                            SELECT miles_driven AS t1.car_miles,
> >>>>                          is_running AS t1.car_is_running FROM cars
> >>>>                          WHERE model=’pinto’;
> >>>>                            SELECT miles_driven AS t1.user_miles FROM
> >>>>                          users WHERE name=’blake’;
> >>>>                            IF NOT car_is_running THEN ABORT
> >>>>                            UPDATE users SET miles_driven = user_miles
> >>>>                          + 30 WHERE name='blake';
> >>>>                            UPDATE cars SET miles_driven = car_miles +
> >>>>                          30 WHERE model='pinto';
> >>>>                          COMMIT TRANSACTION
> >>>>
> >>>>                      This also has the benefit of resolving ambiguity
> >>>>                      in case of naming conflicts with existing (or
> >>>>                      future) column names.
> >>>>
> >>>>                      --
> >>>>                      Abe
> >>>>
>
>

Re: CEP-15 multi key transaction syntax

Reply via email to