Re: CEP-15 multi key transaction syntax

Patrick McFadin Sun, 14 Aug 2022 10:05:09 -0700

Let me just state my bias right up front. For any kind of QL I lean heavily
toward verbose and explicit based on their lifecycle. A CQL query will
probably need to be understood by the next person looking at it, and a few
seconds saved typing isn't worth the potential misunderstanding later.  My
opinion is formed by having to be the second person many times.  :D


I just want to make sure I have the syntax you are proposing.

Verbose version:
LET (a) = SELECT val FROM table
IF a > 1 THEN...

Less verbose version:
LET a = SELECT val FROM table
IF a.val > 1 THEN...

Alternatively, for non-queries:
LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
or less verbose:
LET x = (someFunc() AS v1, someOtherFunc() as v2)
LET (v1, v2) = (someFunc(), someOtherFunc())

Weighing in on the "SELECT without a FROM," I think that is fine and, as
Avi stated, already present in the SQL world. I would prefer that over
'SELECT  func() FROM dual;' (Looking at you, Oracle)

Finally, on the topic of deconstructing SELECT statements instead of
nesting. If I understand the argument here, I would favor deconstructing
over nesting if there is a choice. I think this is what that choice would
look like.

Deconstructed:
LET x = SELECT x FROM ...
LET y = SELECT y FROM ...

Nested:
LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))

I'm trying to summate but let me know if I missed something. I apologize in
advance to Monday morning Caleb, who will have to digest this thread.

Patrick

On Sun, Aug 14, 2022 at 9:00 AM Benedict Elliott Smith <bened...@apache.org>
wrote:

> 
>
> I think SQL dialects require subqueries to be parenthesized (not sure). If
> that's the case I think we should keep the tradition.
>
>
> This isn’t a sub-query though, since LET is not a query. If we permit at
> most one SELECT, and do not permit mixing SELECT with constant assignments,
> I don’t see why we would require parentheses.
>
> I see no harm in making FROM optional, as it's recognized by other SQL
> dialects.
> Absolutely, this just flows naturally from having tuples. There's no
> difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple”.
>
>
> Neither of these things are supported today, and they’re no longer
> necessary with this syntax proposal. The downside of splitting SELECT and
> LET is that there’s no impetus to improve the former. So the question was
> really whether we bother to improve it anyway, not whether or not they
> would be good improvements (I think they obviously are).
>
> I think this can be safely deferred. Most people would again separate it
> into separate LETs.
>
> That implies we’ll permit deconstructing a tuple variable in a LET. This
> makes sense to me, but is roughly equivalent to nested deconstruction. It
> might be that v1 we only support deconstructing SELECT statements, but I
> guess all of this is probably up to the implementor.
>
> I'd add (to the specification) that LETs cannot override a previously
> defined variable, just to reduce ambiguity.
>
> Yep, this was already agreed way back with the earlier proposal.
>
>
> On 14 Aug 2022, at 16:30, Avi Kivity <a...@scylladb.com> wrote:
>
>
> On 14/08/2022 17.50, Benedict Elliott Smith wrote:
>
> 
> > SELECT and LET incompatible once comparisons become valid selectors
>
> I don’t think this would be ambiguous, as = is required in the LET syntax
> as we have to bind the result to a variable name.
>
> But, I like the deconstructed tuple syntax improvement over “Option 6”.
> This would also seem to easily support assigning from non-query statements,
> such as LET (a, b) = (someFunc(), someOtherFunc(?))
>
> I don’t think it is ideal to depend on relative position in the tuple for
> assigning results to a variable name, as it leaves more scope for errors.
> It would be nice to have a simple way to deconstruct safely. But, I think
> this proposal is good, and I’d be fine with it as an alternative if others
> concur. I agree that seeing the SELECT independently may be more easily
> recognisable to users.
>
> With this approach there remains the question of how we handle single
> column results. I’d be inclined to treat in the following way:
>
> LET (a) = SELECT val FROM table
> IF a > 1 THEN...
>
> LET a = SELECT val FROM table
> IF a.val > 1 THEN...
>
>
> I think SQL dialects require subqueries to be parenthesized (not sure). If
> that's the case I think we should keep the tradition.
>
>
> ----
> There is also the question of whether we support SELECT without a FROM
> clause, e.g.
> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
>
> Or just LET (since they are no longer equivalent)
> e.g.
> LET x = (someFunc() AS v1, someOtherFunc() as v2)
> LET (v1, v2) = (someFunc(), someOtherFunc())
>
>
> I see no harm in making FROM optional, as it's recognized by other SQL
> dialects.
>
>
> ----
> Also since LET is only binding variables, is there any reason we shouldn’t
> support multiple SELECT assignments in a single LET?, e.g.
> LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))
>
>
> What if an inner select returns a tuple? Would y be a tuple?
>
>
> I think this is redundant and atypical enough to not be worth supporting.
> Most people would use separate LETs.
>
>
> ----
> Also whether we support tuples in SELECT statements anyway, e.g.
> LET (tuple1, tuple2) = SELECT (a, b), (c, d) FROM..
> IF tuple1.a > 1 AND tuple2.d > 1…
>
>
> Absolutely, this just flows naturally from having tuples. There's no
> difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple".
>
>
>
> ----
> and whether we support nested deconstruction, e.g.
> LET (a, b, (c, d)) = SELECT a, b, someTuple FROM..
> IF a > 1 AND d > 1…
>
>
> I think this can be safely deferred. Most people would again separate it
> into separate LETs.
>
>
> I'd add (to the specification) that LETs cannot override a previously
> defined variable, just to reduce ambiguity.
>
>
>
>
>
>
>
>
> On 14 Aug 2022, at 13:55, Avi Kivity via dev <dev@cassandra.apache.org>
> <dev@cassandra.apache.org> wrote:
>
>
> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>
> 
> I’ll do my best to express with my thinking, as well as how I would
> explain the feature to a user.
>
> My mental model for LET statements is that they are simply SELECT
> statements where the columns that are selected become variables accessible
> anywhere in the scope of the transaction. That is to say, you should be
> able to run something like s/LET/SELECT and s/([^=]+)=([^,]+)(,|$)/\2 AS
> \1\3/g on the columns of a LET statement and produce a valid SELECT
> statement, and vice versa. Both should perform identically.
>
> e.g.
> SELECT pk AS key, v AS value FROM table
>
> =>
> LET key = pk, value = v FROM table
>
>
> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
> supports selecting comparisons:
>
>
> $ psql
> psql (14.3)
> Type "help" for help.
>
> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>  ?column? | ?column? | ?column?
> ----------+----------+----------
>  f        | t        |
> (1 row)
>
>
> Using "=" as a syntactic element in LET would make SELECT and LET
> incompatible once comparisons become valid selectors. Unless they become
> mandatory (and then you'd write "LET q = a = b" if you wanted to select a
> comparison).
>
>
> I personally prefer the nested query syntax:
>
>
>     LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>
>
> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
> immediately recognizable by everyone as a query, LET is not.
>
>
>
> Identical form, identical behaviour. Every statement should be directly
> translatable with some simple text manipulation.
>
> We can then make this more powerful for users by simply expanding SELECT
> statements, e.g. by permitting them to declare constants and tuples in the
> column results. In this scheme LET x = * is simply syntactic sugar for LET
> x = (pk, ck, field1, …) This scheme then supports options 2, 4 and 5 all at
> once, consistently alongside each other.
>
> Option 6 is in fact very similar, but is strictly less flexible for the
> user as they have no way to declare multiple scalar variables without
> scoping them inside a tuple.
>
> e.g.
> LET key = pk, value = v FROM table
> IF key > 1 AND value > 1 THEN...
>
> =>
> LET row = SELECT pk AS key, v AS value FROM table
> IF row.key > 1 AND row.value > 1 THEN…
>
> However, both are expressible in the existing proposal, as if you prefer
> this naming scheme you can simply write
>
> LET row = (pk AS key, v AS value) FROM table
> IF row.key > 1 AND row.value > 1 THEN…
>
> With respect to auto converting single column results to a scalar, we do
> need a way for the user to say they care whether the row was null or the
> column. I think an implicit conversion here could be surprising. However we
> could implement tuple expressions anyway and let the user explicitly
> declare v as a tuple as Caleb has suggested for the existing proposal as
> well.
>
> Assigning constants or other values not selected from a table would also
> be a little clunky:
>
> LET v1 = someFunc(), v2 = someOtherFunc(?)
> IF v1 > 1 AND v2 > 1 THEN…
>
> =>
> LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
> IF row.v1 > 1 AND row.v2 > 1 THEN...
>
> That said, the proposals are *close* to identical, it is just slightly
> more verbose and slightly less flexible.
>
> Which one would be most intuitive to users is hard to predict. It might be
> that Option 6 would be slightly easier, but I’m unsure if there would be a
> huge difference.
>
>
> On 13 Aug 2022, at 16:59, Patrick McFadin <pmcfa...@gmail.com> wrote:
>
> I'm really happy to see CEP-15 getting closer to a final implementation.
> I'm going to walk through my reasoning for your proposals wrt trying to
> explain this to somebody new.
>
> Looking at all the options, the first thing that comes up for me is the
> Cassandra project's complicated relationship with NULL.  We have prior art
> with EXISTS/NOT EXISTS when creating new tables. IS NULL/IS NOT NULL is
> used in materialized views similarly to proposals 2,4 and 5.
>
> CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
>   AS SELECT [ (column_list) ]
>   FROM [keyspace_name.]table_name
>   [ WHERE column_name IS NOT NULL
>   [ AND column_name IS NOT NULL ... ] ]
>   [ AND relation [ AND ... ] ]
>   PRIMARY KEY ( column_list )
>   [ WITH [ table_properties ]
>   [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ;
>
>  Based on that, I believe 1 and 3 would just confuse users, so -1 on
> those.
>
> Trying to explain the difference between row and column operations with
> LET, I can't see the difference between a row and column in #2.
>
> #4 introduces a boolean instead of column names and just adds more syntax.
>
> #5 is verbose and, in my opinion, easier to reason when writing a query.
> Thinking top down, I need to know if these exact rows and/or column values
> exist before changing them, so I'll define them first. Then I'll iterate
> over the state I created in my actual changes so I know I'm changing
> precisely what I want.
>
> #5 could use a bit more to be clearer to somebody who doesn't write CQL
> queries daily and wouldn't require memorizing subtle differences. It should
> be similar to all the other syntax, so learning a little about CQL will let
> you move into more without completely re-learning the new syntax.
>
> So I propose #6)
> BEGIN TRANSACTION
>   LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
> columns
>   LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0;
>   SELECT row1, row2
>   IF row1 IS NULL AND row2.v = 3 THEN
>     INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>   END IF
> COMMIT TRANSACTION
>
> I added the SELECT in the LET just so it's straightforward, you are
> reading, and it's just like doing a regular select, but you are assigning
> it to a variable.
>
> I removed the confusing 'row1.v' and replaced it with 'row1' I can't see
> why you would need the '.v' vs having the complete variable I created in
> the statement above.
>
> EOL
>
> Patrick
>
> On Thu, Aug 11, 2022 at 1:37 PM Caleb Rackliffe <calebrackli...@gmail.com>
> wrote:
>
>> ...and one more option...
>>
>> 5.) Introduce tuple assignments, removing all ambiguity around row vs.
>> column operations.
>>
>> BEGIN TRANSACTION
>>   LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all columns
>>   LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0;
>>   SELECT row1.v, row2.v
>>   IF row1 IS NULL AND row2.v = 3 THEN
>>     INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>   END IF
>> COMMIT TRANSACTION
>>
>>
>>
>> On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe <
>> calebrackli...@gmail.com> wrote:
>>
>>> via Benedict, here is a 4th option:
>>>
>>> 4.) Similar to #2, but don't rely on the key element being NULL.
>>>
>>> If the read returns no result, x effectively becomes NULL. Otherwise, it
>>> remains true/NOT NULL.
>>>
>>> BEGIN TRANSACTION
>>>   LET x = true FROM ks.tbl WHERE k=0 AND c=0;
>>>   LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>>   SELECT x, row2_v
>>>   IF x IS NULL AND row2_v = 3 THEN
>>>     INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>   END IF
>>> COMMIT TRANSACTION
>>>
>>> On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe <
>>> calebrackli...@gmail.com> wrote:
>>>
>>>> Hello again everyone!
>>>>
>>>> I've been working on a prototype
>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17719> in
>>>> CASSANDRA-17719 for a grammar that roughly corresponds to what we've agreed
>>>> on in this thread. One thing that isn't immediately obvious to me is how
>>>> the LET syntax handles cases where we want to check for the plain existence
>>>> of a row in IF. For example, in this hybrid of the originally proposed
>>>> syntax and something more like what we've agreed on (and the RETURNING just
>>>> to distinguish between that and SELECT), this could be pretty
>>>> straightforward:
>>>>
>>>> BEGIN TRANSACTION
>>>>   SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
>>>>   SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
>>>>   RETURNING row1.v, row2.v
>>>>   IF row1 NOT EXISTS AND row2.v = 3 THEN
>>>>     INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>   END IF
>>>> COMMIT TRANSACTION
>>>>
>>>> The NOT EXISTS operator has row1 to work with. One the other hand, w/
>>>> the LET syntax and no naming of reads, it's not clear what the best
>>>> solution would be. Here are a few possibilities:
>>>>
>>>> 1.) Provide a few built-in functions that operate on a whole result
>>>> row. If we assume a SQL style IS NULL and IS NOT NULL (see my last post
>>>> here) for operations on particular columns, this probably eliminates the
>>>> need for EXISTS/NOT EXISTS as well.
>>>>
>>>> BEGIN TRANSACTION
>>>>   LET row1_missing = notExists() FROM ks.tbl WHERE k=0 AND c=0;
>>>>   LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>>>   SELECT row1_missing, row2_v
>>>>   IF row1_missing AND row2_v = 3 THEN
>>>>     INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>   END IF
>>>> COMMIT TRANSACTION
>>>>
>>>> 2.) Assign and check the first primary key element to determine whether
>>>> the row exists.
>>>>
>>>> BEGIN TRANSACTION
>>>>   LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0;
>>>>   LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>>>   SELECT row1_k, row2_v
>>>>   IF row1_k IS NULL AND row2_v = 3 THEN
>>>>     INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>   END IF
>>>> COMMIT TRANSACTION
>>>>
>>>> 3.) Reconsider the LET concept toward something that allows us to
>>>> explicitly name our reads again.
>>>>
>>>> BEGIN TRANSACTION
>>>>   WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0) AS row1;
>>>>   WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0) AS row2;
>>>>   SELECT row1.v, row2.v
>>>>   IF row1 NOT EXISTS AND row2.v = 3 THEN
>>>>     INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>   END IF
>>>> COMMIT TRANSACTION
>>>>
>>>> I don't have a strong affinity for any of these, although #1 seems the
>>>> most awkward.
>>>>
>>>> Does anyone have any other alternatives? Preference for one of the
>>>> above options?
>>>>
>>>> Thanks!
>>>>
>>>> On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe <
>>>> calebrackli...@gmail.com> wrote:
>>>>
>>>>> Avi brought up an interesting point around NULLness checking in
>>>>> CASSANDRA-17762
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17762>...
>>>>>
>>>>> In SQL, any comparison with NULL is NULL, which is interpreted as
>>>>>> FALSE in a condition. To test for NULLness, you use IS NULL or IS NOT 
>>>>>> NULL.
>>>>>> But LWT uses IF col = NULL as a NULLness test. This is likely to confuse
>>>>>> people coming from SQL and hamper attempts to extend the dialect.
>>>>>
>>>>>
>>>>> We can leave that Jira open to address what to do in the legacy LWT
>>>>> case, but I'd support a SQL-congruent syntax here (IS NULL or IS NOT
>>>>> NULL), where we have something closer to a blank slate.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky <a...@aber.io> wrote:
>>>>>
>>>>>> The new syntax looks great, and I’m really excited to see this coming
>>>>>> together.
>>>>>>
>>>>>> One piece of feedback on the proposed syntax is around the use of “=“
>>>>>> as a declaration in addition to its current use as an equality operator 
>>>>>> in
>>>>>> a WHERE clause and an assignment operator in an UPDATE:
>>>>>>
>>>>>> BEGIN TRANSACTION
>>>>>>   LET car_miles = miles_driven, car_is_running = is_running FROM cars
>>>>>> WHERE model=’pinto’
>>>>>>   LET user_miles = miles_driven FROM users WHERE name=’blake’
>>>>>>   SELECT something else from some other table
>>>>>>   IF NOT car_is_running THEN ABORT
>>>>>>   UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
>>>>>>   UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
>>>>>> COMMIT TRANSACTION
>>>>>>
>>>>>> This is supported in languages like PL/pgSQL, but in a normal SQL
>>>>>> query kind of local declaration is often expressed as an alias (SELECT 
>>>>>> col
>>>>>> AS new_col), subquery alias (SELECT col) t, or common table expression
>>>>>> (WITH t AS (SELECT col)).
>>>>>>
>>>>>> Here’s an example of an alternative to the proposed syntax that I’d
>>>>>> find more readable:
>>>>>>
>>>>>> BEGIN TRANSACTION
>>>>>>   WITH car_miles, car_is_running AS (SELECT miles_driven, is_running
>>>>>> FROM cars WHERE model=’pinto’),
>>>>>>   user_miles AS (SELECT miles_driven FROM users WHERE name=’blake’)
>>>>>>   IF NOT car_is_running THEN ABORT
>>>>>>   UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
>>>>>>   UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
>>>>>> COMMIT TRANSACTION
>>>>>>
>>>>>> There’s also the option of naming the transaction like a subquery,
>>>>>> and supporting LET via AS (this one I’m less sure about but wanted to
>>>>>> propose anyway):
>>>>>>
>>>>>> BEGIN TRANSACTION t1
>>>>>>   SELECT miles_driven AS t1.car_miles, is_running AS
>>>>>> t1.car_is_running FROM cars WHERE model=’pinto’;
>>>>>>   SELECT miles_driven AS t1.user_miles FROM users WHERE name=’blake’;
>>>>>>   IF NOT car_is_running THEN ABORT
>>>>>>   UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake';
>>>>>>   UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto';
>>>>>> COMMIT TRANSACTION
>>>>>>
>>>>>> This also has the benefit of resolving ambiguity in case of naming
>>>>>> conflicts with existing (or future) column names.
>>>>>>
>>>>>> --
>>>>>> Abe
>>>>>>
>>>>>
>
>
>

Re: CEP-15 multi key transaction syntax

Reply via email to