> On 21 Aug 2022, at 14:59, Benedict <bened...@apache.org> wrote:
>
> SELECT INTO in T-SQL creates a new table with the results. Since our
> semantics are likely to be different than Postgres and MySQL, I’m not sure
> it’s less confusing or otherwise beneficial to mimic an existing syntax.
>
> Personally I find the LET syntax easier to read, and where ANSI SQL isn’t
> prescriptive it may be better to aim for a more modern look.
>
>
>
>> On 21 Aug 2022, at 14:53, Konstantin Osipov <kostja.osi...@gmail.com> wrote:
>>
>> * Avi Kivity via dev <dev@cassandra.apache.org> [22/08/14 15:59]:
>>
>> MySQL supports SELECT <expr_list> INTO <var_list> FROM ... WHERE
>> ...
>>
>> PostgreSQL supports pretty much the same syntax.
>>
>> Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and
>> MySQL/PostgreSQL SELECT ... INTO?
>>
>>>
>>>>> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>>>>>
>>>>> I’ll do my best to express with my thinking, as well as how I would
>>>>> explain the feature to a user.
>>>>>
>>>>> My mental model for LET statements is that they are simply SELECT
>>>>> statements where the columns that are selected become variables
>>>>> accessible anywhere in the scope of the transaction. That is to say, you
>>>>> should be able to run something like s/LET/SELECT and
>>>>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement
>>>>> and produce a valid SELECT statement, and vice versa. Both should
>>>>> perform identically.
>>>>>
>>>>> e.g.
>>>>> SELECT pk AS key, v AS value FROM table
>>>>>
>>>>> =>
>>>>> LET key = pk, value = v FROM table
>>>
>>>
>>> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL
>>> supports selecting comparisons:
>>>
>>>
>>> $ psql
>>> psql (14.3)
>>> Type "help" for help.
>>>
>>> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>>> ?column? | ?column? | ?column?
>>> ----------+----------+----------
>>> f | t |
>>> (1 row)
>>>
>>>
>>> Using "=" as a syntactic element in LET would make SELECT and LET
>>> incompatible once comparisons become valid selectors. Unless they become
>>> mandatory (and then you'd write "LET q = a = b" if you wanted to select a
>>> comparison).
>>>
>>>
>>> I personally prefer the nested query syntax:
>>>
>>>
>>> LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>>>
>>>
>>> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is
>>> immediately recognizable by everyone as a query, LET is not.
>>>
>>>
>>>>
>>>> Identical form, identical behaviour. Every statement should be directly
>>>> translatable with some simple text manipulation.
>>>>
>>>> We can then make this more powerful for users by simply expanding SELECT
>>>> statements, e.g. by permitting them to declare constants and tuples in
>>>> the column results. In this scheme LET x = * is simply syntactic sugar
>>>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4
>>>> and 5 all at once, consistently alongside each other.
>>>>
>>>> Option 6 is in fact very similar, but is strictly less flexible for the
>>>> user as they have no way to declare multiple scalar variables without
>>>> scoping them inside a tuple.
>>>>
>>>> e.g.
>>>> LET key = pk, value = v FROM table
>>>> IF key > 1 AND value > 1 THEN...
>>>>
>>>> =>
>>>> LET row = SELECT pk AS key, v AS value FROM table
>>>> IF row.key > 1 AND row.value > 1 THEN…
>>>>
>>>> However, both are expressible in the existing proposal, as if you prefer
>>>> this naming scheme you can simply write
>>>>
>>>> LET row = (pk AS key, v AS value) FROM table
>>>> IF row.key > 1 AND row.value > 1 THEN…
>>>>
>>>> With respect to auto converting single column results to a scalar, we do
>>>> need a way for the user to say they care whether the row was null or the
>>>> column. I think an implicit conversion here could be surprising. However
>>>> we could implement tuple expressions anyway and let the user explicitly
>>>> declare v as a tuple as Caleb has suggested for the existing proposal as
>>>> well.
>>>>
>>>> Assigning constants or other values not selected from a table would also
>>>> be a little clunky:
>>>>
>>>> LET v1 = someFunc(), v2 = someOtherFunc(?)
>>>> IF v1 > 1 AND v2 > 1 THEN…
>>>>
>>>> =>
>>>> LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
>>>> IF row.v1 > 1 AND row.v2 > 1 THEN...
>>>>
>>>> That said, the proposals are /close/ to identical, it is just slightly
>>>> more verbose and slightly less flexible.
>>>>
>>>> Which one would be most intuitive to users is hard to predict. It might
>>>> be that Option 6 would be slightly easier, but I’m unsure if there would
>>>> be a huge difference.
>>>>
>>>>
>>>>> On 13 Aug 2022, at 16:59, Patrick McFadin <pmcfa...@gmail.com> wrote:
>>>>>
>>>>> I'm really happy to see CEP-15 getting closer to a final
>>>>> implementation. I'm going to walk through my reasoning for your
>>>>> proposals wrt trying to explain this to somebody new.
>>>>>
>>>>> Looking at all the options, the first thing that comes up for me is
>>>>> the Cassandra project's complicated relationship with NULL. We have
>>>>> prior art with EXISTS/NOT EXISTS when creating new tables. IS
>>>>> NULL/IS NOT NULL is used in materialized views similarly to
>>>>> proposals 2,4 and 5.
>>>>>
>>>>> CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
>>>>> AS SELECT [ (column_list) ]
>>>>> FROM [keyspace_name.]table_name
>>>>> [ WHERE column_name IS NOT NULL
>>>>> [ AND column_name IS NOT NULL ... ] ]
>>>>> [ AND relation [ AND ... ] ]
>>>>> PRIMARY KEY ( column_list )
>>>>> [ WITH [ table_properties ]
>>>>> [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ;
>>>>>
>>>>> Based on that, I believe 1 and 3 would just confuse users, so -1 on
>>>>> those.
>>>>>
>>>>> Trying to explain the difference between row and column operations
>>>>> with LET, I can't see the difference between a row and column in #2.
>>>>>
>>>>> #4 introduces a boolean instead of column names and just adds more
>>>>> syntax.
>>>>>
>>>>> #5 is verbose and, in my opinion, easier to reason when writing a
>>>>> query. Thinking top down, I need to know if these exact rows and/or
>>>>> column values exist before changing them, so I'll define them first.
>>>>> Then I'll iterate over the state I created in my actual changes so I
>>>>> know I'm changing precisely what I want.
>>>>>
>>>>> #5 could use a bit more to be clearer to somebody who doesn't write
>>>>> CQL queries daily and wouldn't require memorizing subtle
>>>>> differences. It should be similar to all the other syntax, so
>>>>> learning a little about CQL will let you move into more without
>>>>> completely re-learning the new syntax.
>>>>>
>>>>> So I propose #6)
>>>>> BEGIN TRANSACTION
>>>>> LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
>>>>> columns
>>>>> LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0;
>>>>> SELECT row1, row2
>>>>> IF row1 IS NULL AND row2.v = 3 THEN
>>>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>> END IF
>>>>> COMMIT TRANSACTION
>>>>>
>>>>> I added the SELECT in the LET just so it's straightforward, you are
>>>>> reading, and it's just like doing a regular select, but you are
>>>>> assigning it to a variable.
>>>>>
>>>>> I removed the confusing 'row1.v'and replaced it with 'row1'I can't
>>>>> see why you would need the '.v'vs having the complete variable I
>>>>> created in the statement above.
>>>>>
>>>>> EOL
>>>>>
>>>>> Patrick
>>>>>
>>>>> On Thu, Aug 11, 2022 at 1:37 PM Caleb Rackliffe
>>>>> <calebrackli...@gmail.com> wrote:
>>>>>
>>>>> ...and one more option...
>>>>>
>>>>> 5.) Introduce tuple assignments, removing all ambiguity around
>>>>> row vs. column operations.
>>>>>
>>>>> BEGIN TRANSACTION
>>>>> LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all
>>>>> columns
>>>>> LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0;
>>>>> SELECT row1.v, row2.v
>>>>> IF row1 IS NULL AND row2.v = 3 THEN
>>>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>> END IF
>>>>> COMMIT TRANSACTION
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe
>>>>> <calebrackli...@gmail.com> wrote:
>>>>>
>>>>> via Benedict, here is a 4th option:
>>>>>
>>>>> 4.) Similar to #2, but don't rely on the key element being NULL.
>>>>>
>>>>> If the read returns no result, x effectively becomes NULL.
>>>>> Otherwise, it remains true/NOT NULL.
>>>>>
>>>>> BEGIN TRANSACTION
>>>>> LET x = true FROM ks.tbl WHERE k=0 AND c=0;
>>>>> LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>>>> SELECT x, row2_v
>>>>> IF x IS NULL AND row2_v = 3 THEN
>>>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>> END IF
>>>>> COMMIT TRANSACTION
>>>>>
>>>>> On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe
>>>>> <calebrackli...@gmail.com> wrote:
>>>>>
>>>>> Hello again everyone!
>>>>>
>>>>> I've been working on a prototype
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17719> in
>>>>> CASSANDRA-17719 for a grammar that roughly corresponds to
>>>>> what we've agreed on in this thread. One thing that isn't
>>>>> immediately obvious to me is how the LET syntax handles
>>>>> cases where we want to check for the plain existence of a
>>>>> row in IF. For example, in this hybrid of the originally
>>>>> proposed syntax and something more like what we've agreed
>>>>> on (and the RETURNING just to distinguish between that
>>>>> and SELECT), this could be pretty straightforward:
>>>>>
>>>>> BEGIN TRANSACTION
>>>>> SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
>>>>> SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
>>>>> RETURNING row1.v, row2.v
>>>>> IF row1 NOT EXISTS AND row2.v = 3 THEN
>>>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>> END IF
>>>>> COMMIT TRANSACTION
>>>>>
>>>>> The NOT EXISTS operator has row1 to work with. One the
>>>>> other hand, w/ the LET syntax and no naming of reads,
>>>>> it's not clear what the best solution would be. Here are
>>>>> a few possibilities:
>>>>>
>>>>> 1.) Provide a few built-in functions that operate on a
>>>>> whole result row. If we assume a SQL style IS NULL and IS
>>>>> NOT NULL (see my last post here) for operations on
>>>>> particular columns, this probably eliminates the need for
>>>>> EXISTS/NOT EXISTS as well.
>>>>>
>>>>> BEGIN TRANSACTION
>>>>> LET row1_missing = notExists() FROM ks.tbl WHERE k=0
>>>>> AND c=0;
>>>>> LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>>>> SELECT row1_missing, row2_v
>>>>> IF row1_missing AND row2_v = 3 THEN
>>>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>> END IF
>>>>> COMMIT TRANSACTION
>>>>>
>>>>> 2.) Assign and check the first primary key element to
>>>>> determine whether the row exists.
>>>>>
>>>>> BEGIN TRANSACTION
>>>>> LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0;
>>>>> LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>>>> SELECT row1_k, row2_v
>>>>> IF row1_k IS NULL AND row2_v = 3 THEN
>>>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>> END IF
>>>>> COMMIT TRANSACTION
>>>>>
>>>>> 3.) Reconsider the LET concept toward something that
>>>>> allows us to explicitly name our reads again.
>>>>>
>>>>> BEGIN TRANSACTION
>>>>> WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0) AS row1;
>>>>> WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0) AS row2;
>>>>> SELECT row1.v, row2.v
>>>>> IF row1 NOT EXISTS AND row2.v = 3 THEN
>>>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>> END IF
>>>>> COMMIT TRANSACTION
>>>>>
>>>>> I don't have a strong affinity for any of these, although
>>>>> #1 seems the most awkward.
>>>>>
>>>>> Does anyone have any other alternatives? Preference for
>>>>> one of the above options?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe
>>>>> <calebrackli...@gmail.com> wrote:
>>>>>
>>>>> Avi brought up an interesting point around NULLness
>>>>> checking inCASSANDRA-17762
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17762>...
>>>>>
>>>>> In SQL, any comparison with NULL is NULL, which
>>>>> is interpreted as FALSE in a condition. To test
>>>>> for NULLness, you use IS NULL or IS NOT NULL. But
>>>>> LWT uses IF col = NULL as a NULLness test. This
>>>>> is likely to confuse people coming from SQL and
>>>>> hamper attempts to extend the dialect.
>>>>>
>>>>>
>>>>> We can leave that Jira open to address what to do in
>>>>> the legacy LWT case, but I'd support a SQL-congruent
>>>>> syntax here (IS NULL or IS NOT NULL), where we have
>>>>> something closer to a blank slate.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky
>>>>> <a...@aber.io> wrote:
>>>>>
>>>>> The new syntax looks great, and I’m really
>>>>> excited to see this coming together.
>>>>>
>>>>> One piece of feedback on the proposed syntax is
>>>>> around the use of “=“ as a declaration in
>>>>> addition to its current use as an equality
>>>>> operator in a WHERE clause and an assignment
>>>>> operator in an UPDATE:
>>>>>
>>>>> BEGIN TRANSACTION
>>>>> LET car_miles = miles_driven,
>>>>> car_is_running = is_running FROM cars WHERE
>>>>> model=’pinto’
>>>>> LET user_miles = miles_driven FROM users
>>>>> WHERE name=’blake’
>>>>> SELECT something else from some other table
>>>>> IF NOT car_is_running THEN ABORT
>>>>> UPDATE users SET miles_driven = user_miles
>>>>> + 30 WHERE name='blake';
>>>>> UPDATE cars SET miles_driven = car_miles +
>>>>> 30 WHERE model='pinto';
>>>>> COMMIT TRANSACTION
>>>>>
>>>>> This is supported in languages like PL/pgSQL, but
>>>>> in a normal SQL query kind of local declaration
>>>>> is often expressed as an alias (SELECT col AS
>>>>> new_col), subquery alias (SELECT col) t, or
>>>>> common table expression (WITH t AS (SELECT col)).
>>>>>
>>>>> Here’s an example of an alternative to the
>>>>> proposed syntax that I’d find more readable:
>>>>>
>>>>> BEGIN TRANSACTION
>>>>> WITH car_miles, car_is_running AS (SELECT
>>>>> miles_driven, is_running FROM cars WHERE
>>>>> model=’pinto’),
>>>>> user_miles AS (SELECT miles_driven FROM users
>>>>> WHERE name=’blake’)
>>>>> IF NOT car_is_running THEN ABORT
>>>>> UPDATE users SET miles_driven = user_miles
>>>>> + 30 WHERE name='blake';
>>>>> UPDATE cars SET miles_driven = car_miles + 30
>>>>> WHERE model='pinto';
>>>>> COMMIT TRANSACTION
>>>>>
>>>>> There’s also the option of naming the transaction
>>>>> like a subquery, and supporting LET via AS (this
>>>>> one I’m less sure about but wanted to propose
>>>>> anyway):
>>>>>
>>>>> BEGIN TRANSACTION t1
>>>>> SELECT miles_driven AS t1.car_miles,
>>>>> is_running AS t1.car_is_running FROM cars
>>>>> WHERE model=’pinto’;
>>>>> SELECT miles_driven AS t1.user_miles FROM
>>>>> users WHERE name=’blake’;
>>>>> IF NOT car_is_running THEN ABORT
>>>>> UPDATE users SET miles_driven = user_miles
>>>>> + 30 WHERE name='blake';
>>>>> UPDATE cars SET miles_driven = car_miles +
>>>>> 30 WHERE model='pinto';
>>>>> COMMIT TRANSACTION
>>>>>
>>>>> This also has the benefit of resolving ambiguity
>>>>> in case of naming conflicts with existing (or
>>>>> future) column names.
>>>>>
>>>>> --
>>>>> Abe
>>>>>
>>>>
>>
>> --
>> Konstantin Osipov, Moscow, Russia