Let me just state my bias right up front. For any kind of QL I lean heavily toward verbose and explicit based on their lifecycle. A CQL query will probably need to be understood by the next person looking at it, and a few seconds saved typing isn't worth the potential misunderstanding later. My opinion is formed by having to be the second person many times. :D
I just want to make sure I have the syntax you are proposing. Verbose version: LET (a) = SELECT val FROM table IF a > 1 THEN... Less verbose version: LET a = SELECT val FROM table IF a.val > 1 THEN... Alternatively, for non-queries: LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2 or less verbose: LET x = (someFunc() AS v1, someOtherFunc() as v2) LET (v1, v2) = (someFunc(), someOtherFunc()) Weighing in on the "SELECT without a FROM," I think that is fine and, as Avi stated, already present in the SQL world. I would prefer that over 'SELECT func() FROM dual;' (Looking at you, Oracle) Finally, on the topic of deconstructing SELECT statements instead of nesting. If I understand the argument here, I would favor deconstructing over nesting if there is a choice. I think this is what that choice would look like. Deconstructed: LET x = SELECT x FROM ... LET y = SELECT y FROM ... Nested: LET (x, y) = ((SELECT x FROM…), (SELECT y FROM)) I'm trying to summate but let me know if I missed something. I apologize in advance to Monday morning Caleb, who will have to digest this thread. Patrick On Sun, Aug 14, 2022 at 9:00 AM Benedict Elliott Smith <bened...@apache.org> wrote: > > > I think SQL dialects require subqueries to be parenthesized (not sure). If > that's the case I think we should keep the tradition. > > > This isn’t a sub-query though, since LET is not a query. If we permit at > most one SELECT, and do not permit mixing SELECT with constant assignments, > I don’t see why we would require parentheses. > > I see no harm in making FROM optional, as it's recognized by other SQL > dialects. > Absolutely, this just flows naturally from having tuples. There's no > difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple”. > > > Neither of these things are supported today, and they’re no longer > necessary with this syntax proposal. The downside of splitting SELECT and > LET is that there’s no impetus to improve the former. So the question was > really whether we bother to improve it anyway, not whether or not they > would be good improvements (I think they obviously are). > > I think this can be safely deferred. Most people would again separate it > into separate LETs. > > That implies we’ll permit deconstructing a tuple variable in a LET. This > makes sense to me, but is roughly equivalent to nested deconstruction. It > might be that v1 we only support deconstructing SELECT statements, but I > guess all of this is probably up to the implementor. > > I'd add (to the specification) that LETs cannot override a previously > defined variable, just to reduce ambiguity. > > Yep, this was already agreed way back with the earlier proposal. > > > On 14 Aug 2022, at 16:30, Avi Kivity <a...@scylladb.com> wrote: > > > On 14/08/2022 17.50, Benedict Elliott Smith wrote: > > > > SELECT and LET incompatible once comparisons become valid selectors > > I don’t think this would be ambiguous, as = is required in the LET syntax > as we have to bind the result to a variable name. > > But, I like the deconstructed tuple syntax improvement over “Option 6”. > This would also seem to easily support assigning from non-query statements, > such as LET (a, b) = (someFunc(), someOtherFunc(?)) > > I don’t think it is ideal to depend on relative position in the tuple for > assigning results to a variable name, as it leaves more scope for errors. > It would be nice to have a simple way to deconstruct safely. But, I think > this proposal is good, and I’d be fine with it as an alternative if others > concur. I agree that seeing the SELECT independently may be more easily > recognisable to users. > > With this approach there remains the question of how we handle single > column results. I’d be inclined to treat in the following way: > > LET (a) = SELECT val FROM table > IF a > 1 THEN... > > LET a = SELECT val FROM table > IF a.val > 1 THEN... > > > I think SQL dialects require subqueries to be parenthesized (not sure). If > that's the case I think we should keep the tradition. > > > ---- > There is also the question of whether we support SELECT without a FROM > clause, e.g. > LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2 > > Or just LET (since they are no longer equivalent) > e.g. > LET x = (someFunc() AS v1, someOtherFunc() as v2) > LET (v1, v2) = (someFunc(), someOtherFunc()) > > > I see no harm in making FROM optional, as it's recognized by other SQL > dialects. > > > ---- > Also since LET is only binding variables, is there any reason we shouldn’t > support multiple SELECT assignments in a single LET?, e.g. > LET (x, y) = ((SELECT x FROM…), (SELECT y FROM)) > > > What if an inner select returns a tuple? Would y be a tuple? > > > I think this is redundant and atypical enough to not be worth supporting. > Most people would use separate LETs. > > > ---- > Also whether we support tuples in SELECT statements anyway, e.g. > LET (tuple1, tuple2) = SELECT (a, b), (c, d) FROM.. > IF tuple1.a > 1 AND tuple2.d > 1… > > > Absolutely, this just flows naturally from having tuples. There's no > difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple". > > > > ---- > and whether we support nested deconstruction, e.g. > LET (a, b, (c, d)) = SELECT a, b, someTuple FROM.. > IF a > 1 AND d > 1… > > > I think this can be safely deferred. Most people would again separate it > into separate LETs. > > > I'd add (to the specification) that LETs cannot override a previously > defined variable, just to reduce ambiguity. > > > > > > > > > On 14 Aug 2022, at 13:55, Avi Kivity via dev <dev@cassandra.apache.org> > <dev@cassandra.apache.org> wrote: > > > On 14/08/2022 01.29, Benedict Elliott Smith wrote: > > > I’ll do my best to express with my thinking, as well as how I would > explain the feature to a user. > > My mental model for LET statements is that they are simply SELECT > statements where the columns that are selected become variables accessible > anywhere in the scope of the transaction. That is to say, you should be > able to run something like s/LET/SELECT and s/([^=]+)=([^,]+)(,|$)/\2 AS > \1\3/g on the columns of a LET statement and produce a valid SELECT > statement, and vice versa. Both should perform identically. > > e.g. > SELECT pk AS key, v AS value FROM table > > => > LET key = pk, value = v FROM table > > > "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL > supports selecting comparisons: > > > $ psql > psql (14.3) > Type "help" for help. > > avi=# SELECT 1 = 2, 3 = 3, NULL = NULL; > ?column? | ?column? | ?column? > ----------+----------+---------- > f | t | > (1 row) > > > Using "=" as a syntactic element in LET would make SELECT and LET > incompatible once comparisons become valid selectors. Unless they become > mandatory (and then you'd write "LET q = a = b" if you wanted to select a > comparison). > > > I personally prefer the nested query syntax: > > > LET (a, b, c) = (SELECT foo, bar, x+y FROM ...); > > > So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is > immediately recognizable by everyone as a query, LET is not. > > > > Identical form, identical behaviour. Every statement should be directly > translatable with some simple text manipulation. > > We can then make this more powerful for users by simply expanding SELECT > statements, e.g. by permitting them to declare constants and tuples in the > column results. In this scheme LET x = * is simply syntactic sugar for LET > x = (pk, ck, field1, …) This scheme then supports options 2, 4 and 5 all at > once, consistently alongside each other. > > Option 6 is in fact very similar, but is strictly less flexible for the > user as they have no way to declare multiple scalar variables without > scoping them inside a tuple. > > e.g. > LET key = pk, value = v FROM table > IF key > 1 AND value > 1 THEN... > > => > LET row = SELECT pk AS key, v AS value FROM table > IF row.key > 1 AND row.value > 1 THEN… > > However, both are expressible in the existing proposal, as if you prefer > this naming scheme you can simply write > > LET row = (pk AS key, v AS value) FROM table > IF row.key > 1 AND row.value > 1 THEN… > > With respect to auto converting single column results to a scalar, we do > need a way for the user to say they care whether the row was null or the > column. I think an implicit conversion here could be surprising. However we > could implement tuple expressions anyway and let the user explicitly > declare v as a tuple as Caleb has suggested for the existing proposal as > well. > > Assigning constants or other values not selected from a table would also > be a little clunky: > > LET v1 = someFunc(), v2 = someOtherFunc(?) > IF v1 > 1 AND v2 > 1 THEN… > > => > LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2 > IF row.v1 > 1 AND row.v2 > 1 THEN... > > That said, the proposals are *close* to identical, it is just slightly > more verbose and slightly less flexible. > > Which one would be most intuitive to users is hard to predict. It might be > that Option 6 would be slightly easier, but I’m unsure if there would be a > huge difference. > > > On 13 Aug 2022, at 16:59, Patrick McFadin <pmcfa...@gmail.com> wrote: > > I'm really happy to see CEP-15 getting closer to a final implementation. > I'm going to walk through my reasoning for your proposals wrt trying to > explain this to somebody new. > > Looking at all the options, the first thing that comes up for me is the > Cassandra project's complicated relationship with NULL. We have prior art > with EXISTS/NOT EXISTS when creating new tables. IS NULL/IS NOT NULL is > used in materialized views similarly to proposals 2,4 and 5. > > CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name > AS SELECT [ (column_list) ] > FROM [keyspace_name.]table_name > [ WHERE column_name IS NOT NULL > [ AND column_name IS NOT NULL ... ] ] > [ AND relation [ AND ... ] ] > PRIMARY KEY ( column_list ) > [ WITH [ table_properties ] > [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] ; > > Based on that, I believe 1 and 3 would just confuse users, so -1 on > those. > > Trying to explain the difference between row and column operations with > LET, I can't see the difference between a row and column in #2. > > #4 introduces a boolean instead of column names and just adds more syntax. > > #5 is verbose and, in my opinion, easier to reason when writing a query. > Thinking top down, I need to know if these exact rows and/or column values > exist before changing them, so I'll define them first. Then I'll iterate > over the state I created in my actual changes so I know I'm changing > precisely what I want. > > #5 could use a bit more to be clearer to somebody who doesn't write CQL > queries daily and wouldn't require memorizing subtle differences. It should > be similar to all the other syntax, so learning a little about CQL will let > you move into more without completely re-learning the new syntax. > > So I propose #6) > BEGIN TRANSACTION > LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all > columns > LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0; > SELECT row1, row2 > IF row1 IS NULL AND row2.v = 3 THEN > INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1); > END IF > COMMIT TRANSACTION > > I added the SELECT in the LET just so it's straightforward, you are > reading, and it's just like doing a regular select, but you are assigning > it to a variable. > > I removed the confusing 'row1.v' and replaced it with 'row1' I can't see > why you would need the '.v' vs having the complete variable I created in > the statement above. > > EOL > > Patrick > > On Thu, Aug 11, 2022 at 1:37 PM Caleb Rackliffe <calebrackli...@gmail.com> > wrote: > >> ...and one more option... >> >> 5.) Introduce tuple assignments, removing all ambiguity around row vs. >> column operations. >> >> BEGIN TRANSACTION >> LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all columns >> LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0; >> SELECT row1.v, row2.v >> IF row1 IS NULL AND row2.v = 3 THEN >> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1); >> END IF >> COMMIT TRANSACTION >> >> >> >> On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe < >> calebrackli...@gmail.com> wrote: >> >>> via Benedict, here is a 4th option: >>> >>> 4.) Similar to #2, but don't rely on the key element being NULL. >>> >>> If the read returns no result, x effectively becomes NULL. Otherwise, it >>> remains true/NOT NULL. >>> >>> BEGIN TRANSACTION >>> LET x = true FROM ks.tbl WHERE k=0 AND c=0; >>> LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0; >>> SELECT x, row2_v >>> IF x IS NULL AND row2_v = 3 THEN >>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1); >>> END IF >>> COMMIT TRANSACTION >>> >>> On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe < >>> calebrackli...@gmail.com> wrote: >>> >>>> Hello again everyone! >>>> >>>> I've been working on a prototype >>>> <https://issues.apache.org/jira/browse/CASSANDRA-17719> in >>>> CASSANDRA-17719 for a grammar that roughly corresponds to what we've agreed >>>> on in this thread. One thing that isn't immediately obvious to me is how >>>> the LET syntax handles cases where we want to check for the plain existence >>>> of a row in IF. For example, in this hybrid of the originally proposed >>>> syntax and something more like what we've agreed on (and the RETURNING just >>>> to distinguish between that and SELECT), this could be pretty >>>> straightforward: >>>> >>>> BEGIN TRANSACTION >>>> SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1; >>>> SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2; >>>> RETURNING row1.v, row2.v >>>> IF row1 NOT EXISTS AND row2.v = 3 THEN >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1); >>>> END IF >>>> COMMIT TRANSACTION >>>> >>>> The NOT EXISTS operator has row1 to work with. One the other hand, w/ >>>> the LET syntax and no naming of reads, it's not clear what the best >>>> solution would be. Here are a few possibilities: >>>> >>>> 1.) Provide a few built-in functions that operate on a whole result >>>> row. If we assume a SQL style IS NULL and IS NOT NULL (see my last post >>>> here) for operations on particular columns, this probably eliminates the >>>> need for EXISTS/NOT EXISTS as well. >>>> >>>> BEGIN TRANSACTION >>>> LET row1_missing = notExists() FROM ks.tbl WHERE k=0 AND c=0; >>>> LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0; >>>> SELECT row1_missing, row2_v >>>> IF row1_missing AND row2_v = 3 THEN >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1); >>>> END IF >>>> COMMIT TRANSACTION >>>> >>>> 2.) Assign and check the first primary key element to determine whether >>>> the row exists. >>>> >>>> BEGIN TRANSACTION >>>> LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0; >>>> LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0; >>>> SELECT row1_k, row2_v >>>> IF row1_k IS NULL AND row2_v = 3 THEN >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1); >>>> END IF >>>> COMMIT TRANSACTION >>>> >>>> 3.) Reconsider the LET concept toward something that allows us to >>>> explicitly name our reads again. >>>> >>>> BEGIN TRANSACTION >>>> WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0) AS row1; >>>> WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0) AS row2; >>>> SELECT row1.v, row2.v >>>> IF row1 NOT EXISTS AND row2.v = 3 THEN >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1); >>>> END IF >>>> COMMIT TRANSACTION >>>> >>>> I don't have a strong affinity for any of these, although #1 seems the >>>> most awkward. >>>> >>>> Does anyone have any other alternatives? Preference for one of the >>>> above options? >>>> >>>> Thanks! >>>> >>>> On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe < >>>> calebrackli...@gmail.com> wrote: >>>> >>>>> Avi brought up an interesting point around NULLness checking in >>>>> CASSANDRA-17762 >>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17762>... >>>>> >>>>> In SQL, any comparison with NULL is NULL, which is interpreted as >>>>>> FALSE in a condition. To test for NULLness, you use IS NULL or IS NOT >>>>>> NULL. >>>>>> But LWT uses IF col = NULL as a NULLness test. This is likely to confuse >>>>>> people coming from SQL and hamper attempts to extend the dialect. >>>>> >>>>> >>>>> We can leave that Jira open to address what to do in the legacy LWT >>>>> case, but I'd support a SQL-congruent syntax here (IS NULL or IS NOT >>>>> NULL), where we have something closer to a blank slate. >>>>> >>>>> Thoughts? >>>>> >>>>> On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky <a...@aber.io> wrote: >>>>> >>>>>> The new syntax looks great, and I’m really excited to see this coming >>>>>> together. >>>>>> >>>>>> One piece of feedback on the proposed syntax is around the use of “=“ >>>>>> as a declaration in addition to its current use as an equality operator >>>>>> in >>>>>> a WHERE clause and an assignment operator in an UPDATE: >>>>>> >>>>>> BEGIN TRANSACTION >>>>>> LET car_miles = miles_driven, car_is_running = is_running FROM cars >>>>>> WHERE model=’pinto’ >>>>>> LET user_miles = miles_driven FROM users WHERE name=’blake’ >>>>>> SELECT something else from some other table >>>>>> IF NOT car_is_running THEN ABORT >>>>>> UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake'; >>>>>> UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto'; >>>>>> COMMIT TRANSACTION >>>>>> >>>>>> This is supported in languages like PL/pgSQL, but in a normal SQL >>>>>> query kind of local declaration is often expressed as an alias (SELECT >>>>>> col >>>>>> AS new_col), subquery alias (SELECT col) t, or common table expression >>>>>> (WITH t AS (SELECT col)). >>>>>> >>>>>> Here’s an example of an alternative to the proposed syntax that I’d >>>>>> find more readable: >>>>>> >>>>>> BEGIN TRANSACTION >>>>>> WITH car_miles, car_is_running AS (SELECT miles_driven, is_running >>>>>> FROM cars WHERE model=’pinto’), >>>>>> user_miles AS (SELECT miles_driven FROM users WHERE name=’blake’) >>>>>> IF NOT car_is_running THEN ABORT >>>>>> UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake'; >>>>>> UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto'; >>>>>> COMMIT TRANSACTION >>>>>> >>>>>> There’s also the option of naming the transaction like a subquery, >>>>>> and supporting LET via AS (this one I’m less sure about but wanted to >>>>>> propose anyway): >>>>>> >>>>>> BEGIN TRANSACTION t1 >>>>>> SELECT miles_driven AS t1.car_miles, is_running AS >>>>>> t1.car_is_running FROM cars WHERE model=’pinto’; >>>>>> SELECT miles_driven AS t1.user_miles FROM users WHERE name=’blake’; >>>>>> IF NOT car_is_running THEN ABORT >>>>>> UPDATE users SET miles_driven = user_miles + 30 WHERE name='blake'; >>>>>> UPDATE cars SET miles_driven = car_miles + 30 WHERE model='pinto'; >>>>>> COMMIT TRANSACTION >>>>>> >>>>>> This also has the benefit of resolving ambiguity in case of naming >>>>>> conflicts with existing (or future) column names. >>>>>> >>>>>> -- >>>>>> Abe >>>>>> >>>>> > > >