Caleb is making great progress on this, and I have been working on CQL fuzz testing the new grammar to make sure we flesh out cases quickly; one thing we hit was about mixing conditional and non-conditional updates; will use a example to better show
BEGIN TRANSACTION LET a = (SELECT * FROM ….); IF a IS NOT NULL THEN UPDATE …; END IF INSERT INTO ... COMMIT TRANSACTION In this case we have 1 UPDATE tied to the IF condition, and one INSERT that isn’t… for v1 do we need/want to support this, or is it best for v1 to be simple and have all updates tied to conditional when present? > On Aug 22, 2022, at 9:19 AM, Avi Kivity via dev <dev@cassandra.apache.org> > wrote: > > I wasn't referring to specific syntax but to the concept. If a SQL dialect > (or better, the standard) has a way to select data into a variable, let's > adopt it. > > If such syntax doesn't exist, LET (a, b, c) = (SELECT x, y, z FROM tab) is my > preference. > > On 8/22/22 19:13, Patrick McFadin wrote: >> The replies got trashed pretty badly in the responses. >> When you say: "Agree it's better to reuse existing syntax than invent new >> syntax." >> >> Which syntax are you referring to? >> >> Patrick >> >> >> On Mon, Aug 22, 2022 at 1:36 AM Avi Kivity via dev <dev@cassandra.apache.org >> <mailto:dev@cassandra.apache.org>> wrote: >> Agree it's better to reuse existing syntax than invent new syntax. >> >> On 8/21/22 16:52, Konstantin Osipov wrote: >> > * Avi Kivity via dev <dev@cassandra.apache.org >> > <mailto:dev@cassandra.apache.org>> [22/08/14 15:59]: >> > >> > MySQL supports SELECT <expr_list> INTO <var_list> FROM ... WHERE >> > ... >> > >> > PostgreSQL supports pretty much the same syntax. >> > >> > Maybe instead of LET use the ANSI/MySQL/PostgreSQL DECLARE var TYPE and >> > MySQL/PostgreSQL SELECT ... INTO? >> > >> >> On 14/08/2022 01.29, Benedict Elliott Smith wrote: >> >>> >> >>> I’ll do my best to express with my thinking, as well as how I would >> >>> explain the feature to a user. >> >>> >> >>> My mental model for LET statements is that they are simply SELECT >> >>> statements where the columns that are selected become variables >> >>> accessible anywhere in the scope of the transaction. That is to say, you >> >>> should be able to run something like s/LET/SELECT and >> >>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement >> >>> and produce a valid SELECT statement, and vice versa. Both should >> >>> perform identically. >> >>> >> >>> e.g. >> >>> SELECT pk AS key, v AS value FROM table >> >>> >> >>> => >> >>> LET key = pk, value = v FROM table >> >> >> >> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL >> >> supports selecting comparisons: >> >> >> >> >> >> $ psql >> >> psql (14.3) >> >> Type "help" for help. >> >> >> >> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL; >> >> ?column? | ?column? | ?column? >> >> ----------+----------+---------- >> >> f | t | >> >> (1 row) >> >> >> >> >> >> Using "=" as a syntactic element in LET would make SELECT and LET >> >> incompatible once comparisons become valid selectors. Unless they become >> >> mandatory (and then you'd write "LET q = a = b" if you wanted to select a >> >> comparison). >> >> >> >> >> >> I personally prefer the nested query syntax: >> >> >> >> >> >> LET (a, b, c) = (SELECT foo, bar, x+y FROM ...); >> >> >> >> >> >> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is >> >> immediately recognizable by everyone as a query, LET is not. >> >> >> >> >> >>> Identical form, identical behaviour. Every statement should be directly >> >>> translatable with some simple text manipulation. >> >>> >> >>> We can then make this more powerful for users by simply expanding SELECT >> >>> statements, e.g. by permitting them to declare constants and tuples in >> >>> the column results. In this scheme LET x = * is simply syntactic sugar >> >>> for LET x = (pk, ck, field1, …) This scheme then supports options 2, 4 >> >>> and 5 all at once, consistently alongside each other. >> >>> >> >>> Option 6 is in fact very similar, but is strictly less flexible for the >> >>> user as they have no way to declare multiple scalar variables without >> >>> scoping them inside a tuple. >> >>> >> >>> e.g. >> >>> LET key = pk, value = v FROM table >> >>> IF key > 1 AND value > 1 THEN... >> >>> >> >>> => >> >>> LET row = SELECT pk AS key, v AS value FROM table >> >>> IF row.key > 1 AND row.value > 1 THEN… >> >>> >> >>> However, both are expressible in the existing proposal, as if you prefer >> >>> this naming scheme you can simply write >> >>> >> >>> LET row = (pk AS key, v AS value) FROM table >> >>> IF row.key > 1 AND row.value > 1 THEN… >> >>> >> >>> With respect to auto converting single column results to a scalar, we do >> >>> need a way for the user to say they care whether the row was null or the >> >>> column. I think an implicit conversion here could be surprising. However >> >>> we could implement tuple expressions anyway and let the user explicitly >> >>> declare v as a tuple as Caleb has suggested for the existing proposal as >> >>> well. >> >>> >> >>> Assigning constants or other values not selected from a table would also >> >>> be a little clunky: >> >>> >> >>> LET v1 = someFunc(), v2 = someOtherFunc(?) >> >>> IF v1 > 1 AND v2 > 1 THEN… >> >>> >> >>> => >> >>> LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2 >> >>> IF row.v1 > 1 AND row.v2 > 1 THEN... >> >>> >> >>> That said, the proposals are /close/ to identical, it is just slightly >> >>> more verbose and slightly less flexible. >> >>> >> >>> Which one would be most intuitive to users is hard to predict. It might >> >>> be that Option 6 would be slightly easier, but I’m unsure if there would >> >>> be a huge difference. >> >>> >> >>> >> >>>> On 13 Aug 2022, at 16:59, Patrick McFadin <pmcfa...@gmail.com >> >>>> <mailto:pmcfa...@gmail.com>> wrote: >> >>>> >> >>>> I'm really happy to see CEP-15 getting closer to a final >> >>>> implementation. I'm going to walk through my reasoning for your >> >>>> proposals wrt trying to explain this to somebody new. >> >>>> >> >>>> Looking at all the options, the first thing that comes up for me is >> >>>> the Cassandra project's complicated relationship with NULL. We have >> >>>> prior art with EXISTS/NOT EXISTS when creating new tables. IS >> >>>> NULL/IS NOT NULL is used in materialized views similarly to >> >>>> proposals 2,4 and 5. >> >>>> >> >>>> CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name >> >>>> AS SELECT [ (column_list) ] >> >>>> FROM [keyspace_name.]table_name >> >>>> [ WHERE column_name IS NOT NULL >> >>>> [ AND column_name IS NOT NULL ... ] ] >> >>>> [ AND relation [ AND ... ] ] >> >>>> PRIMARY KEY ( column_list ) >> >>>> [ WITH [ table_properties ] >> >>>> [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] ] >> >>>> ; >> >>>> >> >>>> Based on that, I believe 1 and 3 would just confuse users, so -1 on >> >>>> those. >> >>>> >> >>>> Trying to explain the difference between row and column operations >> >>>> with LET, I can't see the difference between a row and column in #2. >> >>>> >> >>>> #4 introduces a boolean instead of column names and just adds more >> >>>> syntax. >> >>>> >> >>>> #5 is verbose and, in my opinion, easier to reason when writing a >> >>>> query. Thinking top down, I need to know if these exact rows and/or >> >>>> column values exist before changing them, so I'll define them first. >> >>>> Then I'll iterate over the state I created in my actual changes so I >> >>>> know I'm changing precisely what I want. >> >>>> >> >>>> #5 could use a bit more to be clearer to somebody who doesn't write >> >>>> CQL queries daily and wouldn't require memorizing subtle >> >>>> differences. It should be similar to all the other syntax, so >> >>>> learning a little about CQL will let you move into more without >> >>>> completely re-learning the new syntax. >> >>>> >> >>>> So I propose #6) >> >>>> BEGIN TRANSACTION >> >>>> LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all >> >>>> columns >> >>>> LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0; >> >>>> SELECT row1, row2 >> >>>> IF row1 IS NULL AND row2.v = 3 THEN >> >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1); >> >>>> END IF >> >>>> COMMIT TRANSACTION >> >>>> >> >>>> I added the SELECT in the LET just so it's straightforward, you are >> >>>> reading, and it's just like doing a regular select, but you are >> >>>> assigning it to a variable. >> >>>> >> >>>> I removed the confusing 'row1.v'and replaced it with 'row1'I can't >> >>>> see why you would need the '.v'vs having the complete variable I >> >>>> created in the statement above. >> >>>> >> >>>> EOL >> >>>> >> >>>> Patrick >> >>>> >> >>>> On Thu, Aug 11, 2022 at 1:37 PM Caleb Rackliffe >> >>>> <calebrackli...@gmail.com <mailto:calebrackli...@gmail.com>> wrote: >> >>>> >> >>>> ...and one more option... >> >>>> >> >>>> 5.) Introduce tuple assignments, removing all ambiguity around >> >>>> row vs. column operations. >> >>>> >> >>>> BEGIN TRANSACTION >> >>>> LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all >> >>>> columns >> >>>> LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0; >> >>>> SELECT row1.v, row2.v >> >>>> IF row1 IS NULL AND row2.v = 3 THEN >> >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1); >> >>>> END IF >> >>>> COMMIT TRANSACTION >> >>>> >> >>>> >> >>>> >> >>>> On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe >> >>>> <calebrackli...@gmail.com <mailto:calebrackli...@gmail.com>> wrote: >> >>>> >> >>>> via Benedict, here is a 4th option: >> >>>> >> >>>> 4.) Similar to #2, but don't rely on the key element being >> >>>> NULL. >> >>>> >> >>>> If the read returns no result, x effectively becomes NULL. >> >>>> Otherwise, it remains true/NOT NULL. >> >>>> >> >>>> BEGIN TRANSACTION >> >>>> LET x = true FROM ks.tbl WHERE k=0 AND c=0; >> >>>> LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0; >> >>>> SELECT x, row2_v >> >>>> IF x IS NULL AND row2_v = 3 THEN >> >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1); >> >>>> END IF >> >>>> COMMIT TRANSACTION >> >>>> >> >>>> On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe >> >>>> <calebrackli...@gmail.com <mailto:calebrackli...@gmail.com>> >> >>>> wrote: >> >>>> >> >>>> Hello again everyone! >> >>>> >> >>>> I've been working on a prototype >> >>>> <https://issues.apache.org/jira/browse/CASSANDRA-17719 >> >>>> <https://issues.apache.org/jira/browse/CASSANDRA-17719>> in >> >>>> CASSANDRA-17719 for a grammar that roughly corresponds to >> >>>> what we've agreed on in this thread. One thing that isn't >> >>>> immediately obvious to me is how the LET syntax handles >> >>>> cases where we want to check for the plain existence of a >> >>>> row in IF. For example, in this hybrid of the originally >> >>>> proposed syntax and something more like what we've agreed >> >>>> on (and the RETURNING just to distinguish between that >> >>>> and SELECT), this could be pretty straightforward: >> >>>> >> >>>> BEGIN TRANSACTION >> >>>> SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1; >> >>>> SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2; >> >>>> RETURNING row1.v, row2.v >> >>>> IF row1 NOT EXISTS AND row2.v = 3 THEN >> >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1); >> >>>> END IF >> >>>> COMMIT TRANSACTION >> >>>> >> >>>> The NOT EXISTS operator has row1 to work with. One the >> >>>> other hand, w/ the LET syntax and no naming of reads, >> >>>> it's not clear what the best solution would be. Here are >> >>>> a few possibilities: >> >>>> >> >>>> 1.) Provide a few built-in functions that operate on a >> >>>> whole result row. If we assume a SQL style IS NULL and IS >> >>>> NOT NULL (see my last post here) for operations on >> >>>> particular columns, this probably eliminates the need for >> >>>> EXISTS/NOT EXISTS as well. >> >>>> >> >>>> BEGIN TRANSACTION >> >>>> LET row1_missing = notExists() FROM ks.tbl WHERE k=0 >> >>>> AND c=0; >> >>>> LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0; >> >>>> SELECT row1_missing, row2_v >> >>>> IF row1_missing AND row2_v = 3 THEN >> >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1); >> >>>> END IF >> >>>> COMMIT TRANSACTION >> >>>> >> >>>> 2.) Assign and check the first primary key element to >> >>>> determine whether the row exists. >> >>>> >> >>>> BEGIN TRANSACTION >> >>>> LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0; >> >>>> LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0; >> >>>> SELECT row1_k, row2_v >> >>>> IF row1_k IS NULL AND row2_v = 3 THEN >> >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1); >> >>>> END IF >> >>>> COMMIT TRANSACTION >> >>>> >> >>>> 3.) Reconsider the LET concept toward something that >> >>>> allows us to explicitly name our reads again. >> >>>> >> >>>> BEGIN TRANSACTION >> >>>> WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0) AS row1; >> >>>> WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0) AS row2; >> >>>> SELECT row1.v, row2.v >> >>>> IF row1 NOT EXISTS AND row2.v = 3 THEN >> >>>> INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1); >> >>>> END IF >> >>>> COMMIT TRANSACTION >> >>>> >> >>>> I don't have a strong affinity for any of these, although >> >>>> #1 seems the most awkward. >> >>>> >> >>>> Does anyone have any other alternatives? Preference for >> >>>> one of the above options? >> >>>> >> >>>> Thanks! >> >>>> >> >>>> On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe >> >>>> <calebrackli...@gmail.com >> >>>> <mailto:calebrackli...@gmail.com>> wrote: >> >>>> >> >>>> Avi brought up an interesting point around NULLness >> >>>> checking inCASSANDRA-17762 >> >>>> <https://issues.apache.org/jira/browse/CASSANDRA-17762 >> >>>> <https://issues.apache.org/jira/browse/CASSANDRA-17762>>... >> >>>> >> >>>> In SQL, any comparison with NULL is NULL, which >> >>>> is interpreted as FALSE in a condition. To test >> >>>> for NULLness, you use IS NULL or IS NOT NULL. But >> >>>> LWT uses IF col = NULL as a NULLness test. This >> >>>> is likely to confuse people coming from SQL and >> >>>> hamper attempts to extend the dialect. >> >>>> >> >>>> >> >>>> We can leave that Jira open to address what to do in >> >>>> the legacy LWT case, but I'd support a SQL-congruent >> >>>> syntax here (IS NULL or IS NOT NULL), where we have >> >>>> something closer to a blank slate. >> >>>> >> >>>> Thoughts? >> >>>> >> >>>> On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky >> >>>> <a...@aber.io <mailto:a...@aber.io>> wrote: >> >>>> >> >>>> The new syntax looks great, and I’m really >> >>>> excited to see this coming together. >> >>>> >> >>>> One piece of feedback on the proposed syntax is >> >>>> around the use of “=“ as a declaration in >> >>>> addition to its current use as an equality >> >>>> operator in a WHERE clause and an assignment >> >>>> operator in an UPDATE: >> >>>> >> >>>> BEGIN TRANSACTION >> >>>> LET car_miles = miles_driven, >> >>>> car_is_running = is_running FROM cars WHERE >> >>>> model=’pinto’ >> >>>> LET user_miles = miles_driven FROM users >> >>>> WHERE name=’blake’ >> >>>> SELECT something else from some other table >> >>>> IF NOT car_is_running THEN ABORT >> >>>> UPDATE users SET miles_driven = user_miles >> >>>> + 30 WHERE name='blake'; >> >>>> UPDATE cars SET miles_driven = car_miles + >> >>>> 30 WHERE model='pinto'; >> >>>> COMMIT TRANSACTION >> >>>> >> >>>> This is supported in languages like PL/pgSQL, but >> >>>> in a normal SQL query kind of local declaration >> >>>> is often expressed as an alias (SELECT col AS >> >>>> new_col), subquery alias (SELECT col) t, or >> >>>> common table expression (WITH t AS (SELECT col)). >> >>>> >> >>>> Here’s an example of an alternative to the >> >>>> proposed syntax that I’d find more readable: >> >>>> >> >>>> BEGIN TRANSACTION >> >>>> WITH car_miles, car_is_running AS (SELECT >> >>>> miles_driven, is_running FROM cars WHERE >> >>>> model=’pinto’), >> >>>> user_miles AS (SELECT miles_driven FROM users >> >>>> WHERE name=’blake’) >> >>>> IF NOT car_is_running THEN ABORT >> >>>> UPDATE users SET miles_driven = user_miles >> >>>> + 30 WHERE name='blake'; >> >>>> UPDATE cars SET miles_driven = car_miles + 30 >> >>>> WHERE model='pinto'; >> >>>> COMMIT TRANSACTION >> >>>> >> >>>> There’s also the option of naming the transaction >> >>>> like a subquery, and supporting LET via AS (this >> >>>> one I’m less sure about but wanted to propose >> >>>> anyway): >> >>>> >> >>>> BEGIN TRANSACTION t1 >> >>>> SELECT miles_driven AS t1.car_miles, >> >>>> is_running AS t1.car_is_running FROM cars >> >>>> WHERE model=’pinto’; >> >>>> SELECT miles_driven AS t1.user_miles FROM >> >>>> users WHERE name=’blake’; >> >>>> IF NOT car_is_running THEN ABORT >> >>>> UPDATE users SET miles_driven = user_miles >> >>>> + 30 WHERE name='blake'; >> >>>> UPDATE cars SET miles_driven = car_miles + >> >>>> 30 WHERE model='pinto'; >> >>>> COMMIT TRANSACTION >> >>>> >> >>>> This also has the benefit of resolving ambiguity >> >>>> in case of naming conflicts with existing (or >> >>>> future) column names. >> >>>> >> >>>> -- >> >>>> Abe >> >>>> >> >