Re: CEP-15 multi key transaction syntax

Benedict Elliott Smith Mon, 15 Aug 2022 12:02:01 -0700


> I like Benedict's tuple deconstruction idea

For posterity, this was Avi’s idea!


> On 15 Aug 2022, at 18:59, Caleb Rackliffe <calebrackli...@gmail.com> wrote:
> 
> Monday Morning Caleb has digested, and here's where I am...
> 
> 1.) I have no problem w/ having SELECT on the RHS of a LET assignment, and to 
> be honest, this may make some implementation things easier for me (i.e. the 
> encapsulation of SELECT within LET)
> 2.) I'm in favor of LET without a select, although I have no strong feeling 
> that it needs to be in v1.
> 3.) I like Benedict's tuple deconstruction idea, as it restores some of the 
> notational convenience of the previous proposal. Again, though, I don't have 
> a strong feeling this needs to be in v1.
> 3.b.) When we do implement tuple deconstruction, I'd be in favor of 
> supporting a single level of deconstruction to begin with.
> 
> Having said all that, on Friday I finished a prototype (based on some of 
> Blake's previous work) of the syntax/grammar we've more or less agreed upon 
> here, including an implementation of what I described as option #5 above: 
> https://github.com/maedhroz/cassandra/commits/CASSANDRA-17719-prototype
> 
> To look at specific examples, see these tests: 
> https://github.com/maedhroz/cassandra/blob/CASSANDRA-17719-prototype/test/distributed/org/apache/cassandra/distributed/test/accord/AccordIntegrationTest.java
> 
> There are only two things that aren't yet congruent w/ our discussion above, 
> but they should both be trivial to fix:
> 
> 1.) I'm still using EXISTS/NOT EXISTS instead of IS NOT NULL/IS NULL.
> 2.) I don't require SELECT on the RHS of LET yet.
> 
> If I were to just fix those two items, would we be in agreement on this being 
> both the core of the syntax we want and compatible w/ the wish list for 
> future items?
> 
> 
> On Sun, Aug 14, 2022 at 12:25 PM Benedict Elliott Smith <bened...@apache.org> 
> wrote:
>> 
>> 
>>> 
>>> Verbose version:
>>> LET (a) = SELECT val FROM table
>>> IF a > 1 THEN...
>>> 
>>> Less verbose version:
>>> LET a = SELECT val FROM table
>>> IF a.val > 1 THEN...
>> 
>> 
>> My intention is that these are actually two different ways of expressing the 
>> same thing, both supported and neither intended to be more or less verbose 
>> than the other. The advantage of permitting both is that you can also write
>> 
>> LET a = SELECT val FROM table
>> IF a IS NOT NULL AND a.val IS NULL THEN …
>> 
>>> Alternatively, for non-queries:
>>> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
>>> or less verbose:
>>> LET x = (someFunc() AS v1, someOtherFunc() as v2)
>>> LET (v1, v2) = (someFunc(), someOtherFunc())
>> 
>> I personally prefer clarity over any arbitrary verbosity/succinct 
>> distinction, but we’re in general “taste” territory here. Since this syntax 
>> includes the SELECT on the RHS, it makes sense to only require this for 
>> situations where a query is being performed. Though I think if SELECT 
>> without a FROM is supported then we will likely end up supporting all of the 
>> above.
>> 
>>> Weighing in on the "SELECT without a FROM," I think that is fine and, as 
>>> Avi stated
>> 
>> Yep, definitely fine. Question is just whether we bother to offer it. Also, 
>> evidently, whether we support LET without a SELECT on the RHS. I am strongly 
>> in favour of this, as requiring a SELECT even when there’s no table involved 
>> is counter-intuitive to me, as LET is now a distinct concept that looks like 
>> variable declaration in other languages.
>> 
>>> Nested:
>>> LET (x, y) = SELECT x, y FROM…
>> 
>> Deconstruction here refers to the above, i.e. extracting variables x and y 
>> from the tuple on the RHS
>> 
>> Nesting is just a question of whether we support either nested tuple 
>> declarations, or nested deconstruction, which might include any of the 
>> following:
>> 
>> LET (x, (y, z)) = SELECT (x, (y, z)) FROM…
>> LET (x, (y, z)) = SELECT x, someTuple FROM…
>> LET (x, (y, z)) = (SELECT x FROM.., SELECT y, x FROM…))
>> LET (x, (y, z)) = (someFunc(), SELECT y, z FROM…)
>> LET (x, yAndZ) = (someFunc(), SELECT y, z FROM…)
>> 
>> IMO, once you start supporting features they need to be sort of intuitively 
>> discoverable by users, so that a concept can be used in all places you might 
>> expect.
>> 
>> But I would be fine with an arbitrary restriction of at most one SELECT on 
>> the RHS, or even ONLY a SELECT or some other tuple, and at most one level of 
>> deconstruction of the RHS.
>> 
>> 
>> 
>> 
>> 
>>> On 14 Aug 2022, at 18:04, Patrick McFadin <pmcfa...@gmail.com> wrote:
>>> 
>>> Let me just state my bias right up front. For any kind of QL I lean heavily 
>>> toward verbose and explicit based on their lifecycle. A CQL query will 
>>> probably need to be understood by the next person looking at it, and a few 
>>> seconds saved typing isn't worth the potential misunderstanding later.  My 
>>> opinion is formed by having to be the second person many times.  :D 
>>> 
>>> I just want to make sure I have the syntax you are proposing. 
>>> 
>>> Verbose version:
>>> LET (a) = SELECT val FROM table
>>> IF a > 1 THEN...
>>> 
>>> Less verbose version:
>>> LET a = SELECT val FROM table
>>> IF a.val > 1 THEN...
>>> 
>>> Alternatively, for non-queries:
>>> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
>>> or less verbose:
>>> LET x = (someFunc() AS v1, someOtherFunc() as v2)
>>> LET (v1, v2) = (someFunc(), someOtherFunc())
>>> 
>>> Weighing in on the "SELECT without a FROM," I think that is fine and, as 
>>> Avi stated, already present in the SQL world. I would prefer that over 
>>> 'SELECT  func() FROM dual;' (Looking at you, Oracle)
>>> 
>>> Finally, on the topic of deconstructing SELECT statements instead of 
>>> nesting. If I understand the argument here, I would favor deconstructing 
>>> over nesting if there is a choice. I think this is what that choice would 
>>> look like.
>>> 
>>> Deconstructed:
>>> LET x = SELECT x FROM ...
>>> LET y = SELECT y FROM ...
>>> 
>>> Nested:
>>> LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))
>>> 
>>> I'm trying to summate but let me know if I missed something. I apologize in 
>>> advance to Monday morning Caleb, who will have to digest this thread. 
>>> 
>>> Patrick
>>> 
>>> On Sun, Aug 14, 2022 at 9:00 AM Benedict Elliott Smith 
>>> <bened...@apache.org> wrote:
>>>> 
>>>>> 
>>>>> I think SQL dialects require subqueries to be parenthesized (not sure). 
>>>>> If that's the case I think we should keep the tradition.
>>>>> 
>>>> 
>>>> This isn’t a sub-query though, since LET is not a query. If we permit at 
>>>> most one SELECT, and do not permit mixing SELECT with constant 
>>>> assignments, I don’t see why we would require parentheses.
>>>> 
>>>>> I see no harm in making FROM optional, as it's recognized by other SQL 
>>>>> dialects.
>>>>> 
>>>>> Absolutely, this just flows naturally from having tuples. There's no 
>>>>> difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple”.
>>>> 
>>>> Neither of these things are supported today, and they’re no longer 
>>>> necessary with this syntax proposal. The downside of splitting SELECT and 
>>>> LET is that there’s no impetus to improve the former. So the question was 
>>>> really whether we bother to improve it anyway, not whether or not they 
>>>> would be good improvements (I think they obviously are).
>>>> 
>>>>> I think this can be safely deferred. Most people would again separate it 
>>>>> into separate LETs.
>>>>> 
>>>> That implies we’ll permit deconstructing a tuple variable in a LET. This 
>>>> makes sense to me, but is roughly equivalent to nested deconstruction. It 
>>>> might be that v1 we only support deconstructing SELECT statements, but I 
>>>> guess all of this is probably up to the implementor.
>>>>> I'd add (to the specification) that LETs cannot override a previously 
>>>>> defined variable, just to reduce ambiguity.
>>>>> 
>>>> 
>>>> Yep, this was already agreed way back with the earlier proposal.
>>>> 
>>>> 
>>>>> On 14 Aug 2022, at 16:30, Avi Kivity <a...@scylladb.com> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 14/08/2022 17.50, Benedict Elliott Smith wrote:
>>>>>> 
>>>>>> > SELECT and LET incompatible once comparisons become valid selectors
>>>>>> 
>>>>>> I don’t think this would be ambiguous, as = is required in the LET 
>>>>>> syntax as we have to bind the result to a variable name.
>>>>>> 
>>>>>> But, I like the deconstructed tuple syntax improvement over “Option 6”. 
>>>>>> This would also seem to easily support assigning from non-query 
>>>>>> statements, such as LET (a, b) = (someFunc(), someOtherFunc(?))
>>>>>> 
>>>>>> I don’t think it is ideal to depend on relative position in the tuple 
>>>>>> for assigning results to a variable name, as it leaves more scope for 
>>>>>> errors. It would be nice to have a simple way to deconstruct safely. 
>>>>>> But, I think this proposal is good, and I’d be fine with it as an 
>>>>>> alternative if others concur. I agree that seeing the SELECT 
>>>>>> independently may be more easily recognisable to users.
>>>>>> 
>>>>>> With this approach there remains the question of how we handle single 
>>>>>> column results. I’d be inclined to treat in the following way:
>>>>>> 
>>>>>> LET (a) = SELECT val FROM table
>>>>>> IF a > 1 THEN...
>>>>>> 
>>>>>> LET a = SELECT val FROM table
>>>>>> IF a.val > 1 THEN...
>>>>>> 
>>>>> 
>>>>> I think SQL dialects require subqueries to be parenthesized (not sure). 
>>>>> If that's the case I think we should keep the tradition.
>>>>> 
>>>>> 
>>>>> 
>>>>>> ----
>>>>>> There is also the question of whether we support SELECT without a FROM 
>>>>>> clause, e.g.
>>>>>> LET x = SELECT someFunc() AS v1, someOtherFunc() AS v2
>>>>>> 
>>>>>> Or just LET (since they are no longer equivalent)
>>>>>> e.g.
>>>>>> LET x = (someFunc() AS v1, someOtherFunc() as v2)
>>>>>> LET (v1, v2) = (someFunc(), someOtherFunc())
>>>>>> 
>>>>> 
>>>>> I see no harm in making FROM optional, as it's recognized by other SQL 
>>>>> dialects.
>>>>> 
>>>>> 
>>>>> 
>>>>>> ----
>>>>>> Also since LET is only binding variables, is there any reason we 
>>>>>> shouldn’t support multiple SELECT assignments in a single LET?, e.g.
>>>>>> LET (x, y) = ((SELECT x FROM…), (SELECT y FROM))
>>>>>> 
>>>>> 
>>>>> What if an inner select returns a tuple? Would y be a tuple?
>>>>> 
>>>>> 
>>>>> 
>>>>> I think this is redundant and atypical enough to not be worth supporting. 
>>>>> Most people would use separate LETs.
>>>>> 
>>>>> 
>>>>> 
>>>>>> ----
>>>>>> Also whether we support tuples in SELECT statements anyway, e.g.
>>>>>> LET (tuple1, tuple2) = SELECT (a, b), (c, d) FROM..
>>>>>> IF tuple1.a > 1 AND tuple2.d > 1…
>>>>> 
>>>>> Absolutely, this just flows naturally from having tuples. There's no 
>>>>> difference between "SELECT (a, b)" and "SELECT a_but_a_is_a_tuple".
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> ----
>>>>>> and whether we support nested deconstruction, e.g.
>>>>>> LET (a, b, (c, d)) = SELECT a, b, someTuple FROM..
>>>>>> IF a > 1 AND d > 1…
>>>>>> 
>>>>> 
>>>>> I think this can be safely deferred. Most people would again separate it 
>>>>> into separate LETs.
>>>>> 
>>>>> 
>>>>> 
>>>>> I'd add (to the specification) that LETs cannot override a       
>>>>> previously defined variable, just to reduce ambiguity.
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 14 Aug 2022, at 13:55, Avi Kivity via dev <dev@cassandra.apache.org> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 14/08/2022 01.29, Benedict Elliott Smith wrote:
>>>>>>>> 
>>>>>>>> I’ll do my best to express with my thinking, as well as how I would 
>>>>>>>> explain the feature to a user.
>>>>>>>> 
>>>>>>>> My mental model for LET statements is that they are simply SELECT 
>>>>>>>> statements where the columns that are selected become variables 
>>>>>>>> accessible anywhere in the scope of the transaction. That is to say, 
>>>>>>>> you should be able to run something like s/LET/SELECT and 
>>>>>>>> s/([^=]+)=([^,]+)(,|$)/\2 AS \1\3/g on the columns of a LET statement 
>>>>>>>> and produce a valid SELECT statement, and vice versa. Both should 
>>>>>>>> perform identically.
>>>>>>>> 
>>>>>>>> e.g. 
>>>>>>>> SELECT pk AS key, v AS value FROM table 
>>>>>>>> 
>>>>>>>> => 
>>>>>>>> LET key = pk, value = v FROM table
>>>>>>> 
>>>>>>> "=" is a CQL/SQL operator. Cassandra doesn't support it yet, but SQL 
>>>>>>> supports selecting comparisons:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> $ psql
>>>>>>> psql (14.3)
>>>>>>> Type "help" for help.
>>>>>>> 
>>>>>>> avi=# SELECT 1 = 2, 3 = 3, NULL = NULL;
>>>>>>>  ?column? | ?column? | ?column? 
>>>>>>> ----------+----------+----------
>>>>>>>  f        | t        | 
>>>>>>> (1 row)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Using "=" as a syntactic element in LET would make SELECT and LET 
>>>>>>> incompatible once comparisons become valid selectors. Unless they 
>>>>>>> become mandatory (and then you'd write "LET q = a = b" if you wanted to 
>>>>>>> select a comparison).
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I personally prefer the nested query syntax:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>     LET (a, b, c) = (SELECT foo, bar, x+y FROM ...);
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> So there aren't two similar-but-not-quite-the-same syntaxes. SELECT is 
>>>>>>> immediately recognizable by everyone as a query, LET is not.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> Identical form, identical behaviour. Every statement should be 
>>>>>>>> directly translatable with some simple text manipulation.
>>>>>>>> 
>>>>>>>> We can then make this more powerful for users by simply expanding 
>>>>>>>> SELECT statements, e.g. by permitting them to declare constants and 
>>>>>>>> tuples in the column results. In this scheme LET x = * is simply 
>>>>>>>> syntactic sugar for LET x = (pk, ck, field1, …) This scheme then 
>>>>>>>> supports options 2, 4 and 5 all at once, consistently alongside each 
>>>>>>>> other.
>>>>>>>> 
>>>>>>>> Option 6 is in fact very similar, but is strictly less flexible for 
>>>>>>>> the user as they have no way to declare multiple scalar variables 
>>>>>>>> without scoping them inside a tuple.
>>>>>>>> 
>>>>>>>> e.g.
>>>>>>>> LET key = pk, value = v FROM table
>>>>>>>> IF key > 1 AND value > 1 THEN...
>>>>>>>> 
>>>>>>>> =>
>>>>>>>> LET row = SELECT pk AS key, v AS value FROM table
>>>>>>>> IF row.key > 1 AND row.value > 1 THEN…
>>>>>>>> 
>>>>>>>> However, both are expressible in the existing proposal, as if you 
>>>>>>>> prefer this naming scheme you can simply write
>>>>>>>> 
>>>>>>>> LET row = (pk AS key, v AS value) FROM table
>>>>>>>> IF row.key > 1 AND row.value > 1 THEN…
>>>>>>>> 
>>>>>>>> With respect to auto converting single column results to a scalar, we 
>>>>>>>> do need a way for the user to say they care whether the row was null 
>>>>>>>> or the column. I think an implicit conversion here could be 
>>>>>>>> surprising. However we could implement tuple expressions anyway and 
>>>>>>>> let the user explicitly declare v as a tuple as Caleb has suggested 
>>>>>>>> for the existing proposal as well.
>>>>>>>> 
>>>>>>>> Assigning constants or other values not selected from a table would 
>>>>>>>> also be a little clunky:
>>>>>>>> 
>>>>>>>> LET v1 = someFunc(), v2 = someOtherFunc(?)
>>>>>>>> IF v1 > 1 AND v2 > 1 THEN…
>>>>>>>> 
>>>>>>>> =>
>>>>>>>> LET row = SELECT someFunc() AS v1, someOtherFunc(?) AS v2
>>>>>>>> IF row.v1 > 1 AND row.v2 > 1 THEN...
>>>>>>>> 
>>>>>>>> That said, the proposals are close to identical, it is just slightly 
>>>>>>>> more verbose and slightly less flexible.
>>>>>>>> 
>>>>>>>> Which one would be most intuitive to users is hard to predict. It 
>>>>>>>> might be that Option 6 would be slightly easier, but I’m unsure if 
>>>>>>>> there would be a huge difference.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 13 Aug 2022, at 16:59, Patrick McFadin <pmcfa...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> I'm really happy to see CEP-15 getting closer to a final 
>>>>>>>>> implementation. I'm going to walk through my reasoning for your 
>>>>>>>>> proposals wrt trying to explain this to somebody new. 
>>>>>>>>> 
>>>>>>>>> Looking at all the options, the first thing that comes up for me is 
>>>>>>>>> the Cassandra project's complicated relationship with NULL.  We have 
>>>>>>>>> prior art with EXISTS/NOT EXISTS when creating new tables. IS NULL/IS 
>>>>>>>>> NOT NULL is used in materialized views similarly to proposals 2,4 and 
>>>>>>>>> 5. 
>>>>>>>>> 
>>>>>>>>> CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] [keyspace_name.]view_name
>>>>>>>>>   AS SELECT [ (column_list) ]
>>>>>>>>>   FROM [keyspace_name.]table_name
>>>>>>>>>   [ WHERE column_name IS NOT NULL
>>>>>>>>>   [ AND column_name IS NOT NULL ... ] ]
>>>>>>>>>   [ AND relation [ AND ... ] ] 
>>>>>>>>>   PRIMARY KEY ( column_list )
>>>>>>>>>   [ WITH [ table_properties ]
>>>>>>>>>   [ [ AND ] CLUSTERING ORDER BY (cluster_column_name order_option) ] 
>>>>>>>>> ] ;
>>>>>>>>> 
>>>>>>>>>  Based on that, I believe 1 and 3 would just confuse users, so -1 on 
>>>>>>>>> those. 
>>>>>>>>> 
>>>>>>>>> Trying to explain the difference between row and column operations 
>>>>>>>>> with LET, I can't see the difference between a row and column in #2. 
>>>>>>>>> 
>>>>>>>>> #4 introduces a boolean instead of column names and just adds more 
>>>>>>>>> syntax.
>>>>>>>>> 
>>>>>>>>> #5 is verbose and, in my opinion, easier to reason when writing a 
>>>>>>>>> query. Thinking top down, I need to know if these exact rows and/or 
>>>>>>>>> column values exist before changing them, so I'll define them first. 
>>>>>>>>> Then I'll iterate over the state I created in my actual changes so I 
>>>>>>>>> know I'm changing precisely what I want. 
>>>>>>>>> 
>>>>>>>>> #5 could use a bit more to be clearer to somebody who doesn't write 
>>>>>>>>> CQL queries daily and wouldn't require memorizing subtle differences. 
>>>>>>>>> It should be similar to all the other syntax, so learning a little 
>>>>>>>>> about CQL will let you move into more without completely re-learning 
>>>>>>>>> the new syntax.  
>>>>>>>>> 
>>>>>>>>> So I propose #6)
>>>>>>>>> BEGIN TRANSACTION
>>>>>>>>>   LET row1 = SELECT * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects 
>>>>>>>>> all columns
>>>>>>>>>   LET row2 = SELECT v FROM ks.tbl WHERE k=1 AND c=0;
>>>>>>>>>   SELECT row1, row2
>>>>>>>>>   IF row1 IS NULL AND row2.v = 3 THEN
>>>>>>>>>     INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>>>>>>   END IF
>>>>>>>>> COMMIT TRANSACTION
>>>>>>>>> 
>>>>>>>>> I added the SELECT in the LET just so it's straightforward, you are 
>>>>>>>>> reading, and it's just like doing a regular select, but you are 
>>>>>>>>> assigning it to a variable. 
>>>>>>>>> 
>>>>>>>>> I removed the confusing 'row1.v' and replaced it with 'row1' I can't 
>>>>>>>>> see why you would need the '.v' vs having the complete variable I 
>>>>>>>>> created in the statement above. 
>>>>>>>>> 
>>>>>>>>> EOL
>>>>>>>>> 
>>>>>>>>> Patrick
>>>>>>>>> 
>>>>>>>>> On Thu, Aug 11, 2022 at 1:37 PM Caleb Rackliffe 
>>>>>>>>> <calebrackli...@gmail.com> wrote:
>>>>>>>>>> ...and one more option...
>>>>>>>>>> 
>>>>>>>>>> 5.) Introduce tuple assignments, removing all ambiguity around row 
>>>>>>>>>> vs. column operations.
>>>>>>>>>> 
>>>>>>>>>> BEGIN TRANSACTION
>>>>>>>>>>   LET row1 = * FROM ks.tbl WHERE k=0 AND c=0; <-- * selects all 
>>>>>>>>>> columns
>>>>>>>>>>   LET row2 = (v) FROM ks.tbl WHERE k=1 AND c=0;
>>>>>>>>>>   SELECT row1.v, row2.v
>>>>>>>>>>   IF row1 IS NULL AND row2.v = 3 THEN
>>>>>>>>>>     INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>>>>>>>   END IF
>>>>>>>>>> COMMIT TRANSACTION
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Thu, Aug 11, 2022 at 12:55 PM Caleb Rackliffe 
>>>>>>>>>> <calebrackli...@gmail.com> wrote:
>>>>>>>>>>> via Benedict, here is a 4th option:
>>>>>>>>>>> 
>>>>>>>>>>> 4.) Similar to #2, but don't rely on the key element being NULL.
>>>>>>>>>>> 
>>>>>>>>>>> If the read returns no result, x effectively becomes NULL. 
>>>>>>>>>>> Otherwise, it remains true/NOT NULL.
>>>>>>>>>>> 
>>>>>>>>>>> BEGIN TRANSACTION
>>>>>>>>>>>   LET x = true FROM ks.tbl WHERE k=0 AND c=0;
>>>>>>>>>>>   LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>>>>>>>>>>   SELECT x, row2_v
>>>>>>>>>>>   IF x IS NULL AND row2_v = 3 THEN
>>>>>>>>>>>     INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>>>>>>>>   END IF
>>>>>>>>>>> COMMIT TRANSACTION
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Aug 11, 2022 at 12:12 PM Caleb Rackliffe 
>>>>>>>>>>> <calebrackli...@gmail.com> wrote:
>>>>>>>>>>>> Hello again everyone!
>>>>>>>>>>>> 
>>>>>>>>>>>> I've been working on a prototype in CASSANDRA-17719 for a grammar 
>>>>>>>>>>>> that roughly corresponds to what we've agreed on in this thread. 
>>>>>>>>>>>> One thing that isn't immediately obvious to me is how the LET 
>>>>>>>>>>>> syntax handles cases where we want to check for the plain 
>>>>>>>>>>>> existence of a row in IF. For example, in this hybrid of the 
>>>>>>>>>>>> originally proposed syntax and something more like what we've 
>>>>>>>>>>>> agreed on (and the RETURNING just to distinguish                   
>>>>>>>>>>>>                                     between that and SELECT), this 
>>>>>>>>>>>> could be pretty straightforward:
>>>>>>>>>>>> 
>>>>>>>>>>>> BEGIN TRANSACTION
>>>>>>>>>>>>   SELECT v FROM ks.tbl WHERE k=0 AND c=0 AS row1;
>>>>>>>>>>>>   SELECT v FROM ks.tbl WHERE k=1 AND c=0 AS row2;
>>>>>>>>>>>>   RETURNING row1.v, row2.v
>>>>>>>>>>>>   IF row1 NOT EXISTS AND row2.v = 3 THEN
>>>>>>>>>>>>     INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>>>>>>>>>   END IF
>>>>>>>>>>>> COMMIT TRANSACTION
>>>>>>>>>>>> 
>>>>>>>>>>>> The NOT EXISTS operator has row1 to work with. One the other hand, 
>>>>>>>>>>>> w/ the LET syntax and no naming of reads, it's not clear what the 
>>>>>>>>>>>> best solution would be. Here are a few possibilities:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1.) Provide a few built-in functions that operate on a whole 
>>>>>>>>>>>> result row. If we assume a SQL style IS NULL and IS NOT NULL (see 
>>>>>>>>>>>> my last post here) for operations on particular columns, this 
>>>>>>>>>>>> probably eliminates the need for EXISTS/NOT EXISTS as well.
>>>>>>>>>>>> 
>>>>>>>>>>>> BEGIN TRANSACTION
>>>>>>>>>>>>   LET row1_missing = notExists() FROM ks.tbl WHERE k=0 AND c=0;
>>>>>>>>>>>>   LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>>>>>>>>>>>   SELECT row1_missing, row2_v
>>>>>>>>>>>>   IF row1_missing AND row2_v = 3 THEN
>>>>>>>>>>>>     INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>>>>>>>>>   END IF
>>>>>>>>>>>> COMMIT TRANSACTION
>>>>>>>>>>>> 
>>>>>>>>>>>> 2.) Assign and check the first primary key element to determine 
>>>>>>>>>>>> whether the row exists.
>>>>>>>>>>>> 
>>>>>>>>>>>> BEGIN TRANSACTION
>>>>>>>>>>>>   LET row1_k = k FROM ks.tbl WHERE k=0 AND c=0;
>>>>>>>>>>>>   LET row2_v = v FROM ks.tbl WHERE k=1 AND c=0;
>>>>>>>>>>>>   SELECT row1_k, row2_v
>>>>>>>>>>>>   IF row1_k IS NULL AND row2_v = 3 THEN
>>>>>>>>>>>>     INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>>>>>>>>>   END IF
>>>>>>>>>>>> COMMIT TRANSACTION
>>>>>>>>>>>> 
>>>>>>>>>>>> 3.) Reconsider the LET concept toward something                    
>>>>>>>>>>>>                                      that allows us to explicitly 
>>>>>>>>>>>> name our reads again.
>>>>>>>>>>>> 
>>>>>>>>>>>> BEGIN TRANSACTION
>>>>>>>>>>>>   WITH (SELECT v FROM ks.tbl WHERE k=0 AND c=0) AS row1;
>>>>>>>>>>>>   WITH (SELECT v FROM ks.tbl WHERE k=1 AND c=0) AS row2;
>>>>>>>>>>>>   SELECT row1.v, row2.v
>>>>>>>>>>>>   IF row1 NOT EXISTS AND row2.v = 3 THEN
>>>>>>>>>>>>     INSERT INTO ks.tbl (k, c, v) VALUES (0, 0, 1);
>>>>>>>>>>>>   END IF
>>>>>>>>>>>> COMMIT TRANSACTION
>>>>>>>>>>>> 
>>>>>>>>>>>> I don't have a strong affinity for any of these, although #1 seems 
>>>>>>>>>>>> the most awkward.
>>>>>>>>>>>> 
>>>>>>>>>>>> Does anyone have any other alternatives? Preference for one of the 
>>>>>>>>>>>> above options?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Jul 22, 2022 at 11:21 AM Caleb Rackliffe 
>>>>>>>>>>>> <calebrackli...@gmail.com> wrote:
>>>>>>>>>>>>> Avi brought up an interesting point around NULLness checking in 
>>>>>>>>>>>>> CASSANDRA-17762...
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In SQL, any comparison with NULL is NULL, which is interpreted 
>>>>>>>>>>>>>> as FALSE in a condition. To test for NULLness, you use IS NULL 
>>>>>>>>>>>>>> or IS NOT NULL. But LWT uses IF col = NULL as a NULLness test. 
>>>>>>>>>>>>>> This is likely to confuse people coming from SQL and hamper 
>>>>>>>>>>>>>> attempts to extend the dialect.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We can leave that Jira open to address what to do in the legacy 
>>>>>>>>>>>>> LWT case, but I'd support a SQL-congruent syntax here (IS NULL or 
>>>>>>>>>>>>> IS NOT NULL), where we have something closer to a blank slate.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Jun 30, 2022 at 6:25 PM Abe Ratnofsky <a...@aber.io> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> The new syntax looks great, and I’m really excited to see this 
>>>>>>>>>>>>>> coming together.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> One piece of feedback on the proposed syntax is around the use 
>>>>>>>>>>>>>> of “=“ as a declaration in addition to its current use as an 
>>>>>>>>>>>>>> equality operator in a WHERE clause and an assignment operator 
>>>>>>>>>>>>>> in an UPDATE:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> BEGIN TRANSACTION
>>>>>>>>>>>>>>   LET car_miles = miles_driven, car_is_running = is_running FROM 
>>>>>>>>>>>>>> cars WHERE model=’pinto’
>>>>>>>>>>>>>>   LET user_miles = miles_driven FROM users WHERE name=’blake’
>>>>>>>>>>>>>>   SELECT something else from some other table
>>>>>>>>>>>>>>   IF NOT car_is_running THEN ABORT
>>>>>>>>>>>>>>   UPDATE users SET miles_driven = user_miles + 30 WHERE 
>>>>>>>>>>>>>> name='blake';
>>>>>>>>>>>>>>   UPDATE cars SET miles_driven = car_miles + 30 WHERE 
>>>>>>>>>>>>>> model='pinto';
>>>>>>>>>>>>>> COMMIT TRANSACTION
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This is supported in languages like PL/pgSQL, but in a normal 
>>>>>>>>>>>>>> SQL query kind of local declaration is often expressed as an 
>>>>>>>>>>>>>> alias (SELECT col AS new_col), subquery alias (SELECT col) t, or 
>>>>>>>>>>>>>> common table expression (WITH t AS (SELECT col)).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Here’s an example of an alternative to the proposed syntax that 
>>>>>>>>>>>>>> I’d find more readable:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> BEGIN TRANSACTION
>>>>>>>>>>>>>>   WITH car_miles, car_is_running AS (SELECT miles_driven, 
>>>>>>>>>>>>>> is_running FROM cars WHERE model=’pinto’),
>>>>>>>>>>>>>>          user_miles AS (SELECT miles_driven FROM users WHERE 
>>>>>>>>>>>>>> name=’blake’)
>>>>>>>>>>>>>>   IF NOT car_is_running THEN ABORT
>>>>>>>>>>>>>>   UPDATE users SET miles_driven = user_miles + 30 WHERE 
>>>>>>>>>>>>>> name='blake';
>>>>>>>>>>>>>>   UPDATE cars SET miles_driven = car_miles + 30 WHERE 
>>>>>>>>>>>>>> model='pinto';
>>>>>>>>>>>>>> COMMIT TRANSACTION
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> There’s also the option of naming the transaction like a 
>>>>>>>>>>>>>> subquery, and supporting LET via AS (this one I’m less sure 
>>>>>>>>>>>>>> about but wanted to propose anyway):
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> BEGIN TRANSACTION t1
>>>>>>>>>>>>>>   SELECT miles_driven AS t1.car_miles, is_running AS 
>>>>>>>>>>>>>> t1.car_is_running FROM cars WHERE model=’pinto’;
>>>>>>>>>>>>>>   SELECT miles_driven AS t1.user_miles FROM users WHERE 
>>>>>>>>>>>>>> name=’blake’;
>>>>>>>>>>>>>>   IF NOT car_is_running THEN ABORT
>>>>>>>>>>>>>>   UPDATE users SET miles_driven = user_miles + 30 WHERE 
>>>>>>>>>>>>>> name='blake';
>>>>>>>>>>>>>>   UPDATE cars SET miles_driven = car_miles + 30 WHERE 
>>>>>>>>>>>>>> model='pinto';
>>>>>>>>>>>>>> COMMIT TRANSACTION
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This also has the benefit of resolving ambiguity in case of 
>>>>>>>>>>>>>> naming conflicts with existing (or future) column names.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Abe

Re: CEP-15 multi key transaction syntax

Reply via email to