Re: CEP-15 multi key transaction syntax

2022-06-13 Thread Konstantin Osipov
> IF (X) THEN
> ROLLBACK
> RETURN (ERRCODE)
> END IF
> 
> or
> 
> IF (X) THEN RAISERROR
> 
> So, that is in essence the question we are currently asking: do
> we want to have a more LWT-like approach (and if so, how do we
> address this complexity for the user), or do we want a more
> SQL-like approach (and if so, how do we modify it to make
> non-interactive transactions convenient, and implementation
> tractable)
> 
> * This is anyway a shortcoming of existing batches, I think? So
> it might be we can sweep it under the rug, but I think it will
> be more relevant here as people execute more complex
> transactions, and we should ideally have semantics that will
> work well into the future – including if we later introduce
> interactive transactions.

I'd start with answering the question how the syntax should handle
NOT FOUND condition. In SQL, that would trigger activation of a
CONTINUE handler. 

It's hard to see how one can truly branch the logic without it.
Relying on NULL content of a cell would be full of gotchas.

-- 
Konstantin Osipov, Moscow, Russia


Re: CEP-15 multi key transaction syntax

2022-06-13 Thread bened...@apache.org
I believe that is a MySQL specific concept. This is one problem with mimicking 
SQL – it’s not one thing!

In T-SQL, a Boolean expression is TRUE, FALSE or UNKNOWN[1], and a NULL value 
submitted to a Boolean operator yields UNKNOWN.

IF (X) THEN Y does not run Y if X is UNKNOWN;
IF (X) THEN Y ELSE Z does run Z if X is UNKNOWN.

So, I think we have evidence that it is fine to interpret NULL as “false” for 
the evaluation of IF conditions.

[1] 
https://docs.microsoft.com/en-us/sql/t-sql/language-elements/else-if-else-transact-sql?view=sql-server-ver16



From: Konstantin Osipov 
Date: Monday, 13 June 2022 at 14:57
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
> IF (X) THEN
> ROLLBACK
> RETURN (ERRCODE)
> END IF
>
> or
>
> IF (X) THEN RAISERROR
>
> So, that is in essence the question we are currently asking: do
> we want to have a more LWT-like approach (and if so, how do we
> address this complexity for the user), or do we want a more
> SQL-like approach (and if so, how do we modify it to make
> non-interactive transactions convenient, and implementation
> tractable)
>
> * This is anyway a shortcoming of existing batches, I think? So
> it might be we can sweep it under the rug, but I think it will
> be more relevant here as people execute more complex
> transactions, and we should ideally have semantics that will
> work well into the future – including if we later introduce
> interactive transactions.

I'd start with answering the question how the syntax should handle
NOT FOUND condition. In SQL, that would trigger activation of a
CONTINUE handler.

It's hard to see how one can truly branch the logic without it.
Relying on NULL content of a cell would be full of gotchas.

--
Konstantin Osipov, Moscow, Russia


Re: CEP-15 multi key transaction syntax

2022-06-13 Thread Aaron Ploetz
Benedict,

I'm really excited about this feature.  I've been observing this
conversation for a while now, and I"m happy to add some thoughts.

We must balance the fact we cannot afford to do everything (yet), against
> the need to make sure what we do is reasonably intuitive (to both CQL and
> SQL users) and consistent – including with whatever we do in future.


I think taking small steps forward, to build a few complete features as
close to SQL as possible is a good approach.

question we are currently asking: do we want to have a more LWT-like
> approach... or do we want a more SQL-like approach
>

For years now we've been fighting this notion that Cassandra is difficult
to use.  Coming up with specialized syntax isn't going to bridge that
divide.  From a (new?) user perspective, the best plan is to stay as
consistent with SQL as possible.

I believe that is a MySQL specific concept. This is one problem with
> mimicking SQL – it’s not one thing!


Right?!?!  As if this needed to be more complex.

I think we have evidence that it is fine to interpret NULL as “false” for
> the evaluation of IF conditions.
>

Agree.  Null == false isn't too much of a leap.

Thanks for taking up the charge on this one.  Glad to see it moving forward!

Thanks,

Aaron



On Sun, Jun 12, 2022 at 10:33 AM bened...@apache.org 
wrote:

> Welcome Li, and thanks for your input
>
>
>
> > When I first saw the syntax, I took it for granted that the condition
> was evaluated against the state AFTER the updates
>
>
>
> Depending what you mean, I think this is one of the options being
> considered. At least, it seems this syntax is most likely to be evaluated
> against the values written by preceding statements in the batch, but not
> the statement itself (or later ones), as this could lead to nonsensical
> statements like
>
>
>
> BEGIN TRANSACTION
>
> UPDATE tbl SET v = 1 WHERE key = 1 AS tbl
>
> COMMIT TRANSACTION IF tbl.v = 0
>
>
>
> Where y is never 0 afterwards, so this never succeeds. I take it in this
> simple case you would expect the condition to be evaluated against the
> state prior to the statement (i.e. the initial state)?
>
>
>
> But we have a blank slate, so every option is available to us! We just
> need to make sure it makes sense to the user, even in uncommon cases.
>
>
>
> > The IF (Boolean expr) ABORT TRANSACTION would suffer less because users
> may tend to put the condition closer to the related SELECT statement.
>
>
>
> This is probably not going to matter in practice. The SELECTs all happen
> upfront no matter what the CQL might look like, and the UPDATE all happen
> only after the IF conditions are evaluated. This is all just a question of
> how the user expresses things.
>
>
>
> In future we may offer interactive transactions, or transactions that are
> multi-step, in which case this would be more relevant and could have an
> efficiency impact.
>
>
>
> > Would you consider allowing users to start a read-only transaction
> explicitly like BEGIN TRANSACTION READONLY?
>
>
>
> Good question. I would be OK with this, for sure, and will defer to the
> opinions of others here. There won’t be any optimisation impact, as we
> simply check if the transaction contains any updates, but some validation
> could be helpful for the user.
>
>
>
> > Finally, I wonder if the community would be interested in idempotency
> support.
>
>
>
> This is something that has been considered, and that Accord is able to
> support (in a couple of ways), but as an end-to-end feature this requires
> client support and other scaffolding that is not currently
> planned/scheduled. The simplest (least robust) approach is for the server
> to include the transaction’s identifier in its timeout, so that it be
> queried by the client to establish if it has been made durable. This should
> be quite easy to deliver on the server-side, but would require some
> application or client integration, and is unreliable in the face of
> coordinator failure (so the transaction id is unknown to the client). The
> more complete approach is for the client to include an idempotency token in
> its submission to the server, and for C* to record this alongside the
> transaction id, and for some bounded time window to either reject
> re-submissions of this token or to evaluate it as a no-op. This requires
> much tighter integration from the clients, and more work server-side.
>
>
>
> Which is simply to say, this is on our radar but I can’t make promises
> about what form it will take, or when it will arrive, only that it has been
> planned for enough to ensure we can achieve it when resources permit.
>
>
>
> *From: *Li Boxuan 
> *Date: *Sunday, 12 June 2022 at 16:14
> *To: *dev@cassandra.apache.org 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> Correcting my typo:
>
>
>
> >  I took it for granted that the condition was evaluated against the
> state before the updates
>
>
>
> I took it for granted that the condition was evaluated against the state
> AFTER the upda

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread Blake Eggleston
Does the IF <...> ABORT simplify reasoning though? If you restrict it to only 
dealing with the most recent row it would, but referencing the name implies 
you’d be able to include references from other operations, in which case you’d 
have the same problem.

> return instead an exception if the transaction is aborted

Since the txn is not actually interactive, I think it would be better to 
receive values instead of an excetion, to understand why the operation was 
rolled back.

> On Jun 13, 2022, at 10:32 AM, Aaron Ploetz  wrote:
> 
> Benedict,
> 
> I'm really excited about this feature.  I've been observing this conversation 
> for a while now, and I"m happy to add some thoughts.
> 
> We must balance the fact we cannot afford to do everything (yet), against the 
> need to make sure what we do is reasonably intuitive (to both CQL and SQL 
> users) and consistent – including with whatever we do in future.
> 
> I think taking small steps forward, to build a few complete features as close 
> to SQL as possible is a good approach.
> 
> question we are currently asking: do we want to have a more LWT-like 
> approach... or do we want a more SQL-like approach
>  
> For years now we've been fighting this notion that Cassandra is difficult to 
> use.  Coming up with specialized syntax isn't going to bridge that divide.  
> From a (new?) user perspective, the best plan is to stay as consistent with 
> SQL as possible.
> 
> I believe that is a MySQL specific concept. This is one problem with 
> mimicking SQL – it’s not one thing!
> 
> Right?!?!  As if this needed to be more complex.
> 
> I think we have evidence that it is fine to interpret NULL as “false” for the 
> evaluation of IF conditions.
> 
> 
> Agree.  Null == false isn't too much of a leap.
> 
> Thanks for taking up the charge on this one.  Glad to see it moving forward!
> 
> Thanks,
> 
> Aaron
> 
> 
> 
> On Sun, Jun 12, 2022 at 10:33 AM bened...@apache.org 
>   > wrote:
> Welcome Li, and thanks for your input
> 
>  
> 
> > When I first saw the syntax, I took it for granted that the condition was 
> > evaluated against the state AFTER the updates
> 
>  
> 
> Depending what you mean, I think this is one of the options being considered. 
> At least, it seems this syntax is most likely to be evaluated against the 
> values written by preceding statements in the batch, but not the statement 
> itself (or later ones), as this could lead to nonsensical statements like
> 
>  
> 
> BEGIN TRANSACTION
> 
> UPDATE tbl SET v = 1 WHERE key = 1 AS tbl
> 
> COMMIT TRANSACTION IF tbl.v = 0
> 
>  
> 
> Where y is never 0 afterwards, so this never succeeds. I take it in this 
> simple case you would expect the condition to be evaluated against the state 
> prior to the statement (i.e. the initial state)?
> 
>  
> 
> But we have a blank slate, so every option is available to us! We just need 
> to make sure it makes sense to the user, even in uncommon cases.
> 
>  
> 
> > The IF (Boolean expr) ABORT TRANSACTION would suffer less because users may 
> > tend to put the condition closer to the related SELECT statement.
> 
>  
> 
> This is probably not going to matter in practice. The SELECTs all happen 
> upfront no matter what the CQL might look like, and the UPDATE all happen 
> only after the IF conditions are evaluated. This is all just a question of 
> how the user expresses things.
> 
>  
> 
> In future we may offer interactive transactions, or transactions that are 
> multi-step, in which case this would be more relevant and could have an 
> efficiency impact.
> 
>  
> 
> > Would you consider allowing users to start a read-only transaction 
> > explicitly like BEGIN TRANSACTION READONLY?
> 
>  
> 
> Good question. I would be OK with this, for sure, and will defer to the 
> opinions of others here. There won’t be any optimisation impact, as we simply 
> check if the transaction contains any updates, but some validation could be 
> helpful for the user.
> 
>  
> 
> > Finally, I wonder if the community would be interested in idempotency 
> > support. 
> 
>  
> 
> This is something that has been considered, and that Accord is able to 
> support (in a couple of ways), but as an end-to-end feature this requires 
> client support and other scaffolding that is not currently planned/scheduled. 
> The simplest (least robust) approach is for the server to include the 
> transaction’s identifier in its timeout, so that it be queried by the client 
> to establish if it has been made durable. This should be quite easy to 
> deliver on the server-side, but would require some application or client 
> integration, and is unreliable in the face of coordinator failure (so the 
> transaction id is unknown to the client). The more complete approach is for 
> the client to include an idempotency token in its submission to the server, 
> and for C* to record this alongside the transaction id, and for some bounded 
> time window to either

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread Blake Eggleston
Regarding modeling syntax after SQL... that approach has pros and cons. 
Supporting an SQL like syntax implies capabilities that we can’t provide, so 
you’re delivering something that looks familiar, but behaves differently, which 
doesn’t help us with usability.

I prefer an approach that supports an accurate mental model of what’s happening 
behind the scenes. I think that should be a design priority for the syntax. 
We’ll be able to build things on top of accord, but the core multi-key cas 
operation isn’t going to change too much.

So I have 2 contrarian proposals:

1. Remove named updates, column references must come from selects. More 
verbose, but crystal clear with regards to when/where values come from.
2. Don’t call these transactions, the term implies things accord doesn’t do. 
Maybe call them CAS BATCH, and terminate them with APPLY or APPLY IF.

Although less exciting, this would simplify the initial implementation, and let 
feature requests and first hand experience inform where and how the syntax 
develops from there.

Blake

> On Jun 13, 2022, at 12:14 PM, Blake Eggleston  wrote:
> 
> Does the IF <...> ABORT simplify reasoning though? If you restrict it to only 
> dealing with the most recent row it would, but referencing the name implies 
> you’d be able to include references from other operations, in which case 
> you’d have the same problem.
> 
> > return instead an exception if the transaction is aborted
> 
> Since the txn is not actually interactive, I think it would be better to 
> receive values instead of an excetion, to understand why the operation was 
> rolled back.
> 
>> On Jun 13, 2022, at 10:32 AM, Aaron Ploetz > > wrote:
>> 
>> Benedict,
>> 
>> I'm really excited about this feature.  I've been observing this 
>> conversation for a while now, and I"m happy to add some thoughts.
>> 
>> We must balance the fact we cannot afford to do everything (yet), against 
>> the need to make sure what we do is reasonably intuitive (to both CQL and 
>> SQL users) and consistent – including with whatever we do in future.
>> 
>> I think taking small steps forward, to build a few complete features as 
>> close to SQL as possible is a good approach.
>> 
>> question we are currently asking: do we want to have a more LWT-like 
>> approach... or do we want a more SQL-like approach
>>  
>> For years now we've been fighting this notion that Cassandra is difficult to 
>> use.  Coming up with specialized syntax isn't going to bridge that divide.  
>> From a (new?) user perspective, the best plan is to stay as consistent with 
>> SQL as possible.
>> 
>> I believe that is a MySQL specific concept. This is one problem with 
>> mimicking SQL – it’s not one thing!
>> 
>> Right?!?!  As if this needed to be more complex.
>> 
>> I think we have evidence that it is fine to interpret NULL as “false” for 
>> the evaluation of IF conditions.
>> 
>> 
>> Agree.  Null == false isn't too much of a leap.
>> 
>> Thanks for taking up the charge on this one.  Glad to see it moving forward!
>> 
>> Thanks,
>> 
>> Aaron
>> 
>> 
>> 
>> On Sun, Jun 12, 2022 at 10:33 AM bened...@apache.org 
>>  > > wrote:
>> Welcome Li, and thanks for your input
>> 
>>  
>> 
>> > When I first saw the syntax, I took it for granted that the condition was 
>> > evaluated against the state AFTER the updates
>> 
>>  
>> 
>> Depending what you mean, I think this is one of the options being 
>> considered. At least, it seems this syntax is most likely to be evaluated 
>> against the values written by preceding statements in the batch, but not the 
>> statement itself (or later ones), as this could lead to nonsensical 
>> statements like
>> 
>>  
>> 
>> BEGIN TRANSACTION
>> 
>> UPDATE tbl SET v = 1 WHERE key = 1 AS tbl
>> 
>> COMMIT TRANSACTION IF tbl.v = 0
>> 
>>  
>> 
>> Where y is never 0 afterwards, so this never succeeds. I take it in this 
>> simple case you would expect the condition to be evaluated against the state 
>> prior to the statement (i.e. the initial state)?
>> 
>>  
>> 
>> But we have a blank slate, so every option is available to us! We just need 
>> to make sure it makes sense to the user, even in uncommon cases.
>> 
>>  
>> 
>> > The IF (Boolean expr) ABORT TRANSACTION would suffer less because users 
>> > may tend to put the condition closer to the related SELECT statement.
>> 
>>  
>> 
>> This is probably not going to matter in practice. The SELECTs all happen 
>> upfront no matter what the CQL might look like, and the UPDATE all happen 
>> only after the IF conditions are evaluated. This is all just a question of 
>> how the user expresses things.
>> 
>>  
>> 
>> In future we may offer interactive transactions, or transactions that are 
>> multi-step, in which case this would be more relevant and could have an 
>> efficiency impact.
>> 
>>  
>> 
>> > Would you consider allowing users to start a read-only transaction 
>> > explicitly like BEGIN

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread bened...@apache.org
> Don’t call these transactions, the term implies things accord doesn’t do. 
> Maybe call them CAS BATCH, and terminate them with APPLY or APPLY IF.

The condition is optional, so CAS is not accurate. These are also definitely 
transactions, they are only non-interactive - transactions in SQL are also 
often non-interactive (e.g. within a stored procedure).

I think it is far more problematic to introduce a syntax that would not be 
consistent with future enhancements to transactional functionality. Then we 
would have to introduce a third syntax, and more syntaxes makes for a messy 
language IMO.

I have a very strong preference for choosing a syntax we can evolve 
consistently, so that users just gain additional keywords or have restrictions 
relaxed as the feature evolves.

> Supporting an SQL like syntax implies capabilities that we can’t provide, so 
> you’re delivering something that looks familiar

How so? I think all we’re really considering is *not* introducing the IF part 
of the COMMIT syntax, which is not-SQL-like, and instead offering a way of 
aborting transactions consistent with how it might be done in SQL. This doesn’t 
implement partial SQL functionality, nor look especially not-CQL, it’s just 
more similar control flow so familiar.

> Does the IF <...> ABORT simplify reasoning though? If you restrict it to only 
> dealing with the most recent row it would

If a condition is evaluated against the current value of any record (as of the 
point the condition’s declaration) then it would seem more obvious than were 
the COMMIT IF to be evaluated against the state prior to the value’s 
declaration, as the IF appears to execute last.

> Remove named updates, column references must come from selects. More verbose, 
> but crystal clear with regards to when/where values come from.

Do we require these to be declared first? If so, the problem of ambiguity goes 
away at least, ignoring everything else.

Perhaps we can do that initially either way? It makes both syntaxes easier to 
implement, so we get our MVP more easily. But if we settle what our preferred 
syntax is, we can see if there’s time to deliver it before a release. Either 
way, the syntax evolves on a consistent path.


From: Blake Eggleston 
Date: Monday, 13 June 2022 at 20:57
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
Regarding modeling syntax after SQL... that approach has pros and cons. 
Supporting an SQL like syntax implies capabilities that we can’t provide, so 
you’re delivering something that looks familiar, but behaves differently, which 
doesn’t help us with usability.

I prefer an approach that supports an accurate mental model of what’s happening 
behind the scenes. I think that should be a design priority for the syntax. 
We’ll be able to build things on top of accord, but the core multi-key cas 
operation isn’t going to change too much.

So I have 2 contrarian proposals:
1. Remove named updates, column references must come from selects. More 
verbose, but crystal clear with regards to when/where values come from.
2. Don’t call these transactions, the term implies things accord doesn’t do. 
Maybe call them CAS BATCH, and terminate them with APPLY or APPLY IF.

Although less exciting, this would simplify the initial implementation, and let 
feature requests and first hand experience inform where and how the syntax 
develops from there.

Blake


On Jun 13, 2022, at 12:14 PM, Blake Eggleston 
mailto:beggles...@apple.com>> wrote:

Does the IF <...> ABORT simplify reasoning though? If you restrict it to only 
dealing with the most recent row it would, but referencing the name implies 
you’d be able to include references from other operations, in which case you’d 
have the same problem.

> return instead an exception if the transaction is aborted

Since the txn is not actually interactive, I think it would be better to 
receive values instead of an excetion, to understand why the operation was 
rolled back.


On Jun 13, 2022, at 10:32 AM, Aaron Ploetz 
mailto:aaronplo...@gmail.com>> wrote:

Benedict,

I'm really excited about this feature.  I've been observing this conversation 
for a while now, and I"m happy to add some thoughts.

We must balance the fact we cannot afford to do everything (yet), against the 
need to make sure what we do is reasonably intuitive (to both CQL and SQL 
users) and consistent – including with whatever we do in future.

I think taking small steps forward, to build a few complete features as close 
to SQL as possible is a good approach.

question we are currently asking: do we want to have a more LWT-like 
approach... or do we want a more SQL-like approach

For years now we've been fighting this notion that Cassandra is difficult to 
use.  Coming up with specialized syntax isn't going to bridge that divide.  
From a (new?) user perspective, the best plan is to stay as consistent with SQL 
as possible.

I believe that is a MySQL specific concept. This is one problem wit

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread Derek Chen-Becker
I'm coming to this thread fresh and admittedly I'm still trying to catch up
and wrap my head around it. I think it's already been called out, but what
looked superficially simple at the beginning of the thread has quickly
become something that I'm having to take notes on to make sure I understand
the semantics. I'm a little worried that there are complexities here that
we might not realize. I like the idea, and I think it's a really powerful
addition to CQL, but I think we need to make sure we're not setting up
users for confusion. CQL is great because it leverages knowledge of SQL,
but the devil is in the differences.

Also, related to complexity, is there a subset of what's being discussed
that could be implemented as an initial version and then grown over time to
include more powerful features?

In terms of things that have been discussed so far, in no particular order,
the AS keyword seems to give the user reasonable control over whether they
get the pre- or post-update version of the record. Similarly, I think the
IF...ABORT syntax is much clearer if using AS, since that keyword then
decides which version of the row to use for the condition. Consider the
following (possibly incorrect) example:

BEGIN TRANSACTION
SELECT * from cars where ... AS car
IF car.miles > 10 ROLLBACK TRANSACTION
UPDATE cars SET car.next_service = 10 WHERE ...
COMMIT TRANSACTION

vs

BEGIN TRANSACTION
SELECT * FROM cars WHERE ... AS current_car
IF current_car.miles > 10 ROLLBACK TRANSACTION
UPDATE cars SET car.next_service = 10 WHERE ... AS car
COMMIT TRANSACTION

Cheers,

Derek

On Sun, Jun 12, 2022 at 5:34 AM bened...@apache.org 
wrote:

> > I would love hearing from people on what they think.
>
>
>
> ^^ It would be great to have more participants in this conversation
>
>
>
> > For context, my questions earlier were based on my 20+ years of using
> SQL transactions across different systems.
>
>
>
> We probably don’t come from a very different place. I spent too many years
> with T-SQL.
>
>
>
> > When you start a SQL transaction, you are creating a branch of your
> data that you can operate with until you reach your desired state and then
> merge it back with a commit.
>
>
>
> That’s the essential complexity we’re grappling with: how much do we
> permit your “branch” to do, how do we let you express it, and how do we let
> you express conditions?
>
>
>
> We must balance the fact we cannot afford to do everything (yet), against
> the need to make sure what we do is reasonably intuitive (to both CQL and
> SQL users) and consistent – including with whatever we do in future.
>
>
>
> Right now, we have the issue that read-your-writes introduces some
> complexity to the semantics, particularly around the conditions of
> execution.
>
>
>
> LWTs impose conditions on the state of all records prior to execution, but
> their API has a lot of shortcomings. The proposal of COMMIT IF (Boolean
> expr) is most consistent with this approach. This can be confusing, though,
> if the condition is evaluated on a value that has been updated by a prior
> statement in the batch – what value does this global condition get
> evaluated against?*
>
>
>
> SQL has no such concept, but also SQL is designed to be interactive.
> Depending on the dialect there’s probably a lot of ways to do this
> non-interactively in SQL, but we probably cannot cheaply replicate the
> functionality exactly as we do not (yet) support interactive transactions
> that they were designed for. To submit a conditional non-interactive
> transaction in SQL, you would likely use
>
>
>
> IF (X) THEN
>
> ROLLBACK
>
> RETURN (ERRCODE)
>
> END IF
>
>
>
> or
>
>
>
> IF (X) THEN RAISERROR
>
>
>
> So, that is in essence the question we are currently asking: do we want to
> have a more LWT-like approach (and if so, how do we address this complexity
> for the user), or do we want a more SQL-like approach (and if so, how do we
> modify it to make non-interactive transactions convenient, and
> implementation tractable)
>
>
>
> * This is anyway a shortcoming of existing batches, I think? So it might
> be we can sweep it under the rug, but I think it will be more relevant here
> as people execute more complex transactions, and we should ideally have
> semantics that will work well into the future – including if we later
> introduce interactive transactions.
>
>
>
>
>
>
>
>
>
>
>
> *From: *Patrick McFadin 
> *Date: *Saturday, 11 June 2022 at 15:33
> *To: *dev 
> *Subject: *Re: CEP-15 multi key transaction syntax
>
> I think the syntax is evolving into something pretty complicated, which
> may be warranted but I wanted to take a step back and be a bit more
> reflective on what we are trying to accomplish.
>
>
>
> For context, my questions earlier were based on my 20+ years of using SQL
> transactions across different systems. That's my personal bias when I see
> the word "database transaction" in this case. When you start a SQL
> transaction, you are creating a branch of your d

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread Derek Chen-Becker
On Mon, Jun 13, 2022 at 1:57 PM Blake Eggleston 
wrote:

> I prefer an approach that supports an accurate mental model of what’s
> happening behind the scenes. I think that should be a design priority for
> the syntax. We’ll be able to build things on top of accord, but the core
> multi-key cas operation isn’t going to change too much.
>

+1, the principle of least surprise tells me that if this doesn't behave
exactly like SQL transactions (for whatever SQL actually means), it could
be more clear to not try and emulate it halfway

BEGIN MIXED TRANSACTION?

Derek




> On Jun 13, 2022, at 12:14 PM, Blake Eggleston 
> wrote:
>
>
> Does the IF <...> ABORT simplify reasoning though? If you restrict it to
> only dealing with the most recent row it would, but referencing the name
> implies you’d be able to include references from other operations, in which
> case you’d have the same problem.
>
> > return instead an exception if the transaction is aborted
>
> Since the txn is not actually interactive, I think it would be better to
> receive values instead of an excetion, to understand why the operation was
> rolled back.
>
> On Jun 13, 2022, at 10:32 AM, Aaron Ploetz  wrote:
>
> Benedict,
>
> I'm really excited about this feature.  I've been observing this
> conversation for a while now, and I"m happy to add some thoughts.
>
> We must balance the fact we cannot afford to do everything (yet), against
>> the need to make sure what we do is reasonably intuitive (to both CQL and
>> SQL users) and consistent – including with whatever we do in future.
>
>
> I think taking small steps forward, to build a few complete features as
> close to SQL as possible is a good approach.
>
> question we are currently asking: do we want to have a more LWT-like
>> approach... or do we want a more SQL-like approach
>>
>
> For years now we've been fighting this notion that Cassandra is difficult
> to use.  Coming up with specialized syntax isn't going to bridge that
> divide.  From a (new?) user perspective, the best plan is to stay as
> consistent with SQL as possible.
>
> I believe that is a MySQL specific concept. This is one problem with
>> mimicking SQL – it’s not one thing!
>
>
> Right?!?!  As if this needed to be more complex.
>
> I think we have evidence that it is fine to interpret NULL as “false” for
>> the evaluation of IF conditions.
>>
>
> Agree.  Null == false isn't too much of a leap.
>
> Thanks for taking up the charge on this one.  Glad to see it moving
> forward!
>
> Thanks,
>
> Aaron
>
>
>
> On Sun, Jun 12, 2022 at 10:33 AM bened...@apache.org 
> wrote:
>
>> Welcome Li, and thanks for your input
>>
>>
>>
>> > When I first saw the syntax, I took it for granted that the condition
>> was evaluated against the state AFTER the updates
>>
>>
>>
>> Depending what you mean, I think this is one of the options being
>> considered. At least, it seems this syntax is most likely to be evaluated
>> against the values written by preceding statements in the batch, but not
>> the statement itself (or later ones), as this could lead to nonsensical
>> statements like
>>
>>
>>
>> BEGIN TRANSACTION
>>
>> UPDATE tbl SET v = 1 WHERE key = 1 AS tbl
>>
>> COMMIT TRANSACTION IF tbl.v = 0
>>
>>
>>
>> Where y is never 0 afterwards, so this never succeeds. I take it in this
>> simple case you would expect the condition to be evaluated against the
>> state prior to the statement (i.e. the initial state)?
>>
>>
>>
>> But we have a blank slate, so every option is available to us! We just
>> need to make sure it makes sense to the user, even in uncommon cases.
>>
>>
>>
>> > The IF (Boolean expr) ABORT TRANSACTION would suffer less because
>> users may tend to put the condition closer to the related SELECT statement.
>>
>>
>>
>> This is probably not going to matter in practice. The SELECTs all happen
>> upfront no matter what the CQL might look like, and the UPDATE all happen
>> only after the IF conditions are evaluated. This is all just a question of
>> how the user expresses things.
>>
>>
>>
>> In future we may offer interactive transactions, or transactions that are
>> multi-step, in which case this would be more relevant and could have an
>> efficiency impact.
>>
>>
>>
>> > Would you consider allowing users to start a read-only transaction
>> explicitly like BEGIN TRANSACTION READONLY?
>>
>>
>>
>> Good question. I would be OK with this, for sure, and will defer to the
>> opinions of others here. There won’t be any optimisation impact, as we
>> simply check if the transaction contains any updates, but some validation
>> could be helpful for the user.
>>
>>
>>
>> > Finally, I wonder if the community would be interested in idempotency
>> support.
>>
>>
>>
>> This is something that has been considered, and that Accord is able to
>> support (in a couple of ways), but as an end-to-end feature this requires
>> client support and other scaffolding that is not currently
>> planned/scheduled. The simplest (least robust) approach is for the server
>> to inc

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread bened...@apache.org
What on earth does MIXED mean?

I agree with the sentiment we should minimise surprise, but everyone is 
surprised differently so it becomes a sort of pointless rubrik, everyone 
claiming it supports their view. I think it is only useful in cases where there 
is clear agreement that something is surprising, but unhelpful when choosing 
between subtle variations on approach.

The main goal IMO should be clarity and consistency, so that the user can 
reason about the constructs easily, and so we can evolve them.

For instance, we should be sure to consider how the syntax will look if we *do* 
offer interactive transactions, or JOINs, or anything else we might add in 
future.


From: Derek Chen-Becker 
Date: Monday, 13 June 2022 at 23:09
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
On Mon, Jun 13, 2022 at 1:57 PM Blake Eggleston 
mailto:beggles...@apple.com>> wrote:
I prefer an approach that supports an accurate mental model of what’s happening 
behind the scenes. I think that should be a design priority for the syntax. 
We’ll be able to build things on top of accord, but the core multi-key cas 
operation isn’t going to change too much.

+1, the principle of least surprise tells me that if this doesn't behave 
exactly like SQL transactions (for whatever SQL actually means), it could be 
more clear to not try and emulate it halfway

BEGIN MIXED TRANSACTION?

Derek



On Jun 13, 2022, at 12:14 PM, Blake Eggleston 
mailto:beggles...@apple.com>> wrote:

Does the IF <...> ABORT simplify reasoning though? If you restrict it to only 
dealing with the most recent row it would, but referencing the name implies 
you’d be able to include references from other operations, in which case you’d 
have the same problem.

> return instead an exception if the transaction is aborted

Since the txn is not actually interactive, I think it would be better to 
receive values instead of an excetion, to understand why the operation was 
rolled back.


On Jun 13, 2022, at 10:32 AM, Aaron Ploetz 
mailto:aaronplo...@gmail.com>> wrote:

Benedict,

I'm really excited about this feature.  I've been observing this conversation 
for a while now, and I"m happy to add some thoughts.

We must balance the fact we cannot afford to do everything (yet), against the 
need to make sure what we do is reasonably intuitive (to both CQL and SQL 
users) and consistent – including with whatever we do in future.

I think taking small steps forward, to build a few complete features as close 
to SQL as possible is a good approach.

question we are currently asking: do we want to have a more LWT-like 
approach... or do we want a more SQL-like approach

For years now we've been fighting this notion that Cassandra is difficult to 
use.  Coming up with specialized syntax isn't going to bridge that divide.  
From a (new?) user perspective, the best plan is to stay as consistent with SQL 
as possible.

I believe that is a MySQL specific concept. This is one problem with mimicking 
SQL – it’s not one thing!

Right?!?!  As if this needed to be more complex.

I think we have evidence that it is fine to interpret NULL as “false” for the 
evaluation of IF conditions.

Agree.  Null == false isn't too much of a leap.

Thanks for taking up the charge on this one.  Glad to see it moving forward!

Thanks,

Aaron



On Sun, Jun 12, 2022 at 10:33 AM 
bened...@apache.org 
mailto:bened...@apache.org>> wrote:
Welcome Li, and thanks for your input

> When I first saw the syntax, I took it for granted that the condition was 
> evaluated against the state AFTER the updates

Depending what you mean, I think this is one of the options being considered. 
At least, it seems this syntax is most likely to be evaluated against the 
values written by preceding statements in the batch, but not the statement 
itself (or later ones), as this could lead to nonsensical statements like

BEGIN TRANSACTION
UPDATE tbl SET v = 1 WHERE key = 1 AS tbl
COMMIT TRANSACTION IF tbl.v = 0

Where y is never 0 afterwards, so this never succeeds. I take it in this simple 
case you would expect the condition to be evaluated against the state prior to 
the statement (i.e. the initial state)?

But we have a blank slate, so every option is available to us! We just need to 
make sure it makes sense to the user, even in uncommon cases.

> The IF (Boolean expr) ABORT TRANSACTION would suffer less because users may 
> tend to put the condition closer to the related SELECT statement.

This is probably not going to matter in practice. The SELECTs all happen 
upfront no matter what the CQL might look like, and the UPDATE all happen only 
after the IF conditions are evaluated. This is all just a question of how the 
user expresses things.

In future we may offer interactive transactions, or transactions that are 
multi-step, in which case this would be more relevant and could have an 
efficiency impact.

> Would you consider allowing users to start a read-only tr

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread bened...@apache.org
> is there a subset … that could be implemented as an initial version and then 
> grown over time to include more powerful features?

This is what I would like to aim for, but it’s hard as we probably don’t agree 
in what direction the feature will develop.

My view is that we are more likely than not to develop creeping SQL-like 
functionality over time, in which case it is perhaps good to plan for this 
intentionally from the start.

SQL has decades of work behind it, so we run less risk of taking a design 
deadend, and finding ourselves in a bind when further evolving the language.

I think the way to approach that is to ensure that we do a mix of the following:

1) Ensure any keywords we copy from SQL work very similarly to their SQL 
counterpart, with only some additional restrictions (esp. when we expect to be 
able to later relax them)
2) Where we can’t reasonably do that, introduce new keywords that look and feel 
like SQL but aren’t, so there is no confusion


From: Derek Chen-Becker 
Date: Monday, 13 June 2022 at 23:07
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
I'm coming to this thread fresh and admittedly I'm still trying to catch up and 
wrap my head around it. I think it's already been called out, but what looked 
superficially simple at the beginning of the thread has quickly become 
something that I'm having to take notes on to make sure I understand the 
semantics. I'm a little worried that there are complexities here that we might 
not realize. I like the idea, and I think it's a really powerful addition to 
CQL, but I think we need to make sure we're not setting up users for confusion. 
CQL is great because it leverages knowledge of SQL, but the devil is in the 
differences.

Also, related to complexity, is there a subset of what's being discussed that 
could be implemented as an initial version and then grown over time to include 
more powerful features?

In terms of things that have been discussed so far, in no particular order, the 
AS keyword seems to give the user reasonable control over whether they get the 
pre- or post-update version of the record. Similarly, I think the IF...ABORT 
syntax is much clearer if using AS, since that keyword then decides which 
version of the row to use for the condition. Consider the following (possibly 
incorrect) example:

BEGIN TRANSACTION
SELECT * from cars where ... AS car
IF car.miles > 10 ROLLBACK TRANSACTION
UPDATE cars SET car.next_service = 10 WHERE ...
COMMIT TRANSACTION

vs

BEGIN TRANSACTION
SELECT * FROM cars WHERE ... AS current_car
IF current_car.miles > 10 ROLLBACK TRANSACTION
UPDATE cars SET car.next_service = 10 WHERE ... AS car
COMMIT TRANSACTION

Cheers,

Derek

On Sun, Jun 12, 2022 at 5:34 AM bened...@apache.org 
mailto:bened...@apache.org>> wrote:
> I would love hearing from people on what they think.

^^ It would be great to have more participants in this conversation

> For context, my questions earlier were based on my 20+ years of using SQL 
> transactions across different systems.

We probably don’t come from a very different place. I spent too many years with 
T-SQL.

> When you start a SQL transaction, you are creating a branch of your data that 
> you can operate with until you reach your desired state and then merge it 
> back with a commit.

That’s the essential complexity we’re grappling with: how much do we permit 
your “branch” to do, how do we let you express it, and how do we let you 
express conditions?

We must balance the fact we cannot afford to do everything (yet), against the 
need to make sure what we do is reasonably intuitive (to both CQL and SQL 
users) and consistent – including with whatever we do in future.

Right now, we have the issue that read-your-writes introduces some complexity 
to the semantics, particularly around the conditions of execution.

LWTs impose conditions on the state of all records prior to execution, but 
their API has a lot of shortcomings. The proposal of COMMIT IF (Boolean expr) 
is most consistent with this approach. This can be confusing, though, if the 
condition is evaluated on a value that has been updated by a prior statement in 
the batch – what value does this global condition get evaluated against?*

SQL has no such concept, but also SQL is designed to be interactive. Depending 
on the dialect there’s probably a lot of ways to do this non-interactively in 
SQL, but we probably cannot cheaply replicate the functionality exactly as we 
do not (yet) support interactive transactions that they were designed for. To 
submit a conditional non-interactive transaction in SQL, you would likely use

IF (X) THEN
ROLLBACK
RETURN (ERRCODE)
END IF

or

IF (X) THEN RAISERROR

So, that is in essence the question we are currently asking: do we want to have 
a more LWT-like approach (and if so, how do we address this complexity for the 
user), or do we want a more SQL-like approach (and if so, how d

Re: CEP-15 multi key transaction syntax

2022-06-13 Thread bened...@apache.org
> Like I mentioned in my earlier email, the if/abort syntax throwing an 
> exception would, at least as described, limit useful data returned to the 
> client

Right, I agree. I think this is orthogonal to the other syntax questions. I 
think it is also preferable not to mix success/failure with data results, and 
that might be preferable for both syntaxes. It’s something to hammer out in 
more detail once we get these other questions pinned down, as I think we can 
figure out a good compromise.

> At a higher level, what I meant was that SQL doesn’t have a self contained, 
> one-off statement like these

I’m not sure what you mean? It definitely does? In fact, this was how I most 
often used SQL when I worked with it – non-interactively, with explicit 
transactions as part of a single submission to the server, as this reduced the 
number of round-trips but kept the SQL in version control. Stored procedures 
are just a way of doing this with the SQL saved server-side, and accepting 
explicit parameters, but they’re just a convenience?

> and Cassandra doesn’t have interactive transactions

Yet!

> Incidentally, I think it would be useful to eventually have multiple IF 
> branches inline, and had meant the COMMIT IF as a shorthand for it

I agree it would be nice to support more general IF statements, for both 
positive and negative control flow (i.e. IF (X) THEN UPDATE Y, but also IF (X) 
THEN ABORT/ROLLBACK/RAISERROR).

I’m not sure if COMMIT IF really works as syntactic sugar for the more complex 
construct you outlined, though? Perhaps we could instead offer

IF (X) THEN BEGIN
UPDATE someothertable SET anotherval=14 WHERE key=10;
UPDATE someothertable SET anotherval=13 WHERE key=10;
UPDATE someothertable SET anotherval=12 WHERE key=10;
END

For now we could require that at most one such statement occurs per 
transaction, and encapsulates the whole transaction, e.g.

BEGIN TRANSACTION
IF (X) THEN BEGIN
UPDATE someothertable SET anotherval=14 WHERE key=10;
UPDATE someothertable SET anotherval=13 WHERE key=10;
UPDATE someothertable SET anotherval=12 WHERE key=10;
END
COMMIT TRANSACTION

It would be quite easy to relax this (maybe even before release), but it gets 
us off the starting block without planned obsolescence.

From: Blake Eggleston 
Date: Monday, 13 June 2022 at 23:57
To: dev@cassandra.apache.org 
Subject: Re: CEP-15 multi key transaction syntax
> I think it is far more problematic to introduce a syntax that would not be 
> consistent with future enhancements to transactional functionality. Then we 
> would have to introduce a third syntax, and more syntaxes makes for a messy 
> language IMO.
> I have a very strong preference for choosing a syntax we can evolve 
> consistently, so that users just gain additional keywords or have 
> restrictions relaxed as the feature evolves.

I think our views and goals as pretty strongly aligned here.

> How so? I think all we’re really considering is *not* introducing the IF part 
> of the COMMIT syntax, which is not-SQL-like

Like I mentioned in my earlier email, the if/abort syntax throwing an exception 
would, at least as described, limit useful data returned to the client. 
Solvable depending on how we settle on what data is returned to the client 
though.

At a higher level, what I meant was that SQL doesn’t have a self contained, 
one-off statement like these (stored procedures/functions are close[1], but 
different), and Cassandra doesn’t have interactive transactions. So the 
argument that something is more SQL like when putting syntax meant for 
interactive transactions into Cassandra’s atomic txns isn’t very convincing imo.

Incidentally, I think it would be useful to eventually have multiple IF 
branches inline, and had meant the COMMIT IF as a shorthand for it. Something 
like

BEGIN TRANSACTION;
SELECT * FROM sometable WHERE key=5 AS sel;
UPDATE sometable SET lastread=now() WHERE key=5;
IF sel.someval = 3 THEN
UPDATE someothertable SET anotherval=14 WHERE key=10;
ELSE IF sel.somval = 4 THEN
UPDATE someothertable SET anotherval=13 WHERE key=10;
ELSE
UPDATE someothertable SET anotherval=12 WHERE key=10;
ENDIF;
COMMIT TRANSACTION;

And for extra fun, here’s an early mockup I did based on the Postgres function 
syntax: https://gist.github.com/bdeggleston/51d5510450a1d7549f725e06d871cc60


> Do we require these to be declared first? If so, the problem of ambiguity 
> goes away at least, ignoring everything else.
> Perhaps we can do that initially either way? It makes both syntaxes easier to 
> implement, so we get our MVP more easily. But if we settle what our preferred 
> syntax is, we can see if there’s time to deliver it before a release. Either 
> way, the syntax evolves on a consistent path.

Yes, that’s the idea.


On Jun 13, 2022, at 1:21 PM, bened...@apache.org 
wrote:

> Don’t call these transactions, the term implies things accord doesn’t do. 
> Maybe call them CAS BATCH, and terminate them with APPLY or APPLY IF.