Sorry Jonathan, didn't see this reply earlier today. That would be common behaviour for many MVCC databases, including MongoDB, MySQL Galera Cluster, PostgreSQL...
https://www.postgresql.org/docs/9.5/transaction-iso.html *"Applications using this level must be prepared to retry transactions due to serialization failures."* On Wed, Oct 13, 2021 at 3:19 AM Jonathan Ellis <jbel...@gmail.com> wrote: > Hi Henrik, > > I don't see how this resolves the fundamental problem that I outlined to > start with, namely, that without having the entire logic of the transaction > available to it, the server cannot retry the transaction when concurrent > changes are found to have been applied after the reconnaissance reads (what > you call the conversational phase). > > On Tue, Oct 12, 2021 at 3:55 PM Henrik Ingo <henrik.i...@datastax.com> > wrote: > > > Hi all > > > > I was expecting to stay out of the way while a vote on CEP-15 seemed > > imminent. But discussing this tradeoffs thread with Jonathan, he > encouraged > > me to say these points in my own words, so here we are. > > > > > > On Sun, Oct 10, 2021 at 7:17 AM Blake Eggleston > > <beggles...@apple.com.invalid> wrote: > > > > > 1. Is it worth giving up local latencies to get full global > consistency? > > > Most LWT use cases use > > > LOCAL_SERIAL. > > > > > > This isn’t a tradeoff that needs to be made. There’s nothing about > Accord > > > that prevents performing consensus in one DC and replicating the writes > > to > > > others. That’s not in scope for the initial work, but there’s no reason > > it > > > couldn’t be handled as a follow on if needed. I agree with Jeff that > > > LOCAL_SERIAL and LWTs are not usually done with a full understanding of > > the > > > implications, but there are some valid use cases. For instance, you can > > > enable an OLAP service to operate against another DC without impacting > > the > > > primary, assuming the service can tolerate inconsistency for data > written > > > since the last repair, and there are some others. > > > > > > > > Let's start with the stated goal that CEP-15 is intended to be a better > > version of LWT. > > > > Reading all the discussion, I feel like addressing the LOCAL_SERIAL / > > LOCAL_QUORUM use case is the one thing where Accord isn't strictly an > > improvement over LWT. I don't agree that Accord will just be so much > faster > > anyway, that it would compensate a single network roundtrip around the > > world. Four LWT round-trips with LOCAL_SERIAL will still only be on the > > order of 10 ms, but global latencies for just a single round trip are > > hundreds of ms. > > > > So, my suggestion to resolve this discussion would be that "local quorum > > latency experience" should be included in CEP-15 to meet its stated goal. > > If I have understood the CEP process correctly, this merely means that we > > agree this is a valid and significant use case in the Cassandra > ecosystem. > > It doesn't mean that everything in the CEP must be released in a single > v1 > > release. At least personally I don't necessarily need to see a very > > detailed design for the implementation. But I'm optimistic it would > resolve > > one open discussion if it was codified in the CEP that this is a use case > > that needs to be addressed. > > > > > > > 2. Is it worth giving up the possibility of SQL support, to get the > > > benefits of deterministic transaction design? > > > > > > This is a false dilemma. Today, we’re proposing a deterministic > > > transaction design that addresses some very common user pain points. > SQL > > > addresses different user pain point. If someone wants to add an sql > > > implementation in the future they can a) build it on top of accord b) > > > extend or improve accord or c) implement a separate system. The right > > > choice will depend on their goals, but accord won’t prevent work on it, > > the > > > same way the original lwt design isn’t preventing work on > multi-partition > > > transactions. In the worst case, if the goals of a hypothetical sql > > project > > > are different enough to make them incompatible with accord, I don’t see > > any > > > reason why we couldn’t have 2 separate consensus systems, so long as > > people > > > are willing to maintain them and the use cases and available > technologies > > > justify it. > > > > > > > > > > > The part of the discussion that's hard to deal with is "SQL support", > > "interactive transactions", or "complex transactions". Even if this is > out > > of scope for CEP-15, it's a valid question to ask whether Accord would > > possibly help, but at least not prevent such future work. (The context > > being, Jonathan and myself both think of this as an important long term > > goal. You may have figured this out already!) > > > > There are various ways we can get more insight into this question, but > > realistically writing a complete CEP (or a dozen CEPs) on "full SQL > > support" isn't one of them. On the other hand it seems CEP-15 itself > > proposes a conservative approach of developing first version(s) in a > > separate repository, from where it could then prove its usefulness! I > feel > > like the authors have already proposed a conservative approach there that > > we can probably work with even without perfect knowledge of the future. > > > > > > > > An idea I've been thinking about for a few days is, what would it take to > > implement interactive READ COMMITTED transactions on top of Accord? Now, > > this may not be an isolation level we want to market as the cool flagship > > feature. BUT this exercise does feel meaningful in a few ways: > > > > * First of all, READ COMMITTED *is* a real isolation level in the SQL > > standard. So arguably this would be an existence proof of interactive SQL > > transactions built on top of Accord. > > > > * It's even the default isolation level in PostgeSQL still today. > > > > * An implementation of such transactions could even be used to benchmark > > the performance of such transactions and would give an approximation of > how > > well Accord is suited for this task. This performance would be "best > case" > > in the sense that I would expect Snapshot and Serializeable to have worse > > performance, but that overhead can be considered as inherent in the > > isolation level rather than a fault of Accord. > > > > * Implementing READ COMMITTED transactions on top of Accord is rather > > straightforward and can be described and discussed in this email thread, > > which could hopefully contribute to our understanding of the problem > space. > > (Could also be a real CEP, if we think it's a useful first step for > > interactive transactions, but for now I'm dumping it here just to try to > > bring a concrete example into the discussion.) > > > > > > > > Goal: READ COMMITTED interactive transactions > > > > Dependency: Assume a Cassandra database with CEP-15 implemented. > > > > > > Approach: The conversational part of the transaction is a sequence of > > regular Cassandra reads and writes. Mutations are however executed as > > read-queries toward the database nodes. Database state isn't modified > > during the conversational phase, rather the primary keys of the > > to-be-mutated rows are stored for later use. Accord is essentially the > > commit phase of the transaction. All primary keys to be updated are the > > write set of the Accord transaction. There's no need to re-execute the > > reads, so the read set is empty. > > > > We define READ COMMITTED as "whatever is returned by Cassandra when > > executing the query (with QUORUM consistency)". In other words, this > > functionality doesn't require any changes to the storage engine or other > > fundamental changes to Cassandra. The Accord commit is guaranteed to > > succeed per design and the READ COMMITTED transaction doesn't add any > > additional checks for conflicts. As such, this functionality remains > > abort-free. > > > > > > Proposed Changes: A transaction manager is added to the coordinator, with > > the following functionality: > > > > BEGIN - initialize transaction state in the coordinator. After a BEGIN > > statement, the following commands are modified as follows: > > > > INSERT, UPDATE, DELETE: Transform to an equivalent SELECT, returning the > > primary key columns. Store the original command (INSERT, etc…) and the > > returned primary keys into write set. > > > > SELECT - no changes, except for read your own writes. The results of a > > SELECT query are returned to the client, but there's no need to store the > > results in the transaction state. > > > > Transaction reads its own writes - For each SELECT the coordinator will > > overlay the current write set onto the query results. You can think of > the > > write set as another memtable at Level -1. > > > > Secondary indexes are supported without any additional work needed. > > > > COMMIT - Perform a regular Accord transaction, using the above write set > as > > the Accord write set. The read set is empty. The commit is guaranteed to > > succeed. In the end, clear state on the coordinator. > > > > New interfaces: BEGIN and COMMIT. ROLLBACK. Maybe some command to declare > > READ COMMITTED isolation level and to get the current isolation level. > > > > > > Future work: A motivation for the above proposal is that the same scheme > > could be extended to support SNAPSHOT ISOLATION transactions. This would > > require MVCC support from the storage engine. > > > > > > > > --- > > > > It would be interesting to hear from list members whether the above > appears > > to understand Accord (and SQL) correctly or whether I'm missing > something? > > > > henrik > > > > > > -- > > > > Henrik Ingo > > > > +358 40 569 7354 <358405697354> > > > > [image: Visit us online.] <https://www.datastax.com/> [image: Visit us > on > > Twitter.] <https://twitter.com/DataStaxEng> [image: Visit us on > YouTube.] > > < > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e= > > > > > [image: Visit my LinkedIn profile.] < > https://urldefense.com/v3/__https://www.linkedin.com/in/heingo/__;!!PbtH5S7Ebw!MMVW9XMvdNiSsGMzANXPW8LZVyKo5VqBSfUxNQ5jBwo1jm6KaZD9DYC-25BgNSlOHyo$ > > > > > > -- > Jonathan Ellis > co-founder, http://www.datastax.com > @spyced > -- Henrik Ingo +358 40 569 7354 <358405697354> [image: Visit us online.] <https://www.datastax.com/> [image: Visit us on Twitter.] <https://twitter.com/DataStaxEng> [image: Visit us on YouTube.] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=> [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>