Hi all I was expecting to stay out of the way while a vote on CEP-15 seemed imminent. But discussing this tradeoffs thread with Jonathan, he encouraged me to say these points in my own words, so here we are.
On Sun, Oct 10, 2021 at 7:17 AM Blake Eggleston <beggles...@apple.com.invalid> wrote: > 1. Is it worth giving up local latencies to get full global consistency? > Most LWT use cases use > LOCAL_SERIAL. > > This isn’t a tradeoff that needs to be made. There’s nothing about Accord > that prevents performing consensus in one DC and replicating the writes to > others. That’s not in scope for the initial work, but there’s no reason it > couldn’t be handled as a follow on if needed. I agree with Jeff that > LOCAL_SERIAL and LWTs are not usually done with a full understanding of the > implications, but there are some valid use cases. For instance, you can > enable an OLAP service to operate against another DC without impacting the > primary, assuming the service can tolerate inconsistency for data written > since the last repair, and there are some others. > > Let's start with the stated goal that CEP-15 is intended to be a better version of LWT. Reading all the discussion, I feel like addressing the LOCAL_SERIAL / LOCAL_QUORUM use case is the one thing where Accord isn't strictly an improvement over LWT. I don't agree that Accord will just be so much faster anyway, that it would compensate a single network roundtrip around the world. Four LWT round-trips with LOCAL_SERIAL will still only be on the order of 10 ms, but global latencies for just a single round trip are hundreds of ms. So, my suggestion to resolve this discussion would be that "local quorum latency experience" should be included in CEP-15 to meet its stated goal. If I have understood the CEP process correctly, this merely means that we agree this is a valid and significant use case in the Cassandra ecosystem. It doesn't mean that everything in the CEP must be released in a single v1 release. At least personally I don't necessarily need to see a very detailed design for the implementation. But I'm optimistic it would resolve one open discussion if it was codified in the CEP that this is a use case that needs to be addressed. > 2. Is it worth giving up the possibility of SQL support, to get the > benefits of deterministic transaction design? > > This is a false dilemma. Today, we’re proposing a deterministic > transaction design that addresses some very common user pain points. SQL > addresses different user pain point. If someone wants to add an sql > implementation in the future they can a) build it on top of accord b) > extend or improve accord or c) implement a separate system. The right > choice will depend on their goals, but accord won’t prevent work on it, the > same way the original lwt design isn’t preventing work on multi-partition > transactions. In the worst case, if the goals of a hypothetical sql project > are different enough to make them incompatible with accord, I don’t see any > reason why we couldn’t have 2 separate consensus systems, so long as people > are willing to maintain them and the use cases and available technologies > justify it. > The part of the discussion that's hard to deal with is "SQL support", "interactive transactions", or "complex transactions". Even if this is out of scope for CEP-15, it's a valid question to ask whether Accord would possibly help, but at least not prevent such future work. (The context being, Jonathan and myself both think of this as an important long term goal. You may have figured this out already!) There are various ways we can get more insight into this question, but realistically writing a complete CEP (or a dozen CEPs) on "full SQL support" isn't one of them. On the other hand it seems CEP-15 itself proposes a conservative approach of developing first version(s) in a separate repository, from where it could then prove its usefulness! I feel like the authors have already proposed a conservative approach there that we can probably work with even without perfect knowledge of the future. An idea I've been thinking about for a few days is, what would it take to implement interactive READ COMMITTED transactions on top of Accord? Now, this may not be an isolation level we want to market as the cool flagship feature. BUT this exercise does feel meaningful in a few ways: * First of all, READ COMMITTED *is* a real isolation level in the SQL standard. So arguably this would be an existence proof of interactive SQL transactions built on top of Accord. * It's even the default isolation level in PostgeSQL still today. * An implementation of such transactions could even be used to benchmark the performance of such transactions and would give an approximation of how well Accord is suited for this task. This performance would be "best case" in the sense that I would expect Snapshot and Serializeable to have worse performance, but that overhead can be considered as inherent in the isolation level rather than a fault of Accord. * Implementing READ COMMITTED transactions on top of Accord is rather straightforward and can be described and discussed in this email thread, which could hopefully contribute to our understanding of the problem space. (Could also be a real CEP, if we think it's a useful first step for interactive transactions, but for now I'm dumping it here just to try to bring a concrete example into the discussion.) Goal: READ COMMITTED interactive transactions Dependency: Assume a Cassandra database with CEP-15 implemented. Approach: The conversational part of the transaction is a sequence of regular Cassandra reads and writes. Mutations are however executed as read-queries toward the database nodes. Database state isn't modified during the conversational phase, rather the primary keys of the to-be-mutated rows are stored for later use. Accord is essentially the commit phase of the transaction. All primary keys to be updated are the write set of the Accord transaction. There's no need to re-execute the reads, so the read set is empty. We define READ COMMITTED as "whatever is returned by Cassandra when executing the query (with QUORUM consistency)". In other words, this functionality doesn't require any changes to the storage engine or other fundamental changes to Cassandra. The Accord commit is guaranteed to succeed per design and the READ COMMITTED transaction doesn't add any additional checks for conflicts. As such, this functionality remains abort-free. Proposed Changes: A transaction manager is added to the coordinator, with the following functionality: BEGIN - initialize transaction state in the coordinator. After a BEGIN statement, the following commands are modified as follows: INSERT, UPDATE, DELETE: Transform to an equivalent SELECT, returning the primary key columns. Store the original command (INSERT, etc…) and the returned primary keys into write set. SELECT - no changes, except for read your own writes. The results of a SELECT query are returned to the client, but there's no need to store the results in the transaction state. Transaction reads its own writes - For each SELECT the coordinator will overlay the current write set onto the query results. You can think of the write set as another memtable at Level -1. Secondary indexes are supported without any additional work needed. COMMIT - Perform a regular Accord transaction, using the above write set as the Accord write set. The read set is empty. The commit is guaranteed to succeed. In the end, clear state on the coordinator. New interfaces: BEGIN and COMMIT. ROLLBACK. Maybe some command to declare READ COMMITTED isolation level and to get the current isolation level. Future work: A motivation for the above proposal is that the same scheme could be extended to support SNAPSHOT ISOLATION transactions. This would require MVCC support from the storage engine. --- It would be interesting to hear from list members whether the above appears to understand Accord (and SQL) correctly or whether I'm missing something? henrik -- Henrik Ingo +358 40 569 7354 <358405697354> [image: Visit us online.] <https://www.datastax.com/> [image: Visit us on Twitter.] <https://twitter.com/DataStaxEng> [image: Visit us on YouTube.] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=> [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>