The problem that I keep pointing out is that you've created this CEP for Accord without first getting consensus that the goals and the tradeoffs it makes to achieve those goals (and that it will impose on future work around transactions) are the right ones for Cassandra long term.
At this point I'm done repeating myself. For the convenience of anyone following this thread intermittently, I'll quote my first reply on this thread to illustrate the kind of discussion I'd like to have. ----- The whitepaper here is a good description of the consensus algorithm itself as well as its robustness and stability characteristics, and its comparison with other state-of-the-art consensus algorithms is very useful. In the context of Cassandra, where a consensus algorithm is only part of what will be implemented, I'd like to see a more complete evaluation of the transactional side of things as well, including performance characteristics as well as the types of transactions that can be supported and at least a general idea of what it would look like applied to Cassandra. This will allow the PMC to make a more informed decision about what tradeoffs are best for the entire long-term project of first supplementing and ultimately replacing LWT. (Allowing users to mix LWT and AP Cassandra operations against the same rows was probably a mistake, so in contrast with LWT we’re not looking for something fast enough for occasional use but rather something within a reasonable factor of AP operations, appropriate to being the only way to interact with tables declared as such.) Besides Accord, this should cover - Calvin and FaunaDB - A Spanner derivative (no opinion on whether that should be Cockroach or Yugabyte, I don’t think it’s necessary to cover both) - A 2PC implementation (the Accord paper mentions DynamoDB but I suspect there is more public information about MongoDB) - RAMP Here’s an example of what I mean: =Calvin= Approach: global consensus (Paxos in Calvin, Raft in FaunaDB) to order transactions, then replicas execute the transactions independently with no further coordination. No SPOF. Transactions are batched by each sequencer to keep this from becoming a bottleneck. Performance: Calvin paper (published 2012) reports linear scaling of TPC-C New Order up to 500,000 transactions/s on 100 machines (EC2 XL machines with 7GB ram and 8 virtual cores). Note that TPC-C New Order is composed of four reads and four writes, so this is effectively 2M reads and 2M writes as we normally measure them in C*. Calvin supports mixed read/write transactions, but because the transaction execution logic requires knowing all partition keys in advance to ensure that all replicas can reproduce the same results with no coordination, reads against non-PK predicates must be done ahead of time (transparently, by the server) to determine the set of keys, and this must be retried if the set of rows affected is updated before the actual transaction executes. Batching and global consensus adds latency -- 100ms in the Calvin paper and apparently about 50ms in FaunaDB. Glass half full: all transactions (including multi-partition updates) are equally performant in Calvin since the coordination is handled up front in the sequencing step. Glass half empty: even single-row reads and writes have to pay the full coordination cost. Fauna has optimized this away for reads but I am not aware of a description of how they changed the design to allow this. Functionality and limitations: since the entire transaction must be known in advance to allow coordination-less execution at the replicas, Calvin cannot support interactive transactions at all. FaunaDB mitigates this by allowing server-side logic to be included, but a Calvin approach will never be able to offer SQL compatibility. Guarantees: Calvin transactions are strictly serializable. There is no additional complexity or performance hit to generalizing to multiple regions, apart from the speed of light. And since Calvin is already paying a batching latency penalty, this is less painful than for other systems. Application to Cassandra: B-. Distributed transactions are handled by the sequencing and scheduling layers, which are leaderless, and Calvin’s requirements for the storage layer are easily met by C*. But Calvin also requires a global consensus protocol and LWT is almost certainly not sufficiently performant, so this would require ZK or etcd (reasonable for a library approach but not for replacing LWT in C* itself), or an implementation of Accord. I don’t believe Calvin would require additional table-level metadata in Cassandra. On Wed, Oct 6, 2021 at 9:53 AM bened...@apache.org <bened...@apache.org> wrote: > The problem with dropping a patch on Jira is that there is no opportunity > to point out problems, either with the fundamental approach or with the > specific implementation. So please point out some problems I can engage > with! > > > From: Jonathan Ellis <jbel...@gmail.com> > Date: Wednesday, 6 October 2021 at 15:48 > To: dev <dev@cassandra.apache.org> > Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions > On Wed, Oct 6, 2021 at 9:21 AM bened...@apache.org <bened...@apache.org> > wrote: > > > The goals of the CEP are stated clearly, and these were the goals we had > > going into the (multi-month) research project we undertook before > proposing > > this CEP. These goals are necessarily value judgements, so we cannot > expect > > that everyone will agree that they are optimal. > > > > Right, so I'm saying that this is exactly the most important thing to get > consensus on, and creating a CEP for a protocol to achieve goals that you > have not discussed with the community is the CEP equivalent of dropping a > patch on Jira without discussing its goals either. > > That's why our conversations haven't gone anywhere, because I keep saying > "we need discuss the goals and tradeoffs", and I'll give an example of what > I mean, and you keep addressing the examples (sometimes very shallowly, "it > would be possible to X" or "Y could be done as an optimization") while > ignoring the request to open a discussion around the big picture. > -- Jonathan Ellis co-founder, http://www.datastax.com @spyced