The problem that I keep pointing out is that you've created this CEP for
Accord without first getting consensus that the goals and the tradeoffs it
makes to achieve those goals (and that it will impose on future work around
transactions) are the right ones for Cassandra long term.

At this point I'm done repeating myself.  For the convenience of anyone
following this thread intermittently, I'll quote my first reply on this
thread to illustrate the kind of discussion I'd like to have.

-----

The whitepaper here is a good description of the consensus algorithm itself
as well as its robustness and stability characteristics, and its comparison
with other state-of-the-art consensus algorithms is very useful.  In the
context of Cassandra, where a consensus algorithm is only part of what will
be implemented, I'd like to see a more complete evaluation of the
transactional side of things as well, including performance characteristics
as well as the types of transactions that can be supported and at least a
general idea of what it would look like applied to Cassandra. This will
allow the PMC to make a more informed decision about what tradeoffs are
best for the entire long-term project of first supplementing and ultimately
replacing LWT.

(Allowing users to mix LWT and AP Cassandra operations against the same
rows was probably a mistake, so in contrast with LWT we’re not looking for
something fast enough for occasional use but rather something within a
reasonable factor of AP operations, appropriate to being the only way to
interact with tables declared as such.)

Besides Accord, this should cover

- Calvin and FaunaDB
- A Spanner derivative (no opinion on whether that should be Cockroach or
Yugabyte, I don’t think it’s necessary to cover both)
- A 2PC implementation (the Accord paper mentions DynamoDB but I suspect
there is more public information about MongoDB)
- RAMP

Here’s an example of what I mean:

=Calvin=

Approach: global consensus (Paxos in Calvin, Raft in FaunaDB) to order
transactions, then replicas execute the transactions independently with no
further coordination.  No SPOF.  Transactions are batched by each sequencer
to keep this from becoming a bottleneck.

Performance: Calvin paper (published 2012) reports linear scaling of TPC-C
New Order up to 500,000 transactions/s on 100 machines (EC2 XL machines
with 7GB ram and 8 virtual cores).  Note that TPC-C New Order is composed
of four reads and four writes, so this is effectively 2M reads and 2M
writes as we normally measure them in C*.

Calvin supports mixed read/write transactions, but because the transaction
execution logic requires knowing all partition keys in advance to ensure
that all replicas can reproduce the same results with no coordination,
reads against non-PK predicates must be done ahead of time (transparently,
by the server) to determine the set of keys, and this must be retried if
the set of rows affected is updated before the actual transaction executes.

Batching and global consensus adds latency -- 100ms in the Calvin paper and
apparently about 50ms in FaunaDB.  Glass half full: all transactions
(including multi-partition updates) are equally performant in Calvin since
the coordination is handled up front in the sequencing step.  Glass half
empty: even single-row reads and writes have to pay the full coordination
cost.  Fauna has optimized this away for reads but I am not aware of a
description of how they changed the design to allow this.

Functionality and limitations: since the entire transaction must be known
in advance to allow coordination-less execution at the replicas, Calvin
cannot support interactive transactions at all.  FaunaDB mitigates this by
allowing server-side logic to be included, but a Calvin approach will never
be able to offer SQL compatibility.

Guarantees: Calvin transactions are strictly serializable.  There is no
additional complexity or performance hit to generalizing to multiple
regions, apart from the speed of light.  And since Calvin is already paying
a batching latency penalty, this is less painful than for other systems.

Application to Cassandra: B-.  Distributed transactions are handled by the
sequencing and scheduling layers, which are leaderless, and Calvin’s
requirements for the storage layer are easily met by C*.  But Calvin also
requires a global consensus protocol and LWT is almost certainly not
sufficiently performant, so this would require ZK or etcd (reasonable for a
library approach but not for replacing LWT in C* itself), or an
implementation of Accord.  I don’t believe Calvin would require additional
table-level metadata in Cassandra.

On Wed, Oct 6, 2021 at 9:53 AM bened...@apache.org <bened...@apache.org>
wrote:

> The problem with dropping a patch on Jira is that there is no opportunity
> to point out problems, either with the fundamental approach or with the
> specific implementation. So please point out some problems I can engage
> with!
>
>
> From: Jonathan Ellis <jbel...@gmail.com>
> Date: Wednesday, 6 October 2021 at 15:48
> To: dev <dev@cassandra.apache.org>
> Subject: Re: [DISCUSS] CEP-15: General Purpose Transactions
> On Wed, Oct 6, 2021 at 9:21 AM bened...@apache.org <bened...@apache.org>
> wrote:
>
> > The goals of the CEP are stated clearly, and these were the goals we had
> > going into the (multi-month) research project we undertook before
> proposing
> > this CEP. These goals are necessarily value judgements, so we cannot
> expect
> > that everyone will agree that they are optimal.
> >
>
> Right, so I'm saying that this is exactly the most important thing to get
> consensus on, and creating a CEP for a protocol to achieve goals that you
> have not discussed with the community is the CEP equivalent of dropping a
> patch on Jira without discussing its goals either.
>
> That's why our conversations haven't gone anywhere, because I keep saying
> "we need discuss the goals and tradeoffs", and I'll give an example of what
> I mean, and you keep addressing the examples (sometimes very shallowly, "it
> would be possible to X" or "Y could be done as an optimization") while
> ignoring the request to open a discussion around the big picture.
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced

Reply via email to