Re: Tradeoffs for Cassandra transaction management

Henrik Ingo Tue, 12 Oct 2021 13:55:16 -0700

Hi all

I was expecting to stay out of the way while a vote on CEP-15 seemed
imminent. But discussing this tradeoffs thread with Jonathan, he encouraged
me to say these points in my own words, so here we are.

On Sun, Oct 10, 2021 at 7:17 AM Blake Eggleston
<beggles...@apple.com.invalid> wrote:

> 1. Is it worth giving up local latencies to get full global consistency?
> Most LWT use cases use
> LOCAL_SERIAL.
>
> This isn’t a tradeoff that needs to be made. There’s nothing about Accord
> that prevents performing consensus in one DC and replicating the writes to
> others. That’s not in scope for the initial work, but there’s no reason it
> couldn’t be handled as a follow on if needed. I agree with Jeff that
> LOCAL_SERIAL and LWTs are not usually done with a full understanding of the
> implications, but there are some valid use cases. For instance, you can
> enable an OLAP service to operate against another DC without impacting the
> primary, assuming the service can tolerate inconsistency for data written
> since the last repair, and there are some others.
>
>
Let's start with the stated goal that CEP-15 is intended to be a better
version of LWT.

Reading all the discussion, I feel like addressing the LOCAL_SERIAL /
LOCAL_QUORUM use case is the one thing where Accord isn't strictly an
improvement over LWT. I don't agree that Accord will just be so much faster
anyway, that it would compensate a single network roundtrip around the
world. Four LWT round-trips with LOCAL_SERIAL will still only be on the
order of 10 ms, but global latencies for just a single round trip are
hundreds of ms.

So, my suggestion to resolve this discussion would be that "local quorum
latency experience" should be included in CEP-15 to meet its stated goal.
If I have understood the CEP process correctly, this merely means that we
agree this is a valid and significant use case in the Cassandra ecosystem.
It doesn't mean that everything in the CEP must be released in a single v1
release. At least personally I don't necessarily need to see a very
detailed design for the implementation. But I'm optimistic it would resolve
one open discussion if it was codified in the CEP that this is a use case
that needs to be addressed.

> 2. Is it worth giving up the possibility of SQL support, to get the
> benefits of deterministic transaction design?
>
> This is a false dilemma. Today, we’re proposing a deterministic
> transaction design that addresses some very common user pain points. SQL
> addresses different user pain point. If someone wants to add an sql
> implementation in the future they can a) build it on top of accord b)
> extend or improve accord or c) implement a separate system. The right
> choice will depend on their goals, but accord won’t prevent work on it, the
> same way the original lwt design isn’t preventing work on multi-partition
> transactions. In the worst case, if the goals of a hypothetical sql project
> are different enough to make them incompatible with accord, I don’t see any
> reason why we couldn’t have 2 separate consensus systems, so long as people
> are willing to maintain them and the use cases and available technologies
> justify it.
>

The part of the discussion that's hard to deal with is "SQL support",
"interactive transactions", or "complex transactions". Even if this is out
of scope for CEP-15, it's a valid question to ask whether Accord would
possibly help, but at least not prevent such future work. (The context
being, Jonathan and myself both think of this as an important long term
goal. You may have figured this out already!)

There are various ways we can get more insight into this question, but
realistically writing a complete CEP (or a dozen CEPs) on "full SQL
support" isn't one of them. On the other hand it seems CEP-15 itself
proposes a conservative approach of developing first version(s) in a
separate repository, from where it could then prove its usefulness! I feel
like the authors have already proposed a conservative approach there that
we can probably work with even without perfect knowledge of the future.

An idea I've been thinking about for a few days is, what would it take to
implement interactive READ COMMITTED transactions on top of Accord? Now,
this may not be an isolation level we want to market as the cool flagship
feature. BUT this exercise does feel meaningful in a few ways:

* First of all, READ COMMITTED *is* a real isolation level in the SQL
standard. So arguably this would be an existence proof of interactive SQL
transactions built on top of Accord.

* It's even the default isolation level in PostgeSQL still today.

* An implementation of such transactions could even be used to benchmark
the performance of such transactions and would give an approximation of how
well Accord is suited for this task. This performance would be "best case"
in the sense that I would expect Snapshot and Serializeable to have worse
performance, but that overhead can be considered as inherent in the
isolation level rather than a fault of Accord.

* Implementing READ COMMITTED transactions on top of Accord is rather
straightforward and can be described and discussed in this email thread,
which could hopefully contribute to our understanding of the problem space.
(Could also be a real CEP, if we think it's a useful first step for
interactive transactions, but for now I'm dumping it here just to try to
bring a concrete example into the discussion.)

Goal: READ COMMITTED interactive transactions

Dependency: Assume a Cassandra database with CEP-15 implemented.

Approach: The conversational part of the transaction is a sequence of
regular Cassandra reads and writes. Mutations are however executed as
read-queries toward the database nodes. Database state isn't modified
during the conversational phase, rather the primary keys of the
to-be-mutated rows are stored for later use. Accord is essentially the
commit phase of the transaction. All primary keys to be updated are the
write set of  the Accord transaction. There's no need to re-execute the
reads, so the read set is empty.

We define READ COMMITTED as "whatever is returned by Cassandra when
executing the query (with QUORUM consistency)". In other words, this
functionality doesn't require any changes to the storage engine or other
fundamental changes to Cassandra. The Accord commit is guaranteed to
succeed per design and the READ COMMITTED transaction doesn't add any
additional checks for conflicts. As such, this functionality remains
abort-free.

Proposed Changes: A transaction manager is added to the coordinator, with
the following functionality:

BEGIN - initialize transaction state in the coordinator. After a BEGIN
statement, the following commands are modified as follows:

INSERT, UPDATE, DELETE: Transform to an equivalent SELECT, returning the
primary key columns. Store the original command (INSERT, etc…) and the
returned primary keys into write set.

SELECT - no changes, except for read your own writes. The results of a
SELECT query are returned to the client, but there's no need to store the
results in the transaction state.

Transaction reads its own writes - For each SELECT the coordinator will
overlay the current write set onto the query results. You can think of the
write set as another memtable at Level -1.

Secondary indexes are supported without any additional work needed.

COMMIT - Perform a regular Accord transaction, using the above write set as
the Accord write set. The read set is empty. The commit is guaranteed to
succeed. In the end, clear state on the coordinator.

New interfaces: BEGIN and COMMIT. ROLLBACK. Maybe some command to declare
READ COMMITTED isolation level and to get the current isolation level.

Future work: A motivation for the above proposal is that the same scheme
could be extended to support SNAPSHOT ISOLATION transactions. This would
require MVCC support from the storage engine.

---

It would be interesting to hear from list members whether the above appears
to understand Accord (and SQL) correctly or whether I'm missing something?

henrik

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>

Re: Tradeoffs for Cassandra transaction management

Reply via email to