Re: Tradeoffs for Cassandra transaction management

Jonathan Ellis Mon, 11 Oct 2021 08:28:39 -0700

Thanks for flagging that, Alex.  Here it is without trying to include the
inline table:

*After calling several times for a broader discussion of goals and
tradeoffs around transaction management in the CEP-15 thread, I’ve put
together a short analysis to kick that off.Here is a table that summarizes
the state of the art for distributed transactions that offer
serializability, i.e., a superset of what you can get with LWT.  (The most
interesting option that this eliminates is
RAMP.)https://imgur.com/a/SCZ8jex <https://imgur.com/a/SCZ8jex>(I have not
included Accord here because it’s not sufficiently clear to me how to
create a full transaction manager from the Accord protocol, so I can’t
analyze many of the properties such a system would have.  The most obvious
solution would be “Calvin but with Accord instead of Raft”, but since
Accord already does some Calvin-like things that seems like it would result
in some suboptimal redundancy.)After putting the above together it seems to
me that the two main areas of tradeoff are, 1. Is it worth giving up local
latencies to get full global consistency?  Most LWT use cases use
LOCAL_SERIAL.  While all of the above have more efficient designs than LWT,
it’s still true that global serialization will require 100+ms in the
general case due to physical transmission latency.  So a design that allows
local serialization with EC between regions, or a design (like SLOG) that
automatically infers a “home” region that can do local consensus in the
common case without giving up global serializability, is desirable.2. Is it
worth giving up the possibility of SQL support, to get the benefits of
deterministic transaction design?  To be clear, these benefits include very
significant ones around simplicity of design, higher write throughput, and
(in SLOG) lower read and write latencies.I’ll doubleclick on #2 because it
was asserted in the CEP-15 thread that Accord could support SQL by applying
known techniques on top.  This is mistaken.  Deterministic systems like
Calvin or SLOG or Accord can support queries where the rows affected are
not known in advance using a technique that Abadi calls OLLP (Optimistic
Lock Location Prediction), but this does not help when the transaction
logic is not known in advance.Here is Daniel Abadi’s explanation of OLLP
from “An Overview of Deterministic Database Systems
<https://cacm.acm.org/magazines/2018/9/230601-an-overview-of-deterministic-database-systems/fulltext?mobile=false>:”In
practice, deterministic database systems that use ordered locking do not
wait until runtime for transactions to determine their access-sets.
Instead, they use a technique called OLLP where if a transaction does not
know its access-sets in advance, it is not inserted into the input log.
Instead, it is run in a trial mode that does not write to the database
state, but determines what it would have read or written to if it was
actually being processed. It is then annotated with the access-sets
determined during the trial run, and submitted to the input log for actual
processing. In the actual run, every replica processes the transaction
deterministically, acquiring locks for the transaction based on the
estimate from the trial run. In some cases, database state may have changed
in a way that the access sets estimates are now incorrect. Since a
transaction cannot read or write data for which it does not have a lock, it
must abort as soon as it realizes that it acquired the wrong set of locks.
But since the transaction is being processed deterministically at this
point, every replica will independently come to the same conclusion that
the wrong set of locks were acquired, and will all independently decide to
abort the transaction. The transaction then gets resubmitted to the input
log with the new access-set estimates annotated.Clearly this does not work
if the server-visible logic changes between runs.  For instance, consider
this simple interactive transaction:cursor.execute("BEGIN
TRANSACTION")count = cursor.execute("SELECT count FROM inventory WHERE id =
1").result[0]if count > 0:    cursor.execute("UPDATE inventory SET count =
count - 1 WHERE id = 1")cursor.execute("COMMIT TRANSACTION")The first
problem is that it’s far from clear how to do a “trial run” of a
transaction that the server only knows pieces of at a time.  But even
worse, the server only knows that it got either a SELECT, or a SELECT
followed by an UPDATE.  It doesn’t know anything about the logic that would
drive a change in those statements.  So if the value read changes between
trial run and execution, there is no possibility of transparently retrying,
you’re just screwed and have to report failure.So Abadi concludes,[A]ll
recent [deterministic database] implementations have limited or no support
for interactive transactions, thereby preventing their use in many existing
deployments. If the advantages of deterministic database systems will be
realized in the coming years, one of two things must occur: either database
users must accept a stored procedure interface to the system [instead of
client-side SQL], or additional research must be performed in order to
enable improved support for interactive transactions.TLDR:We need to decide
if we want to give users local transaction latencies, either with an
approach inspired by SLOG or with tuneable serializability like LWT
(trading away global consistency).  I think the answer here is clearly Yes,
we have abundant evidence from LWT that people care a great deal about
latency, and specifically that they are willing to live with
cross-datacenter eventual consistency to get low local latencies.We also
need to decide if we eventually want to support full SQL.  I think this one
is less clear, there are strong arguments both ways.P.S. SLOG deserves more
attention. Here are links to the paper
<http://www.vldb.org/pvldb/vol12/p1747-ren.pdf>, Abadi’s writeup
<http://dbmsmusings.blogspot.com/2019/10/introducing-slog-cheating-low-latency.html>,
and Murat Demirbas’s reading group compares SLOG to something called Ocean
Vista that I’ve never heard of but which reminds me of Accord
<http://muratbuffalo.blogspot.com/2020/11/ocean-vista-gossip-based-visibility.html>.*

On Mon, Oct 11, 2021 at 9:37 AM Oleksandr Petrov <oleksandr.pet...@gmail.com>
wrote:

> I realise this is not contributing to this discussion, but this email is
> very difficult to read because it seems like something has happened with
> formatting. For me it gets displayed as a single paragraph with no line
> breaks.
>
> There seems to be some overlap between the image uploaded to imgur and this
> email, but some things are only present in the email and not on the image.
>
> On Sat, Oct 9, 2021 at 6:54 PM Jonathan Ellis <jbel...@gmail.com> wrote:
>
> > * Hi all,After calling several times for a broader discussion of goals
> and
> > tradeoffs around transaction management in the CEP-15 thread, I’ve put
> > together a short analysis to kick that off.Here is a table that
> summarizes
> > the state of the art for distributed transactions that offer
> > serializability, i.e., a superset of what you can get with LWT.  (The
> most
> > interesting option that this eliminates is RAMP.)Since I'm not sure how
> > this will render outside gmail, I've also uploaded it here:
> > https://imgur.com/a/SCZ8jex
> > <https://imgur.com/a/SCZ8jex>SpannerCockroachCalvin/FaunaSLOG (see
> > below)Write latencyGlobal Paxos, plus 2pc for multi-partition.For
> > intercontinental replication this is 100+ms.  Cloud Spanner does not
> allow
> > truly global deployments for this reason.Single-region Paxos, plus 2pc.
> > I’m not very clear on how this works but it results in non-strict
> > serializability.I didn’t find actual numbers for CR other than “2ms in a
> > single AZ” which is not a typical scenario.Global Raft.  Fauna posts
> actual
> > numbers of ~70ms in production which I assume corresponds to a
> multi-region
> > deployment with all regions in the USA.  SLOG paper says true global
> Calvin
> > is 200+ms.Single-region Paxos (common case) with fallback to multi-region
> > Paxos.Under 10ms.Scalability bottlenecksLocks held during cross-region
> > replicationSame as SpannerOLLP approach required when PKs are not known
> in
> > advance (mostly for indexed queries) -- results in retries under
> > contentionSame as CalvinRead latency at serial consistencyTimestamp from
> > Paxos leader (may be cross-region), then read from local replica.Same as
> > Spanner, I thinkSame as writesSame as writesMaximum serializability
> > flavorStrictUn-strictStrictStrictSupport for other isolation
> > levels?SnapshotNoSnapshot (in Fauna)Paper mentions dropping from
> > strict-serializable to only serializable.  Probably could also support
> > Snapshot like Fauna.Interactive transaction support (req’d for
> > SQL)YesYesNoNoPotential for grafting onto C*NightmareNightmareReasonable,
> > Calvin is relatively simple and the storage assumptions it makes are
> > minimalI haven’t thought about this enough. SLOG may require versioned
> > storage, e.g. see this comment
> > <
> >
> http://dbmsmusings.blogspot.com/2019/10/introducing-slog-cheating-low-latency.html?showComment=1570497003296#c5976719429355924873
> > >.(I
> > have not included Accord here because it’s not sufficiently clear to me
> how
> > to create a full transaction manager from the Accord protocol, so I can’t
> > analyze many of the properties such a system would have.  The most
> obvious
> > solution would be “Calvin but with Accord instead of Raft”, but since
> > Accord already does some Calvin-like things that seems like it would
> result
> > in some suboptimal redundancy.)After putting the above together it seems
> to
> > me that the two main areas of tradeoff are, 1. Is it worth giving up
> local
> > latencies to get full global consistency?  Most LWT use cases use
> > LOCAL_SERIAL.  While all of the above have more efficient designs than
> LWT,
> > it’s still true that global serialization will require 100+ms in the
> > general case due to physical transmission latency.  So a design that
> allows
> > local serialization with EC between regions, or a design (like SLOG) that
> > automatically infers a “home” region that can do local consensus in the
> > common case without giving up global serializability, is desirable.2. Is
> it
> > worth giving up the possibility of SQL support, to get the benefits of
> > deterministic transaction design?  To be clear, these benefits include
> very
> > significant ones around simplicity of design, higher write throughput,
> and
> > (in SLOG) lower read and write latencies.I’ll doubleclick on #2 because
> it
> > was asserted in the CEP-15 thread that Accord could support SQL by
> applying
> > known techniques on top.  This is mistaken.  Deterministic systems like
> > Calvin or SLOG or Accord can support queries where the rows affected are
> > not known in advance using a technique that Abadi calls OLLP (Optimistic
> > Lock Location Prediction), but this does not help when the transaction
> > logic is not known in advance.Here is Daniel Abadi’s explanation of OLLP
> > from “An Overview of Deterministic Database Systems
> > <
> >
> https://cacm.acm.org/magazines/2018/9/230601-an-overview-of-deterministic-database-systems/fulltext?mobile=false
> > >:”In
> > practice, deterministic database systems that use ordered locking do not
> > wait until runtime for transactions to determine their access-sets.
> > Instead, they use a technique called OLLP where if a transaction does not
> > know its access-sets in advance, it is not inserted into the input log.
> > Instead, it is run in a trial mode that does not write to the database
> > state, but determines what it would have read or written to if it was
> > actually being processed. It is then annotated with the access-sets
> > determined during the trial run, and submitted to the input log for
> actual
> > processing. In the actual run, every replica processes the transaction
> > deterministically, acquiring locks for the transaction based on the
> > estimate from the trial run. In some cases, database state may have
> changed
> > in a way that the access sets estimates are now incorrect. Since a
> > transaction cannot read or write data for which it does not have a lock,
> it
> > must abort as soon as it realizes that it acquired the wrong set of
> locks.
> > But since the transaction is being processed deterministically at this
> > point, every replica will independently come to the same conclusion that
> > the wrong set of locks were acquired, and will all independently decide
> to
> > abort the transaction. The transaction then gets resubmitted to the input
> > log with the new access-set estimates annotated.Clearly this does not
> work
> > if the server-visible logic changes between runs.  For instance, consider
> > this simple interactive transaction:cursor.execute("BEGIN
> > TRANSACTION")count = cursor.execute("SELECT count FROM inventory WHERE
> id =
> > 1").result[0]if count > 0:    cursor.execute("UPDATE inventory SET count
> =
> > count - 1 WHERE id = 1")cursor.execute("COMMIT TRANSACTION")The first
> > problem is that it’s far from clear how to do a “trial run” of a
> > transaction that the server only knows pieces of at a time.  But even
> > worse, the server only knows that it got either a SELECT, or a SELECT
> > followed by an UPDATE.  It doesn’t know anything about the logic that
> would
> > drive a change in those statements.  So if the value read changes between
> > trial run and execution, there is no possibility of transparently
> retrying,
> > you’re just screwed and have to report failure.So Abadi concludes,[A]ll
> > recent [deterministic database] implementations have limited or no
> support
> > for interactive transactions, thereby preventing their use in many
> existing
> > deployments. If the advantages of deterministic database systems will be
> > realized in the coming years, one of two things must occur: either
> database
> > users must accept a stored procedure interface to the system [instead of
> > client-side SQL], or additional research must be performed in order to
> > enable improved support for interactive transactions.TLDR:We need to
> decide
> > if we want to give users local transaction latencies, either with an
> > approach inspired by SLOG or with tuneable serializability like LWT
> > (trading away global consistency).  I think the answer here is clearly
> Yes,
> > we have abundant evidence from LWT that people care a great deal about
> > latency, and specifically that they are willing to live with
> > cross-datacenter eventual consistency to get low local latencies.We also
> > need to decide if we eventually want to support full SQL.  I think this
> one
> > is less clear, there are strong arguments both ways.P.S. SLOG deserves
> more
> > attention. Here are links to the paper
> > <http://www.vldb.org/pvldb/vol12/p1747-ren.pdf>, Abadi’s writeup
> > <
> >
> http://dbmsmusings.blogspot.com/2019/10/introducing-slog-cheating-low-latency.html
> > >,
> > and Murat Demirbas’s reading group compares SLOG to something called
> Ocean
> > Vista that I’ve never heard of but which reminds me of Accord
> > <
> >
> http://muratbuffalo.blogspot.com/2020/11/ocean-vista-gossip-based-visibility.html
> > >.*
> > --
> > Jonathan Ellis
> > co-founder, http://www.datastax.com
> > @spyced
> >
>
>
> --
> alex p
>

-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced

Re: Tradeoffs for Cassandra transaction management

Reply via email to