Hey Marek,

you’ve put a lot of effort in your proposal and first of all: Thanks for that!

If I understood you right, your proposal is about full ACID support ; able to 
handle multiple statements in a single transaction.

Cassandra is a database used in distributed environments - a few servers 
sitting nearby to each other up to 1000+ servers on the other side of the globe 
connected via a more or probably less reliable network connection. Cassandra is 
also a database that can handle many millions of operations per second.
Why do I say that?
If you now imagine that _one_ transaction manager is no longer reachable by the 
client/coordinator or the involved replica nodes - what do you do?
I mean, you might be able to implement the A and D - but what about I and C?

IMO there is no way around a global lock/transaction manager for ACID. You 
could optimize some things and take some load of “the central” instance - but 
you end up coordinating stuff at that transaction manager.
Implementing distributed locks is a nightmare - especially if the lock spans 
the whole world.

Proposing transactions for C* (full ACID or not) would need a thorough concept 
covering all these “little nice” edge cases that can happen in a huge 
distributed system and also deals with the problems that arise with many 10k 
clients. Just think of: clients fail/stuck, nodes fail/stuck, network gets 
broken, partitioned, flappy etc etc.

Robert


> On 07 Aug 2015, at 09:18, Marek Lewandowski <marekmlewandow...@gmail.com> 
> wrote:
> 
> Hello everyone,
> 
> *TL;DR;* I want to develop transactions (similar to those relational ones)
> for Cassandra, I have some ideas and I'd like to hear your feedback.
> 
> *Long story short:* I want to develop prototype of solution that features
> transactions spanning multiple Cassandra partitions resembling those in
> relational databases. I understand that in Cassandra's world such
> transactions will be quite a bit different than in relational db world.
> That prototype or research if you will, is subject of my master's thesis.
> 
> It's been some time since I've been dancing around that subject and
> postponing it, but what I understood in the process is that I cannot do
> anything useful being hidden from community's feedback. I want you to
> provide feedback on my ideas. You also have a power of changing it
> completely because first and foremost _I want to develop something actually
> useful, not only get my master's degree._
> 
> To not scare anyone with huge wall of text, for now I'll just post very
> brief description of my ideas and longer block of text describing how I can
> see committing and rolling back data. I have more detailed descriptions
> prepared already, but I don't want to share 4 pages of text in one go.
> 
> Scope of the project (assuming I'll do it alone) is an experiment, not
> production ready solution.
> Such experiment can use any tools possible to actually perform it.
> 
> Baseline for any idea is:
> - asynchronous execution - think Akka and actors with non blocking
> execution and message passing
> - doesn't have to be completely transparent for end user - solution may
> enforce certain workflow of interacting with it and/or introduce new
> concepts (like akka messages instead of CQL and binary protocol).
> 
> So far after reading a lot and thinking even more I have two ideas that I'd
> like to share.
> 
> ### Ideas are (with brief description, details will come later): ###
> 
> 
> #### Idea 1: Event streaming ####
> - Imagine that every modiifcation is represented by an _Event_.
> - Imagine you can group these events into _Event Groups_.
> - Imagine that such groups are time series data
> - Imagine you can read such groups as a stream of events (think reactive
> stream)
> 
> Idea is that: you don't need to lock data when you are sure there is no one
> else to compete with.
> 
> There is 1 guy called _Cursor_ that reads Event Stream and executes Event
> Groups one by one advacing its position on the stream when Event Group has
> been executed.
> 
> Seems like a system where you have only 1 transaction at any given time,
> but there are many areas to optimize that and to allow more than that.
> However I'll stop here.
> 
> #### Idea 2: Locking data ####
> - uses additional tables to acquire read/write locks
> - seperate tables to append modifications - as in "Rollback: Appending to
> seperate table."
> - supports different isolation levels.
> - more traditional approach, kind of translation of traditional locking to
> cassandra reality.
> 
> 
> -------------------------------------------------------------------------------------------
> Common part of two is approach to doing commit and rollback.
> ###  Doing Rollback and commit ###
> I have two ideas for rollback. I like 2nd one more, because it is simpler
> and potentially faster.
> 
> #### Rollback: Query rewriting ####
> It modifies original data, but before that it copies original data so that
> state can be restored. Then when failure is detected, modification query
> can be rewritten so that original data can be restored.
> 
> Query rewriting seems like a complex functionality. I tried few simple and
> a little bit more complex statements and in general for basic stuff
> algorithm is not that complicated, but to support everything CQL has to
> offer it might be hard.
> 
> Still such transactional system might have some restrictions over CQL
> statements used, because first of all when someone wants to have these
> transactions they already want something non standard.
> 
> I will skip details of that approach for now.
> 
> #### Rollback: Appending to seperate table. ####
> Image we have table A that we want to have transactions on.
> This requires another table A_tx which has same schema as A, but has *1
> more clustering column* and few new columns. A_tx will be additionally
> clustered by transaction id.
> New columns are:
> - is_committed boolean
> - is_rolledback boolean
> - is_applied boolean
> 
> General idea is:
> 1. During transaction append changes to XXX_tx tables.
> 2. For rollback: nothing needs to be done (besides cleaning XXX_tx tables
> of useless data scheduled for rollback)
> 3. For commit: rows in each XXX_tx are marked as committed. This can be
> done using BATCH update so that all rows affected by transactions are
> committed. These changes will be eventually merged back into original row.
> 
> Committed changes are visible during query, because query has to select
> from 2 tables. If you query for XXX table then you have to query that
> table, but also XXX_TX and get all committed data, merge result and return
> that to client.
> 
> Here committed data eventually lands into proper row - during read as
> background process for example (this is this is_applied column) results are
> merged and inserted into original row, plus additionally modifications can
> be marked as _applied_.
> Uncommitted data can also be eventually cleaned up.
> 
> *Important note:* since partitioning key stays the same for {{XXX}} table
> and {{XXX_TX}} table, data will reside on same nodes so that queries and
> application of data can be done locally.
> 
> ### What happens next ###
> Assuming I get any feedback I'll post more detailed descriptions of two
> approaches.
> 
> I would love to hear your feedback on whole subject. Just to begin
> discussion and pick your interest.
> 
> What you think about having more heavy transactions?
> Does this experiment has sense at all?
> 
> regards
> -- 
> Marek Lewandowski

—
Robert Stupp
@snazy

Reply via email to