Re: I want to develop transactions for Cassandra and I want your feedback

Marek Lewandowski Fri, 07 Aug 2015 05:37:18 -0700

Hey Robert,

thanks for reply. Yes, you understood correctly although I don't think it
is quite necessary to provide full ACID support, concretely C could be
eventual. D is supported by Cassandra as long as they are replicas (I
think). I'd focus only on A and I for multiple statements, which still can
be piece of work, but that should be more less possible.


Although I haven't worked with such big clusters nor smaller ones I
understand typical use of Cassandra in the wild and unreliability of
network. I tried to incorporate that into my thinking process.

I also believe that any solution that will be developed has to be like
opt-in for end user. End user has to accept degraded performance doing
transactions, retries of transactions at his side and so on. It won't work
without some proper handling at client level and I mean at business logic
level.

So to answer your question what can be done when one transaction manager is
no longer reachable => abort transaction and raise that issue up to client
side. Let it crash. Let client handle it, but there can be some default
handling supported at driver level to support common scenarios for clients,
like retry again later after some timeout. Need to think about details, but
point is to not hide everything from client. Client knows he does
distributed transaction and should be aware of fallacies of that. That's my
opinion for now. What do you think about it?

About global lock / transaction manager. I agree. In both of my ideas there
is some central piece. It doesn't have to be single point of failure
thought, but once this single guy is elected for managing transaction (or
however we call it) it will need to coordinate stuff with other nodes and I
haven't found any other way around it.

Anywho, my question is: Is there any one who would accompany me during
journey of developing such solution and give feedback at what I do? If
there is, that's awesome I will gladly present detailed solution up for
discussion. If not I'll just try to do it on my own which in the end might
turn into failure, but that's fine because it is an experiment.

Marek




2015-08-07 10:31 GMT+02:00 Robert Stupp <sn...@snazy.de>:

> Hey Marek,
>
> you’ve put a lot of effort in your proposal and first of all: Thanks for
> that!
>
> If I understood you right, your proposal is about full ACID support ; able
> to handle multiple statements in a single transaction.
>
> Cassandra is a database used in distributed environments - a few servers
> sitting nearby to each other up to 1000+ servers on the other side of the
> globe connected via a more or probably less reliable network connection.
> Cassandra is also a database that can handle many millions of operations
> per second.
> Why do I say that?
> If you now imagine that _one_ transaction manager is no longer reachable
> by the client/coordinator or the involved replica nodes - what do you do?
> I mean, you might be able to implement the A and D - but what about I and
> C?
>
> IMO there is no way around a global lock/transaction manager for ACID. You
> could optimize some things and take some load of “the central” instance -
> but you end up coordinating stuff at that transaction manager.
> Implementing distributed locks is a nightmare - especially if the lock
> spans the whole world.
>
> Proposing transactions for C* (full ACID or not) would need a thorough
> concept covering all these “little nice” edge cases that can happen in a
> huge distributed system and also deals with the problems that arise with
> many 10k clients. Just think of: clients fail/stuck, nodes fail/stuck,
> network gets broken, partitioned, flappy etc etc.
>
> Robert
>
>
> > On 07 Aug 2015, at 09:18, Marek Lewandowski <marekmlewandow...@gmail.com>
> wrote:
> >
> > Hello everyone,
> >
> > *TL;DR;* I want to develop transactions (similar to those relational
> ones)
> > for Cassandra, I have some ideas and I'd like to hear your feedback.
> >
> > *Long story short:* I want to develop prototype of solution that features
> > transactions spanning multiple Cassandra partitions resembling those in
> > relational databases. I understand that in Cassandra's world such
> > transactions will be quite a bit different than in relational db world.
> > That prototype or research if you will, is subject of my master's thesis.
> >
> > It's been some time since I've been dancing around that subject and
> > postponing it, but what I understood in the process is that I cannot do
> > anything useful being hidden from community's feedback. I want you to
> > provide feedback on my ideas. You also have a power of changing it
> > completely because first and foremost _I want to develop something
> actually
> > useful, not only get my master's degree._
> >
> > To not scare anyone with huge wall of text, for now I'll just post very
> > brief description of my ideas and longer block of text describing how I
> can
> > see committing and rolling back data. I have more detailed descriptions
> > prepared already, but I don't want to share 4 pages of text in one go.
> >
> > Scope of the project (assuming I'll do it alone) is an experiment, not
> > production ready solution.
> > Such experiment can use any tools possible to actually perform it.
> >
> > Baseline for any idea is:
> > - asynchronous execution - think Akka and actors with non blocking
> > execution and message passing
> > - doesn't have to be completely transparent for end user - solution may
> > enforce certain workflow of interacting with it and/or introduce new
> > concepts (like akka messages instead of CQL and binary protocol).
> >
> > So far after reading a lot and thinking even more I have two ideas that
> I'd
> > like to share.
> >
> > ### Ideas are (with brief description, details will come later): ###
> >
> >
> > #### Idea 1: Event streaming ####
> > - Imagine that every modiifcation is represented by an _Event_.
> > - Imagine you can group these events into _Event Groups_.
> > - Imagine that such groups are time series data
> > - Imagine you can read such groups as a stream of events (think reactive
> > stream)
> >
> > Idea is that: you don't need to lock data when you are sure there is no
> one
> > else to compete with.
> >
> > There is 1 guy called _Cursor_ that reads Event Stream and executes Event
> > Groups one by one advacing its position on the stream when Event Group
> has
> > been executed.
> >
> > Seems like a system where you have only 1 transaction at any given time,
> > but there are many areas to optimize that and to allow more than that.
> > However I'll stop here.
> >
> > #### Idea 2: Locking data ####
> > - uses additional tables to acquire read/write locks
> > - seperate tables to append modifications - as in "Rollback: Appending to
> > seperate table."
> > - supports different isolation levels.
> > - more traditional approach, kind of translation of traditional locking
> to
> > cassandra reality.
> >
> >
> >
> -------------------------------------------------------------------------------------------
> > Common part of two is approach to doing commit and rollback.
> > ###  Doing Rollback and commit ###
> > I have two ideas for rollback. I like 2nd one more, because it is simpler
> > and potentially faster.
> >
> > #### Rollback: Query rewriting ####
> > It modifies original data, but before that it copies original data so
> that
> > state can be restored. Then when failure is detected, modification query
> > can be rewritten so that original data can be restored.
> >
> > Query rewriting seems like a complex functionality. I tried few simple
> and
> > a little bit more complex statements and in general for basic stuff
> > algorithm is not that complicated, but to support everything CQL has to
> > offer it might be hard.
> >
> > Still such transactional system might have some restrictions over CQL
> > statements used, because first of all when someone wants to have these
> > transactions they already want something non standard.
> >
> > I will skip details of that approach for now.
> >
> > #### Rollback: Appending to seperate table. ####
> > Image we have table A that we want to have transactions on.
> > This requires another table A_tx which has same schema as A, but has *1
> > more clustering column* and few new columns. A_tx will be additionally
> > clustered by transaction id.
> > New columns are:
> > - is_committed boolean
> > - is_rolledback boolean
> > - is_applied boolean
> >
> > General idea is:
> > 1. During transaction append changes to XXX_tx tables.
> > 2. For rollback: nothing needs to be done (besides cleaning XXX_tx tables
> > of useless data scheduled for rollback)
> > 3. For commit: rows in each XXX_tx are marked as committed. This can be
> > done using BATCH update so that all rows affected by transactions are
> > committed. These changes will be eventually merged back into original
> row.
> >
> > Committed changes are visible during query, because query has to select
> > from 2 tables. If you query for XXX table then you have to query that
> > table, but also XXX_TX and get all committed data, merge result and
> return
> > that to client.
> >
> > Here committed data eventually lands into proper row - during read as
> > background process for example (this is this is_applied column) results
> are
> > merged and inserted into original row, plus additionally modifications
> can
> > be marked as _applied_.
> > Uncommitted data can also be eventually cleaned up.
> >
> > *Important note:* since partitioning key stays the same for {{XXX}} table
> > and {{XXX_TX}} table, data will reside on same nodes so that queries and
> > application of data can be done locally.
> >
> > ### What happens next ###
> > Assuming I get any feedback I'll post more detailed descriptions of two
> > approaches.
> >
> > I would love to hear your feedback on whole subject. Just to begin
> > discussion and pick your interest.
> >
> > What you think about having more heavy transactions?
> > Does this experiment has sense at all?
> >
> > regards
> > --
> > Marek Lewandowski
>
> —
> Robert Stupp
> @snazy
>
>


-- 
Marek Lewandowski

Re: I want to develop transactions for Cassandra and I want your feedback

Reply via email to