> And I notice in 0.7 roadmap there is a feature called "vector clock
support"
The orginal plan was to implement vector clocks for Cassandra, but
Cassandra's data model actually provides at alternative solution that we'd
like to start recommending. If you know that you will be experiencing
unavoidable races on a particular column family, and would like to keep all
versions, you can store the versions with unique ids.

Super column families give you an additional level of nesting, which means
that the path for a column becomes (key, name1, name2, value). By storing
contentious columns in a super column with a path like ("key", "col",
"version", "value"), and never overwriting existing versions (only deleting
them) you can resolve some types of conflicts.

In order to read the value of "col", you get the slice of all versions it
contains, and then perform client side resolution. Once you've successfully
inserted a resolved version, you can delete the old versions.

Applied to Amazon's shopping cart example: assuming that you want to
represent a shopping cart within a single column (although honestly, there
are better ways to do this), you might write a cart with a version of
"a23df..." and then a conflicting cart with a version of "be241...". At read
time, you would see both versions, merge the carts, and insert a new version
"c87a9...". The content of the column representing the cart need to be
designed specifically to support this usecase though, using tombstones for
deleted items, etc.

----

Long story short: don't try to use Cassandra in places where you are
expecting race conditions, unless you've planned ahead.

Thanks,
Stu

On Sun, Jan 9, 2011 at 3:12 AM, Peter Schuller
<peter.schul...@infidyne.com>wrote:

> > I'm very fresh to Cassandra and just read some relevant documentations.
> > It seems each time when a client wants to insert data to Cassandra
> cluster,
> > the client also need to assign a timestamp. Then Cassandra will keep the
> > timestamp and it will be used to determine which copy is the latest and
> > should be returned based on CL level when client issues a query, right?
>
> Yes (the same reconciliation logic is also used on e.g. anti-entropy
> (nodetool repair)).
>
> > My question is if we have many clients, should all the clients be time
> > synchronized? Is it the clients responsibility? If the clients does not
> time
> > synchronized, the Cassandra might returned wrong row?
>
> Yes, yes, sort of. If clients are out of synch w.r.t time, the wrong
> version of the data may end up getting stored by Cassandra. However,
> the clock on the client doing the *read* does not affect what it sees.
>
> Note however that the need for clock synchronization is often less of
> a problem than it might first appear; if you have strong
> synchronization requirements such that you cannot afford to have races
> at all, you will need a separate synchronization mechanism anyway (or
> change the data model to handle it). Clocks should be synchronized
> yes, but the no matter how well the clocks are synchronized that alone
> will never give you the measurable ability to control "who wins" in
> the event of a race.
>
> (I don't know about plans for vector clock support; I haven't heard
> much about it lately. I'll let someone else respond to that.)
>
> --
> / Peter Schuller
>

Reply via email to