Re: why set replica placement strategy at keyspace level ?

Manu Zhang Tue, 29 Jan 2013 15:33:39 -0800

On Tue 29 Jan 2013 03:39:17 PM CST, aaron morton wrote:

  So If I write to CF Users with rowkey="dean"
and to CF Schedules with rowkey="dean", it is actually one row?

In my mental model that's correct.
A RowMutation is a row key and a collection of (internal) ColumnFamilies which 
contain the columns to write for a single CF.

This is the thing that is committed to the log, and then the changes in the 
ColumnFamilies are applied to each CF in an isolated way.

.(must have missed that several times in the
documentation).

http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 9:28 AM, "Hiller, Dean" <dean.hil...@nrel.gov> wrote:

"If you write to 4 CF's with the same row key that is considered one
mutation"

Hmmmmm, I never considered this, never knew either.(very un-intuitive from
a user perspective IMHO).  So If I write to CF Users with rowkey="dean"
and to CF Schedules with rowkey="dean", it is actually one row?  (it's so
un-intuitive that I had to ask to make sure I am reading that correctly).

I guess I really don't have that case since most of my row keys are GUID's
anyways, but very interesting and unexpected (not sure I really mind, was
just taken aback)

Ps. Not sure I ever minded losting atomic commits to the same row across
CF's as I never expected it in the first place having used cassandra for
more than a year.(must have missed that several times in the
documentation).

Thanks,
Dean

On 1/28/13 12:41 PM, "aaron morton" <aa...@thelastpickle.com> wrote:


Another thing that's been confusing me is that when we talk about the
data model should the row key be inside or outside a column family?

My mental model is:

cluster == database
keyspace == table
row == a row in a table
CF == a family of columns in one row

(I think that's different to others, but it works for me)

Is it important to store rows of different column families that share
the same row key to the same node?

Makes the failure models a little easier to understand. e.g. Everything
key for user "amorton" is either available or not.

Meanwhile, what's the drawback of setting RPS and RF at column family
level?

Other than it's baked in?

We process all mutations for a row at the same time. If you write to 4
CF's with the same row key that is considered one mutation, for one row.
That one RowMutation is directed to the replicas using the
ReplicationStratagy and atomically applied to the commit log.

If you have RS per CF that one mutation would be split into 4, which
would then be sent to different replicas. Even if they went to the same
replicas they would be written to the commit log as different mutations.

So if you have RS per CF you lose atomic commits for writes to the same
row.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/01/2013, at 11:22 PM, Manu Zhang <owenzhang1...@gmail.com> wrote:

On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:

The row is the unit of replication, all values with the same storage
engine row key in a KS are on the same nodes. if they were per CF this
would not hold.

Not that it would be the end of the world, but that is the first thing
that comes to mind.

Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/01/2013, at 4:15 PM, Manu Zhang <owenzhang1...@gmail.com> wrote:

Although I've got to know Cassandra for quite a while, this question
only has occurred to me recently:

Why are the replica placement strategy and replica factors set at the
keyspace level?

Would setting them at the column family level offers more flexibility?

Is this because it's easier for user to manage an application? Or
related to internal implementation? Or it's just that I've overlooked
something?


Is it important to store rows of different column families that share
the same row key to the same node? AFAIK, Cassandra doesn't support get
all of them in a single call.

Meanwhile, what's the drawback of setting RPS and RF at column family
level?

Another thing that's been confusing me is that when we talk about the
data model should the row key be inside or outside a column family?

Thanks

From that wiki page, "mutations against a single key are atomic but notisolated". I think a row mutation is isolated now, but is it acrosscolumn families? By the way, the wiki page really needs updating.

Re: why set replica placement strategy at keyspace level ?

Reply via email to