Re: why set replica placement strategy at keyspace level ?

Hiller, Dean Mon, 28 Jan 2013 12:28:58 -0800

"If you write to 4 CF's with the same row key that is considered one
mutation"


Hmmmmm, I never considered this, never knew either.(very un-intuitive from
a user perspective IMHO).  So If I write to CF Users with rowkey="dean"
and to CF Schedules with rowkey="dean", it is actually one row?  (it's so
un-intuitive that I had to ask to make sure I am reading that correctly).

I guess I really don't have that case since most of my row keys are GUID's
anyways, but very interesting and unexpected (not sure I really mind, was
just taken aback)

Ps. Not sure I ever minded losting atomic commits to the same row across
CF's as I never expected it in the first place having used cassandra for
more than a year.(must have missed that several times in the
documentation).

Thanks,
Dean

On 1/28/13 12:41 PM, "aaron morton" <aa...@thelastpickle.com> wrote:

>> 
>> Another thing that's been confusing me is that when we talk about the
>>data model should the row key be inside or outside a column family?
>My mental model is:
>
>cluster == database
>keyspace == table
>row == a row in a table
>CF == a family of columns in one row
>
>(I think that's different to others, but it works for me)
>
>> Is it important to store rows of different column families that share
>>the same row key to the same node?
>Makes the failure models a little easier to understand. e.g. Everything
>key for user "amorton" is either available or not.
>
>> Meanwhile, what's the drawback of setting RPS and RF at column family
>>level?
>Other than it's baked in?
>
>We process all mutations for a row at the same time. If you write to 4
>CF's with the same row key that is considered one mutation, for one row.
>That one RowMutation is directed to the replicas using the
>ReplicationStratagy and atomically applied to the commit log.
>
>If you have RS per CF that one mutation would be split into 4, which
>would then be sent to different replicas. Even if they went to the same
>replicas they would be written to the commit log as different mutations.
>
>So if you have RS per CF you lose atomic commits for writes to the same
>row.
>
>Cheers
>
>-----------------
>Aaron Morton
>Freelance Cassandra Developer
>New Zealand
>
>@aaronmorton
>http://www.thelastpickle.com
>
>On 28/01/2013, at 11:22 PM, Manu Zhang <owenzhang1...@gmail.com> wrote:
>
>> On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:
>>> The row is the unit of replication, all values with the same storage
>>>engine row key in a KS are on the same nodes. if they were per CF this
>>>would not hold.
>>> 
>>> Not that it would be the end of the world, but that is the first thing
>>>that comes to mind.
>>> 
>>> Cheers
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> New Zealand
>>> 
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 27/01/2013, at 4:15 PM, Manu Zhang <owenzhang1...@gmail.com> wrote:
>>> 
>>>> Although I've got to know Cassandra for quite a while, this question
>>>>only has occurred to me recently:
>>>> 
>>>> Why are the replica placement strategy and replica factors set at the
>>>>keyspace level?
>>>> 
>>>> Would setting them at the column family level offers more flexibility?
>>>> 
>>>> Is this because it's easier for user to manage an application? Or
>>>>related to internal implementation? Or it's just that I've overlooked
>>>>something?
>>> 
>> 
>> Is it important to store rows of different column families that share
>>the same row key to the same node? AFAIK, Cassandra doesn't support get
>>all of them in a single call.
>> 
>> Meanwhile, what's the drawback of setting RPS and RF at column family
>>level?
>> 
>> Another thing that's been confusing me is that when we talk about the
>>data model should the row key be inside or outside a column family?
>> 
>> Thanks
>> 
>

Re: why set replica placement strategy at keyspace level ?

Reply via email to