Re: why set replica placement strategy at keyspace level ?

aaron morton Wed, 30 Jan 2013 16:56:12 -0800

>  I think a row mutation is isolated now, but is it across column families?
Correct they are isolated, but only for an individual CF.


> By the way, the wiki page really needs updating.
You can update if you would like to. 

Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 30/01/2013, at 12:33 PM, Manu Zhang <owenzhang1...@gmail.com> wrote:

> On Tue 29 Jan 2013 03:39:17 PM CST, aaron morton wrote:
>> 
>>>  So If I write to CF Users with rowkey="dean"
>>> and to CF Schedules with rowkey="dean", it is actually one row?
>> In my mental model that's correct.
>> A RowMutation is a row key and a collection of (internal) ColumnFamilies 
>> which contain the columns to write for a single CF.
>> 
>> This is the thing that is committed to the log, and then the changes in the 
>> ColumnFamilies are applied to each CF in an isolated way.
>> 
>>> .(must have missed that several times in the
>>> documentation).
>> http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 29/01/2013, at 9:28 AM, "Hiller, Dean" <dean.hil...@nrel.gov> wrote:
>> 
>>> "If you write to 4 CF's with the same row key that is considered one
>>> mutation"
>>> 
>>> Hmmmmm, I never considered this, never knew either.(very un-intuitive from
>>> a user perspective IMHO).  So If I write to CF Users with rowkey="dean"
>>> and to CF Schedules with rowkey="dean", it is actually one row?  (it's so
>>> un-intuitive that I had to ask to make sure I am reading that correctly).
>>> 
>>> I guess I really don't have that case since most of my row keys are GUID's
>>> anyways, but very interesting and unexpected (not sure I really mind, was
>>> just taken aback)
>>> 
>>> Ps. Not sure I ever minded losting atomic commits to the same row across
>>> CF's as I never expected it in the first place having used cassandra for
>>> more than a year.(must have missed that several times in the
>>> documentation).
>>> 
>>> Thanks,
>>> Dean
>>> 
>>> On 1/28/13 12:41 PM, "aaron morton" <aa...@thelastpickle.com> wrote:
>>> 
>>>>> 
>>>>> Another thing that's been confusing me is that when we talk about the
>>>>> data model should the row key be inside or outside a column family?
>>>> My mental model is:
>>>> 
>>>> cluster == database
>>>> keyspace == table
>>>> row == a row in a table
>>>> CF == a family of columns in one row
>>>> 
>>>> (I think that's different to others, but it works for me)
>>>> 
>>>>> Is it important to store rows of different column families that share
>>>>> the same row key to the same node?
>>>> Makes the failure models a little easier to understand. e.g. Everything
>>>> key for user "amorton" is either available or not.
>>>> 
>>>>> Meanwhile, what's the drawback of setting RPS and RF at column family
>>>>> level?
>>>> Other than it's baked in?
>>>> 
>>>> We process all mutations for a row at the same time. If you write to 4
>>>> CF's with the same row key that is considered one mutation, for one row.
>>>> That one RowMutation is directed to the replicas using the
>>>> ReplicationStratagy and atomically applied to the commit log.
>>>> 
>>>> If you have RS per CF that one mutation would be split into 4, which
>>>> would then be sent to different replicas. Even if they went to the same
>>>> replicas they would be written to the commit log as different mutations.
>>>> 
>>>> So if you have RS per CF you lose atomic commits for writes to the same
>>>> row.
>>>> 
>>>> Cheers
>>>> 
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Developer
>>>> New Zealand
>>>> 
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>> 
>>>> On 28/01/2013, at 11:22 PM, Manu Zhang <owenzhang1...@gmail.com> wrote:
>>>> 
>>>>> On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:
>>>>>> The row is the unit of replication, all values with the same storage
>>>>>> engine row key in a KS are on the same nodes. if they were per CF this
>>>>>> would not hold.
>>>>>> 
>>>>>> Not that it would be the end of the world, but that is the first thing
>>>>>> that comes to mind.
>>>>>> 
>>>>>> Cheers
>>>>>> -----------------
>>>>>> Aaron Morton
>>>>>> Freelance Cassandra Developer
>>>>>> New Zealand
>>>>>> 
>>>>>> @aaronmorton
>>>>>> http://www.thelastpickle.com
>>>>>> 
>>>>>> On 27/01/2013, at 4:15 PM, Manu Zhang <owenzhang1...@gmail.com> wrote:
>>>>>> 
>>>>>>> Although I've got to know Cassandra for quite a while, this question
>>>>>>> only has occurred to me recently:
>>>>>>> 
>>>>>>> Why are the replica placement strategy and replica factors set at the
>>>>>>> keyspace level?
>>>>>>> 
>>>>>>> Would setting them at the column family level offers more flexibility?
>>>>>>> 
>>>>>>> Is this because it's easier for user to manage an application? Or
>>>>>>> related to internal implementation? Or it's just that I've overlooked
>>>>>>> something?
>>>>>> 
>>>>> 
>>>>> Is it important to store rows of different column families that share
>>>>> the same row key to the same node? AFAIK, Cassandra doesn't support get
>>>>> all of them in a single call.
>>>>> 
>>>>> Meanwhile, what's the drawback of setting RPS and RF at column family
>>>>> level?
>>>>> 
>>>>> Another thing that's been confusing me is that when we talk about the
>>>>> data model should the row key be inside or outside a column family?
>>>>> 
>>>>> Thanks
>>>>> 
>>>> 
>>> 
>> 
> 
> From that wiki page, "mutations against a single key are atomic but not 
> isolated". I think a row mutation is isolated now, but is it across column 
> families? By the way, the wiki page really needs updating.

Re: why set replica placement strategy at keyspace level ?

Reply via email to