> I think a row mutation is isolated now, but is it across column families? Correct they are isolated, but only for an individual CF.
> By the way, the wiki page really needs updating. You can update if you would like to. Cheers ----------------- Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 30/01/2013, at 12:33 PM, Manu Zhang <owenzhang1...@gmail.com> wrote: > On Tue 29 Jan 2013 03:39:17 PM CST, aaron morton wrote: >> >>> So If I write to CF Users with rowkey="dean" >>> and to CF Schedules with rowkey="dean", it is actually one row? >> In my mental model that's correct. >> A RowMutation is a row key and a collection of (internal) ColumnFamilies >> which contain the columns to write for a single CF. >> >> This is the thing that is committed to the log, and then the changes in the >> ColumnFamilies are applied to each CF in an isolated way. >> >>> .(must have missed that several times in the >>> documentation). >> http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 29/01/2013, at 9:28 AM, "Hiller, Dean" <dean.hil...@nrel.gov> wrote: >> >>> "If you write to 4 CF's with the same row key that is considered one >>> mutation" >>> >>> Hmmmmm, I never considered this, never knew either.(very un-intuitive from >>> a user perspective IMHO). So If I write to CF Users with rowkey="dean" >>> and to CF Schedules with rowkey="dean", it is actually one row? (it's so >>> un-intuitive that I had to ask to make sure I am reading that correctly). >>> >>> I guess I really don't have that case since most of my row keys are GUID's >>> anyways, but very interesting and unexpected (not sure I really mind, was >>> just taken aback) >>> >>> Ps. Not sure I ever minded losting atomic commits to the same row across >>> CF's as I never expected it in the first place having used cassandra for >>> more than a year.(must have missed that several times in the >>> documentation). >>> >>> Thanks, >>> Dean >>> >>> On 1/28/13 12:41 PM, "aaron morton" <aa...@thelastpickle.com> wrote: >>> >>>>> >>>>> Another thing that's been confusing me is that when we talk about the >>>>> data model should the row key be inside or outside a column family? >>>> My mental model is: >>>> >>>> cluster == database >>>> keyspace == table >>>> row == a row in a table >>>> CF == a family of columns in one row >>>> >>>> (I think that's different to others, but it works for me) >>>> >>>>> Is it important to store rows of different column families that share >>>>> the same row key to the same node? >>>> Makes the failure models a little easier to understand. e.g. Everything >>>> key for user "amorton" is either available or not. >>>> >>>>> Meanwhile, what's the drawback of setting RPS and RF at column family >>>>> level? >>>> Other than it's baked in? >>>> >>>> We process all mutations for a row at the same time. If you write to 4 >>>> CF's with the same row key that is considered one mutation, for one row. >>>> That one RowMutation is directed to the replicas using the >>>> ReplicationStratagy and atomically applied to the commit log. >>>> >>>> If you have RS per CF that one mutation would be split into 4, which >>>> would then be sent to different replicas. Even if they went to the same >>>> replicas they would be written to the commit log as different mutations. >>>> >>>> So if you have RS per CF you lose atomic commits for writes to the same >>>> row. >>>> >>>> Cheers >>>> >>>> ----------------- >>>> Aaron Morton >>>> Freelance Cassandra Developer >>>> New Zealand >>>> >>>> @aaronmorton >>>> http://www.thelastpickle.com >>>> >>>> On 28/01/2013, at 11:22 PM, Manu Zhang <owenzhang1...@gmail.com> wrote: >>>> >>>>> On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote: >>>>>> The row is the unit of replication, all values with the same storage >>>>>> engine row key in a KS are on the same nodes. if they were per CF this >>>>>> would not hold. >>>>>> >>>>>> Not that it would be the end of the world, but that is the first thing >>>>>> that comes to mind. >>>>>> >>>>>> Cheers >>>>>> ----------------- >>>>>> Aaron Morton >>>>>> Freelance Cassandra Developer >>>>>> New Zealand >>>>>> >>>>>> @aaronmorton >>>>>> http://www.thelastpickle.com >>>>>> >>>>>> On 27/01/2013, at 4:15 PM, Manu Zhang <owenzhang1...@gmail.com> wrote: >>>>>> >>>>>>> Although I've got to know Cassandra for quite a while, this question >>>>>>> only has occurred to me recently: >>>>>>> >>>>>>> Why are the replica placement strategy and replica factors set at the >>>>>>> keyspace level? >>>>>>> >>>>>>> Would setting them at the column family level offers more flexibility? >>>>>>> >>>>>>> Is this because it's easier for user to manage an application? Or >>>>>>> related to internal implementation? Or it's just that I've overlooked >>>>>>> something? >>>>>> >>>>> >>>>> Is it important to store rows of different column families that share >>>>> the same row key to the same node? AFAIK, Cassandra doesn't support get >>>>> all of them in a single call. >>>>> >>>>> Meanwhile, what's the drawback of setting RPS and RF at column family >>>>> level? >>>>> >>>>> Another thing that's been confusing me is that when we talk about the >>>>> data model should the row key be inside or outside a column family? >>>>> >>>>> Thanks >>>>> >>>> >>> >> > > From that wiki page, "mutations against a single key are atomic but not > isolated". I think a row mutation is isolated now, but is it across column > families? By the way, the wiki page really needs updating.