"If you write to 4 CF's with the same row key that is considered one mutation"
Hmmmmm, I never considered this, never knew either.(very un-intuitive from a user perspective IMHO). So If I write to CF Users with rowkey="dean" and to CF Schedules with rowkey="dean", it is actually one row? (it's so un-intuitive that I had to ask to make sure I am reading that correctly). I guess I really don't have that case since most of my row keys are GUID's anyways, but very interesting and unexpected (not sure I really mind, was just taken aback) Ps. Not sure I ever minded losting atomic commits to the same row across CF's as I never expected it in the first place having used cassandra for more than a year.(must have missed that several times in the documentation). Thanks, Dean On 1/28/13 12:41 PM, "aaron morton" <aa...@thelastpickle.com> wrote: >> >> Another thing that's been confusing me is that when we talk about the >>data model should the row key be inside or outside a column family? >My mental model is: > >cluster == database >keyspace == table >row == a row in a table >CF == a family of columns in one row > >(I think that's different to others, but it works for me) > >> Is it important to store rows of different column families that share >>the same row key to the same node? >Makes the failure models a little easier to understand. e.g. Everything >key for user "amorton" is either available or not. > >> Meanwhile, what's the drawback of setting RPS and RF at column family >>level? >Other than it's baked in? > >We process all mutations for a row at the same time. If you write to 4 >CF's with the same row key that is considered one mutation, for one row. >That one RowMutation is directed to the replicas using the >ReplicationStratagy and atomically applied to the commit log. > >If you have RS per CF that one mutation would be split into 4, which >would then be sent to different replicas. Even if they went to the same >replicas they would be written to the commit log as different mutations. > >So if you have RS per CF you lose atomic commits for writes to the same >row. > >Cheers > >----------------- >Aaron Morton >Freelance Cassandra Developer >New Zealand > >@aaronmorton >http://www.thelastpickle.com > >On 28/01/2013, at 11:22 PM, Manu Zhang <owenzhang1...@gmail.com> wrote: > >> On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote: >>> The row is the unit of replication, all values with the same storage >>>engine row key in a KS are on the same nodes. if they were per CF this >>>would not hold. >>> >>> Not that it would be the end of the world, but that is the first thing >>>that comes to mind. >>> >>> Cheers >>> ----------------- >>> Aaron Morton >>> Freelance Cassandra Developer >>> New Zealand >>> >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 27/01/2013, at 4:15 PM, Manu Zhang <owenzhang1...@gmail.com> wrote: >>> >>>> Although I've got to know Cassandra for quite a while, this question >>>>only has occurred to me recently: >>>> >>>> Why are the replica placement strategy and replica factors set at the >>>>keyspace level? >>>> >>>> Would setting them at the column family level offers more flexibility? >>>> >>>> Is this because it's easier for user to manage an application? Or >>>>related to internal implementation? Or it's just that I've overlooked >>>>something? >>> >> >> Is it important to store rows of different column families that share >>the same row key to the same node? AFAIK, Cassandra doesn't support get >>all of them in a single call. >> >> Meanwhile, what's the drawback of setting RPS and RF at column family >>level? >> >> Another thing that's been confusing me is that when we talk about the >>data model should the row key be inside or outside a column family? >> >> Thanks >> >