On Thu, Mar 31, 2011 at 3:52 AM, T Akhayo <t.akh...@gmail.com> wrote: > Hi Aaron, > > Thank you for your reply, i appreciate the suggestions you made. > > Yesterday i managed to get everything (our main read) in one CF, with the > use of a structure in a value like you suggested. > > Designing a new data model is different from what i'm used to, but if you > keep in mind that you designing for performance instead of flexibility then > everything gets a bit easier. > > Kind regards, > T. Akhayo > > 2011/3/30 aaron morton <aa...@thelastpickle.com> >> >> I would go with the solution that means you only have to make one request >> to serve your reads, so consider the super CF approach. >> There are some downsides to super columns >> see http://wiki.apache.org/cassandra/CassandraLimitations and they tend to >> have a love-them-hate-them reputation. >> One thing to consider is that you do not need to model every attribute of >> your entity as a column in cassandra. Especially if you are always going to >> pull back all the attributes. So you could do your super CF approach with a >> standard CF, just pack the columns into some sort of structure such as JSON >> and store them as a blob. >> Or you can use a naming scheme in the column names with a standard CF, >> e.g. uuid1.text and uuid2.text >> Hope that helps. >> Aaron >> On 30 Mar 2011, at 01:05, T Akhayo wrote: >> >> Good afternoon, >> >> I'm making my data model from scratch for cassandra, this means i can tune >> and fine tune it for performance. >> >> At this time i'm having problems choosing between a 2 column families or 1 >> super column family. I will illustrate with a example. >> >> Sector, this defines a place, this is one or two properties. >> Entry, a entry that is bound to a sector, this is simply some text and a >> few properties. >> >> I can model this with a super column family: >> >> sectors{ //super column family >> sector1{ >> uid1{ >> text: a text >> user: joop >> } >> uid2{ >> text: more text >> user: piet >> } >> } >> sector2{ >> uid10{ >> text: even more text >> user: marie >> } >> } >> } >> >> But i can also model this with 2 column families: >> >> sectors{ // column family >> sector1{ >> textid1: null >> textid2: null >> } >> sector2{ >> textid4: null >> } >> } >> >> texts{ //column family >> textid1{ >> text: a text >> user: joop >> } >> textid2{ >> text: more text >> user: piet >> } >> } >> >> With the super column family i can retrieve a list of texts for a specific >> sector with only 1 request to cassandra. >> >> With the 2 column families i need to send 2 requests to cassandra: >> 1. give me all textids from sector x. (returns x, y, z) >> 2. give me all texts that have id x, y, z. >> >> In my final application it is likely that there will be a bit more writes >> compared to reads. >> >> I was wondering what the best approach is when it comes to performance. I >> suspect that using super column families is slower compared the using column >> families, but is it stil slower when using 2 column families and with 2 >> request to cassandra instead of 1 (with super column family). >> >> Kind regards, >> T. Akhayo >> > >
I decided to write this as a general guide to the topic of denormalizing things into multiple CF's or not. http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/whytf_would_i_need_with