Thanks Sylvain! Can I vote for internally implementing supercolumn families as regular column families? (With a smooth upgrade process that doesn't require shutting down a live cluster.)
What if supercolumn families were supported as regular column families + an index (on what used to be supercolumn keys)? Would that solve some problems? On Thu, Feb 3, 2011 at 2:00 PM, Sylvain Lebresne <sylv...@datastax.com>wrote: > > Is there any advantage to using supercolumns > > (columnFamilyName[superColumnName[columnName[val]]]) instead of regular > > columns with concatenated keys > > (columnFamilyName[superColumnName@columnName[val]])? > > > > When I designed my data model, I used supercolumns wherever I needed two > > levels of key depth - just because they were there, and I figured that > they > > must be there for a reason. > > > > Now I see that in 0.7 secondary indexes don't work on supercolumns or > > subcolumns (is that right?), which seems to me like a very serious > > limitation of supercolumn families. > > > > It raises the question: Is there anything that supercolumn families are > good > > for? > > There is a bunch of queries that you cannot do (or less conveniently) if > you > encode super columns using regular columns with concatenated keys: > > 1) If you use regular columns with concatenated keys, the count argument > count simple columns. With super columns it counts super columns. It means > that you can't do "give me the 10 first super columns of this row". > > 2) If you need to get x super columns by name, you'll have to issue x > get_slice query (one of each super column). On the client side it sucks. > Internally in Cassandra we could do it reasonably well though. > > 3) You cannot remove entire super columns since there is no support for > range > deletions. > > Moreover, the encoding with concatenated keys uses more disk space (and > less > disk used for the same information means less things to read so it may have > a slight impact on read performance too -- it's probably really slight on > most > usage but nevertheless). > > > And here's a related question: Why can't Cassandra implement supercolumn > > families as regular column families, internally, and give you that > > functionality? > > For the 1) and 2) above, we could deal with those internally fairly easily > I > think and rather well (which means it wouldn't be much worse > performance-wise > than with the actual implementaion of super columns, not that it would be > better). For 3), range deletes are harder and would require more > significant > changes (that doesn't mean that Cassandra will never have it). Even without > that, there would be the disk space lost. > > -- > Sylvain > >