Please realize that I do not make any decisions here and I am not part of the 
core Cassandra developer team.

What has been said before is that they will most likely go away and at least 
under the hood be replaced by composite columns.

Jonathan have however stated that he would like the supercolumn API/abstraction 
to remain at least for backwards compatibility.

Please understand that under the hood, supercolumns are merely groups of 
columns serialized as a single block of data. 


The fact that there is a specialized and hardcoded way to serialize these 
column groups into supercolumns is a problem however and they should probably 
go away to make space for a more generic implementation allowing more flexible 
data structures and less code specific for one special data structure.

Today there are tons of extra code to deal with the slight difference in 
serialization and features of supercolumns vs columns and hopefully most of 
that could go away if things got structured a bit different.

I also hope that we keep APIs to allow simple access to groups of key/value 
pairs to simplify application logic as working with just columns can add a lot 
of application code which should not be needed.

If you almost always need all or mostly all of the columns in a supercolumn, 
and you normally update all of them at the same time, they will most likely be 
faster than normal columns.

Processing wise, you will actually do a bit more work on 
serialization/deserialization of SC's but the I/O part will usually be better 
grouped/require less operations.

I think we did some benchmarks on some heavy use cases with ~30 small columns 
per SC some time back and I think we ended up with  SCs being 10-20% faster.


Terje

On Jan 5, 2012, at 2:37 PM, Aklin_81 wrote:

> I have seen supercolumns usage been discouraged most of the times.
> However sometimes the supercolumns seem to fit the scenario most
> appropriately not only in terms of how the data is stored but also in
> terms of how is it retrieved. Some of the queries supported by SCs are
> uniquely capable of doing the task which no other alternative schema
> could do.(Like recently I asked about getting the equivalent of
> retrieving a list of (full)supercolumns by name, through use of
> composite columns, unfortunately there was no way to do this without
> reading lots of extra columns).
> 
> So I am really confused whether:
> 
> 1. Should I really not use the supercolumns for any case at all,
> however appropriate, or I just need to be just careful while realizing
> that supercolumns fit my use case appropriately or what!?
> 
> 2. Are there any performance concerns with supercolumns even in the
> cases where they are used most appropriately. Like when you need to
> retrieve the entire supercolumns everytime & max. no of subcolumns
> vary between 0-10.
> (I don't write all the subcolumns inside supercolumn, at once though!
> Does this also matter?)
> 
> 3. What is their future? Are they going to be deprecated or may be
> enhanced later?

Reply via email to