Re: Do supercolumns have a purpose?

David Boxenhorn Sun, 13 Feb 2011 00:09:53 -0800

I agree, that is the way to go. Then each piece of new functionality will
not have to be implemented twice.


On Sat, Feb 12, 2011 at 9:41 AM, Stu Hood <stuh...@gmail.com> wrote:

> I would like to continue to support super columns, but to slowly convert
> them into "compound column names", since that is really all they really are.
>
>
> On Thu, Feb 10, 2011 at 10:16 AM, Frank LoVecchio <fr...@isidorey.com>wrote:
>
>> I've found super column families quite useful when using
>> RandomOrderedPartioner on a low-maintenance cluster (as opposed to
>> Byte/Ordered), e.g. returning ordered data from a TimeUUID comparator type;
>> try doing that with one regular column family and secondary indexes (you
>> could obviously sort on the client side, but that is tedious and not logical
>> for older data).
>>
>> On Thu, Feb 10, 2011 at 12:32 AM, David Boxenhorn <da...@lookin2.com>wrote:
>>
>>> Mike, my problem is that I have an database and codebase that already
>>> uses supercolumns. If I had to do it over, it wouldn't use them, for the
>>> reasons you point out. In fact, I have a feeling that over time supercolumns
>>> will become deprecated de facto, if not de jure. That's why I would like to
>>> see them represented internally as regular columns, with an upgrade path for
>>> backward compatibility.
>>>
>>> I would love to do it myself! (I haven't looked at the code base, but I
>>> don't understand why it should be so hard.) But my employer has other
>>> ideas...
>>>
>>>
>>> On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone <m...@simplegeo.com> wrote:
>>>
>>>> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn <da...@lookin2.com>wrote:
>>>>
>>>>> Shaun, I agree with you, but marking them as deprecated is not good
>>>>> enough for me. I can't easily stop using supercolumns. I need an upgrade
>>>>> path.
>>>>>
>>>>
>>>> David,
>>>>
>>>> Cassandra is open source and community developed. The right thing to do
>>>> is what's best for the community, which sometimes conflicts with what's 
>>>> best
>>>> for individual users. Such strife should be minimized, it will never be
>>>> eliminated. Luckily, because this is an open source, liberal licensed
>>>> project, if you feel strongly about something you should feel free to add
>>>> whatever features you want yourself. I'm sure other people in your 
>>>> situation
>>>> will thank you for it.
>>>>
>>>> At a minimum I think it would behoove you to re-read some of the
>>>> comments here re: why super columns aren't really needed and take another
>>>> look at your data model and code. I would actually be quite surprised to
>>>> find a use of super columns that could not be trivially converted to normal
>>>> columns. In fact, it should be possible to do at the framework/client
>>>> library layer - you probably wouldn't even need to change any application
>>>> code.
>>>>
>>>> Mike
>>>>
>>>> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net>wrote:
>>>>>
>>>>>>
>>>>>> I'm a newbie here, but, with apologies for my presumptuousness, I
>>>>>> think you should deprecate SuperColumns. They are already distracting 
>>>>>> you,
>>>>>> and as the years go by the cost of supporting them as you add more and 
>>>>>> more
>>>>>> functionality is only likely to get worse. It would be better to 
>>>>>> concentrate
>>>>>> on making the "core" column families better (and I'm sure we can all 
>>>>>> think
>>>>>> of lots of things we'd like).
>>>>>>
>>>>>> Just dropping SuperColumns would be bad for your reputation -- and for
>>>>>> users like David who are currently using them. But if you mark them 
>>>>>> clearly
>>>>>> as deprecated and explain why and what to do instead (perhaps putting a 
>>>>>> bit
>>>>>> of effort into migration tools... or even a "virtual" layer supporting
>>>>>> arbitrary hierarchical data), then you can drop them in a few years (when
>>>>>> you get to 1.0, say), without people feeling betrayed.
>>>>>>
>>>>>> -- Shaun
>>>>>>
>>>>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>>>>>
>>>>>> "My main point was to say that it's think it is better to create
>>>>>> tickets for what you want, rather than for something else completely
>>>>>> different that would, as a by-product, give you what you want."
>>>>>>
>>>>>> Then let me say what I want: I want supercolumn families to have any
>>>>>> feature that regular column families have.
>>>>>>
>>>>>> My data model is full of supercolumns. I used them, even though I knew
>>>>>> it didn't *have to*, "because they were there", which implied to me that 
>>>>>> I
>>>>>> was supposed to use them for some good reason. Now I suspect that they 
>>>>>> will
>>>>>> gradually become less and less functional, as features are added to 
>>>>>> regular
>>>>>> column families and not supported for supercolumn families.
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne <
>>>>>> sylv...@datastax.com> wrote:
>>>>>>
>>>>>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <m...@simplegeo.com>wrote:
>>>>>>>
>>>>>>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <
>>>>>>>> sylv...@datastax.com> wrote:
>>>>>>>>
>>>>>>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> The advantage would be to enable secondary indexes on supercolumn
>>>>>>>>>> families.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Then I suggest opening a ticket for adding secondary indexes to
>>>>>>>>> supercolumn families and voting on it. This will be 1 or 2 order of
>>>>>>>>> magnitude less work than getting rid of super column internally, and
>>>>>>>>> probably a much better solution anyway.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I realize that this is largely subjective, and on such matters code
>>>>>>>> speaks louder than words, but I don't think I agree with you on the 
>>>>>>>> issue of
>>>>>>>> which alternative is less work, or even which is a better solution.
>>>>>>>>
>>>>>>>
>>>>>>> You are right, I put probably too much emphase in that sentence. My
>>>>>>> main point was to say that it's think it is better to create tickets for
>>>>>>> what you want, rather than for something else completely different that
>>>>>>> would, as a by-product, give you what you want.
>>>>>>> Then I suspect that *if* the only goal is to get secondary indexes on
>>>>>>> super columns, then there is a good chance this would be less work than
>>>>>>> getting rid of super columns. But to be fair, secondary indexes on super
>>>>>>> columns may not make too much sense without #598, which itself would 
>>>>>>> require
>>>>>>> quite some work, so clearly I spoke a bit quickly.
>>>>>>>
>>>>>>>
>>>>>>>> If the goal is to have a hierarchical model, limiting the depth to
>>>>>>>> two seems arbitrary. Why not go all the way and allow an arbitrarily 
>>>>>>>> deep
>>>>>>>> hierarchy?
>>>>>>>>
>>>>>>>> If a more sophisticated hierarchical model is deemed unnecessary, or
>>>>>>>> impractical, allowing a depth of two seems inconsistent and
>>>>>>>> unnecessary. It's pretty trivial to overlay a hierarchical model on 
>>>>>>>> top of
>>>>>>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
>>>>>>>> implemented a custom comparator that does the job [1]. Google's 
>>>>>>>> Megastore
>>>>>>>> has a similar architecture and goes even further [2].
>>>>>>>>
>>>>>>>> It seems to me that super columns are a historical artifact from
>>>>>>>> Cassandra's early life as Facebook's inbox storage system. They needed
>>>>>>>> posting lists of messages, sharded by user. So that's what they built. 
>>>>>>>> In my
>>>>>>>> dealings with the Cassandra code, super columns end up making a mess 
>>>>>>>> all
>>>>>>>> over the place when algorithms need to be special cased and branch 
>>>>>>>> based on
>>>>>>>> the column/supercolumn distinction.
>>>>>>>>
>>>>>>>> I won't even mention what it does to the thrift interface.
>>>>>>>>
>>>>>>>
>>>>>>> Actually, I agree with you, more than you know. If I were to start
>>>>>>> coding Cassandra now, I wouldn't include super columns (and I would 
>>>>>>> probably
>>>>>>> not go for a depth unlimited hierarchical model either). But it's there 
>>>>>>> and
>>>>>>> I'm not sure getting rid of them fully (meaning, including in thrift) 
>>>>>>> is an
>>>>>>> option (it would be a big compatibility breakage). And (even though I
>>>>>>> certainly though about this more than once :)) I'm slightly
>>>>>>> less enthusiastic about keeping them in thrift but encoding them in 
>>>>>>> regular
>>>>>>> column family internally: it would still be a lot of work but we would 
>>>>>>> still
>>>>>>> probably end up with nasty tricks to stick to the thrift api.
>>>>>>>
>>>>>>> --
>>>>>>> Sylvain
>>>>>>>
>>>>>>>
>>>>>>>> Mike
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
>>>>>>>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Frank LoVecchio
>> Senior Software Engineer | Isidorey, LLC
>> Google Voice +1.720.295.9179
>> isidorey.com | facebook.com/franklovecchio | franklovecchio.com |
>> rodsandricers.com
>>
>>
>

Re: Do supercolumns have a purpose?

Reply via email to