Mike, my problem is that I have an database and codebase that already uses supercolumns. If I had to do it over, it wouldn't use them, for the reasons you point out. In fact, I have a feeling that over time supercolumns will become deprecated de facto, if not de jure. That's why I would like to see them represented internally as regular columns, with an upgrade path for backward compatibility.
I would love to do it myself! (I haven't looked at the code base, but I don't understand why it should be so hard.) But my employer has other ideas... On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone <m...@simplegeo.com> wrote: > On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn <da...@lookin2.com> wrote: > >> Shaun, I agree with you, but marking them as deprecated is not good enough >> for me. I can't easily stop using supercolumns. I need an upgrade path. >> > > David, > > Cassandra is open source and community developed. The right thing to do is > what's best for the community, which sometimes conflicts with what's best > for individual users. Such strife should be minimized, it will never be > eliminated. Luckily, because this is an open source, liberal licensed > project, if you feel strongly about something you should feel free to add > whatever features you want yourself. I'm sure other people in your situation > will thank you for it. > > At a minimum I think it would behoove you to re-read some of the comments > here re: why super columns aren't really needed and take another look at > your data model and code. I would actually be quite surprised to find a use > of super columns that could not be trivially converted to normal columns. In > fact, it should be possible to do at the framework/client library layer - > you probably wouldn't even need to change any application code. > > Mike > > On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts <sh...@cuttshome.net> wrote: >> >>> >>> I'm a newbie here, but, with apologies for my presumptuousness, I think >>> you should deprecate SuperColumns. They are already distracting you, and as >>> the years go by the cost of supporting them as you add more and more >>> functionality is only likely to get worse. It would be better to concentrate >>> on making the "core" column families better (and I'm sure we can all think >>> of lots of things we'd like). >>> >>> Just dropping SuperColumns would be bad for your reputation -- and for >>> users like David who are currently using them. But if you mark them clearly >>> as deprecated and explain why and what to do instead (perhaps putting a bit >>> of effort into migration tools... or even a "virtual" layer supporting >>> arbitrary hierarchical data), then you can drop them in a few years (when >>> you get to 1.0, say), without people feeling betrayed. >>> >>> -- Shaun >>> >>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote: >>> >>> "My main point was to say that it's think it is better to create tickets >>> for what you want, rather than for something else completely different that >>> would, as a by-product, give you what you want." >>> >>> Then let me say what I want: I want supercolumn families to have any >>> feature that regular column families have. >>> >>> My data model is full of supercolumns. I used them, even though I knew it >>> didn't *have to*, "because they were there", which implied to me that I was >>> supposed to use them for some good reason. Now I suspect that they will >>> gradually become less and less functional, as features are added to regular >>> column families and not supported for supercolumn families. >>> >>> >>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne >>> <sylv...@datastax.com>wrote: >>> >>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone <m...@simplegeo.com>wrote: >>>> >>>>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne <sylv...@datastax.com >>>>> > wrote: >>>>> >>>>>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn <da...@lookin2.com>wrote: >>>>>> >>>>>>> The advantage would be to enable secondary indexes on supercolumn >>>>>>> families. >>>>>>> >>>>>> >>>>>> Then I suggest opening a ticket for adding secondary indexes to >>>>>> supercolumn families and voting on it. This will be 1 or 2 order of >>>>>> magnitude less work than getting rid of super column internally, and >>>>>> probably a much better solution anyway. >>>>>> >>>>> >>>>> I realize that this is largely subjective, and on such matters code >>>>> speaks louder than words, but I don't think I agree with you on the issue >>>>> of >>>>> which alternative is less work, or even which is a better solution. >>>>> >>>> >>>> You are right, I put probably too much emphase in that sentence. My main >>>> point was to say that it's think it is better to create tickets for what >>>> you >>>> want, rather than for something else completely different that would, as a >>>> by-product, give you what you want. >>>> Then I suspect that *if* the only goal is to get secondary indexes on >>>> super columns, then there is a good chance this would be less work than >>>> getting rid of super columns. But to be fair, secondary indexes on super >>>> columns may not make too much sense without #598, which itself would >>>> require >>>> quite some work, so clearly I spoke a bit quickly. >>>> >>>> >>>>> If the goal is to have a hierarchical model, limiting the depth to two >>>>> seems arbitrary. Why not go all the way and allow an arbitrarily deep >>>>> hierarchy? >>>>> >>>>> If a more sophisticated hierarchical model is deemed unnecessary, or >>>>> impractical, allowing a depth of two seems inconsistent and >>>>> unnecessary. It's pretty trivial to overlay a hierarchical model on top of >>>>> the map-of-sorted-maps model that Cassandra implements. Ed Anuff has >>>>> implemented a custom comparator that does the job [1]. Google's Megastore >>>>> has a similar architecture and goes even further [2]. >>>>> >>>>> It seems to me that super columns are a historical artifact from >>>>> Cassandra's early life as Facebook's inbox storage system. They needed >>>>> posting lists of messages, sharded by user. So that's what they built. In >>>>> my >>>>> dealings with the Cassandra code, super columns end up making a mess all >>>>> over the place when algorithms need to be special cased and branch based >>>>> on >>>>> the column/supercolumn distinction. >>>>> >>>>> I won't even mention what it does to the thrift interface. >>>>> >>>> >>>> Actually, I agree with you, more than you know. If I were to start >>>> coding Cassandra now, I wouldn't include super columns (and I would >>>> probably >>>> not go for a depth unlimited hierarchical model either). But it's there and >>>> I'm not sure getting rid of them fully (meaning, including in thrift) is an >>>> option (it would be a big compatibility breakage). And (even though I >>>> certainly though about this more than once :)) I'm slightly >>>> less enthusiastic about keeping them in thrift but encoding them in regular >>>> column family internally: it would still be a lot of work but we would >>>> still >>>> probably end up with nasty tricks to stick to the thrift api. >>>> >>>> -- >>>> Sylvain >>>> >>>> >>>>> Mike >>>>> >>>>> [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html >>>>> [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf >>>>> >>>> >>>> >>> >>> >> >