Re: Is SuperColumn necessary?

Mike Malone Mon, 10 May 2010 14:20:50 -0700

On Mon, May 10, 2010 at 1:38 PM, AJ Chen <[email protected]> wrote:


> Could someone confirm this discussion is not about abandoning supercolumn
> family? I have found modeling data with supercolumn family is actually an
> advantage of cassadra compared to relational database. Hope you are going to
> drop this important concept.  How it's implemented internally is a different
> matter.
>

SuperColumns are useful as a convenience mechanism. That's pretty much it.
There's _nothing_ (as far as I can tell) that you can do with SuperColumns
that you can't do by manually concatenating key names with a separator on
the client side and implementing a custom comparator on the server (as ugly
as that is).

This discussion is about getting rid of SuperColumns and adding a more
generic mechanism that will actually be useful and interesting and will
continue to be convenient for the types of use cases for which people use
SuperColumns.

If there's a particular use case that you feel you can only implement with
SuperColumns, please share! I honestly can't think of any.

Mike


> On Mon, May 10, 2010 at 10:08 AM, Jonathan Shook <[email protected]> wrote:
>
>> Agreed
>>
>> On Mon, May 10, 2010 at 12:01 PM, Mike Malone <[email protected]> wrote:
>> > On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook <[email protected]>
>> wrote:
>> >>
>> >> I have to disagree about the naming of things. The name of something
>> >> isn't just a literal identifier. It affects the way people think about
>> >> it. For new users, the whole naming thing has been a persistent
>> >> barrier.
>> >
>> > I'm saying we shouldn't be worried too much about coming up with names
>> and
>> > analogies until we've decided what it is we're naming.
>> >
>> >>
>> >> As for your suggestions, I'm all for simplifying or generalizing the
>> >> "how it works" part down to a more generalized set of operations. I'm
>> >> not sure it's a good idea to require users to think in terms building
>> >> up a fluffy query structure just to thread it through a needle of an
>> >> API, even for the simplest of queries. At some point, the level of
>> >> generic boilerplate takes away from the semantic hand rails that
>> >> developers like. So I guess I'm suggesting that "how it works" and
>> >> "how we use it" are not always exactly the same. At least they should
>> >> both hinge on a common conceptual model, which is where the naming
>> >> becomes an important anchoring point.
>> >
>> > If things are done properly, client libraries could expose simplified
>> query
>> > interfaces without much effort. Most ORMs these days work by building a
>> > propositional directed acyclic graph that's serialized to SQL. This
>> would
>> > work the same way, but it wouldn't be converted into a 4GL.
>> > Mike
>> >
>> >>
>> >> Jonathan
>> >>
>> >> On Mon, May 10, 2010 at 11:37 AM, Mike Malone <[email protected]>
>> wrote:
>> >> > Maybe... but honestly, it doesn't affect the architecture or
>> interface
>> >> > at
>> >> > all. I'm more interested in thinking about how the system should work
>> >> > than
>> >> > what things are called. Naming things are important, but that can
>> happen
>> >> > later.
>> >> > Does anyone have any thoughts or comments on the architecture I
>> >> > suggested
>> >> > earlier?
>> >> >
>> >> > Mike
>> >> >
>> >> > On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang <[email protected]>
>> >> > wrote:
>> >> >>
>> >> >> Yes, the "column" here is not appropriate.
>> >> >> Maybe we need not to create new terms, in Google's Bigtable, the
>> term
>> >> >> "qualifier" is a good one.
>> >> >>
>> >> >> On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn <[email protected]>
>> >> >> wrote:
>> >> >>>
>> >> >>> That would be a good time to get rid of the confusing "column"
>> term,
>> >> >>> which incorrectly suggests a two-dimensional tabular structure.
>> >> >>>
>> >> >>> Suggestions:
>> >> >>>
>> >> >>> 1. A hypercube (or hypocube, if only two dimensions): replace "key"
>> >> >>> and
>> >> >>> "column" with "1st dimension", "2nd dimension", etc.
>> >> >>>
>> >> >>> 2. A file system: replace "key" and "column" with "directory" and
>> >> >>> "subdirectory"
>> >> >>>
>> >> >>> 3. A tuple tree: "Column family" replaced by top-level tuple, whose
>> >> >>> value
>> >> >>> is the set of keys, whose value is the set of supercolumns of the
>> key,
>> >> >>> whose
>> >> >>> value is the set of columns for the supercolumn, etc.
>> >> >>>
>> >> >>> 4. Etc.
>> >> >>>
>> >> >>> On Thu, May 6, 2010 at 2:28 AM, Mike Malone <[email protected]>
>> >> >>> wrote:
>> >> >>>>
>> >> >>>> Nice, Ed, we're doing something very similar but less generic.
>> >> >>>> Now replace all of the various methods for querying with a simple
>> >> >>>> query
>> >> >>>> interface that takes a Predicate, allow the user to specify (in
>> >> >>>> storage-conf) which levels of the nested Columns should be
>> indexed,
>> >> >>>> and
>> >> >>>> completely remove Comparators and have people subclass Column /
>> >> >>>> implement
>> >> >>>> IColumn and we'd really be on to something ;).
>> >> >>>> Mock storage-conf.xml:
>> >> >>>>   <Column Name="ThingThatsNowKey" Indexed="True"
>> >> >>>> ClusterPartitioned="True" Type="UTF8">
>> >> >>>>     <Column Name="ThingThatsNowColumnFamily"
>> DiskPartitioned="True"
>> >> >>>> Type="UTF8">
>> >> >>>>       <Column Name="ThingThatsNowSuperColumnName" Type="Long">
>> >> >>>>         <Column Name="ThingThatsNowColumnName" Indexed="True"
>> >> >>>> Type="ASCII">
>> >> >>>>           <Column Name="ThingThatCantCurrentlyBeRepresented"/>
>> >> >>>>         </Column>
>> >> >>>>       </Column>
>> >> >>>>     </Column>
>> >> >>>>   </Column>
>> >> >>>> Thrift:
>> >> >>>>   struct NamePredicate {
>> >> >>>>     1: required list<binary> column_names,
>> >> >>>>   }
>> >> >>>>   struct SlicePredicate {
>> >> >>>>     1: required binary start,
>> >> >>>>     2: required binary end,
>> >> >>>>   }
>> >> >>>>   struct CountPredicate {
>> >> >>>>     1: required struct predicate,
>> >> >>>>     2: required i32 count=100,
>> >> >>>>   }
>> >> >>>>   struct AndPredicate {
>> >> >>>>     1: required Predicate left,
>> >> >>>>     2: required Predicate right,
>> >> >>>>   }
>> >> >>>>   struct SubColumnsPredicate {
>> >> >>>>     1: required Predicate columns,
>> >> >>>>     2: required Predicate subcolumns,
>> >> >>>>   }
>> >> >>>>   ... OrPredicate, OtherUsefulPredicates ...
>> >> >>>>   query(predicate, count, consistency_level) # Count here would be
>> >> >>>> total
>> >> >>>> count of leaf values returned, whereas CountPredicate specifies a
>> >> >>>> column
>> >> >>>> count for a particular sub-slice.
>> >> >>>> Not fully baked... but I think this could really simplify stuff
>> and
>> >> >>>> make
>> >> >>>> it more flexible. Downside is it may give people enough rope to
>> hang
>> >> >>>> themselves, but at least the predicate stuff is easily
>> distributable.
>> >> >>>> I'm thinking I'll play around with implementing some of this stuff
>> >> >>>> myself if I have any free time in the near future.
>> >> >>>> Mike
>> >> >>>>
>> >> >>>> On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis <[email protected]
>> >
>> >> >>>> wrote:
>> >> >>>>>
>> >> >>>>> Very interesting, thanks!
>> >> >>>>>
>> >> >>>>> On Wed, May 5, 2010 at 1:31 PM, Ed Anuff <[email protected]> wrote:
>> >> >>>>> > Follow-up from last weeks discussion, I've been playing around
>> >> >>>>> > with a
>> >> >>>>> > simple
>> >> >>>>> > column comparator for composite column names that I put up on
>> >> >>>>> > github.  I'd
>> >> >>>>> > be interested to hear what people think of this approach.
>> >> >>>>> >
>> >> >>>>> > http://github.com/edanuff/CassandraCompositeType
>> >> >>>>> >
>> >> >>>>> > Ed
>> >> >>>>> >
>> >> >>>>> > On Wed, Apr 28, 2010 at 12:52 PM, Ed Anuff <[email protected]>
>> wrote:
>> >> >>>>> >>
>> >> >>>>> >> It might make sense to create a CompositeType subclass of
>> >> >>>>> >> AbstractType for
>> >> >>>>> >> the purpose of constructing and comparing these types of
>> >> >>>>> >> "composite"
>> >> >>>>> >> column
>> >> >>>>> >> names so that if you could more easily do that sort of thing
>> >> >>>>> >> rather
>> >> >>>>> >> than
>> >> >>>>> >> having to concatenate into one big string.
>> >> >>>>> >>
>> >> >>>>> >> On Wed, Apr 28, 2010 at 10:25 AM, Mike Malone
>> >> >>>>> >> <[email protected]>
>> >> >>>>> >> wrote:
>> >> >>>>> >>>
>> >> >>>>> >>> The only thing SuperColumns appear to buy you (as someone
>> >> >>>>> >>> pointed
>> >> >>>>> >>> out to
>> >> >>>>> >>> me at the Cassandra meetup - I think it was Eric Florenzano)
>> is
>> >> >>>>> >>> that you can
>> >> >>>>> >>> use different comparator types for the Super/SubColumns, I
>> >> >>>>> >>> guess..?
>> >> >>>>> >>> But you
>> >> >>>>> >>> should be able to do the same thing by creating your own
>> Column
>> >> >>>>> >>> comparator.
>> >> >>>>> >>> I guess my point is that SuperColumns are mostly a
>> convenience
>> >> >>>>> >>> mechanism, as
>> >> >>>>> >>> far as I can tell.
>> >> >>>>> >>> Mike
>> >> >>>>> >
>> >> >>>>> >
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> Jonathan Ellis
>> >> >>>>> Project Chair, Apache Cassandra
>> >> >>>>> co-founder of Riptano, the source for professional Cassandra
>> support
>> >> >>>>> http://riptano.com
>> >> >>>>
>> >> >>>
>> >> >>
>> >> >
>> >> >
>> >
>> >
>>
>
>
>
> --
> AJ Chen, PhD
> Chair, Semantic Web SIG, sdforum.org
> http://web2express.org
> twitter @web2express
> Palo Alto, CA, USA
>

Re: Is SuperColumn necessary?

Reply via email to