Re: Is SuperColumn necessary?

Schubert Zhang Tue, 11 May 2010 08:24:05 -0700

I appreciate to let cassandra core data model clear and pure.


On Tue, May 11, 2010 at 5:20 AM, Mike Malone <m...@simplegeo.com> wrote:

> On Mon, May 10, 2010 at 1:38 PM, AJ Chen <ajc...@web2express.org> wrote:
>
>> Could someone confirm this discussion is not about abandoning supercolumn
>> family? I have found modeling data with supercolumn family is actually an
>> advantage of cassadra compared to relational database. Hope you are going to
>> drop this important concept.  How it's implemented internally is a different
>> matter.
>>
>
> SuperColumns are useful as a convenience mechanism. That's pretty much it.
> There's _nothing_ (as far as I can tell) that you can do with SuperColumns
> that you can't do by manually concatenating key names with a separator on
> the client side and implementing a custom comparator on the server (as ugly
> as that is).
>
> This discussion is about getting rid of SuperColumns and adding a more
> generic mechanism that will actually be useful and interesting and will
> continue to be convenient for the types of use cases for which people use
> SuperColumns.
>
> If there's a particular use case that you feel you can only implement with
> SuperColumns, please share! I honestly can't think of any.
>
> Mike
>
>
>> On Mon, May 10, 2010 at 10:08 AM, Jonathan Shook <jsh...@gmail.com>wrote:
>>
>>> Agreed
>>>
>>> On Mon, May 10, 2010 at 12:01 PM, Mike Malone <m...@simplegeo.com>
>>> wrote:
>>> > On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook <jsh...@gmail.com>
>>> wrote:
>>> >>
>>> >> I have to disagree about the naming of things. The name of something
>>> >> isn't just a literal identifier. It affects the way people think about
>>> >> it. For new users, the whole naming thing has been a persistent
>>> >> barrier.
>>> >
>>> > I'm saying we shouldn't be worried too much about coming up with names
>>> and
>>> > analogies until we've decided what it is we're naming.
>>> >
>>> >>
>>> >> As for your suggestions, I'm all for simplifying or generalizing the
>>> >> "how it works" part down to a more generalized set of operations. I'm
>>> >> not sure it's a good idea to require users to think in terms building
>>> >> up a fluffy query structure just to thread it through a needle of an
>>> >> API, even for the simplest of queries. At some point, the level of
>>> >> generic boilerplate takes away from the semantic hand rails that
>>> >> developers like. So I guess I'm suggesting that "how it works" and
>>> >> "how we use it" are not always exactly the same. At least they should
>>> >> both hinge on a common conceptual model, which is where the naming
>>> >> becomes an important anchoring point.
>>> >
>>> > If things are done properly, client libraries could expose simplified
>>> query
>>> > interfaces without much effort. Most ORMs these days work by building a
>>> > propositional directed acyclic graph that's serialized to SQL. This
>>> would
>>> > work the same way, but it wouldn't be converted into a 4GL.
>>> > Mike
>>> >
>>> >>
>>> >> Jonathan
>>> >>
>>> >> On Mon, May 10, 2010 at 11:37 AM, Mike Malone <m...@simplegeo.com>
>>> wrote:
>>> >> > Maybe... but honestly, it doesn't affect the architecture or
>>> interface
>>> >> > at
>>> >> > all. I'm more interested in thinking about how the system should
>>> work
>>> >> > than
>>> >> > what things are called. Naming things are important, but that can
>>> happen
>>> >> > later.
>>> >> > Does anyone have any thoughts or comments on the architecture I
>>> >> > suggested
>>> >> > earlier?
>>> >> >
>>> >> > Mike
>>> >> >
>>> >> > On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang <zson...@gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Yes, the "column" here is not appropriate.
>>> >> >> Maybe we need not to create new terms, in Google's Bigtable, the
>>> term
>>> >> >> "qualifier" is a good one.
>>> >> >>
>>> >> >> On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn <da...@lookin2.com
>>> >
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> That would be a good time to get rid of the confusing "column"
>>> term,
>>> >> >>> which incorrectly suggests a two-dimensional tabular structure.
>>> >> >>>
>>> >> >>> Suggestions:
>>> >> >>>
>>> >> >>> 1. A hypercube (or hypocube, if only two dimensions): replace
>>> "key"
>>> >> >>> and
>>> >> >>> "column" with "1st dimension", "2nd dimension", etc.
>>> >> >>>
>>> >> >>> 2. A file system: replace "key" and "column" with "directory" and
>>> >> >>> "subdirectory"
>>> >> >>>
>>> >> >>> 3. A tuple tree: "Column family" replaced by top-level tuple,
>>> whose
>>> >> >>> value
>>> >> >>> is the set of keys, whose value is the set of supercolumns of the
>>> key,
>>> >> >>> whose
>>> >> >>> value is the set of columns for the supercolumn, etc.
>>> >> >>>
>>> >> >>> 4. Etc.
>>> >> >>>
>>> >> >>> On Thu, May 6, 2010 at 2:28 AM, Mike Malone <m...@simplegeo.com>
>>> >> >>> wrote:
>>> >> >>>>
>>> >> >>>> Nice, Ed, we're doing something very similar but less generic.
>>> >> >>>> Now replace all of the various methods for querying with a simple
>>> >> >>>> query
>>> >> >>>> interface that takes a Predicate, allow the user to specify (in
>>> >> >>>> storage-conf) which levels of the nested Columns should be
>>> indexed,
>>> >> >>>> and
>>> >> >>>> completely remove Comparators and have people subclass Column /
>>> >> >>>> implement
>>> >> >>>> IColumn and we'd really be on to something ;).
>>> >> >>>> Mock storage-conf.xml:
>>> >> >>>>   <Column Name="ThingThatsNowKey" Indexed="True"
>>> >> >>>> ClusterPartitioned="True" Type="UTF8">
>>> >> >>>>     <Column Name="ThingThatsNowColumnFamily"
>>> DiskPartitioned="True"
>>> >> >>>> Type="UTF8">
>>> >> >>>>       <Column Name="ThingThatsNowSuperColumnName" Type="Long">
>>> >> >>>>         <Column Name="ThingThatsNowColumnName" Indexed="True"
>>> >> >>>> Type="ASCII">
>>> >> >>>>           <Column Name="ThingThatCantCurrentlyBeRepresented"/>
>>> >> >>>>         </Column>
>>> >> >>>>       </Column>
>>> >> >>>>     </Column>
>>> >> >>>>   </Column>
>>> >> >>>> Thrift:
>>> >> >>>>   struct NamePredicate {
>>> >> >>>>     1: required list<binary> column_names,
>>> >> >>>>   }
>>> >> >>>>   struct SlicePredicate {
>>> >> >>>>     1: required binary start,
>>> >> >>>>     2: required binary end,
>>> >> >>>>   }
>>> >> >>>>   struct CountPredicate {
>>> >> >>>>     1: required struct predicate,
>>> >> >>>>     2: required i32 count=100,
>>> >> >>>>   }
>>> >> >>>>   struct AndPredicate {
>>> >> >>>>     1: required Predicate left,
>>> >> >>>>     2: required Predicate right,
>>> >> >>>>   }
>>> >> >>>>   struct SubColumnsPredicate {
>>> >> >>>>     1: required Predicate columns,
>>> >> >>>>     2: required Predicate subcolumns,
>>> >> >>>>   }
>>> >> >>>>   ... OrPredicate, OtherUsefulPredicates ...
>>> >> >>>>   query(predicate, count, consistency_level) # Count here would
>>> be
>>> >> >>>> total
>>> >> >>>> count of leaf values returned, whereas CountPredicate specifies a
>>> >> >>>> column
>>> >> >>>> count for a particular sub-slice.
>>> >> >>>> Not fully baked... but I think this could really simplify stuff
>>> and
>>> >> >>>> make
>>> >> >>>> it more flexible. Downside is it may give people enough rope to
>>> hang
>>> >> >>>> themselves, but at least the predicate stuff is easily
>>> distributable.
>>> >> >>>> I'm thinking I'll play around with implementing some of this
>>> stuff
>>> >> >>>> myself if I have any free time in the near future.
>>> >> >>>> Mike
>>> >> >>>>
>>> >> >>>> On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis <
>>> jbel...@gmail.com>
>>> >> >>>> wrote:
>>> >> >>>>>
>>> >> >>>>> Very interesting, thanks!
>>> >> >>>>>
>>> >> >>>>> On Wed, May 5, 2010 at 1:31 PM, Ed Anuff <e...@anuff.com> wrote:
>>> >> >>>>> > Follow-up from last weeks discussion, I've been playing around
>>> >> >>>>> > with a
>>> >> >>>>> > simple
>>> >> >>>>> > column comparator for composite column names that I put up on
>>> >> >>>>> > github.  I'd
>>> >> >>>>> > be interested to hear what people think of this approach.
>>> >> >>>>> >
>>> >> >>>>> > http://github.com/edanuff/CassandraCompositeType
>>> >> >>>>> >
>>> >> >>>>> > Ed
>>> >> >>>>> >
>>> >> >>>>> > On Wed, Apr 28, 2010 at 12:52 PM, Ed Anuff <e...@anuff.com>
>>> wrote:
>>> >> >>>>> >>
>>> >> >>>>> >> It might make sense to create a CompositeType subclass of
>>> >> >>>>> >> AbstractType for
>>> >> >>>>> >> the purpose of constructing and comparing these types of
>>> >> >>>>> >> "composite"
>>> >> >>>>> >> column
>>> >> >>>>> >> names so that if you could more easily do that sort of thing
>>> >> >>>>> >> rather
>>> >> >>>>> >> than
>>> >> >>>>> >> having to concatenate into one big string.
>>> >> >>>>> >>
>>> >> >>>>> >> On Wed, Apr 28, 2010 at 10:25 AM, Mike Malone
>>> >> >>>>> >> <m...@simplegeo.com>
>>> >> >>>>> >> wrote:
>>> >> >>>>> >>>
>>> >> >>>>> >>> The only thing SuperColumns appear to buy you (as someone
>>> >> >>>>> >>> pointed
>>> >> >>>>> >>> out to
>>> >> >>>>> >>> me at the Cassandra meetup - I think it was Eric Florenzano)
>>> is
>>> >> >>>>> >>> that you can
>>> >> >>>>> >>> use different comparator types for the Super/SubColumns, I
>>> >> >>>>> >>> guess..?
>>> >> >>>>> >>> But you
>>> >> >>>>> >>> should be able to do the same thing by creating your own
>>> Column
>>> >> >>>>> >>> comparator.
>>> >> >>>>> >>> I guess my point is that SuperColumns are mostly a
>>> convenience
>>> >> >>>>> >>> mechanism, as
>>> >> >>>>> >>> far as I can tell.
>>> >> >>>>> >>> Mike
>>> >> >>>>> >
>>> >> >>>>> >
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> --
>>> >> >>>>> Jonathan Ellis
>>> >> >>>>> Project Chair, Apache Cassandra
>>> >> >>>>> co-founder of Riptano, the source for professional Cassandra
>>> support
>>> >> >>>>> http://riptano.com
>>> >> >>>>
>>> >> >>>
>>> >> >>
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> AJ Chen, PhD
>> Chair, Semantic Web SIG, sdforum.org
>> http://web2express.org
>> twitter @web2express
>> Palo Alto, CA, USA
>>
>
>

Re: Is SuperColumn necessary?

Reply via email to