Re: Is SuperColumn necessary?

Jonathan Shook Mon, 10 May 2010 10:08:31 -0700

Agreed


On Mon, May 10, 2010 at 12:01 PM, Mike Malone <m...@simplegeo.com> wrote:
> On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook <jsh...@gmail.com> wrote:
>>
>> I have to disagree about the naming of things. The name of something
>> isn't just a literal identifier. It affects the way people think about
>> it. For new users, the whole naming thing has been a persistent
>> barrier.
>
> I'm saying we shouldn't be worried too much about coming up with names and
> analogies until we've decided what it is we're naming.
>
>>
>> As for your suggestions, I'm all for simplifying or generalizing the
>> "how it works" part down to a more generalized set of operations. I'm
>> not sure it's a good idea to require users to think in terms building
>> up a fluffy query structure just to thread it through a needle of an
>> API, even for the simplest of queries. At some point, the level of
>> generic boilerplate takes away from the semantic hand rails that
>> developers like. So I guess I'm suggesting that "how it works" and
>> "how we use it" are not always exactly the same. At least they should
>> both hinge on a common conceptual model, which is where the naming
>> becomes an important anchoring point.
>
> If things are done properly, client libraries could expose simplified query
> interfaces without much effort. Most ORMs these days work by building a
> propositional directed acyclic graph that's serialized to SQL. This would
> work the same way, but it wouldn't be converted into a 4GL.
> Mike
>
>>
>> Jonathan
>>
>> On Mon, May 10, 2010 at 11:37 AM, Mike Malone <m...@simplegeo.com> wrote:
>> > Maybe... but honestly, it doesn't affect the architecture or interface
>> > at
>> > all. I'm more interested in thinking about how the system should work
>> > than
>> > what things are called. Naming things are important, but that can happen
>> > later.
>> > Does anyone have any thoughts or comments on the architecture I
>> > suggested
>> > earlier?
>> >
>> > Mike
>> >
>> > On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang <zson...@gmail.com>
>> > wrote:
>> >>
>> >> Yes, the "column" here is not appropriate.
>> >> Maybe we need not to create new terms, in Google's Bigtable, the term
>> >> "qualifier" is a good one.
>> >>
>> >> On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn <da...@lookin2.com>
>> >> wrote:
>> >>>
>> >>> That would be a good time to get rid of the confusing "column" term,
>> >>> which incorrectly suggests a two-dimensional tabular structure.
>> >>>
>> >>> Suggestions:
>> >>>
>> >>> 1. A hypercube (or hypocube, if only two dimensions): replace "key"
>> >>> and
>> >>> "column" with "1st dimension", "2nd dimension", etc.
>> >>>
>> >>> 2. A file system: replace "key" and "column" with "directory" and
>> >>> "subdirectory"
>> >>>
>> >>> 3. A tuple tree: "Column family" replaced by top-level tuple, whose
>> >>> value
>> >>> is the set of keys, whose value is the set of supercolumns of the key,
>> >>> whose
>> >>> value is the set of columns for the supercolumn, etc.
>> >>>
>> >>> 4. Etc.
>> >>>
>> >>> On Thu, May 6, 2010 at 2:28 AM, Mike Malone <m...@simplegeo.com>
>> >>> wrote:
>> >>>>
>> >>>> Nice, Ed, we're doing something very similar but less generic.
>> >>>> Now replace all of the various methods for querying with a simple
>> >>>> query
>> >>>> interface that takes a Predicate, allow the user to specify (in
>> >>>> storage-conf) which levels of the nested Columns should be indexed,
>> >>>> and
>> >>>> completely remove Comparators and have people subclass Column /
>> >>>> implement
>> >>>> IColumn and we'd really be on to something ;).
>> >>>> Mock storage-conf.xml:
>> >>>>   <Column Name="ThingThatsNowKey" Indexed="True"
>> >>>> ClusterPartitioned="True" Type="UTF8">
>> >>>>     <Column Name="ThingThatsNowColumnFamily" DiskPartitioned="True"
>> >>>> Type="UTF8">
>> >>>>       <Column Name="ThingThatsNowSuperColumnName" Type="Long">
>> >>>>         <Column Name="ThingThatsNowColumnName" Indexed="True"
>> >>>> Type="ASCII">
>> >>>>           <Column Name="ThingThatCantCurrentlyBeRepresented"/>
>> >>>>         </Column>
>> >>>>       </Column>
>> >>>>     </Column>
>> >>>>   </Column>
>> >>>> Thrift:
>> >>>>   struct NamePredicate {
>> >>>>     1: required list<binary> column_names,
>> >>>>   }
>> >>>>   struct SlicePredicate {
>> >>>>     1: required binary start,
>> >>>>     2: required binary end,
>> >>>>   }
>> >>>>   struct CountPredicate {
>> >>>>     1: required struct predicate,
>> >>>>     2: required i32 count=100,
>> >>>>   }
>> >>>>   struct AndPredicate {
>> >>>>     1: required Predicate left,
>> >>>>     2: required Predicate right,
>> >>>>   }
>> >>>>   struct SubColumnsPredicate {
>> >>>>     1: required Predicate columns,
>> >>>>     2: required Predicate subcolumns,
>> >>>>   }
>> >>>>   ... OrPredicate, OtherUsefulPredicates ...
>> >>>>   query(predicate, count, consistency_level) # Count here would be
>> >>>> total
>> >>>> count of leaf values returned, whereas CountPredicate specifies a
>> >>>> column
>> >>>> count for a particular sub-slice.
>> >>>> Not fully baked... but I think this could really simplify stuff and
>> >>>> make
>> >>>> it more flexible. Downside is it may give people enough rope to hang
>> >>>> themselves, but at least the predicate stuff is easily distributable.
>> >>>> I'm thinking I'll play around with implementing some of this stuff
>> >>>> myself if I have any free time in the near future.
>> >>>> Mike
>> >>>>
>> >>>> On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis <jbel...@gmail.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> Very interesting, thanks!
>> >>>>>
>> >>>>> On Wed, May 5, 2010 at 1:31 PM, Ed Anuff <e...@anuff.com> wrote:
>> >>>>> > Follow-up from last weeks discussion, I've been playing around
>> >>>>> > with a
>> >>>>> > simple
>> >>>>> > column comparator for composite column names that I put up on
>> >>>>> > github.  I'd
>> >>>>> > be interested to hear what people think of this approach.
>> >>>>> >
>> >>>>> > http://github.com/edanuff/CassandraCompositeType
>> >>>>> >
>> >>>>> > Ed
>> >>>>> >
>> >>>>> > On Wed, Apr 28, 2010 at 12:52 PM, Ed Anuff <e...@anuff.com> wrote:
>> >>>>> >>
>> >>>>> >> It might make sense to create a CompositeType subclass of
>> >>>>> >> AbstractType for
>> >>>>> >> the purpose of constructing and comparing these types of
>> >>>>> >> "composite"
>> >>>>> >> column
>> >>>>> >> names so that if you could more easily do that sort of thing
>> >>>>> >> rather
>> >>>>> >> than
>> >>>>> >> having to concatenate into one big string.
>> >>>>> >>
>> >>>>> >> On Wed, Apr 28, 2010 at 10:25 AM, Mike Malone
>> >>>>> >> <m...@simplegeo.com>
>> >>>>> >> wrote:
>> >>>>> >>>
>> >>>>> >>> The only thing SuperColumns appear to buy you (as someone
>> >>>>> >>> pointed
>> >>>>> >>> out to
>> >>>>> >>> me at the Cassandra meetup - I think it was Eric Florenzano) is
>> >>>>> >>> that you can
>> >>>>> >>> use different comparator types for the Super/SubColumns, I
>> >>>>> >>> guess..?
>> >>>>> >>> But you
>> >>>>> >>> should be able to do the same thing by creating your own Column
>> >>>>> >>> comparator.
>> >>>>> >>> I guess my point is that SuperColumns are mostly a convenience
>> >>>>> >>> mechanism, as
>> >>>>> >>> far as I can tell.
>> >>>>> >>> Mike
>> >>>>> >
>> >>>>> >
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Jonathan Ellis
>> >>>>> Project Chair, Apache Cassandra
>> >>>>> co-founder of Riptano, the source for professional Cassandra support
>> >>>>> http://riptano.com
>> >>>>
>> >>>
>> >>
>> >
>> >
>
>

Re: Is SuperColumn necessary?

Reply via email to