Re: Is SuperColumn necessary?

Jonathan Shook Mon, 10 May 2010 09:53:20 -0700

I have to disagree about the naming of things. The name of something
isn't just a literal identifier. It affects the way people think about
it. For new users, the whole naming thing has been a persistent
barrier.


As for your suggestions, I'm all for simplifying or generalizing the
"how it works" part down to a more generalized set of operations. I'm
not sure it's a good idea to require users to think in terms building
up a fluffy query structure just to thread it through a needle of an
API, even for the simplest of queries. At some point, the level of
generic boilerplate takes away from the semantic hand rails that
developers like. So I guess I'm suggesting that "how it works" and
"how we use it" are not always exactly the same. At least they should
both hinge on a common conceptual model, which is where the naming
becomes an important anchoring point.

Jonathan

On Mon, May 10, 2010 at 11:37 AM, Mike Malone <m...@simplegeo.com> wrote:
> Maybe... but honestly, it doesn't affect the architecture or interface at
> all. I'm more interested in thinking about how the system should work than
> what things are called. Naming things are important, but that can happen
> later.
> Does anyone have any thoughts or comments on the architecture I suggested
> earlier?
>
> Mike
>
> On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang <zson...@gmail.com> wrote:
>>
>> Yes, the "column" here is not appropriate.
>> Maybe we need not to create new terms, in Google's Bigtable, the term
>> "qualifier" is a good one.
>>
>> On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn <da...@lookin2.com> wrote:
>>>
>>> That would be a good time to get rid of the confusing "column" term,
>>> which incorrectly suggests a two-dimensional tabular structure.
>>>
>>> Suggestions:
>>>
>>> 1. A hypercube (or hypocube, if only two dimensions): replace "key" and
>>> "column" with "1st dimension", "2nd dimension", etc.
>>>
>>> 2. A file system: replace "key" and "column" with "directory" and
>>> "subdirectory"
>>>
>>> 3. A tuple tree: "Column family" replaced by top-level tuple, whose value
>>> is the set of keys, whose value is the set of supercolumns of the key, whose
>>> value is the set of columns for the supercolumn, etc.
>>>
>>> 4. Etc.
>>>
>>> On Thu, May 6, 2010 at 2:28 AM, Mike Malone <m...@simplegeo.com> wrote:
>>>>
>>>> Nice, Ed, we're doing something very similar but less generic.
>>>> Now replace all of the various methods for querying with a simple query
>>>> interface that takes a Predicate, allow the user to specify (in
>>>> storage-conf) which levels of the nested Columns should be indexed, and
>>>> completely remove Comparators and have people subclass Column / implement
>>>> IColumn and we'd really be on to something ;).
>>>> Mock storage-conf.xml:
>>>>   <Column Name="ThingThatsNowKey" Indexed="True"
>>>> ClusterPartitioned="True" Type="UTF8">
>>>>     <Column Name="ThingThatsNowColumnFamily" DiskPartitioned="True"
>>>> Type="UTF8">
>>>>       <Column Name="ThingThatsNowSuperColumnName" Type="Long">
>>>>         <Column Name="ThingThatsNowColumnName" Indexed="True"
>>>> Type="ASCII">
>>>>           <Column Name="ThingThatCantCurrentlyBeRepresented"/>
>>>>         </Column>
>>>>       </Column>
>>>>     </Column>
>>>>   </Column>
>>>> Thrift:
>>>>   struct NamePredicate {
>>>>     1: required list<binary> column_names,
>>>>   }
>>>>   struct SlicePredicate {
>>>>     1: required binary start,
>>>>     2: required binary end,
>>>>   }
>>>>   struct CountPredicate {
>>>>     1: required struct predicate,
>>>>     2: required i32 count=100,
>>>>   }
>>>>   struct AndPredicate {
>>>>     1: required Predicate left,
>>>>     2: required Predicate right,
>>>>   }
>>>>   struct SubColumnsPredicate {
>>>>     1: required Predicate columns,
>>>>     2: required Predicate subcolumns,
>>>>   }
>>>>   ... OrPredicate, OtherUsefulPredicates ...
>>>>   query(predicate, count, consistency_level) # Count here would be total
>>>> count of leaf values returned, whereas CountPredicate specifies a column
>>>> count for a particular sub-slice.
>>>> Not fully baked... but I think this could really simplify stuff and make
>>>> it more flexible. Downside is it may give people enough rope to hang
>>>> themselves, but at least the predicate stuff is easily distributable.
>>>> I'm thinking I'll play around with implementing some of this stuff
>>>> myself if I have any free time in the near future.
>>>> Mike
>>>>
>>>> On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis <jbel...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Very interesting, thanks!
>>>>>
>>>>> On Wed, May 5, 2010 at 1:31 PM, Ed Anuff <e...@anuff.com> wrote:
>>>>> > Follow-up from last weeks discussion, I've been playing around with a
>>>>> > simple
>>>>> > column comparator for composite column names that I put up on
>>>>> > github.  I'd
>>>>> > be interested to hear what people think of this approach.
>>>>> >
>>>>> > http://github.com/edanuff/CassandraCompositeType
>>>>> >
>>>>> > Ed
>>>>> >
>>>>> > On Wed, Apr 28, 2010 at 12:52 PM, Ed Anuff <e...@anuff.com> wrote:
>>>>> >>
>>>>> >> It might make sense to create a CompositeType subclass of
>>>>> >> AbstractType for
>>>>> >> the purpose of constructing and comparing these types of "composite"
>>>>> >> column
>>>>> >> names so that if you could more easily do that sort of thing rather
>>>>> >> than
>>>>> >> having to concatenate into one big string.
>>>>> >>
>>>>> >> On Wed, Apr 28, 2010 at 10:25 AM, Mike Malone <m...@simplegeo.com>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> The only thing SuperColumns appear to buy you (as someone pointed
>>>>> >>> out to
>>>>> >>> me at the Cassandra meetup - I think it was Eric Florenzano) is
>>>>> >>> that you can
>>>>> >>> use different comparator types for the Super/SubColumns, I guess..?
>>>>> >>> But you
>>>>> >>> should be able to do the same thing by creating your own Column
>>>>> >>> comparator.
>>>>> >>> I guess my point is that SuperColumns are mostly a convenience
>>>>> >>> mechanism, as
>>>>> >>> far as I can tell.
>>>>> >>> Mike
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jonathan Ellis
>>>>> Project Chair, Apache Cassandra
>>>>> co-founder of Riptano, the source for professional Cassandra support
>>>>> http://riptano.com
>>>>
>>>
>>
>
>

Re: Is SuperColumn necessary?

Reply via email to