Re: Is SuperColumn necessary?

Mike Malone Tue, 11 May 2010 09:00:31 -0700

On Tue, May 11, 2010 at 7:46 AM, David Boxenhorn <da...@lookin2.com> wrote:


> I would like an API with a variable number of arguments. Using Java
> varargs, something like
>
> value = keyspace.get("articles", "cars", "John Smith", "2010-05-01",
> "comment-25");
>
> or
>
> valueArray = keyspace.get("articles", predicate1, predicate2, predicate3,
> predicate4);
>

Hrm. I haven't dug that deeply into the joys of predicate logic,
propositional DAGs, etc. but couldn't this also be represented as a nested
tree of predicates / other primitives. So it would be something like:

   SubColumns = Transformation that takes a predicate, applies it to a
Column, then gets it's SubColumns
   keyspace.get("articles", SubColumns(predicate1, SubColumns(predicate2,
SubColumns(predicate3, predicate4))));

It's more like functional programming-ish, I suppose, but I think that model
might apply more cleanly here. FP does tend to result in nice clean
algorithms for manipulating large data sets.

Mike


>
>
> The storage layout would be determined by the configuration, as below:
>
> <Column Name="ThingThatsNowKey" Indexed="True" ClusterPartitioned="True"
> ...
>
>
>
>
> On Tue, May 11, 2010 at 5:26 PM, Jonathan Shook <jsh...@gmail.com> wrote:
>
>> This is one of the sticking points with the key concatenation
>> argument. You can't simply access subpartitions of data along an
>> aggregate name using a concatenated key unless you can efficiently
>> address a range of the keys according to a property of a subset. I'm
>> hoping this will bear out with more of this discussion.
>>
>> Another facet of this issue is performance with respect to storage
>> layout. Presently columns within a row are inherently organized for
>> efficient range operations. The key space is not generally optimal in
>> this way. I'm hoping to see some discussion of this, as well.
>>
>> On Tue, May 11, 2010 at 6:17 AM, vd <vineetdan...@gmail.com> wrote:
>> > Hi
>> >
>> > Can we make range search on ID:ID format as this would be treated as
>> > single ID by API or can it bifurcate on ':' . If now then how do can
>> > we ignore usage of supercolumns where we need to associate 'n' number
>> > of rows to a single ID.
>> > Like
>> >          CatID1-> articleID1
>> >          CatID1-> articleID2
>> >          CatID1-> articleID3
>> >          CatID1-> articleID4
>> > How can we map such scenarios with simple column families.
>> >
>> > Rgds.
>> >
>> > On Tue, May 11, 2010 at 2:11 PM, Torsten Curdt <tcu...@vafer.org>
>> wrote:
>> >> Exactly.
>> >>
>> >> On Tue, May 11, 2010 at 10:20, David Boxenhorn <da...@lookin2.com>
>> wrote:
>> >>> Don't think of it as getting rid of supercolum. Think of it as adding
>> >>> superdupercolums, supertriplecolums, etc. Or, in sparse array
>> terminology:
>> >>> array[dim1][dim2][dim3].....[dimN] = value
>> >>>
>> >>> Or, as said above:
>> >>>
>> >>>   <Column Name="ThingThatsNowKey" Indexed="True"
>> ClusterPartitioned="True"
>> >>> Type="UTF8">
>> >>>     <Column Name="ThingThatsNowColumnFamily" DiskPartitioned="True"
>> >>> Type="UTF8">
>> >>>       <Column Name="ThingThatsNowSuperColumnName" Type="Long">
>> >>>         <Column Name="ThingThatsNowColumnName" Indexed="True"
>> Type="ASCII">
>> >>>           <Column Name="ThingThatCantCurrentlyBeRepresented"/>
>> >>>         </Column>
>> >>>       </Column>
>> >>>     </Column>
>> >>>   </Column>
>> >>
>> >
>>
>
>

Re: Is SuperColumn necessary?

Reply via email to