Could someone confirm this discussion is not about abandoning supercolumn family? I have found modeling data with supercolumn family is actually an advantage of cassadra compared to relational database. Hope you are going to drop this important concept. How it's implemented internally is a different matter. -aj
On Mon, May 10, 2010 at 10:08 AM, Jonathan Shook <jsh...@gmail.com> wrote: > Agreed > > On Mon, May 10, 2010 at 12:01 PM, Mike Malone <m...@simplegeo.com> wrote: > > On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook <jsh...@gmail.com> > wrote: > >> > >> I have to disagree about the naming of things. The name of something > >> isn't just a literal identifier. It affects the way people think about > >> it. For new users, the whole naming thing has been a persistent > >> barrier. > > > > I'm saying we shouldn't be worried too much about coming up with names > and > > analogies until we've decided what it is we're naming. > > > >> > >> As for your suggestions, I'm all for simplifying or generalizing the > >> "how it works" part down to a more generalized set of operations. I'm > >> not sure it's a good idea to require users to think in terms building > >> up a fluffy query structure just to thread it through a needle of an > >> API, even for the simplest of queries. At some point, the level of > >> generic boilerplate takes away from the semantic hand rails that > >> developers like. So I guess I'm suggesting that "how it works" and > >> "how we use it" are not always exactly the same. At least they should > >> both hinge on a common conceptual model, which is where the naming > >> becomes an important anchoring point. > > > > If things are done properly, client libraries could expose simplified > query > > interfaces without much effort. Most ORMs these days work by building a > > propositional directed acyclic graph that's serialized to SQL. This would > > work the same way, but it wouldn't be converted into a 4GL. > > Mike > > > >> > >> Jonathan > >> > >> On Mon, May 10, 2010 at 11:37 AM, Mike Malone <m...@simplegeo.com> > wrote: > >> > Maybe... but honestly, it doesn't affect the architecture or interface > >> > at > >> > all. I'm more interested in thinking about how the system should work > >> > than > >> > what things are called. Naming things are important, but that can > happen > >> > later. > >> > Does anyone have any thoughts or comments on the architecture I > >> > suggested > >> > earlier? > >> > > >> > Mike > >> > > >> > On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang <zson...@gmail.com> > >> > wrote: > >> >> > >> >> Yes, the "column" here is not appropriate. > >> >> Maybe we need not to create new terms, in Google's Bigtable, the term > >> >> "qualifier" is a good one. > >> >> > >> >> On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn <da...@lookin2.com> > >> >> wrote: > >> >>> > >> >>> That would be a good time to get rid of the confusing "column" term, > >> >>> which incorrectly suggests a two-dimensional tabular structure. > >> >>> > >> >>> Suggestions: > >> >>> > >> >>> 1. A hypercube (or hypocube, if only two dimensions): replace "key" > >> >>> and > >> >>> "column" with "1st dimension", "2nd dimension", etc. > >> >>> > >> >>> 2. A file system: replace "key" and "column" with "directory" and > >> >>> "subdirectory" > >> >>> > >> >>> 3. A tuple tree: "Column family" replaced by top-level tuple, whose > >> >>> value > >> >>> is the set of keys, whose value is the set of supercolumns of the > key, > >> >>> whose > >> >>> value is the set of columns for the supercolumn, etc. > >> >>> > >> >>> 4. Etc. > >> >>> > >> >>> On Thu, May 6, 2010 at 2:28 AM, Mike Malone <m...@simplegeo.com> > >> >>> wrote: > >> >>>> > >> >>>> Nice, Ed, we're doing something very similar but less generic. > >> >>>> Now replace all of the various methods for querying with a simple > >> >>>> query > >> >>>> interface that takes a Predicate, allow the user to specify (in > >> >>>> storage-conf) which levels of the nested Columns should be indexed, > >> >>>> and > >> >>>> completely remove Comparators and have people subclass Column / > >> >>>> implement > >> >>>> IColumn and we'd really be on to something ;). > >> >>>> Mock storage-conf.xml: > >> >>>> <Column Name="ThingThatsNowKey" Indexed="True" > >> >>>> ClusterPartitioned="True" Type="UTF8"> > >> >>>> <Column Name="ThingThatsNowColumnFamily" DiskPartitioned="True" > >> >>>> Type="UTF8"> > >> >>>> <Column Name="ThingThatsNowSuperColumnName" Type="Long"> > >> >>>> <Column Name="ThingThatsNowColumnName" Indexed="True" > >> >>>> Type="ASCII"> > >> >>>> <Column Name="ThingThatCantCurrentlyBeRepresented"/> > >> >>>> </Column> > >> >>>> </Column> > >> >>>> </Column> > >> >>>> </Column> > >> >>>> Thrift: > >> >>>> struct NamePredicate { > >> >>>> 1: required list<binary> column_names, > >> >>>> } > >> >>>> struct SlicePredicate { > >> >>>> 1: required binary start, > >> >>>> 2: required binary end, > >> >>>> } > >> >>>> struct CountPredicate { > >> >>>> 1: required struct predicate, > >> >>>> 2: required i32 count=100, > >> >>>> } > >> >>>> struct AndPredicate { > >> >>>> 1: required Predicate left, > >> >>>> 2: required Predicate right, > >> >>>> } > >> >>>> struct SubColumnsPredicate { > >> >>>> 1: required Predicate columns, > >> >>>> 2: required Predicate subcolumns, > >> >>>> } > >> >>>> ... OrPredicate, OtherUsefulPredicates ... > >> >>>> query(predicate, count, consistency_level) # Count here would be > >> >>>> total > >> >>>> count of leaf values returned, whereas CountPredicate specifies a > >> >>>> column > >> >>>> count for a particular sub-slice. > >> >>>> Not fully baked... but I think this could really simplify stuff and > >> >>>> make > >> >>>> it more flexible. Downside is it may give people enough rope to > hang > >> >>>> themselves, but at least the predicate stuff is easily > distributable. > >> >>>> I'm thinking I'll play around with implementing some of this stuff > >> >>>> myself if I have any free time in the near future. > >> >>>> Mike > >> >>>> > >> >>>> On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis <jbel...@gmail.com> > >> >>>> wrote: > >> >>>>> > >> >>>>> Very interesting, thanks! > >> >>>>> > >> >>>>> On Wed, May 5, 2010 at 1:31 PM, Ed Anuff <e...@anuff.com> wrote: > >> >>>>> > Follow-up from last weeks discussion, I've been playing around > >> >>>>> > with a > >> >>>>> > simple > >> >>>>> > column comparator for composite column names that I put up on > >> >>>>> > github. I'd > >> >>>>> > be interested to hear what people think of this approach. > >> >>>>> > > >> >>>>> > http://github.com/edanuff/CassandraCompositeType > >> >>>>> > > >> >>>>> > Ed > >> >>>>> > > >> >>>>> > On Wed, Apr 28, 2010 at 12:52 PM, Ed Anuff <e...@anuff.com> > wrote: > >> >>>>> >> > >> >>>>> >> It might make sense to create a CompositeType subclass of > >> >>>>> >> AbstractType for > >> >>>>> >> the purpose of constructing and comparing these types of > >> >>>>> >> "composite" > >> >>>>> >> column > >> >>>>> >> names so that if you could more easily do that sort of thing > >> >>>>> >> rather > >> >>>>> >> than > >> >>>>> >> having to concatenate into one big string. > >> >>>>> >> > >> >>>>> >> On Wed, Apr 28, 2010 at 10:25 AM, Mike Malone > >> >>>>> >> <m...@simplegeo.com> > >> >>>>> >> wrote: > >> >>>>> >>> > >> >>>>> >>> The only thing SuperColumns appear to buy you (as someone > >> >>>>> >>> pointed > >> >>>>> >>> out to > >> >>>>> >>> me at the Cassandra meetup - I think it was Eric Florenzano) > is > >> >>>>> >>> that you can > >> >>>>> >>> use different comparator types for the Super/SubColumns, I > >> >>>>> >>> guess..? > >> >>>>> >>> But you > >> >>>>> >>> should be able to do the same thing by creating your own > Column > >> >>>>> >>> comparator. > >> >>>>> >>> I guess my point is that SuperColumns are mostly a convenience > >> >>>>> >>> mechanism, as > >> >>>>> >>> far as I can tell. > >> >>>>> >>> Mike > >> >>>>> > > >> >>>>> > > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> -- > >> >>>>> Jonathan Ellis > >> >>>>> Project Chair, Apache Cassandra > >> >>>>> co-founder of Riptano, the source for professional Cassandra > support > >> >>>>> http://riptano.com > >> >>>> > >> >>> > >> >> > >> > > >> > > > > > > -- AJ Chen, PhD Chair, Semantic Web SIG, sdforum.org http://web2express.org twitter @web2express Palo Alto, CA, USA