I appreciate to let cassandra core data model clear and pure.
On Tue, May 11, 2010 at 5:20 AM, Mike Malone <m...@simplegeo.com> wrote: > On Mon, May 10, 2010 at 1:38 PM, AJ Chen <ajc...@web2express.org> wrote: > >> Could someone confirm this discussion is not about abandoning supercolumn >> family? I have found modeling data with supercolumn family is actually an >> advantage of cassadra compared to relational database. Hope you are going to >> drop this important concept. How it's implemented internally is a different >> matter. >> > > SuperColumns are useful as a convenience mechanism. That's pretty much it. > There's _nothing_ (as far as I can tell) that you can do with SuperColumns > that you can't do by manually concatenating key names with a separator on > the client side and implementing a custom comparator on the server (as ugly > as that is). > > This discussion is about getting rid of SuperColumns and adding a more > generic mechanism that will actually be useful and interesting and will > continue to be convenient for the types of use cases for which people use > SuperColumns. > > If there's a particular use case that you feel you can only implement with > SuperColumns, please share! I honestly can't think of any. > > Mike > > >> On Mon, May 10, 2010 at 10:08 AM, Jonathan Shook <jsh...@gmail.com>wrote: >> >>> Agreed >>> >>> On Mon, May 10, 2010 at 12:01 PM, Mike Malone <m...@simplegeo.com> >>> wrote: >>> > On Mon, May 10, 2010 at 9:52 AM, Jonathan Shook <jsh...@gmail.com> >>> wrote: >>> >> >>> >> I have to disagree about the naming of things. The name of something >>> >> isn't just a literal identifier. It affects the way people think about >>> >> it. For new users, the whole naming thing has been a persistent >>> >> barrier. >>> > >>> > I'm saying we shouldn't be worried too much about coming up with names >>> and >>> > analogies until we've decided what it is we're naming. >>> > >>> >> >>> >> As for your suggestions, I'm all for simplifying or generalizing the >>> >> "how it works" part down to a more generalized set of operations. I'm >>> >> not sure it's a good idea to require users to think in terms building >>> >> up a fluffy query structure just to thread it through a needle of an >>> >> API, even for the simplest of queries. At some point, the level of >>> >> generic boilerplate takes away from the semantic hand rails that >>> >> developers like. So I guess I'm suggesting that "how it works" and >>> >> "how we use it" are not always exactly the same. At least they should >>> >> both hinge on a common conceptual model, which is where the naming >>> >> becomes an important anchoring point. >>> > >>> > If things are done properly, client libraries could expose simplified >>> query >>> > interfaces without much effort. Most ORMs these days work by building a >>> > propositional directed acyclic graph that's serialized to SQL. This >>> would >>> > work the same way, but it wouldn't be converted into a 4GL. >>> > Mike >>> > >>> >> >>> >> Jonathan >>> >> >>> >> On Mon, May 10, 2010 at 11:37 AM, Mike Malone <m...@simplegeo.com> >>> wrote: >>> >> > Maybe... but honestly, it doesn't affect the architecture or >>> interface >>> >> > at >>> >> > all. I'm more interested in thinking about how the system should >>> work >>> >> > than >>> >> > what things are called. Naming things are important, but that can >>> happen >>> >> > later. >>> >> > Does anyone have any thoughts or comments on the architecture I >>> >> > suggested >>> >> > earlier? >>> >> > >>> >> > Mike >>> >> > >>> >> > On Mon, May 10, 2010 at 8:36 AM, Schubert Zhang <zson...@gmail.com> >>> >> > wrote: >>> >> >> >>> >> >> Yes, the "column" here is not appropriate. >>> >> >> Maybe we need not to create new terms, in Google's Bigtable, the >>> term >>> >> >> "qualifier" is a good one. >>> >> >> >>> >> >> On Thu, May 6, 2010 at 3:04 PM, David Boxenhorn <da...@lookin2.com >>> > >>> >> >> wrote: >>> >> >>> >>> >> >>> That would be a good time to get rid of the confusing "column" >>> term, >>> >> >>> which incorrectly suggests a two-dimensional tabular structure. >>> >> >>> >>> >> >>> Suggestions: >>> >> >>> >>> >> >>> 1. A hypercube (or hypocube, if only two dimensions): replace >>> "key" >>> >> >>> and >>> >> >>> "column" with "1st dimension", "2nd dimension", etc. >>> >> >>> >>> >> >>> 2. A file system: replace "key" and "column" with "directory" and >>> >> >>> "subdirectory" >>> >> >>> >>> >> >>> 3. A tuple tree: "Column family" replaced by top-level tuple, >>> whose >>> >> >>> value >>> >> >>> is the set of keys, whose value is the set of supercolumns of the >>> key, >>> >> >>> whose >>> >> >>> value is the set of columns for the supercolumn, etc. >>> >> >>> >>> >> >>> 4. Etc. >>> >> >>> >>> >> >>> On Thu, May 6, 2010 at 2:28 AM, Mike Malone <m...@simplegeo.com> >>> >> >>> wrote: >>> >> >>>> >>> >> >>>> Nice, Ed, we're doing something very similar but less generic. >>> >> >>>> Now replace all of the various methods for querying with a simple >>> >> >>>> query >>> >> >>>> interface that takes a Predicate, allow the user to specify (in >>> >> >>>> storage-conf) which levels of the nested Columns should be >>> indexed, >>> >> >>>> and >>> >> >>>> completely remove Comparators and have people subclass Column / >>> >> >>>> implement >>> >> >>>> IColumn and we'd really be on to something ;). >>> >> >>>> Mock storage-conf.xml: >>> >> >>>> <Column Name="ThingThatsNowKey" Indexed="True" >>> >> >>>> ClusterPartitioned="True" Type="UTF8"> >>> >> >>>> <Column Name="ThingThatsNowColumnFamily" >>> DiskPartitioned="True" >>> >> >>>> Type="UTF8"> >>> >> >>>> <Column Name="ThingThatsNowSuperColumnName" Type="Long"> >>> >> >>>> <Column Name="ThingThatsNowColumnName" Indexed="True" >>> >> >>>> Type="ASCII"> >>> >> >>>> <Column Name="ThingThatCantCurrentlyBeRepresented"/> >>> >> >>>> </Column> >>> >> >>>> </Column> >>> >> >>>> </Column> >>> >> >>>> </Column> >>> >> >>>> Thrift: >>> >> >>>> struct NamePredicate { >>> >> >>>> 1: required list<binary> column_names, >>> >> >>>> } >>> >> >>>> struct SlicePredicate { >>> >> >>>> 1: required binary start, >>> >> >>>> 2: required binary end, >>> >> >>>> } >>> >> >>>> struct CountPredicate { >>> >> >>>> 1: required struct predicate, >>> >> >>>> 2: required i32 count=100, >>> >> >>>> } >>> >> >>>> struct AndPredicate { >>> >> >>>> 1: required Predicate left, >>> >> >>>> 2: required Predicate right, >>> >> >>>> } >>> >> >>>> struct SubColumnsPredicate { >>> >> >>>> 1: required Predicate columns, >>> >> >>>> 2: required Predicate subcolumns, >>> >> >>>> } >>> >> >>>> ... OrPredicate, OtherUsefulPredicates ... >>> >> >>>> query(predicate, count, consistency_level) # Count here would >>> be >>> >> >>>> total >>> >> >>>> count of leaf values returned, whereas CountPredicate specifies a >>> >> >>>> column >>> >> >>>> count for a particular sub-slice. >>> >> >>>> Not fully baked... but I think this could really simplify stuff >>> and >>> >> >>>> make >>> >> >>>> it more flexible. Downside is it may give people enough rope to >>> hang >>> >> >>>> themselves, but at least the predicate stuff is easily >>> distributable. >>> >> >>>> I'm thinking I'll play around with implementing some of this >>> stuff >>> >> >>>> myself if I have any free time in the near future. >>> >> >>>> Mike >>> >> >>>> >>> >> >>>> On Wed, May 5, 2010 at 2:04 PM, Jonathan Ellis < >>> jbel...@gmail.com> >>> >> >>>> wrote: >>> >> >>>>> >>> >> >>>>> Very interesting, thanks! >>> >> >>>>> >>> >> >>>>> On Wed, May 5, 2010 at 1:31 PM, Ed Anuff <e...@anuff.com> wrote: >>> >> >>>>> > Follow-up from last weeks discussion, I've been playing around >>> >> >>>>> > with a >>> >> >>>>> > simple >>> >> >>>>> > column comparator for composite column names that I put up on >>> >> >>>>> > github. I'd >>> >> >>>>> > be interested to hear what people think of this approach. >>> >> >>>>> > >>> >> >>>>> > http://github.com/edanuff/CassandraCompositeType >>> >> >>>>> > >>> >> >>>>> > Ed >>> >> >>>>> > >>> >> >>>>> > On Wed, Apr 28, 2010 at 12:52 PM, Ed Anuff <e...@anuff.com> >>> wrote: >>> >> >>>>> >> >>> >> >>>>> >> It might make sense to create a CompositeType subclass of >>> >> >>>>> >> AbstractType for >>> >> >>>>> >> the purpose of constructing and comparing these types of >>> >> >>>>> >> "composite" >>> >> >>>>> >> column >>> >> >>>>> >> names so that if you could more easily do that sort of thing >>> >> >>>>> >> rather >>> >> >>>>> >> than >>> >> >>>>> >> having to concatenate into one big string. >>> >> >>>>> >> >>> >> >>>>> >> On Wed, Apr 28, 2010 at 10:25 AM, Mike Malone >>> >> >>>>> >> <m...@simplegeo.com> >>> >> >>>>> >> wrote: >>> >> >>>>> >>> >>> >> >>>>> >>> The only thing SuperColumns appear to buy you (as someone >>> >> >>>>> >>> pointed >>> >> >>>>> >>> out to >>> >> >>>>> >>> me at the Cassandra meetup - I think it was Eric Florenzano) >>> is >>> >> >>>>> >>> that you can >>> >> >>>>> >>> use different comparator types for the Super/SubColumns, I >>> >> >>>>> >>> guess..? >>> >> >>>>> >>> But you >>> >> >>>>> >>> should be able to do the same thing by creating your own >>> Column >>> >> >>>>> >>> comparator. >>> >> >>>>> >>> I guess my point is that SuperColumns are mostly a >>> convenience >>> >> >>>>> >>> mechanism, as >>> >> >>>>> >>> far as I can tell. >>> >> >>>>> >>> Mike >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> -- >>> >> >>>>> Jonathan Ellis >>> >> >>>>> Project Chair, Apache Cassandra >>> >> >>>>> co-founder of Riptano, the source for professional Cassandra >>> support >>> >> >>>>> http://riptano.com >>> >> >>>> >>> >> >>> >>> >> >> >>> >> > >>> >> > >>> > >>> > >>> >> >> >> >> -- >> AJ Chen, PhD >> Chair, Semantic Web SIG, sdforum.org >> http://web2express.org >> twitter @web2express >> Palo Alto, CA, USA >> > >