I had complains that my preceding mail was unreadable (thanks gmailfor fucking my formatting up), so I've posted the same thing with nice formatting on the JIRA ticket.
-- Sylvain On Tue, Jan 3, 2012 at 7:08 PM, Sylvain Lebresne <sylv...@datastax.com> wrote: > Ok, I think I'm warming up to what we're getting at. I would change > thesyntax of the VALUE() thing however. Instead of:CREATE TABLE > timeline ( userid int, posted_at uuid, body string, PRIMARY > KEY(user_id, posted_at), VALUE(body))I would prefer:CREATE COMPACT > TABLE timeline ( userid int, posted_at uuid, body string, > PRIMARY KEY(user_id, posted_at),) > The reasons being that it really influences the implementation layout > of theCF in C*. Namely, the non-compact CF defined by CREATE TABLE > timeline ( userid int, posted_at uuid, body string, PRIMARY > KEY(user_id, posted_at),)would look in C* like:<userid> : { > <posted_at>:'body' -> <value>}while the COMPACT variant would > be:<userid> : { <posted_at> -> <value>}which is using the fact that > there is only 1 field not part of the key to"optimize" the layout. And > I believe making the COMPACT part of the CREATEemphasis better that > it's a property of the definition itself (that cannot bechanged) > rather than of that specific 'body' field. It also make the rule > forCOMPACT table rather simple: "a compact table should have only one > field notpart of the primary key"; you don't have to deal with errors > like someonedefining two VALUE() for instance. > > That being said, I'd like to try to resume where we're at (including > theCOMPACT change above) and add a few random ideas along the way. > Please correctme if I've got something wrong. > I think we have 4 different cases, 2 for 'standard' CF without > composites:- static CFs (the only case CQL handle really well today)- > dynamic CFs (wide rows, time series if you prefer)and 2 for CF with > composite column names:- 'dense' composite (typically time series but > where the key is naturally multi-parts)- 'sparse' composite (aka > super columns) > Let me try to take an example for which, with how it would > translateinternally and example queries. > > Standard "static" CF-------------------- > "For each user, holds his infos" > CREATE TABLE users ( userid uuid PRIMARY KEY, firstname text, > lastname text, age int) > In C*:<userid> : { 'firstname' -> <value> 'lastname' -> <value> > 'age' -> <value>} > Query:SELECT firstname, lastname FROM users WHERE userid = '...'; > Standard "dynamic" CF--------------------- > "For each user, keep each url he clicked on with the date of last click" > CREATE COMPACT TABLE clicks ( userid uuid, url text, > timestamp date PRIMARY KEY (userid, url)) > In C*:<userid> : { <url> -> <timestamp>} > Query:SELECT url, timestamp FROM clicks WHERE userid = '...';SELECT > timestamp FROM clicks WHERE userid = '...' and url = 'http://...'; > 'dense' composite----------------- > "For each user, keep ip and port from where he connected with the date > of lastconnection" > CREATE COMPACT TABLE connections ( userid uuid, ip binary, > port int, timestamp date PRIMARY KEY (userid, ip, port)) > In C*:<userid> : { <ip>:<port> -> <timestamp>} > Query:SELECT ip, port, timestamp FROM connections WHERE userid = '...'; > 'sparse' composite------------------ > "User timeline" > CREATE TABLE timeline ( userid uuid, posted_at date, body > text, posted_by text, PRIMARY KEY (user_id, posted_at),); > In C*:<userid> : { <posted_at>:'body' -> <value> > <posted_at>:'posted_by' -> <value>} > Query:SELECT body, posted_by FROM timeline WHERE userid = '...' and > posted_at = '2 janvier 2010' > Note: I think we really should also be able to do queries like:SELECT > posted_ad, body, posted_by FROM timeline WHERE userid = '...' and > posted_at > '2 janvier 2010'but that's more akin to the modification > of the syntax for slices. > > Random other ideas------------------ > 1) We could allow something like: CONSTRAINT key PRIMARY KEY > (userid, ip, port) which would then allow to write SELECT > timestamp FROM users WHERE key = ('...', 192.168.0.1, 80); (I believe > this is the 'standard' notation to name a 'composite' key in SQL) > - Above we're ony handling the use of composites for column names, but > they can be useful for value (and row keys) and it could be nice to > have an easy notation for that (clearly a following ticket however). > What about: > CREATE COMPACT TABLE timeline ( userid_part1 text, userid_part2 > int, posted_at date, posted_by uuid, body text header text > GROUP (userid_part1, userid_part2) AS userid, PRIMARY KEY > (userid, posted_at, posted_by) GROUP (header, body)) > In C*:<userid_part1>:<userid_part2> : { <posted_at>:<posted_by> -> > <header>:<body>} > Query:SELECT posted_at, posted_by, body, header FROM timeline WHERE > userid = ('john', 32) > > -- > Sylvain > On Mon, Jan 2, 2012 at 8:29 PM, Eric Evans <eev...@acunu.com> wrote: >> On Mon, Jan 2, 2012 at 12:55 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >>> On Mon, Jan 2, 2012 at 10:53 AM, Eric Evans <eev...@acunu.com> wrote: >>>> In SQL, PRIMARY KEY is a modifier to a column spec, and here PRIMARY >>>> KEY(user_id, posted_at, posted_by) reads like a PRIMARY modifier >>>> applied to a KEY() function. It's also a little strange the way it >>>> appears in the grouping of column specs, when it's actually defining a >>>> grouping or relationship of them (maybe this is what you meant about >>>> using TRANSPOSED WITH <options> to emphasize the non-standard). >>> >>> Fear not, I can set your mind at ease. :) >>> >>> Personally I think the syntax works reasonably well in its own right, >>> but my main reason for the proposed syntax is that it is actually >>> standard SQL for composite primary keys at least as far back as SQL >>> 92, as a subcategory of table constraints. The SQL standard is not >>> freely linkable, but see >>> http://www.postgresql.org/docs/9.1/static/sql-createtable.html for a >>> real-world example. >> >> OK, I stand corrected (and my mind is at ease :) ). >> >> >> -- >> Eric Evans >> Acunu | http://www.acunu.com | @acunu