On Fri, Dec 30, 2011 at 12:30 PM, Eric Evans <eev...@acunu.com> wrote: >> CREATE TABLE timeline ( >> user_id int, >> posted_at uuid, >> body string, >> posted_by string, >> PRIMARY KEY(user_id, posted_at, posted_by), >> VALUE(body) >> ); > > I think the value declaration also helps in that it's one more thing > that provides cues as to the data model it creates (more expressive). > But this got me thinking, why not introduce something special for the > composite name as well? That way the PRIMARY KEY syntax (which comes > preloaded with meaning and expectations) could be kept more SQLish, > and the whole thing looks more like an extension to the language as > opposed to a modification. > > Say: > > CREATE TABLE timeline ( > user_id int PRIMARY KEY, > posted_at uuid, > body text, > posted_by text, > COMPOSITE_NAME(posted_at, posted_by), > COMPOSITE_VALUE(body) > )
I went back and forth on this mentally, but I come down as -0 on CN instead of PK. For two reasons: First, the composite PRIMARY KEY is a better description of what you can actually do with the data. In a relational model, a PK of user_id means there is only one (user_id, posted_at, body, posted_by) row with a given user_id. Which is not the case here. PK = (row key + composite components) captures exactly what is "immutable and unique" in a given object, so it's actually exactly what it's meant for and not an abuse at all. (It even fits nicely with the "queries involving the PK are always indexed" assumption that isn't required by the SQL standard but every other database does anyway because it makes the most sense.) The only place where we do violence to relational expectations by using PK this way is that "insert should raise an error if there's already a row with that PK" (instead of updating that row the way we do). But, this is already a quirk we inflict on the use of PK, and I don't think it's a big deal. Second, it feels like this bleeds too much of the implementation into the DDL. What if we move to a storage engine like CASSANDRA-678 and we represent this as more like the traditional relational rows than composite columns? Unlikely, I know, but the spirit of SQL is to specify what you want to do and let the engine figure out how to do it most efficiently. Following the second line of reasoning made me realize that using TRANSPOSED WITH <options> for things besides the PRIMARY KEY definition [i.e., the COMPOSITE VALUE option] has the advantage of emphasizing that it's a non-standard option that allows the implementation to bleed through into the DDL. But, I can see the advantage in regularity of having COMPOSITE VALUE adjacent to the PK definition as well. So I'm fine either way, but I wanted to point that out. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com