On Fri, Dec 30, 2011 at 12:30 PM, Eric Evans <eev...@acunu.com> wrote:
>> CREATE TABLE timeline (
>>    user_id int,
>>    posted_at uuid,
>>    body string,
>>    posted_by string,
>>    PRIMARY KEY(user_id, posted_at, posted_by),
>>    VALUE(body)
>> );
>
> I think the value declaration also helps in that it's one more thing
> that provides cues as to the data model it creates (more expressive).
> But this got me thinking, why not introduce something special for the
> composite name as well?  That way the PRIMARY KEY syntax (which comes
> preloaded with meaning and expectations) could be kept more SQLish,
> and the whole thing looks more like an extension to the language as
> opposed to a modification.
>
> Say:
>
> CREATE TABLE timeline (
>  user_id int PRIMARY KEY,
>  posted_at uuid,
>  body text,
>  posted_by text,
>  COMPOSITE_NAME(posted_at, posted_by),
>  COMPOSITE_VALUE(body)
> )

I went back and forth on this mentally, but I come down as -0 on CN
instead of PK.  For two reasons:

First, the composite PRIMARY KEY is a better description of what you
can actually do with the data.  In a relational model, a PK of user_id
means there is only one (user_id, posted_at, body, posted_by) row with
a given user_id.  Which is not the case here.  PK = (row key +
composite components) captures exactly what is "immutable and unique"
in a given object, so it's actually exactly what it's meant for and
not an abuse at all.  (It even fits nicely with the "queries involving
the PK are always indexed" assumption that isn't required by the SQL
standard but every other database does anyway because it makes the
most sense.)

The only place where we do violence to relational expectations by
using PK this way is that "insert should raise an error if there's
already a row with that PK" (instead of updating that row the way we
do).  But, this is already a quirk we inflict on the use of PK, and I
don't think it's a big deal.

Second, it feels like this bleeds too much of the implementation into
the DDL.  What if we move to a storage engine like CASSANDRA-678 and
we represent this as more like the traditional relational rows than
composite columns?  Unlikely, I know, but the spirit of SQL is to
specify what you want to do and let the engine figure out how to do it
most efficiently.

Following the second line of reasoning made me realize that using
TRANSPOSED WITH <options> for things besides the PRIMARY KEY
definition [i.e., the COMPOSITE VALUE option] has the advantage of
emphasizing that it's a non-standard option that allows the
implementation to bleed through into the DDL.  But, I can see the
advantage in regularity of having COMPOSITE VALUE adjacent to the PK
definition as well.  So I'm fine either way, but I wanted to point
that out.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Reply via email to