In CQL we need to decide between using ((customer_id,type),date) as the CQL 
primary key for a reporting table, versus ((customer_id,date),type).

We store reports for every day.  If we use (customer_id,type) as the partition 
key (physical key), then we have  a WIDE ROW where each date's data is stored 
in a different column. Over time, as new reports are added for different dates, 
the row will get wider and wider, and I thought that might cause more work for 
compaction.

So, would a partition key of (customer_id,date) yield better compaction 
behavior?

Again, if we use (customer_id,type) as the partition key, then over time, as 
new columns are added to that row for different dates, I'd think that 
compaction would have to merge new data for a given physical row from multiple 
sstables. That would make compaction expensive.  But if we use 
(customer_id,date) as the partition key, then new data will be added to new 
physical rows, and so compaction would have less work to do????

My question is really about how compaction interacts with partition keys.  
Someone on the Cassandra irc channel, 
http://webchat.freenode.net/?channels=#cassandra, said that when partition keys 
overlap between sstables, there's only "slightly" more work to do than when 
they don't, for merging sstables in compaction.  So he thought the first form,  
((customer_id,type),date),  would be better.

One advantage of the first form, ((customer_id,type),date) ,  is that we can 
get all report data for all dates for a given customer and type in a single 
wide row  -- and we do have a (uncommon) use case for such reports.

If we used a primary key of ((customer_id,type,date)), then the rows would be 
un-wide; that wouldn't take advantage of clustering columns and (like the 
second form) wouldn't support the (uncommon) use case mentioned in the previous 
paragraph.

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.com<mailto:dona...@audiencescience.com>

[AudienceScience]

<<inline: image001.jpg>>

Reply via email to