Hi All,

New to Cassandra, so apologies if I don't fully grok stuff just yet.

I have data keyed by a key as well as a date. I want to run a query to get
multiple keys across multiple contiguous date ranges simultaneously. I'm
currently storing the date along with the row key like this:

key1|2011-05-15 {  c1 : , c2 :,  c3 : ... }
key1|2011-05-16 {  c1 : , c2 :,  c3 : ... }
key2|2011-05-15 {  c1 : , c2 :,  c3 : ... }
key2|2011-05-16 {  c1 : , c2 :,  c3 : ... }
...

I generate all the key/date combinations that I'm interested in and use
multiget_slice to retrieve them, pulling in all the columns for each key (I
need all the data, but the number of columns is small: less than 100). The
total number of row keys retrieved will only be 100 or so.

Now it strikes me I could also store this using composite columns, like
this:

key1 {  2011-05-15|c1 : , 2011-5-16|c1 : , 2011-05-15|c2 :, 2011-05-16|c2 :
, 2011-05-15|c3 : , 2011-05-16|c3 : , ... }
key2 {  2011-05-15|c1 : , 2011-5-16|c1 : , 2011-05-15|c2 :, 2011-05-16|c2 :
, 2011-05-15|c3 : , 2011-05-16|c3 : , ... }
...

Then use multislice_get again (but with less keys), and use a slice range to
only retrieve the dates I'm interested in.

Another alternative I guess would be to use OPP with the first storage
approach and get_range_slices, but as I understand this would not be great
for performance due to keys being clustered together on a single node?

So my question is, which approach is best? One downside to the latter I
guess is that the number of columns grows without bound (although with 2
billion to play with this isn't gonna be  a problem any time soon). Also
multiget_slice supports only one slice predicate, so I'd guess I'd have to
use multiple queries to get multiple date ranges.

Anyway, any thoughts/tips appreciated.

Thanks,
Charles

Reply via email to