If your summary data is frequently accessed, you will probably be best off
storing the two sets of data separately (either in separate column families
or with different key-prefixes). This will give you the greatest
cache-locality for your summary data, which you say is popular. If your
summary data is very well cached, then it won't matter that it's might
require two disk-seeks to get summary+details, because your summary data is
usually in cache anyhow.

If you want a more specific recommendation that that, we'd need to see
answers to the following questions:

(a) how big is the summary data (total, per row)? (average, max)
(b) how big is the detail data (total, per row)?     (average, max)
(b) what is the read/write traffic to the summary data? ..detail data?

A side note about caches... IMO, you're better off getting the cache
behavior you want through physical ordering than through more explicit
caching. This is because most modern databases (cassandra included) go
through the OS buffer cache already, and there is some amount of
duplicating of data involved in trying to application cache data. If your
application cache hitrate is very high (90%+) this can work out, but if
it's lower (50%) it can sometimes have poor effects on the cache efficiency
of both the application cache and OS buffer cache (because of data being
duplicated in both caches).

Reply via email to