On Mon, Oct 3, 2011 at 9:14 AM, Pierre-Yves Ritschard <p...@smallrivers.com> wrote: > Unfortunately there's no way to do this in Cassandra right now, except > by using another row as index, like you're doing right now. > > Of course you could also store by source_id.date and have a batch job > iterate over all sources to compute the top 100. It would not be real > time any more though.
Indexes are used to trade-off some insert performance for write performance. The index you describe is optimal for reads, so writes take a hit. As Pierre says, the only way to maintain an index in Cassandra is to read, delete and insert on every increment. This is how secondary indexes work under the hood in Cassandra, although they are not implemented for counters. It's more expensive for counters though since a counter read is in general more expensive. So to speed up inserts, you have to take the hit on reads. The other extreme is to not build an index at all and read in all the counters and sort on the client. But given you have 10,000s of counters, this will be slow, but inserts are optimal. A batch job will work too, provided you are happy to have it non-real time, or slightly out of date. Richard. -- Richard Low Acunu | http://www.acunu.com | @acunu