Unfortunately there's no way to do this in Cassandra right now, except by using another row as index, like you're doing right now.
Of course you could also store by source_id.date and have a batch job iterate over all sources to compute the top 100. It would not be real time any more though. - pyr On Sun, 02 Oct 2011 15:01:20 -0400 Mike Peters <cassan...@softwareprojects.com> wrote: > Any ideas? > > > Thanks, > Mike Peters > > On 10/1/2011 1:19 AM, Mike Peters wrote: > > > > > > Hi, > > > > We're using Cassandra 0.8 counters in production and loving it! > > > > One issue we're running into is we need an efficient mechanism to > > retrieve the "top 100" results, sorted by count values. > > > > We have tens of thousands of counters growing rapidly (one counter > > per each combination of date.source_id). What we're looking for > > is, what's the best way to retrieve the top 100 "sources" for a > > given date, without having to iterate through all counters created > > for that date? > > > > Right now to accomplish this, we are managing an inverted index of > > count values. This is very inefficient and kills our write > > performance, because after every counter-increment, we have to read > > its value and store it into an inverted index that looks like this: > > > > Key, CounterName > > 000005 2011-10-01.source1 > > 000009 2011-10-01.source2 > > 000010 2011-10-01.source3 > > > > If source2 just generated 100 "hits", we need to delete the row > > with the key of "000009" from the inverted index and insert a new > > one with the new counter value for source2: > > > > Key, CounterName > > 000005 2011-10-01.source1 > > 000010 2011-10-01.source3 > > 000109 2011-10-01.source2 > > > > The additional reads and deletes are killing our performance. > > > > Any one has any ideas about a more efficient way to utilize > > counters and support "top 100" results? > > > > Looking forward to any ideas and feedback you can share. > > > > > > Thanks, > > Mike Peters > > >