I haven't actually tried to use that schema yet, it was just my first idea. If we use that solution our app would have to read the whole table once a day or so to find the top 5000'ish words.
On Fri, Jan 17, 2014 at 2:49 PM, Jonathan Lacefield <jlacefi...@datastax.com > wrote: > Hi David, > > How do you know that you are receiving a seek for each row? Are you > querying for a specific word at a time or do the queries span multiple > words, i.e. what's the query pattern? Also, what is your goal for read > latency? Most customers can achieve microsecond partition key base query > reads with Cassanda. This can be done through tuning, data modeling, > and/or scaling. Please post a cfhistograms for this table as well as > provide some details on the specific queries you are running. > > Thanks, > > Jonathan > > Jonathan Lacefield > Solutions Architect, DataStax > (404) 822 3487 > <http://www.linkedin.com/in/jlacefield> > > > > <http://www.datastax.com/what-we-offer/products-services/training/virtual-training> > > > On Fri, Jan 17, 2014 at 1:41 AM, David Tinker <david.tin...@gmail.com>wrote: > >> I have an app that stores lots of bits of text in Cassandra. One of >> the things I need to do is keep a global word frequency table. >> Something like this: >> >> CREATE TABLE IF NOT EXISTS word_count ( >> word text, >> count value, >> PRIMARY KEY (word) >> ); >> >> This is slow to read as the rows (100's of thousands of them) each >> need a seek. Is there a better way to model this in Cassandra? I could >> periodically snapshot the rows into a fat row in another table I >> suppose. >> >> Or should I use Redis or something instead? I would prefer to keep it >> all Cassandra if possible. >> > > -- http://qdb.io/ Persistent Message Queues With Replay and #RabbitMQ Integration