Re: Tracking word frequencies

David Tinker Mon, 20 Jan 2014 04:59:43 -0800

I haven't actually tried to use that schema yet, it was just my first idea.
If we use that solution our app would have to read the whole table once a
day or so to find the top 5000'ish words.



On Fri, Jan 17, 2014 at 2:49 PM, Jonathan Lacefield <jlacefi...@datastax.com
> wrote:

> Hi David,
>
>   How do you know that you are receiving a seek for each row?  Are you
> querying for a specific word at a time or do the queries span multiple
> words, i.e. what's the query pattern? Also, what is your goal for read
> latency?  Most customers can achieve microsecond partition key base query
> reads with Cassanda.  This can be done through tuning, data modeling,
> and/or scaling.  Please post a cfhistograms for this table as well as
> provide some details on the specific queries you are running.
>
> Thanks,
>
> Jonathan
>
> Jonathan Lacefield
> Solutions Architect, DataStax
> (404) 822 3487
>  <http://www.linkedin.com/in/jlacefield>
>
>
>
> <http://www.datastax.com/what-we-offer/products-services/training/virtual-training>
>
>
> On Fri, Jan 17, 2014 at 1:41 AM, David Tinker <david.tin...@gmail.com>wrote:
>
>> I have an app that stores lots of bits of text in Cassandra. One of
>> the things I need to do is keep a global word frequency table.
>> Something like this:
>>
>> CREATE TABLE IF NOT EXISTS word_count (
>>   word text,
>>   count value,
>>   PRIMARY KEY (word)
>> );
>>
>> This is slow to read as the rows (100's of thousands of them) each
>> need a seek. Is there a better way to model this in Cassandra? I could
>> periodically snapshot the rows into a fat row in another table I
>> suppose.
>>
>> Or should I use Redis or something instead? I would prefer to keep it
>> all Cassandra if possible.
>>
>
>


-- 
http://qdb.io/ Persistent Message Queues With Replay and #RabbitMQ
Integration

Re: Tracking word frequencies

Reply via email to