On Sat, May 1, 2010 at 6:34 AM, Rakesh Rajan <rakes...@gmail.com> wrote:
> I am evaluating cassandra to implement activity streams. We currently have
> over 1000000 feeds with total entries exceeding 320000000 implemented using
> redis ( ~320 entries / feed). Would like hear from the community on how to
> use cassandra to solve the following cases:
>
> Ability to fetch entries by applying a few filters ( like show me only likes
> from a given user). This would include range query to support pagination. So
> this would mean indices on a few columns like the feed id, feed type etc.

Sounds like you've got it: you need to denormalize in your app to
other CFs for things that you need "filtered" server-side.  Everything
else you have to filter client-side.

> We have around 3 machines with 4GB RAM for this purpose and thinking of
> having replication factor 2. Would 4GB * 3 be enough for cassandra for this
> kind of data? I read that cassandra does not keep all the data in memory but
> want to be sure that we have the right server config to handle this data
> using cassandra.

Depends on how much of the data is "hot."  Cassandra does not require
all memory to be in memory, but of course if you request data faster
than the disk can keep up then that will be your bottleneck.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Reply via email to