We are working on a system that has super heavy traffic during specific
times... think of sporting events.  Other times we will get almost 0
traffic.  In order to handle the traffic during the events, we are planning
on scaling out cassandra into a very large cluster.   The size of our data
is still quite small.  A single event's data might be 100MB in size max, but
we will be inserting that data very rapidly and needing to read it at the
same time.

Since we have very slow times, we use a replication factor of 2 and a
cluster size of 2 to handle the traffic... it handles it perfectly.

Since dataset size is not really an issue, what is the best way to scale out
for us?  We are using order preserving partitioners to do range scanning, so
last time I tried to scale out our cluster we ended up with very uneven
load.  Then the few nodes that contained that data were very swamped, while
the rest were barely touched.

Other note is that since we have very little data, and lots of memory, we
turned on key and row cache almost as high as we could go.

So my question is this... if I bring in 20+ nodes, should I increase the
replication factor as well?  It would seem to make sense that more
replication factor would help distribute load?  Or does it just mean that
writes take even longer?  What are some other suggestions on how to do scale
up (and then back down) for a system that gets very high traffic in known
small time windows.



Let me know if you need more info.

Thanks!
Ryan

Reply via email to