We are working on a system that has super heavy traffic during specific times... think of sporting events. Other times we will get almost 0 traffic. In order to handle the traffic during the events, we are planning on scaling out cassandra into a very large cluster. The size of our data is still quite small. A single event's data might be 100MB in size max, but we will be inserting that data very rapidly and needing to read it at the same time.
Since we have very slow times, we use a replication factor of 2 and a cluster size of 2 to handle the traffic... it handles it perfectly. Since dataset size is not really an issue, what is the best way to scale out for us? We are using order preserving partitioners to do range scanning, so last time I tried to scale out our cluster we ended up with very uneven load. Then the few nodes that contained that data were very swamped, while the rest were barely touched. Other note is that since we have very little data, and lots of memory, we turned on key and row cache almost as high as we could go. So my question is this... if I bring in 20+ nodes, should I increase the replication factor as well? It would seem to make sense that more replication factor would help distribute load? Or does it just mean that writes take even longer? What are some other suggestions on how to do scale up (and then back down) for a system that gets very high traffic in known small time windows. Let me know if you need more info. Thanks! Ryan