On Sun, Mar 6, 2011 at 1:39 PM, Mimi Aluminium <mimi.alumin...@gmail.com> wrote: > Are you familiar with Cassandra cluster that is installed in datacenters > that are spread across the WAN? can you comment on the perfromance of such > installation? > What is the largest size of of such a cluster you are aware of?
Digg operated a 40 node cassandra 0.6.1 to 0.6.6+1072 cluster in two datacenters, one on each coast of the US. It worked fine, with the usual caveats that come with that sort of network latency. We used a custom snitch and rackaware, but the implementation in 0.6.x was insufficiently robust and ended up disabled in favor of simple snitch and rack unaware with the nodes simply alternating data centers on the ring. As we only read from one half of the cluster, this meant that we often only had one local replica of our data. We recently moved this cluster from two physical DCs to one. Other than the network trickery involved in keeping IP addresses the same, the only negative aspect of the WAN in this case was the bottleneck while copying the data. As long as you have decent latency and throughput across the WAN link, cassandra should be fine. This should be especially true in 0.7 and with the DynamicEndpointSnitch enabled. I have clearly used a number of weasel words in this summary, you should of course test for your case, especially if that case involves more than two datacenters. =Rob