On Sun, Mar 6, 2011 at 1:39 PM, Mimi Aluminium <mimi.alumin...@gmail.com> wrote:
> Are you familiar with Cassandra cluster that is installed in datacenters
> that are spread across the WAN? can you comment on the perfromance of such
> installation?
> What is the largest size of of such a cluster you are aware of?

Digg operated a 40 node cassandra 0.6.1 to 0.6.6+1072 cluster in two
datacenters, one on each coast of the US. It worked fine, with the
usual caveats that come with that sort of network latency. We used a
custom snitch and rackaware, but the implementation in 0.6.x was
insufficiently robust and ended up disabled in favor of simple snitch
and rack unaware with the nodes simply alternating data centers on the
ring. As we only read from one half of the cluster, this meant that we
often only had one local replica of our data. We recently moved this
cluster from two physical DCs to one. Other than the network trickery
involved in keeping IP addresses the same, the only negative aspect of
the WAN in this case was the bottleneck while copying the data.

As long as you have decent latency and throughput across the WAN link,
cassandra should be fine. This should be especially true in 0.7 and
with the DynamicEndpointSnitch enabled. I have clearly used a number
of weasel words in this summary, you should of course test for your
case, especially if that case involves more than two datacenters.

=Rob

Reply via email to