When you say decent latency and throughput what numbers do you consider decent? I know throughput would be highly dependent on the quantity of kb shoved through the pipe so I would expect throughput needs would be highly dependent on the data actually in cassandra.
Thanks for the info, I am getting ready for a multisite deployment and any bit of information will help us qualify or disqualify network vendors with regards to what we anticipate our needs to be. Lewis On Mar 7, 2011, at 10:25 AM, Robert Coli wrote: > On Sun, Mar 6, 2011 at 1:39 PM, Mimi Aluminium <mimi.alumin...@gmail.com> > wrote: >> Are you familiar with Cassandra cluster that is installed in datacenters >> that are spread across the WAN? can you comment on the perfromance of such >> installation? >> What is the largest size of of such a cluster you are aware of? > > Digg operated a 40 node cassandra 0.6.1 to 0.6.6+1072 cluster in two > datacenters, one on each coast of the US. It worked fine, with the > usual caveats that come with that sort of network latency. We used a > custom snitch and rackaware, but the implementation in 0.6.x was > insufficiently robust and ended up disabled in favor of simple snitch > and rack unaware with the nodes simply alternating data centers on the > ring. As we only read from one half of the cluster, this meant that we > often only had one local replica of our data. We recently moved this > cluster from two physical DCs to one. Other than the network trickery > involved in keeping IP addresses the same, the only negative aspect of > the WAN in this case was the bottleneck while copying the data. > > As long as you have decent latency and throughput across the WAN link, > cassandra should be fine. This should be especially true in 0.7 and > with the DynamicEndpointSnitch enabled. I have clearly used a number > of weasel words in this summary, you should of course test for your > case, especially if that case involves more than two datacenters. > > =Rob