Well to be honest I was thinking of using that connection in
production, not for a backup node.
For productions. there are several problems. Added network latency which
is inconsistent and vary greatly during day, sometimes you will face
network lags which will break cluster for a while (about 1-2 minutes).
Also network bandwidth is problem especially during peak hours. It might
not be problem if you dont have interactive workload - app can wait,
human cant. Be sure to use connection pooling to different servers at
client. Over WAN you can have about 4:1 ratio in available bw in peak
hours/night hours. - You need to schedule antientropy repairs at nights.
My Cassandra deployment works just like an expensive file caching and
replication - I mean, all I use it for is to replicate some 5million
files of 2M each across few nodes and intensively read/write.
for mass replication of large files hadoop is really better then
cassandra because there are no compactions.
Not only the files themselves but I also need to attach some tags to
each file (see them as key=value) so I though of Haadop but in the end
settle for Cassandra because of better consistency, community support,
no single point of failure and some!
hadoop is far better then cassandra for batch processing if your batch
processing changes majority of data set. SPOF is not problem, but it is
way harder to write optimised applications for hadoop, its kinda low level.