Well to be honest I was thinking of using that connection in production, not for a backup node.

For productions. there are several problems. Added network latency which is inconsistent and vary greatly during day, sometimes you will face network lags which will break cluster for a while (about 1-2 minutes). Also network bandwidth is problem especially during peak hours. It might not be problem if you dont have interactive workload - app can wait, human cant. Be sure to use connection pooling to different servers at client. Over WAN you can have about 4:1 ratio in available bw in peak hours/night hours. - You need to schedule antientropy repairs at nights.

My Cassandra deployment works just like an expensive file caching and replication - I mean, all I use it for is to replicate some 5million files of 2M each across few nodes and intensively read/write.

for mass replication of large files hadoop is really better then cassandra because there are no compactions.

Not only the files themselves but I also need to attach some tags to each file (see them as key=value) so I though of Haadop but in the end settle for Cassandra because of better consistency, community support, no single point of failure and some!

hadoop is far better then cassandra for batch processing if your batch processing changes majority of data set. SPOF is not problem, but it is way harder to write optimised applications for hadoop, its kinda low level.

Reply via email to