I'm just setting up a Cassandra cluster for my company. For a variety of reasons, we have the servers that run our hadoop jobs in our local office and our production machines in a collocated data center. We don't want to run hadoop jobs against cassandra servers on the other side of the US from us, not to mention that we don't want them impacting performance in production. What's the best way to handle this?
My first instinct is to add some servers locally to the node and use NetworkTopologyStrategy. This way, the servers automatically get updated with the latest changes, and we get a bit of extra redundancy for our production machine. Of course, the glaring weakness of this strategy is that our stats servers aren't in a datacenter with any kind of production guarantees. The network connection is relatively slow and unreliable, the servers may go out at any time, and I generally don't want to tie our production performance or reliability to these servers. Is this as dumb an idea as I suspect it is, or can this be made to work? :-) Are there any better ways to accomplish what I'm trying to accomplish?