Hi Sujee, HDFS today doesn't consider too much on data center level reliability (although it is supposed to extend to data center layer in topology but never honored in replica policemen/balancer/task scheduling policy) and performance is part of concern to cross data center (assume cross-dc bandwidth is lower than within data center). However, in future, I think we should deliver a solution to enable data center level disaster recovery even performance is downgrade. My several years experience in delivering enterprise software is: it is best to let customer to make trade-off decision on performance and reliability, and engineering effort is to provide options. BTW, HDFS HA is a protection of key nodes from SPOF but not handle the whole data center shutdown.
Thanks, Junping ----- Original Message ----- From: "Sujee Maniyam" <su...@sujee.net> To: "hdfs-dev" <hdfs-dev@hadoop.apache.org> Sent: Tuesday, September 11, 2012 7:29:39 AM Subject: data center aware hadoop? HI devs now that hfds HA is is a reality, how about HDFS spanning multiple data centers? Are there any discussions / work going on in this area? It could be a single cluster spanning multiple data centers or having a 'standby cluster' in another data center. curious, and thanks for your time! regards Sujee Maniyam http://sujee.net