> tweaked Hadoop to allow the datanodes to get the entire list are you referring to datanodes or dfs clients here?
The client already gets the entire list of replica locations for a block from the namenode. and one could always develop a DFS client that is free to choose whatever locations it decides to pick up the data from, isn't it? thanks, dhruba On Fri, Jan 22, 2010 at 7:32 AM, Steve Loughran <ste...@apache.org> wrote: > Stack wrote: > > I'm being 0 on this > > -I would worry if the exclusion list was used by the NN to do its > blacklisting, I'm glad to see this isn't happening. Yes, you could pick up > datanode failure faster, but you would also be vulnerable to a user doing a > DoS against the cluster by reporting every DN as failing > > -Russ Perry's work on high-speed Hadoop rendering [1] tweaked Hadoop to > allow the datanodes to get the entire list of nodes holding the data, and > allowed them to make their own decision about where to get the data from. > This > 1. pushed the policy of handling failure down to the clients, less need to > talk to the NN about it. > 2. lets you do something very fancy where you deliberately choose data > from different DNs, so that you can then pull data off the cluster at the > full bandwidth of every disk > > Long term, I would like to see Russ's addition go in, so worry if the > HDFS-630 patch would be useful long term. Maybe its a more fundamental > issue: where does the decision making go, into the clients or into the NN? > > -steve > > > > [1] http://www.hpl.hp.com/techreports/2009/HPL-2009-345.html > -- Connect to me at http://www.facebook.com/dhruba