+1 for making this patch go into 0.21. thanks, dhruba
On Fri, Jan 22, 2010 at 10:25 AM, Todd Lipcon <t...@cloudera.com> wrote: > Hi Steve, > > All of the below may be good ideas, but I don't think they're relevant to > the discussion at hand. Specifically, none of them can enter 0.21 without a > vote as they'd be new features, and it doesn't even sound like there's a > JIRA out for them yet. Let's not put off a well-known improvement patch > waiting for one that doesn't even exist yet. If we want to get the ideas > below into 22 or a later version, let's open a JIRA and discuss there > rather > than using this vote thread. > > As for the patch, I'm +1. It certainly is a large improvement on small > clusters - without it, in a three node cluster, you cannot successfully > kill > a DN while doing an fs -put, even if your min.replication is 1. As Ryan > mentioned above, this is a huge problem since new users may evaluate Hadoop > on a 3-node cluster, figure "hey, let's see fault tolerance in action" and > then be entirely put off when their kill -9 takes the cluster to a > screeching halt. > > Thanks > -Todd > > On Fri, Jan 22, 2010 at 7:32 AM, Steve Loughran <ste...@apache.org> wrote: > > > Stack wrote: > > > > I'm being 0 on this > > > > -I would worry if the exclusion list was used by the NN to do its > > blacklisting, I'm glad to see this isn't happening. Yes, you could pick > up > > datanode failure faster, but you would also be vulnerable to a user doing > a > > DoS against the cluster by reporting every DN as failing > > > > -Russ Perry's work on high-speed Hadoop rendering [1] tweaked Hadoop to > > allow the datanodes to get the entire list of nodes holding the data, and > > allowed them to make their own decision about where to get the data from. > > This > > 1. pushed the policy of handling failure down to the clients, less need > to > > talk to the NN about it. > > 2. lets you do something very fancy where you deliberately choose data > > from different DNs, so that you can then pull data off the cluster at the > > full bandwidth of every disk > > > > Long term, I would like to see Russ's addition go in, so worry if the > > HDFS-630 patch would be useful long term. Maybe its a more fundamental > > issue: where does the decision making go, into the clients or into the > NN? > > > > -steve > > > > > > > > [1] http://www.hpl.hp.com/techreports/2009/HPL-2009-345.html > > > -- Connect to me at http://www.facebook.com/dhruba