I'm moving this to hdfs-dev, where it belongs. There are so many failure scenarios that trying to shoe horn this functionality just isn't going to work in any realistic way.
What happens when the local DataNode process dies? What happens when the local DataNode process has partial thread death so that it is stuck in a pseudo-alive state? What happens when the local DataNode process can't write the requested blocks because it is out of space? ... There are lots of reasons why requesting that specific blocks are stored locally for a task/application/whatever is full of danger. If this feature is absolutely required, then the best bet is likely to set the default replication factor the same as the HDFS node count. This has significant scaling limitations however. On May 26, 2011, at 12:49 PM, Jason Rutherglen wrote: >> Keep in mind there's a fair bit of subtlety to it -- eg what happens >> if you have two racks: A with 2 replicas, and B with one replica. A >> node in rack A requests a local replica. In this case we have to make >> sure that we move one of the A replicas and not the B replica (ie we >> must respect the NN's rack replication policy). > > Yes, good point. Also, I wonder how HDFS handles what will be over > replication of the file (meaning will it try to delete the over > replicated blocks, in which case we'd need to ensure [somehow] this > doesn't happen). > > On Thu, May 26, 2011 at 12:30 PM, Todd Lipcon <t...@cloudera.com> wrote: >> On Thu, May 26, 2011 at 12:02 PM, Jason Rutherglen >> <jason.rutherg...@gmail.com> wrote: >>> Todd, thanks! >>> >>>> In general, though, keep in mind that, whenever you write data, you'll >>>> get a local copy first, if the writer is in the cluster. That's how >>>> HBase gets locality for most of its accesses >>> >>> Right. However in the failover scenario where a node goes down >>> (hardware failure, or either of the processes, such as the DataNode, >>> RegionServer, etc), then I think the new RS will not have local data? >>> We could first make a request that all necessary HDFS files go local >>> prior to the new RS being available. At least for search to work this >>> is a requirement. >> >> Yep, we've thrown this idea around before in the past, but not sure if >> there's an HBASE JIRA for it or not. >> >>> >>>> There are some non-public APIs to do this -- have a look at how the >>>> Balancer works - the dispatch() function is the guts you're looking >>>> for. It might be nice to expose this functionality as a "limited >>>> private evolving" API >>> >>> Perhaps simply mark them as 'expert' or make them package private? >>> I'll work on a patch. >> >> Sounds good. >> >> Keep in mind there's a fair bit of subtlety to it -- eg what happens >> if you have two racks: A with 2 replicas, and B with one replica. A >> node in rack A requests a local replica. In this case we have to make >> sure that we move one of the A replicas and not the B replica (ie we >> must respect the NN's rack replication policy). >> >> -Todd >> >>> On Thu, May 26, 2011 at 11:40 AM, Todd Lipcon <t...@cloudera.com> wrote: >>>> Hey Jason, >>>> >>>> There are some non-public APIs to do this -- have a look at how the >>>> Balancer works - the dispatch() function is the guts you're looking >>>> for. It might be nice to expose this functionality as a "limited >>>> private evolving" API. >>>> >>>> In general, though, keep in mind that, whenever you write data, you'll >>>> get a local copy first, if the writer is in the cluster. That's how >>>> HBase gets locality for most of its accesses. >>>> >>>> -Todd >>>> >>>> On Thu, May 26, 2011 at 11:36 AM, Jason Rutherglen >>>> <jason.rutherg...@gmail.com> wrote: >>>>> Is there a way to send a request to the name node to replicate >>>>> block(s) to a specific DataNode? If not, what would be a way to do >>>>> this? -Thanks >>>>> >>>> >>>> >>>> >>>> -- >>>> Todd Lipcon >>>> Software Engineer, Cloudera >>>> >>> >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >>