Kannan Rajah created HADOOP-11905:
-------------------------------------

             Summary: Abstraction for LocalDirAllocator
                 Key: HADOOP-11905
                 URL: https://issues.apache.org/jira/browse/HADOOP-11905
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs
    Affects Versions: 2.5.2
            Reporter: Kannan Rajah
            Assignee: Kannan Rajah
             Fix For: 2.7.1


There are 2 abstractions used to write data to local disk.
LocalDirAllocator: Allocate paths from a set of configured local directories.
LocalFileSystem/RawLocalFileSystem: Read/write using java.io.* and java.nio.*

In the current implementation, local disk is managed by guest OS and not HDFS. 
The proposal is to provide a new abstraction that encapsulates the above 2 
abstractions and hides who manages the local disks. This enables us to provide 
an alternate implementation where a DFS can manage the local disks and it can 
be accessed using HDFS APIs. This means the DFS maintains a namespace for node 
local directories and can create paths that are guaranteed to be present on a 
specific node.

Here is an example use case for Shuffle: When a mapper writes intermediate data 
using this new implementation, it will continue write to local disk. When a 
reducer needs to access data from a remote node, it can use HDFS APIs with a 
path that points to that node’s local namespace instead of having to use HTTP 
server to transfer the data across nodes.

New Abstractions
1. LocalDiskPathAllocator
Interface to get file/directory paths from the local disk namespace.
This contains all the APIs that are currently supported by LocalDirAllocator. 
So we just need to change LocalDirAllocator to implement this new interface.

2. LocalDiskUtil
Helper class to get a handle to LocalDiskPathAllocator and the FileSystem
that is used to manage those paths.
By default, it will return LocalDirAllocator and LocalFileSystem.
A supporting DFS can return DFSLocalDirAllocator and an instance of DFS.

3. DFSLocalDirAllocator
This is a generic implementation. An allocator is created for a specific node. 
It uses Configuration object to get user configured base directory and appends 
the node hostname to it. Hence the returned paths are within the node local 
namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to