BELUGA BEHR created HDFS-13448:
----------------------------------
Summary: HDFS Block Placement - Ignore Locality
Key: HDFS-13448
URL: https://issues.apache.org/jira/browse/HDFS-13448
Project: Hadoop HDFS
Issue Type: New Feature
Components: block placement, hdfs-client
Affects Versions: 3.0.1, 2.9.0
Reporter: BELUGA BEHR
According to the HDFS Block Place Rules:
{quote}
/**
* The replica placement strategy is that if the writer is on a datanode,
* the 1st replica is placed on the local machine,
* otherwise a random datanode. The 2nd replica is placed on a datanode
* that is on a different rack. The 3rd replica is placed on a datanode
* which is on a different node of the rack as the second replica.
*/
{quote}
However, there is a hint for the hdfs-client that allows the block placement
request to not put a block replica on the local datanode _where 'local' means
the same host as the client is being run on._
{quote}
/**
* Advise that a block replica NOT be written to the local DataNode where
* 'local' means the same host as the client is being run on.
*
* @see CreateFlag#NO_LOCAL_WRITE
*/
{quote}
I propose that we add a new flag that allows the hdfs-client to request that
the first block replica be placed on a random DataNode in the cluster. The
subsequent block replicas should follow the normal block placement rules.
The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block
replica is not placed on the local node, but it is still placed on the local
rack. Where this comes into play is where you have, for example, a flume agent
that is loading data into HDFS.
If the Flume agent is running on a DataNode, then by default, the DataNode
local to the Flume agent will always get the first block replica and this leads
to un-even block placements, with the local node always filling up faster than
any other node in the cluster.
Modifying this example, if the DataNode is removed from the host where the
Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then
the default block placement policy will still prefer the local rack. This
remedies the situation only so far as now the first block replica will always
be distributed to a DataNode on the local rack.
This new flag would allow a single Flume agent to distribute the blocks
randomly, evenly, over the entire cluster instead of hot-spotting the local
node or the local rack.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]