Andre Araujo created HDFS-12643:
-----------------------------------

             Summary: HDFS maintenance state behaviour is confusing and not 
well documented
                 Key: HDFS-12643
                 URL: https://issues.apache.org/jira/browse/HDFS-12643
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: documentation, namenode
            Reporter: Andre Araujo


The current implementation of the HDFS maintenance state feature is confusing 
and error-prone. The documentation is missing important information that's 
required for the correct use of the feature.

For example, if the Hadoop admin wants to put a single node in maintenance 
state, he/she can add a single entry to the maintenance file with the contents:

{code}
{
   "hostName": "host-1.example.com",
   "adminState": "IN_MAINTENANCE",
   "maintenanceExpireTimeInMS": 1507663698000
}
{code}

Let's say now that the actual maintenance finished well before the set 
expiration time and the Hadoop admin wants to bring the node back to NORMAL 
state. It would be natural to simply change the state of the node, as show 
below, and run another refresh:

{code}
{
   "hostName": "host-1.example.com",
   "adminState": "NORMAL"
}
{code}

The configuration file above, though, not only take the node {{host-1}} out of 
maintenance state but it also *blacklists all the other DataNodes*. This 
behaviour seems inconsistent to me and is due to {{emptyInServiceNodeLists}} 
being set to {{false}} 
[here|https://github.com/apache/hadoop/blob/230b85d5865b7e08fb7aaeab45295b5b966011ef/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CombinedHostFileManager.java#L80]
 only when there is at least one node with {{adminState = NORMAL}} listed in 
the file.

I believe that it would be more consistent, and less error prone, to simply 
implement the following:
* If the dfs.hosts file is empty, all nodes are allowed and in normal state
* If the file is not empty, any host *not* listed in the file is *blacklisted*, 
regardless of the state of the hosts listed in the file.

Regardless of the implementation being changed or not, the documentation also 
needs to be updated to ensure the readers know of the caveats mentioned above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to