dfs.*.dir should not default to /tmp (or other typically volatile storage)
--------------------------------------------------------------------------
Key: HDFS-1960
URL: https://issues.apache.org/jira/browse/HDFS-1960
Project: Hadoop HDFS
Issue Type: Improvement
Components: data-node
Affects Versions: 0.20.2
Environment: *nix systems
Reporter: philo vivero
Priority: Critical
The hdfs-site.xml file possibly will not define one or both of:
dfs.name.dir
dfs.data.dir
If they are not specified, data is stored in /tmp. This is extremely dangerous.
Rationale: the cluster will work fine for days, possibly even weeks, before
blocks will start to go missing. Rebooting a datanode on common Linux systems
will clear all the data from that node. There is no documented way (that I'm
aware of) to recover the situation. The cluster must be completely obliterated
and rebuilt from scratch.
Better reactions to the missing configuration parameters:
1. DataNode dies on startup and asks that these parameters be defined.
2. Default is /var/db/hadoop (or some other non-volatile storage location).
Naturally, inability to write into that directory leads to DataNode to die on
startup, logging error.
The first solution would be most likely preferred by typical enterprise
sysadmins. The second solution is suboptimal (since /var/db/hadoop might not be
the optimal location for the data) but is still preferable to the current
implementation, since it will less often lead to an irretrievably corrupt
cluster.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira