Aaron T. Myers created HDFS-3835:
------------------------------------
Summary: Long-lived 2NN cannot perform a checkpoint if security is
enabled and the NN restarts without outstanding delegation tokens
Key: HDFS-3835
URL: https://issues.apache.org/jira/browse/HDFS-3835
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node, security
Affects Versions: 2.0.0-alpha
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
When the 2NN wants to perform a checkpoint, it figures out the highest
transaction ID of the fsimage files on the NN, and if the 2NN has a copy of
that fsimage file (because it created that merged fsimage file the last time it
did a checkpoint) then the 2NN won't download the fsimage file from the NN, and
instead only gets the new edits files from the NN. In this case, the 2NN also
doesn't even bother reloading the fsimage file it has from disk, since it has
all of the namespace state in-memory. This all works just fine.
When the 2NN _doesn't_ have a copy of the relevant fsimage file (for example,
if the NN had restarted since the last checkpoint) then the 2NN blows away its
in-memory namespace state, downloads the fsimage file from the NN, and loads
the newly-downloaded fsimage file from disk. The bug is that when the 2NN
clears its in-memory state, it only resets the namespace, but not the
delegation token map.
The fix is pretty simple - just make the delegation token map get cleared as
well as the namespace state when a running 2NN needs to load a new fsimage from
disk.
Credit to Stephen Chu for identifying this issue.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira