NN should verify images and edit logs on startup
------------------------------------------------
Key: HDFS-903
URL: https://issues.apache.org/jira/browse/HDFS-903
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Critical
I was playing around with corrupting fsimage and edits logs when there are
multiple dfs.name.dirs specified. I noticed that:
* As long as your corruption does not make the image invalid, eg changes an
opcode so it's an invalid opcode HDFS doesn't notice and happily uses a corrupt
image or applies the corrupt edit.
* If the first image in dfs.name.dir is "valid" it replaces the other copies in
the other name.dirs, even if they are different, with this first image, ie if
the first image is actually invalid/old/corrupt metadata than you've lost your
valid metadata, which can result in data loss if the namenode garbage collects
blocks that it thinks are no longer used.
How about we maintain a checksum as part of the image and edit log and check
those on startup and refuse to startup if they are different. Or at least
provide a configuration option to do so if people are worried about the
overhead of maintaining checksums of these files. Even if we assume
dfs.name.dir is reliable storage this guards against operator errors.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.