[
https://issues.apache.org/jira/browse/HDFS-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Foley resolved HDFS-1687.
------------------------------
Resolution: Fixed
> HDFS Federation: DirectoryScanner changes for federation
> --------------------------------------------------------
>
> Key: HDFS-1687
> URL: https://issues.apache.org/jira/browse/HDFS-1687
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: data-node
> Affects Versions: Federation Branch
> Reporter: Matt Foley
> Assignee: Matt Foley
> Fix For: Federation Branch
>
> Attachments: HDFS-1687_DirScan_v1.patch
>
>
> DirectoryScanner scans substantially all of the directory tree of entire
> volumes. It needs to be extended to work with Blockpools in Federation.
> Design notes:
> 1. The subdirectories of active bpid's will be scanned. Active bpid's are
> those associated with currently connected Namenodes. Each Volume knows the
> set of all active bpid's, via volume.map.keySet(). I'll add a
> package-private accessor in FSVolume to return the set of active bpid's for
> use by DirectoryScanner, DataBlockScanner, etc. DirectoryScanner will ignore
> inactive bpid's subdirectories; see item below.
> 2. There is no need to compare the volume set of active bpid's with the
> global set, because the way the code works, they really can't be different.
> If differences arise, they will be automatically fixed by the next restart of
> either the Datanode or the Namenode.
> 3. Inactive bpid's will be ignored. Until we are connected to the owner
> Namenode, we cannot know whether a bpid subdirectory is correctly formatted,
> has snapshot data, etc. So it doesn't make sense to try to manage the data
> under an inactive bpid.
> 4. DirectoryScanner is currently instantiated and periodically triggered by
> DataBlockScanner. Other than both being "scanners", these two modules have
> little in common, and the triggering code is confusing. (DirectoryScanner
> scans filesystem directory trees every hour, to detect and fix
> inconsistencies between disk directories and ReplicasMap. DataBlockScanner
> runs every 3 weeks, and traverses all block files, actually reading them out
> and checksumming them to detect block corruption.)
> Separating them, and running DirectoryScanner under its own periodic
> scheduler, is a small change that will make the code much clearer. It
> already runs on its own FixedThreadPool Executor, so it is easy to change it
> to a ScheduledThreadPool, and instantiate it from DataNode.postStartInit() at
> the same time as initBlockScanner() is called.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira