[ 
https://issues.apache.org/jira/browse/HDFS-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley resolved HDFS-1687.
------------------------------

    Resolution: Fixed

> HDFS Federation: DirectoryScanner changes for federation
> --------------------------------------------------------
>
>                 Key: HDFS-1687
>                 URL: https://issues.apache.org/jira/browse/HDFS-1687
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: Federation Branch
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>             Fix For: Federation Branch
>
>         Attachments: HDFS-1687_DirScan_v1.patch
>
>
> DirectoryScanner scans substantially all of the directory tree of entire 
> volumes.  It needs to be extended to work with Blockpools in Federation.  
> Design notes:
> 1. The subdirectories of active bpid's will be scanned.  Active bpid's are 
> those associated with currently connected Namenodes.  Each Volume knows the 
> set of all active bpid's, via volume.map.keySet().  I'll add a 
> package-private accessor in FSVolume to return the set of active bpid's for 
> use by DirectoryScanner, DataBlockScanner, etc.  DirectoryScanner will ignore 
> inactive bpid's subdirectories; see item below.  
> 2. There is no need to compare the volume set of active bpid's with the 
> global set, because the way the code works, they really can't be different.  
> If differences arise, they will be automatically fixed by the next restart of 
> either the Datanode or the Namenode.
> 3. Inactive bpid's will be ignored.  Until we are connected to the owner 
> Namenode, we cannot know whether a bpid subdirectory is correctly formatted, 
> has snapshot data, etc.  So it doesn't make sense to try to manage the data 
> under an inactive bpid.
> 4. DirectoryScanner is currently instantiated and periodically triggered by 
> DataBlockScanner.  Other than both being "scanners", these two modules have 
> little in common, and the triggering code is confusing.  (DirectoryScanner 
> scans filesystem directory trees every hour, to detect and fix 
> inconsistencies between disk directories and ReplicasMap.  DataBlockScanner 
> runs every 3 weeks, and traverses all block files, actually reading them out 
> and checksumming them to detect block corruption.)
> Separating them, and running DirectoryScanner under its own periodic 
> scheduler, is a small change that will make the code much clearer.  It 
> already runs on its own FixedThreadPool Executor, so it is easy to change it 
> to a ScheduledThreadPool, and instantiate it from DataNode.postStartInit() at 
> the same time as initBlockScanner() is called.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to