Hongbing Wang created HDFS-15987:
------------------------------------

             Summary: Improve oiv tool to parse fsimage file in parallel with 
delimited format
                 Key: HDFS-15987
                 URL: https://issues.apache.org/jira/browse/HDFS-15987
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Hongbing Wang


The purpose of this Jira is to improve oiv tool to parse fsimage file with 
sub-sections (see -HDFS-14617-) in parallel with delmited format. 

1.Serial parsing is time-consuming

The time to serially parse a large fsimage with delimited format (e.g. `hdfs 
oiv -p Delimited -t <tmp> ...`) is as follows: 
{code:java}
1) Loading string table:                 -> Not time consuming.
2) Loading inode references:             -> Not time consuming
3) Loading directories in INode section: -> Slightly time consuming (3%)
4) Loading INode directory section:      -> A bit time consuming (11%)
5) Output:                               -> Very time consuming (86%){code}
Therefore, output is the most parallelized stage.

2.How to output in parallel

The sub-sections are grouped in order, and each thread processes a group and 
outputs it to the file corresponding to each thread, and finally merges the 
output files.

3. The result of a test
{code:java}
 input fsimage file info:
 3.4G, 12 sub-sections, 55976500 INodes
 -----------------------------------------
 Threads TotalTime OutputTime MergeTime
 1       18m37s     16m18s      –
 4        8m7s      4m49s       41s{code}
 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to