Hongbing Wang created HDFS-15987: ------------------------------------ Summary: Improve oiv tool to parse fsimage file in parallel with delimited format Key: HDFS-15987 URL: https://issues.apache.org/jira/browse/HDFS-15987 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hongbing Wang
The purpose of this Jira is to improve oiv tool to parse fsimage file with sub-sections (see -HDFS-14617-) in parallel with delmited format. 1.Serial parsing is time-consuming The time to serially parse a large fsimage with delimited format (e.g. `hdfs oiv -p Delimited -t <tmp> ...`) is as follows: {code:java} 1) Loading string table: -> Not time consuming. 2) Loading inode references: -> Not time consuming 3) Loading directories in INode section: -> Slightly time consuming (3%) 4) Loading INode directory section: -> A bit time consuming (11%) 5) Output: -> Very time consuming (86%){code} Therefore, output is the most parallelized stage. 2.How to output in parallel The sub-sections are grouped in order, and each thread processes a group and outputs it to the file corresponding to each thread, and finally merges the output files. 3. The result of a test {code:java} input fsimage file info: 3.4G, 12 sub-sections, 55976500 INodes ----------------------------------------- Threads TotalTime OutputTime MergeTime 1 18m37s 16m18s – 4 8m7s 4m49s 41s{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org