[ https://issues.apache.org/jira/browse/HDFS-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaoqiao He resolved HDFS-15987. -------------------------------- Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Committed to trunk. Thanks [~wanghongbing] for your contributions! > Improve oiv tool to parse fsimage file in parallel with delimited format > ------------------------------------------------------------------------ > > Key: HDFS-15987 > URL: https://issues.apache.org/jira/browse/HDFS-15987 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Hongbing Wang > Assignee: Hongbing Wang > Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: Improve_oiv_tool_001.pdf > > Time Spent: 6.5h > Remaining Estimate: 0h > > The purpose of this Jira is to improve oiv tool to parse fsimage file with > sub-sections (see -HDFS-14617-) in parallel with delmited format. > 1.Serial parsing is time-consuming > The time to serially parse a large fsimage with delimited format (e.g. `hdfs > oiv -p Delimited -t <tmp> ...`) is as follows: > {code:java} > 1) Loading string table: -> Not time consuming. > 2) Loading inode references: -> Not time consuming > 3) Loading directories in INode section: -> Slightly time consuming (3%) > 4) Loading INode directory section: -> A bit time consuming (11%) > 5) Output: -> Very time consuming (86%){code} > Therefore, output is the most parallelized stage. > 2.How to output in parallel > The sub-sections are grouped in order, and each thread processes a group and > outputs it to the file corresponding to each thread, and finally merges the > output files. > 3. The result of a test > {code:java} > input fsimage file info: > 3.4G, 12 sub-sections, 55976500 INodes > ----------------------------------------- > Threads TotalTime OutputTime MergeTime > 1 18m37s 16m18s – > 4 8m7s 4m49s 41s{code} > > > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org