Sungdong Kim created HDFS-17573: ----------------------------------- Summary: Add test code for FSImage parallelization and compression Key: HDFS-17573 URL: https://issues.apache.org/jira/browse/HDFS-17573 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs, namenode Affects Versions: 3.4.1 Reporter: Sungdong Kim Fix For: 3.4.1
The feature added HDFS-14617(in Improve FSImage load time by writing sub-sections to the FSImage index. by [Stephen O'Donnell|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sodonnell]) makes loading FSImage very faster. But this option cannot be activated when turn on dfs.image.compress=true. In my opinion, larger clusters require both settings at the same time. For Example, the cluster I'm using has approximately 6 million file system objects and FSImage is approximately 11GB with dfs.image.compress=true setting. If turn off the dfs.image.compress option, it is expected to exceed 30GB, in which case it will take a long time to move FSImage from standby to active namenode using high network resource. It was proved in this jira(HDFS-16147 by [kinit|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei]) that loading FSImage parallel and FSImage compression can be turned on at the same time. (And worked well on my environment also.) I created this new Jira and PR because the discussion in HDFS-16147 ended in 2021, and I want it to be officially added in the next release, instead of patch available. The actual code of the patch was written by [kinit|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei] and I resolved empty sub-section problem(see below comment of HDFS-16147) and added test code. If this is not a proper method, please let me know another way to contribute. Thanks. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org