[jira] [Resolved] (HDFS-17573) Allow turn on both FSImage parallelization and compression

Xiaoqiao He (Jira) Sun, 25 Aug 2024 02:53:05 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-17573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xiaoqiao He resolved HDFS-17573.
--------------------------------
       Fix Version/s:     (was: 3.4.1)
        Hadoop Flags: Reviewed
    Target Version/s:   (was: 3.4.1, 3.5.0)
          Resolution: Fixed

> Allow turn on both FSImage parallelization and compression
> ----------------------------------------------------------
>
>                 Key: HDFS-17573
>                 URL: https://issues.apache.org/jira/browse/HDFS-17573
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs, namenode
>    Affects Versions: 3.4.1
>            Reporter: Sungdong Kim
>            Assignee: Sungdong Kim
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.5.0
>
>         Attachments: compressed-image-load-serial.png, 
> compressed-subsection-image-load-parallel.png, 
> compressed-subsection-image-load-serial.png
>
>
> The feature added HDFS-14617(in Improve FSImage load time by writing 
> sub-sections to the FSImage index. by [Stephen 
> O'Donnell|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sodonnell])
>  makes loading FSImage very faster.
>  
> But this option cannot be activated when turn on dfs.image.compress=true.
> In my opinion, larger clusters require both settings at the same time.
> For Example, the cluster I'm using has approximately 6 million file system 
> objects and FSImage is approximately 11GB with dfs.image.compress=true 
> setting.
> If turn off the dfs.image.compress option, it is expected to exceed 30GB, in 
> which case it will take a long time to move FSImage from standby to active 
> namenode using high network resource.
>  
> It was proved in this jira(HDFS-16147 by 
> [kinit|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei]) 
> that loading FSImage parallel and FSImage compression can be turned on at the 
> same time.  (And worked well on my environment also.)
> I created this new Jira and PR because the discussion in HDFS-16147 ended in 
> 2021, and I want it to be officially added in the next release, instead of 
> patch available.
> The actual code of the patch was written by 
> [kinit|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mofei] and 
> I resolved empty sub-section problem(see below comment of HDFS-16147) and 
> added test code.
> If this is not a proper method, please let me know another way to contribute.
> Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (HDFS-17573) Allow turn on both FSImage parallelization and compression

Reply via email to