kuper created HDFS-17231:
----------------------------

             Summary: HA: Safemode should exit when resources are from low to 
available
                 Key: HDFS-17231
                 URL: https://issues.apache.org/jira/browse/HDFS-17231
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: ha
    Affects Versions: 3.3.6, 3.3.4
            Reporter: kuper
         Attachments: 企业微信截图_75d15d37-26b7-4d88-ac0c-8d77e358761b.png

The NameNodeResourceMonitor automatically enters safe mode when it detects that 
the resources are not sufficient. When zkfc detects insufficient resources, it 
triggers failover. Consider the following scenario:
Initially, nn01 is active and nn02 is standby. Due to insufficient resources in 
dfs.namenode.name.dir, the NameNodeResourceMonitor detects the resource issue 
and puts nn01 into safemode. Subsequently, zkfc triggers failover.
 * At this point, nn01 is in safemode (ON) and standby, while nn02 is in 
safemode (OFF) and active.
 * After a period of time, the resources in nn01's dfs.namenode.name.dir 
recover, causing a slight instability and triggering failover again.
 * Now, nn01 is in safe mode (ON) and active, while nn02 is in safe mode (OFF) 
and standby.
 * However, since nn01 is active but in safemode (ON), hdfs cannot be read from 
or written to.

!企业微信截图_75d15d37-26b7-4d88-ac0c-8d77e358761b.png!

*reproduction*
 # Increase the dfs.namenode.resource.du.reserved
 # Increase the ha.health-monitor.check-interval.ms can avoid directly 
switching to standby and stopping the NameNodeResourceMonitor thread. Instead, 
it is necessary to wait for the NameNodeResourceMonitor to enter safe mode 
before switching to standby.
 # On the nn01 active node, using the dd command to create a file that exceeds 
the threshold, triggering a low on available disk space condition. 
 # If the nn01 namenode process is not dead, the situation of nn01 safemode 
(ON) and standby occurs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to