Arjun Mohnot created YARN-11730:
-----------------------------------
Summary: Resourcemanager node reporting enhancement for
unregistered hosts
Key: YARN-11730
URL: https://issues.apache.org/jira/browse/YARN-11730
Project: Hadoop YARN
Issue Type: Improvement
Components: resourcemanager, yarn
Affects Versions: 3.4.0
Environment: Tested on multiple environments:
A. Docker Environment{*}:{*}
* Base OS: *Ubuntu 20.04*
* *Java 8* installed from OpenJDK.
* Docker image includes Hadoop binaries, user configurations, and ports for
YARN services.
* Verified behavior using a Hadoop snapshot in a containerized environment.
* Performed Namenode formatting and validated service interactions through
exposed ports.
* Repo reference:
[arjunmohnot/hadoop-yarn-docker|https://github.com/arjunmohnot/hadoop-yarn-docker/tree/main]
B. Bare-metal Distributed Setup (RedHat Linux){*}:{*}
* Running *Java 8* in a High-Availability (HA) configuration with *Zookeeper*
for locking mechanism.
* Two ResourceManagers (RM) in HA: Failover tested between HA1 and HA2 RM
node, including state retention and proper node state transitions.
* Verified node state transitions during RM failover, ensuring nodes moved
between LOST, ACTIVE, and other states as expected.
Reporter: Arjun Mohnot
Fix For: 3.5.0
h3. Issue Overview
When the ResourceManager (RM) starts, nodes listed in the _"include"_ file are
not immediately reported until their corresponding NodeManagers (NMs) send
their first heartbeat. However, nodes in the _"exclude"_ file are instantly
reflected in the _"Decommissioned Hosts"_ section with a port value -1.
This design creates several challenges:
* {*}Untracked Nodemanagers{*}: During Resourcemanager HA failover or RM
standalone restart, some nodes may not report back, even though they are listed
in the _"include"_ file. These nodes neither appear in the _LOST_ state nor are
they represented in the RM's JMX metrics. This results in an untracked state,
making it difficult to monitor their status. While in HDFS similar behaviour
exists and is marked as {_}"DEAD"{_}.
* {*}Monitoring Gaps{*}: Nodes in the _"include"_ file are not visible until
they send their first heartbeat. This delay impacts real-time cluster
monitoring, leading to a lack of immediate visibility for these nodes in
Resourcemanager's state on the total no. of nodes.
* {*}Operational Impact{*}: These unreported nodes cause operational
difficulties, particularly in automated workflows such as OS Upgrade Automation
(OSUA), node recovery automation, and others where validation depends on nodes
being reflected in JMX as {_}LOST{_}, {_}UNHEALTHY{_}, or {_}DECOMMISSIONED,
etc{_}. Nodes that don't report, however, require hacky workarounds to
determine their accurate status.
h3. Proposed Solution
To address these issues, we propose automatically assigning the _LOST_ state to
any node listed in the _"include"_ file by default at the RM startup or HA
failover. This can be done by marking the node with a special port value
{_}-2{_}, signaling that the node is considered LOST but has not yet been
reported. Whenever a heartbeat is received for that
{color:#de350b}nodeID{color}, it will be transitioned from _LOST_ to
{_}RUNNING{_}, {_}UNHEALTHY{_}, or any other required desired state.
h3. Key implementation points
* Mark Unreported Nodes as LOST: Nodes in the _"include"_ file not part of the
RM active node context should be automatically marked as {_}LOST{_}. This can
be achieved by modifying the _NodesListManager_ under the
{color:#de350b}refreshHostsReader{color} method, invoked during failover, or
manual node refresh operations. This logic should ensure that all unregistered
nodes are moved to the _LOST_ state, with port _-2_ indicating the node is
untracked.
* For non-HA setups, this process can be triggered during RM service startup
to mark nodes as _LOST_ initially, and they will gradually transition to their
desired state when the heartbeat is received.
* Handle Node Heartbeat and Transition: When a node sends its first heartbeat,
the system should verify if the node is listed in
{color:#de350b}getInactiveRMNodes(){color}. If the node exists in the _LOST_
state, the RM should remove it from the inactive list, decrement the _LOST_
node count, and handle the transition back to the active node set.
* This logic can be placed in the state transition method within
{color:#de350b}RMNodeImpl.java{color}, ensuring that nodes transitioned from
_NEW_ to _LOST_ state, and recover gracefully from the _LOST_ state upon
receiving their heartbeat.
h3. Benefits
* {*}Improved Cluster Monitoring{*}: Automatically assigning a _LOST_ state to
nodes listed in the _"include"_ file but not reporting ensures that every node
in the cluster has a well-defined state ({_}ACTIVE{_}, {_}LOST{_},
{_}DECOMMISSIONED{_}, {_}UNHEALTHY, etc{_}). This eliminates any potential gaps
in cluster node visibility and simplifies operational monitoring.
* {*}Better Recovery Management{*}: By marking unreported nodes as {_}LOST{_},
automation can quickly identify which nodes require attention during recovery
efforts to restore cluster health. This prevents confusion between unreachable
nodes and untracked nodes, improving recovery accuracy.
* {*}Enhanced Cluster Stability{*}: This approach improves overall stability
by preventing nodes from slipping into an untracked or unknown state. It
guarantees that the system remains aware of all nodes, reducing issues during
RM failover or restart scenarios.
h3. Additional Considerations
* Feature Flag Control: This feature will be enabled/disabled via a
configuration flag, allowing users to adjust behavior based on their
requirements. By default, it is marked as {_}False{_}.
* Enough Validations: The approach has been well-tested on non-HA and HA
setups, and a dummy docker-based
[setup|https://github.com/arjunmohnot/hadoop-yarn-docker/tree/main] has been
created to replicate the behavior. Added the required unit test cases to
validate the code behavior. Demo
[video|https://drive.google.com/file/d/1okiPe7uMNVMRUnNYtz-B8Igf8FMGr-SJ/view?usp=sharing]
for this change.
Any thoughts/suggestions/feedback are welcome!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]