arjunmohnot commented on code in PR #7049:
URL: https://github.com/apache/hadoop/pull/7049#discussion_r1763997947


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java:
##########
@@ -1608,9 +1608,19 @@ protected void serviceStart() throws Exception {
     // Non HA case, start after RM services are started.
     if (!this.rmContext.isHAEnabled()) {
       transitionToActive();
+
+      // Refresh node state at the service startup to reflect the unregistered
+      // nodemanagers as LOST if the tracking for unregistered nodes flag is 
enabled.
+      // For HA setup, refreshNodes is already being called during the 
transition.
+      Configuration yarnConf = getConfig();
+      if (yarnConf.getBoolean(
+          YarnConfiguration.ENABLE_TRACKING_FOR_UNREGISTERED_NODES,
+          YarnConfiguration.DEFAULT_ENABLE_TRACKING_FOR_UNREGISTERED_NODES)) {
+        this.rmContext.getNodesListManager().refreshNodes(yarnConf);

Review Comment:
   Hey @zeekling, thanks for your question! After reviewing potential edge 
cases and comparing the existing implementation, here’s a summary of the 
different scenarios:
   
   ### Unregistered Lost Node Definition
   - A node is marked as _LOST_ when listed in the _"include"_ file but not 
registered in the ResourceManager's active node context, and is also not part 
of the _"exclude"_ file during startup or HA failover. The unregistered lost 
node is indicated by a port value of -2.
   
   ### **Case 1**: Node Marked as LOST, Heartbeat Received
   - When a node is marked as **LOST**, a lost event is dispatched, adding the 
node to the **active** and **inactive** node maps of the RMContext.
   - If the node sends a heartbeat afterward, the transition method in 
`RMNodeImpl` checks for the same hostname of that node with **port -2** (LOST 
state).
   - If found, the nodeID is removed from the `rmContext` and re-registered 
with the desired port of the NM.
   - It also decrements the LOST node counter and increments the ACTIVE node 
counter, ensuring a clean state of transitions.
   
   ### **Case 2**: **Rare Scenario** - Race Condition
   - A race condition may occur if the **ResourceTrackerService** starts before 
the RM starts processing the unregistered lost nodes, and the NodeManager (NM) 
sends its heartbeat **quickly** in parallel.
     - **Example**:
       - When fetching nodes from `rmContext`, an NM (say **host1**) may not 
initially be present in the context.
       - Before this operation completes, **host1** may send a heartbeat and 
get registered with a valid port.
       - Meanwhile, the RM could still attempt to mark the same NM (host1) as 
LOST with port -2 as it was not registered while querying the context, 
resulting in two entries for the same host: one as ACTIVE and another as LOST.
   
   ### Details on Case-2
   - I reviewed the code and found that for a node to register, the 
`ResourceTracker` service must start during service startup. In HA mode, nodes 
only register once the RM becomes active.
   - The current implementation for HA calls the `refreshNodes` function before 
the `transitionToActive` method, which rules out the race condition for HA 
setup since all unregistered nodes are dispatched first.
     
   For standalone setups, there was a slight oversight. However, during 
testing, I did not encounter or replicate this issue, as the NM heartbeat can 
take time to register while the nodes were marked as LOST beforehand. With the 
recent changes, I am making the `refreshNodes` operation before the service 
starts. This change ensures that unregistered nodes are consistently marked as 
LOST with **port -2** first. Only afterwards NMs can register themselves with a 
proper port during heartbeat reception when the `ResourceTrackerService` starts 
the server (Triggered during RM service start).
   
   ### Conclusion
   The updated change should guarantee that the nodes are properly marked as 
LOST **before** any heartbeats are processed. This eliminates the chance of 
falsely reporting nodes as LOST. I’ve validated this behavior through logs, 
which show that the lost event is dispatched first, and then the heartbeats 
from NMs received after service start.
   
   Let me know if this clears things up! 😊
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to