featzhang opened a new pull request, #27716:
URL: https://github.com/apache/flink/pull/27716

   ## What is the purpose of the change
   
   This PR adds a new **Node Health** tab to the Flink Web UI under the Job 
Manager page, allowing operators to observe which nodes are currently 
quarantined by the `NodeHealthManager`.
   
   The page calls the existing `GET /cluster/blocklist` REST API (introduced in 
PR-3) and renders a table showing:
   - **Node ID** – the ResourceID of the quarantined node
   - **Cause** – the reason the node was quarantined
   - **Expiration Time** – when the quarantine expires (or "Never" for 
permanent)
   - **Status** – a tag indicating `Quarantined` (active) or `Expired`
   
   ## Brief change log
   
   - Add `BlockedNodeInfo` and `BlocklistResponse` TypeScript interfaces 
(`node-health.ts`)
   - Add `loadBlocklist()` method to `JobManagerService` calling `GET 
/cluster/blocklist`
   - Add `JobManagerNodeHealthComponent` with an `nz-table` displaying node 
health status
   - Register new route `node-health` under job-manager routes
   - Add **Node Health** navigation tab to `JobManagerComponent`
   
   ## Verifying this change
   
   1. Start a Flink cluster with `node.health.enabled: true`
   2. Use the REST API to quarantine a node:
      ```
      POST /cluster/nodes/{nodeId}/quarantine
      { "reason": "manual test", "duration": "10 min" }
      ```
   3. Open the Flink Web UI → Job Manager → **Node Health** tab
   4. Verify the quarantined node appears in the table with correct cause and 
expiration time
   5. After expiration, verify the status tag changes to **Expired**
   
   ## Does this pull request potentially affect one of the following parts?
   
   - [ ] Dependencies (does it add or upgrade a dependency): **no**
   - [ ] The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
   - [ ] The serializers: **no**
   - [ ] The runtime per-record code paths (performance sensitive): **no**
   - [ ] Anything that affects the Flink WebUI: **yes**
   
   ## Documentation
   
   - Does this pull request introduce a new feature? **yes** (Web UI 
observability for node health)
   - If yes, how is the feature documented? Visible in the Web UI; REST API 
documented in PR-3.
   
   ## Depends On
   
   This PR depends on:
   - PR-3: `[FLINK-39176][Runtime] Add REST API for Node Quarantine` – provides 
`GET /cluster/blocklist` endpoint
   - PR-1, PR-2, PR-4, PR-5: NodeHealthManager infrastructure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to