[ 
https://issues.apache.org/jira/browse/FLINK-39617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18078728#comment-18078728
 ] 

Herbert Wang commented on FLINK-39617:
--------------------------------------

I am opening this ticket to discuss the API shape before submitting any 
implementation.

>   Add batch REST endpoints for aggregated subtask metrics across multiple job 
> vertices
> --------------------------------------------------------------------------------------
>
>                 Key: FLINK-39617
>                 URL: https://issues.apache.org/jira/browse/FLINK-39617
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Metrics, Runtime / REST
>            Reporter: Herbert Wang
>            Priority: Major
>              Labels: Metrics, metrics, rest_api
>
> The JobManager REST API currently exposes aggregated subtask metrics per job 
> vertex via:
> {code}
> GET /jobs/:jobid/vertices/:vertexid/subtasks/metrics
> {code}
> Clients that need the same metric set for many vertices, such as autoscalers 
> or monitoring integrations, must issue one request per vertex for metric-name 
> discovery and another request per vertex for metric values. For jobs with 
> many vertices this creates avoidable REST fan-out, repeated MetricFetcher 
> updates, and large repeated payloads.
> h2. Proposal
> Add two batch JobManager REST endpoints for aggregated subtask metrics across 
> multiple vertices:
> {code}
> POST /jobs/:jobid/vertices/subtasks/metrics/names
> POST /jobs/:jobid/vertices/subtasks/metrics/values
> {code}
> The existing single-vertex endpoint should remain unchanged for compatibility.
> The endpoints are intentionally split rather than using one POST endpoint 
> with mode-switching behavior, so OpenAPI schemas, code generation, and 
> capability detection remain straightforward.
> h3. Name discovery endpoint
> Request:
> {code:json}
> {
>   "vertexIds": ["<jobVertexId>", "<jobVertexId>"],
>   "regex": [".*busyTime.*", ".*numRecords.*"]
> }
> {code}
> Response:
> {code:json}
> [
>   {
>     "vertexId": "<jobVertexId>",
>     "metrics": [{ "id": "busyTimeMsPerSecond" }]
>   }
> ]
> {code}
> h3. Value aggregation endpoint
> Request:
> {code:json}
> {
>   "vertices": [
>     { "vertexId": "<jobVertexId>", "metrics": ["busyTimeMsPerSecond"] },
>     { "vertexId": "<jobVertexId>", "metrics": ["numRecordsInPerSecond"] }
>   ],
>   "agg": ["min", "max", "avg"]
> }
> {code}
> Response:
> {code:json}
> [
>   {
>     "vertexId": "<jobVertexId>",
>     "metrics": [{ "id": "busyTimeMsPerSecond", "min": 0.0, "max": 1.0, "avg": 
> 0.5 }]
>   }
> ]
> {code}
> h2. Compatibility
> This is additive. The existing endpoint remains unchanged:
> {code}
> GET /jobs/:jobid/vertices/:vertexid/subtasks/metrics
> {code}
> Clients can feature-detect the new endpoints and fall back to the existing 
> per-vertex endpoint when unavailable, or we can cherry-pick to earlier 2.x 
> versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to