Hi all, I hope that everyone has had sufficient time to review the discussion and the updates on the FLIP doc. If there are no objections, I'll start a voting thread in 2 days.
Best, Mason On Thu, Jan 18, 2024 at 2:39 PM Mason Chen <mas.chen6...@gmail.com> wrote: > Hi Lijie, > > That's also a possibility but I would prefer to keep it consistent with > how the existing metric APIs are used. > > For example, in the current metric APIs [1], there is no way to figure out > the vertexid and subtaskindex without getting the job graph from > `/jobs/<jobid>/plan` and correspondingly there are no APIs to return a map > of all metrics for every vertex and to return a map of all metrics for > every subtask. Essentially, the plan API is already required to use the > finer-grained metric apis. > > In addition, keeping the design similar lends itself better for the > implementation. The metric handler utilities [2] assume a > ComponentMetricStore is returned rather than a Map. > > I've updated the FLIP doc ( > https://cwiki.apache.org/confluence/display/FLINK/FLIP-417%3A+Expose+JobManagerOperatorMetrics+via+REST+API) > with our discussion so far. > > [1] > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#rest-api-integration > [2] > https://github.com/apache/flink/blob/a41229b24d82e8c561350c42d8a98dfb865c3f69/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/metrics/AbstractMetricsHandler.java#L109 > > Best, > Mason > > On Wed, Jan 17, 2024 at 8:28 AM Lijie Wang <wangdachui9...@gmail.com> > wrote: > >> Hi Mason, >> >> Thanks for driving the discussion. +1 for the proposal. >> >> How about we return all operator metrics in a vertex? (the response is a >> map of operatorId/operatorName -> operator-metrics). Correspondingly, the >> url may be changed to /jobs/<jobid>/vertices/<vertexid>/operator-metrics >> >> In this way, users can skip the step of obtaining the operator id. >> >> Best, >> Lijie >> >> Hang Ruan <ruanhang1...@gmail.com> 于2024年1月17日周三 10:31写道: >> >> > Hi, Mason. >> > >> > The field `operatorName` in JobManagerOperatorQueryScopeInfo refers to >> the >> > fields in OperatorQueryScopeInfo and chooses the operatorName instead of >> > OperatorID. >> > It is fine by my side to change from opertorName to operatorID in this >> > FLIP. >> > >> > Best, >> > Hang >> > >> > Mason Chen <mas.chen6...@gmail.com> 于2024年1月17日周三 09:39写道: >> > >> > > Hi Xuyang and Hang, >> > > >> > > Thanks for your support and feedback! See my responses below: >> > > >> > > 1. IIRC, in a sense, operator ID and vertex ID are the same thing. The >> > > > operator ID can >> > > > be converted from the vertex ID[1]. Therefore, it is somewhat >> strange >> > to >> > > > have both vertex >> > > > ID and operator ID in a single URL. >> > > > >> > > I think Hang explained it perfectly. Essentially, a vertix may contain >> > one >> > > or more operators so the operator ID is required to distinguish this >> > case. >> > > >> > > 2. If I misunderstood the semantics of operator IDs here, then what is >> > the >> > > > relationship >> > > > between vertex ID and operator ID, and do we need a url like >> > > > `/jobs/<jobid>/vertices/<vertexid>/operators/` >> > > > to list all operator ids under this vertices? >> > > > >> > > Good question, we definitely need expose operator IDs through the REST >> > API >> > > to make this usable. I'm looking at how users would currently discover >> > the >> > > vertex id to query. From the supported REST APIs [1], you can >> currently >> > > obtain it from >> > > >> > > 1. `/jobs/<jobid>` >> > > 2. `/jobs/<jobid>/plan` >> > > >> > > From the response of both these APIs, they include the vertex ids (the >> > > vertices AND nodes fields), but not the operator ids. We would need to >> > add >> > > the logic to the plan generation [2]. The response is a little >> confusing >> > > because there is a field in the vertices called "operator name". I >> > propose >> > > to add a new field called "operators" to the vertex response object, >> > which >> > > would be a list of objects with the structure >> > > >> > > Operator >> > > { >> > > "id": "THE-FLINK-GENERATED-ID" >> > > }. >> > > >> > > The JobManagerOperatorQueryScopeInfo has three fields: jobID, vertexID >> > and >> > > > operatorName. So we should use the operator name in the API. >> > > > If you think we should use the operator id, there need be more >> changes >> > > > about it. >> > > > >> > > I think we should use operator id since it uniquely identifies an >> > > operator--on the contrary, the operator name does not (it may be >> empty or >> > > repeated between operators by the user). I actually had a question on >> > that >> > > since you implemented the metric group. What's the reason we use >> operator >> > > name currently? Could it also use operator id so we can match against >> the >> > > id? >> > > >> > > [1] >> > > >> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/ >> > > [2] >> > > >> > > >> > >> https://github.com/apache/flink/blob/416cb7aaa02c176e01485ff11ab4269f76b5e9e2/flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/jsonplan/JsonPlanGenerator.java#L53 >> > > >> > > Best, >> > > Mason >> > > >> > > >> > > On Thu, Jan 11, 2024 at 10:54 PM Hang Ruan <ruanhang1...@gmail.com> >> > wrote: >> > > >> > > > Hi, Mason. >> > > > >> > > > Thanks for driving this FLIP. >> > > > >> > > > The JobManagerOperatorQueryScopeInfo has three fields: jobID, >> vertexID >> > > and >> > > > operatorName. So we should use the operator name in the API. >> > > > If you think we should use the operator id, there need be more >> changes >> > > > about it. >> > > > >> > > > About the Xuyang's questions, we add both vertexID and operatorID >> > > > information because of the operator chain. >> > > > A operator chain has a vertexID and contains many different >> operators. >> > > The >> > > > operator information helps to distinguish them in the same operator >> > > chain. >> > > > >> > > > Best, >> > > > Hang >> > > > >> > > > >> > > > Xuyang <xyzhong...@163.com> 于2024年1月12日周五 10:21写道: >> > > > >> > > > > Hi, Mason. >> > > > > Thanks for driving this Flip. I think it's important for external >> > > system >> > > > > to be able to >> > > > > perceive the metric of the operator coordinator. +1 for it. >> > > > > >> > > > > >> > > > > I just have the following minor questions and am looking forward >> to >> > > your >> > > > > reply. Please forgive >> > > > > me if I have some misunderstandings. >> > > > > >> > > > > >> > > > > 1. IIRC, in a sense, operator ID and vertex ID are the same thing. >> > The >> > > > > operator ID can >> > > > > be converted from the vertex ID[1]. Therefore, it is somewhat >> strange >> > > to >> > > > > have both vertex >> > > > > ID and operator ID in a single URL. >> > > > > >> > > > > >> > > > > 2. If I misunderstood the semantics of operator IDs here, then >> what >> > is >> > > > the >> > > > > relationship >> > > > > between vertex ID and operator ID, and do we need a url like >> > > > > `/jobs/<jobid>/vertices/<vertexid>/operators/` >> > > > > to list all operator ids under this vertices? >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > [1] >> > > > > >> > > > >> > > >> > >> https://github.com/apache/flink/blob/7bebd2d9fac517c28afc24c0c034d77cfe2b43a6/flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/OperatorID.java#L40C27-L40C27 >> > > > > >> > > > > -- >> > > > > >> > > > > Best! >> > > > > Xuyang >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > At 2024-01-12 04:20:03, "Mason Chen" <mas.chen6...@gmail.com> >> wrote: >> > > > > >Hi Devs, >> > > > > > >> > > > > >I'm opening this thread to discuss a short FLIP for exposing >> > > > > >JobManagerOperatorMetrics via REST API [1]. >> > > > > > >> > > > > >The current set of REST APIs make it impossible to query >> coordinator >> > > > > >metrics. This FLIP proposes a new REST API to query the >> > > > > >JobManagerOperatorMetrics. >> > > > > > >> > > > > >[1] >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-417%3A+Expose+JobManagerOperatorMetrics+via+REST+API >> > > > > > >> > > > > >Best, >> > > > > >Mason >> > > > > >> > > > >> > > >> > >> >