Hi all,

I hope that everyone has had sufficient time to review the discussion and
the updates on the FLIP doc. If there are no objections, I'll start a
voting thread in 2 days.

Best,
Mason

On Thu, Jan 18, 2024 at 2:39 PM Mason Chen <mas.chen6...@gmail.com> wrote:

> Hi Lijie,
>
> That's also a possibility but I would prefer to keep it consistent with
> how the existing metric APIs are used.
>
> For example, in the current metric APIs [1], there is no way to figure out
> the vertexid and subtaskindex without getting the job graph from
> `/jobs/<jobid>/plan` and correspondingly there are no APIs to return a map
> of all metrics for every vertex and to return a map of all metrics for
> every subtask. Essentially, the plan API is already required to use the
> finer-grained metric apis.
>
> In addition, keeping the design similar lends itself better for the
> implementation. The metric handler utilities [2] assume a
> ComponentMetricStore is returned rather than a Map.
>
> I've updated the FLIP doc (
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-417%3A+Expose+JobManagerOperatorMetrics+via+REST+API)
> with our discussion so far.
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#rest-api-integration
> [2]
> https://github.com/apache/flink/blob/a41229b24d82e8c561350c42d8a98dfb865c3f69/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/metrics/AbstractMetricsHandler.java#L109
>
> Best,
> Mason
>
> On Wed, Jan 17, 2024 at 8:28 AM Lijie Wang <wangdachui9...@gmail.com>
> wrote:
>
>> Hi Mason,
>>
>> Thanks for driving the discussion. +1 for the proposal.
>>
>> How about we return all operator metrics in a vertex? (the response is a
>> map of operatorId/operatorName -> operator-metrics). Correspondingly, the
>> url may be changed to /jobs/<jobid>/vertices/<vertexid>/operator-metrics
>>
>> In this way, users can skip the step of obtaining the operator id.
>>
>> Best,
>> Lijie
>>
>> Hang Ruan <ruanhang1...@gmail.com> 于2024年1月17日周三 10:31写道:
>>
>> > Hi, Mason.
>> >
>> > The field `operatorName` in JobManagerOperatorQueryScopeInfo refers to
>> the
>> > fields in OperatorQueryScopeInfo and chooses the operatorName instead of
>> > OperatorID.
>> > It is fine by my side to change from opertorName to operatorID in this
>> > FLIP.
>> >
>> > Best,
>> > Hang
>> >
>> > Mason Chen <mas.chen6...@gmail.com> 于2024年1月17日周三 09:39写道:
>> >
>> > > Hi Xuyang and Hang,
>> > >
>> > > Thanks for your support and feedback! See my responses below:
>> > >
>> > > 1. IIRC, in a sense, operator ID and vertex ID are the same thing. The
>> > > > operator ID can
>> > > > be converted from the vertex ID[1]. Therefore, it is somewhat
>> strange
>> > to
>> > > > have both vertex
>> > > > ID and operator ID in a single URL.
>> > > >
>> > > I think Hang explained it perfectly. Essentially, a vertix may contain
>> > one
>> > > or more operators so the operator ID is required to distinguish this
>> > case.
>> > >
>> > > 2. If I misunderstood the semantics of operator IDs here, then what is
>> > the
>> > > > relationship
>> > > > between vertex ID and operator ID, and do we need a url like
>> > > > `/jobs/<jobid>/vertices/<vertexid>/operators/`
>> > > > to list all operator ids under this vertices?
>> > > >
>> > > Good question, we definitely need expose operator IDs through the REST
>> > API
>> > > to make this usable. I'm looking at how users would currently discover
>> > the
>> > > vertex id to query. From the supported REST APIs [1], you can
>> currently
>> > > obtain it from
>> > >
>> > > 1. `/jobs/<jobid>`
>> > > 2. `/jobs/<jobid>/plan`
>> > >
>> > > From the response of both these APIs, they include the vertex ids (the
>> > > vertices AND nodes fields), but not the operator ids. We would need to
>> > add
>> > > the logic to the plan generation [2]. The response is a little
>> confusing
>> > > because there is a field in the vertices called "operator name". I
>> > propose
>> > > to add a new field called "operators" to the vertex response object,
>> > which
>> > > would be a list of objects with the structure
>> > >
>> > > Operator
>> > > {
>> > >   "id": "THE-FLINK-GENERATED-ID"
>> > > }.
>> > >
>> > > The JobManagerOperatorQueryScopeInfo has three fields: jobID, vertexID
>> > and
>> > > > operatorName. So we should use the operator name in the API.
>> > > > If you think we should use the operator id, there need be more
>> changes
>> > > > about it.
>> > > >
>> > > I think we should use operator id since it uniquely identifies an
>> > > operator--on the contrary, the operator name does not (it may be
>> empty or
>> > > repeated between operators by the user). I actually had a question on
>> > that
>> > > since you implemented the metric group. What's the reason we use
>> operator
>> > > name currently? Could it also use operator id so we can match against
>> the
>> > > id?
>> > >
>> > > [1]
>> > >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/
>> > > [2]
>> > >
>> > >
>> >
>> https://github.com/apache/flink/blob/416cb7aaa02c176e01485ff11ab4269f76b5e9e2/flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/jsonplan/JsonPlanGenerator.java#L53
>> > >
>> > > Best,
>> > > Mason
>> > >
>> > >
>> > > On Thu, Jan 11, 2024 at 10:54 PM Hang Ruan <ruanhang1...@gmail.com>
>> > wrote:
>> > >
>> > > > Hi, Mason.
>> > > >
>> > > > Thanks for driving this FLIP.
>> > > >
>> > > > The JobManagerOperatorQueryScopeInfo has three fields: jobID,
>> vertexID
>> > > and
>> > > > operatorName. So we should use the operator name in the API.
>> > > > If you think we should use the operator id, there need be more
>> changes
>> > > > about it.
>> > > >
>> > > > About the Xuyang's questions, we add both vertexID and operatorID
>> > > > information because of the operator chain.
>> > > > A operator chain has a vertexID and contains many different
>> operators.
>> > > The
>> > > > operator information helps to distinguish them in the same operator
>> > > chain.
>> > > >
>> > > > Best,
>> > > > Hang
>> > > >
>> > > >
>> > > > Xuyang <xyzhong...@163.com> 于2024年1月12日周五 10:21写道:
>> > > >
>> > > > > Hi, Mason.
>> > > > > Thanks for driving this Flip. I think it's important for external
>> > > system
>> > > > > to be able to
>> > > > > perceive the metric of the operator coordinator. +1 for it.
>> > > > >
>> > > > >
>> > > > > I just have the following minor questions and am looking forward
>> to
>> > > your
>> > > > > reply. Please forgive
>> > > > > me if I have some misunderstandings.
>> > > > >
>> > > > >
>> > > > > 1. IIRC, in a sense, operator ID and vertex ID are the same thing.
>> > The
>> > > > > operator ID can
>> > > > > be converted from the vertex ID[1]. Therefore, it is somewhat
>> strange
>> > > to
>> > > > > have both vertex
>> > > > > ID and operator ID in a single URL.
>> > > > >
>> > > > >
>> > > > > 2. If I misunderstood the semantics of operator IDs here, then
>> what
>> > is
>> > > > the
>> > > > > relationship
>> > > > > between vertex ID and operator ID, and do we need a url like
>> > > > > `/jobs/<jobid>/vertices/<vertexid>/operators/`
>> > > > > to list all operator ids under this vertices?
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > [1]
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/flink/blob/7bebd2d9fac517c28afc24c0c034d77cfe2b43a6/flink-runtime/src/main/java/org/apache/flink/runtime/jobgraph/OperatorID.java#L40C27-L40C27
>> > > > >
>> > > > > --
>> > > > >
>> > > > >     Best!
>> > > > >     Xuyang
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > At 2024-01-12 04:20:03, "Mason Chen" <mas.chen6...@gmail.com>
>> wrote:
>> > > > > >Hi Devs,
>> > > > > >
>> > > > > >I'm opening this thread to discuss a short FLIP for exposing
>> > > > > >JobManagerOperatorMetrics via REST API [1].
>> > > > > >
>> > > > > >The current set of REST APIs make it impossible to query
>> coordinator
>> > > > > >metrics. This FLIP proposes a new REST API to query the
>> > > > > >JobManagerOperatorMetrics.
>> > > > > >
>> > > > > >[1]
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-417%3A+Expose+JobManagerOperatorMetrics+via+REST+API
>> > > > > >
>> > > > > >Best,
>> > > > > >Mason
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to