Re: [DISCUSS] FLIP-274 : Introduce metric group for OperatorCoordinator

Hang Ruan Thu, 22 Dec 2022 23:21:25 -0800

Hi everyone,

As for the Zhu Zhu's problem, I think we should keep the common metrics,
which will help to observe incoming and outgoing events. What do you
think, @Zhu
Zhu <[email protected]> ?
And @Chesnay, are there any other issues you are more concerned about?
Looking forward to your reply.


Thanks for all the comments. If there are no further comments, we plan to
start the voting thread next week.

Best,
Hang

Hang Ruan <[email protected]> 于2022年12月15日周四 16:49写道：

> Hi, Zhu Zhu,
>
> Thanks for your feedback!
>
> The OperatorCoordinator implementations are different. And their metrics
> are much different too. We try to find the common metrics and put them in
> the OperatorCoordinatorMetricGroup. If most developers think we do not need
> these common metrics, removing the common metrics is also acceptable.
>
> Best,
> Hang
>
> Zhu Zhu <[email protected]> 于2022年12月14日周三 22:09写道：
>
>> Hi Hang & MengYue,
>>
>> Thanks for creating this FLIP!
>>
>> I think it is very useful, mainly in two aspects:
>> 1. Enables OperatorCoordinators to register metrics. Currently
>> the coordinators has no way to do this. And operator coordinator
>> metric group further enables the SplitEnumerator to have access
>> to a registered metric group (via the existing public interface
>> SplitEnumeratorContext#metricGroup()), which is null at the moment.
>>
>> 2. Defines the scope of operator coordinator metrics. A clear definition
>> makes it easy for users to find their wanted metrics. The definition
>> also helps to avoid conflicts of metrics from multiple
>> OperatorCoordinators
>> of the same kind. E.g. each SourceCoordinator may have its own
>> numSourceSplits metric, these metrics should not be directly registered
>> to the job metric group.
>>
>> What I'm a bit concerned is the necessity of the introduced common metrics
>> numEventsInCounter & numEventsOutCounter. If there any case which strongly
>> requires them?
>>
>> Regarding the concerns of Chesnay,
>> > A dedicated coordinator MG implementation is overkill
>> Directly using the job metric group can result in metric conflicts, as
>> mentioned
>> in above #2.
>>
>> Thanks,
>> Zhu
>>
>> Dong Lin <[email protected]> 于2022年12月10日周六 14:16写道：
>>
>> >
>> > Hi Chesney,
>> >
>> > Just to double check with you, OperatorCoordinatorMetricGroup
>> (annotated as
>> > @PublicEvolving) has already been introduced into Flink by FLIP-179
>> > <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics
>> >.
>> > And that FLIP has got you +1.. Do you mean we should remove this
>> > OperatorCoordinatorMetricGroup?
>> >
>> > Regards,
>> > Dong
>> >
>> > On Sat, Dec 10, 2022 at 1:33 AM Chesnay Schepler <[email protected]>
>> wrote:
>> >
>> > > As a whole I feel like this FLIP is overly complicated. A dedicated
>> > > coordinator MG implementation is overkill; it could just re-use the
>> > > existing Task/OperatorMGs to create the same structure we have on TMs,
>> > > similar to what we did with the Job MG.
>> > >
>> > > However, I'm not convinced that this is required anyway, because all
>> the
>> > > example metrics you listed can be implemented on the TM side +
>> > > aggregating them in the external metrics backend.
>> > >
>> > > Since I'm on holidays soon, just so no one tries to pull a fast one on
>> > > me, if this were to go to a vote as-is I'd be against it.
>> > >
>> > >
>> > > On 09/12/2022 15:30, Dong Lin wrote:
>> > > > Hi Hang,
>> > > >
>> > > > Thanks for the FLIP! The FLIP looks good and it is pretty
>> informative.
>> > > >
>> > > > I have just two minor comments regarding names:
>> > > > - Would it be useful to rename the config key as
>> > > > *metrics.scope.jm.job.operator-coordinator* for consistency with
>> > > > *metrics.scope.jm.job
>> > > > *(which is not named as *jm-job)?
>> > > > - Maybe rename the variable as SCOPE_NAMING_OPERATOR_COORDINATOR for
>> > > > simplicity and consistency with SCOPE_NAMING_OPERATOR (which is not
>> named
>> > > > as SCOPE_NAMING_TM_JOB_OPERATOR)?
>> > > >
>> > > > Cheers,
>> > > > Dong
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan <[email protected]>
>> wrote:
>> > > >
>> > > >> Hi all,
>> > > >>
>> > > >> MengYue and I created FLIP-274[1] Introduce metric group for
>> > > >> OperatorCoordinator. OperatorCoordinator is the coordinator for
>> runtime
>> > > >> operators and running on Job Manager. The coordination mechanism is
>> > > >> operator events between OperatorCoordinator and its all operators,
>> the
>> > > >> coordination is more and more using in Flink, for example many
>> Sources
>> > > and
>> > > >> Sinks depend on the mechanism to assign splits and coordinate
>> commits to
>> > > >> external systems. The OperatorCoordinator is widely using in flink
>> kafka
>> > > >> connector, flink pulsar connector, flink cdc connector, flink hudi
>> > > >> connector and so on.
>> > > >>
>> > > >> But there is not a suitable metric group scope for the
>> > > OperatorCoordinator
>> > > >> and not an implementation for the interface
>> > > OperatorCoordinatorMetricGroup.
>> > > >> These metrics in OperatorCoordinator could be how many
>> splits/partitions
>> > > >> have been assigned to source readers, how many files have been
>> written
>> > > out
>> > > >> by sink writers, these metrics not only help users to know the job
>> > > progress
>> > > >> but also make big job maintaining easier. Thus we propose the
>> FLIP-274
>> > > to
>> > > >> introduce a new metric group scope for OperatorCoordinator and
>> provide
>> > > an
>> > > >> internal implementation for OperatorCoordinatorMetricGroup.
>> > > >>
>> > > >> Could you help review this FLIP when you get time? Any feedback is
>> > > >> appreciated!
>> > > >>
>> > > >> Best,
>> > > >> Hang
>> > > >>
>> > > >> [1]
>> > > >>
>> > > >>
>> > >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator
>> > > >>
>> > >
>> > >
>>
>

Re: [DISCUSS] FLIP-274 : Introduce metric group for OperatorCoordinator

Reply via email to