Hi Hang & MengYue,

Thanks for creating this FLIP!

I think it is very useful, mainly in two aspects:
1. Enables OperatorCoordinators to register metrics. Currently
the coordinators has no way to do this. And operator coordinator
metric group further enables the SplitEnumerator to have access
to a registered metric group (via the existing public interface
SplitEnumeratorContext#metricGroup()), which is null at the moment.

2. Defines the scope of operator coordinator metrics. A clear definition
makes it easy for users to find their wanted metrics. The definition
also helps to avoid conflicts of metrics from multiple OperatorCoordinators
of the same kind. E.g. each SourceCoordinator may have its own
numSourceSplits metric, these metrics should not be directly registered
to the job metric group.

What I'm a bit concerned is the necessity of the introduced common metrics
numEventsInCounter & numEventsOutCounter. If there any case which strongly
requires them?

Regarding the concerns of Chesnay,
> A dedicated coordinator MG implementation is overkill
Directly using the job metric group can result in metric conflicts, as mentioned
in above #2.

Thanks,
Zhu

Dong Lin <lindon...@gmail.com> 于2022年12月10日周六 14:16写道:

>
> Hi Chesney,
>
> Just to double check with you, OperatorCoordinatorMetricGroup (annotated as
> @PublicEvolving) has already been introduced into Flink by FLIP-179
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics>.
> And that FLIP has got you +1.. Do you mean we should remove this
> OperatorCoordinatorMetricGroup?
>
> Regards,
> Dong
>
> On Sat, Dec 10, 2022 at 1:33 AM Chesnay Schepler <ches...@apache.org> wrote:
>
> > As a whole I feel like this FLIP is overly complicated. A dedicated
> > coordinator MG implementation is overkill; it could just re-use the
> > existing Task/OperatorMGs to create the same structure we have on TMs,
> > similar to what we did with the Job MG.
> >
> > However, I'm not convinced that this is required anyway, because all the
> > example metrics you listed can be implemented on the TM side +
> > aggregating them in the external metrics backend.
> >
> > Since I'm on holidays soon, just so no one tries to pull a fast one on
> > me, if this were to go to a vote as-is I'd be against it.
> >
> >
> > On 09/12/2022 15:30, Dong Lin wrote:
> > > Hi Hang,
> > >
> > > Thanks for the FLIP! The FLIP looks good and it is pretty informative.
> > >
> > > I have just two minor comments regarding names:
> > > - Would it be useful to rename the config key as
> > > *metrics.scope.jm.job.operator-coordinator* for consistency with
> > > *metrics.scope.jm.job
> > > *(which is not named as *jm-job)?
> > > - Maybe rename the variable as SCOPE_NAMING_OPERATOR_COORDINATOR for
> > > simplicity and consistency with SCOPE_NAMING_OPERATOR (which is not named
> > > as SCOPE_NAMING_TM_JOB_OPERATOR)?
> > >
> > > Cheers,
> > > Dong
> > >
> > >
> > >
> > > On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan <ruanhang1...@gmail.com> wrote:
> > >
> > >> Hi all,
> > >>
> > >> MengYue and I created FLIP-274[1] Introduce metric group for
> > >> OperatorCoordinator. OperatorCoordinator is the coordinator for runtime
> > >> operators and running on Job Manager. The coordination mechanism is
> > >> operator events between OperatorCoordinator and its all operators, the
> > >> coordination is more and more using in Flink, for example many Sources
> > and
> > >> Sinks depend on the mechanism to assign splits and coordinate commits to
> > >> external systems. The OperatorCoordinator is widely using in flink kafka
> > >> connector, flink pulsar connector, flink cdc connector, flink hudi
> > >> connector and so on.
> > >>
> > >> But there is not a suitable metric group scope for the
> > OperatorCoordinator
> > >> and not an implementation for the interface
> > OperatorCoordinatorMetricGroup.
> > >> These metrics in OperatorCoordinator could be how many splits/partitions
> > >> have been assigned to source readers, how many files have been written
> > out
> > >> by sink writers, these metrics not only help users to know the job
> > progress
> > >> but also make big job maintaining easier. Thus we propose the FLIP-274
> > to
> > >> introduce a new metric group scope for OperatorCoordinator and provide
> > an
> > >> internal implementation for OperatorCoordinatorMetricGroup.
> > >>
> > >> Could you help review this FLIP when you get time? Any feedback is
> > >> appreciated!
> > >>
> > >> Best,
> > >> Hang
> > >>
> > >> [1]
> > >>
> > >>
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator
> > >>
> >
> >

Reply via email to