Hi, @ches...@apache.org <ches...@apache.org> ,

Do you have time to help to review this FLIP again after the modification?
Looking forward to your reply.
This FLIP will add a new configuration for the
OperatorCoordinatorMetricGroup scope format. It provides an internal
implementation and is added as a component to the JobManagerJobMetricGroup.
If something doesn't make sense, could you provide some advice? It will be
very helpful. Thanks a lot for your help.

Best,
Hang

Martijn Visser <martijnvis...@apache.org> 于2023年1月11日周三 16:34写道:

> Hi Hang,
>
> I'm a bit surprised that this has gone to a vote, given that Chesnay
> deliberately mentioned that he would vote against it as-is. I would expect
> that before going to a vote, he has had the opportunity to participate in
> this discussion.
>
> Best regards,
>
> Martijn
>
> On Tue, Jan 3, 2023 at 12:53 PM Jark Wu <imj...@gmail.com> wrote:
>
> > Hi Dong,
> >
> > Regarding “SplitEnumeratorContext#metricGroup”, my only concern is that
> > this is a core interface for the FLIP. It’s hard to tell how sources use
> > metric group without mentioning this interface. Even if this is an
> existing
> > API, I think it’s worth introducing the interface again and declaring
> that
> > we will implement the interface instead of a no-op method in this FLIP.
> >
> > Anyway, this is a minor problem and shouldn’t block this FLIP. I’m +1 to
> > start a vote.
> >
> > Best,
> > Jark
> >
> >
> > > 2023年1月3日 10:03,Hang Ruan <ruanhang1...@gmail.com> 写道:
> > >
> > > Hi, Jark and Dong,
> > >
> > > Thanks for your comments. Sorry for my late reply.
> > >
> > > For suggestion 1, I plan to implement the SplitEnumeratorMetricGroup in
> > > another issue, and it is not contained in this FLIP. I will add some
> > > description about this part.
> > > For suggestion 2, changes about OperatorCoordinator#metricGroup has
> > already
> > > been documented in the proposed change section.
> > >
> > > Best,
> > > Hang
> > >
> > > Dong Lin <lindon...@gmail.com> 于2023年1月1日周日 09:45写道:
> > >
> > >> Let me chime-in and add comments regarding the public interface
> section.
> > >> Please see my comments inline.
> > >>
> > >> On Thu, Dec 29, 2022 at 6:08 PM Jark Wu <imj...@gmail.com> wrote:
> > >>
> > >>> Hi Hang,
> > >>>
> > >>> Thanks for driving this discussion. I think this is a very useful
> > feature
> > >>> for connectors.
> > >>>
> > >>> The FLIP looks quite good to me, and I just have two suggestions.
> > >>>
> > >>> 1. In the "Public Interface" section, mention that the implementation
> > >>> behavior of "SplitEnumeratorContext#metricGroup" is changed from
> > >> returning
> > >>> null to returning a concrete SplitEnumeratorMetricGroup instance.
> Even
> > >>> though the API is already there, the behavior change can also be
> > >> considered
> > >>> a public change.
> > >>>
> > >>
> > >> SplitEnumeratorContext#metricGroup is an interface and this FLIP does
> > not
> > >> seem to change its semantics/behavior. The FLIP does change the
> > >> implementation/behavior of SourceCoordinatorContext#metricGroup, which
> > is
> > >> marked @Internal.
> > >>
> > >> Thus it might seem a bit weird to add in the public interface section
> > >> saying that we change the interface SplitEnumeratorContext#metricGroup
> > from
> > >> returning null to non-null object.
> > >>
> > >> 2. Mention the newly added interface of
> > "OperatorCoordinator#metricGroup"
> > >>> in the "Proposed Changes" section or "Public Interface" section. As
> the
> > >>> FLIP said, OperatorCoordinator is widely used in many connectors.
> > Though
> > >> it
> > >>> is still an @Internal API, I think it is worth mentioning the change
> in
> > >> the
> > >>> FLIP.
> > >>
> > >>
> > >> Since OperatorCoordinator is an internal API, it seems reasonable to
> > >> explain it in the proposed change section. The FLIP seems to have
> > >> documented this in point 5 of the proposed change section.
> > >>
> > >> BTW, if we think there are @internal classes that are important enough
> > to
> > >> be added in the public interface section, it might be useful to
> > explicitly
> > >> discuss this topic and document it in the "*What are the "public
> > >> interfaces" of the project*" in this
> > >> <
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > >>>
> > >> wiki.
> > >>
> > >>
> > >>> Best,
> > >>> Jark
> > >>>
> > >>>
> > >>> On Mon, 26 Dec 2022 at 18:06, Hang Ruan <ruanhang1...@gmail.com>
> > wrote:
> > >>>
> > >>>> Hi, thanks for the feedback, Zhu Zhu and Qingsheng.
> > >>>> After combining everyone's comments, the main concerns and
> > >> corresponding
> > >>>> adjustments are as follows.
> > >>>>
> > >>>> Q1: Common metrics are not quite useful.
> > >>>> numEventsIn and numEventsOut counters will be removed from the
> > >>>> OperatorCoordinatorMetricGroup. These common metrics do not provide
> > >>> enough
> > >>>> information for users. The users are more willing to get the number
> of
> > >>>> events of the specified type instead of the total number. And this
> > >> metric
> > >>>> is calculated differently. The implementation could register the
> > metric
> > >>> by
> > >>>> themselves.
> > >>>>
> > >>>> Q2: This FLIP is overly complicated.
> > >>>> This FLIP will become concise after these modifications.
> > >>>> OperatorCoordinatorMetricGroup has already been introduced into
> Flink
> > >> by
> > >>>> FLIP-179<
> > >>>>
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics
> > >>>>> .
> > >>>> And
> > >>>> this FLIP will not change it. This FLIP only provides a new metric
> > >> option
> > >>>> and a new metric group scope. The changes in proposed changes
> provide
> > >> the
> > >>>> details about the modifications for the internal classes, which
> might
> > >>> make
> > >>>> it look complicated.
> > >>>>
> > >>>> Thanks for all the comments again. If there are no further comments,
> > we
> > >>>> plan to start the voting thread this week.
> > >>>>
> > >>>> Best,
> > >>>> Hang
> > >>>>
> > >>>> Qingsheng Ren <renqs...@gmail.com> 于2022年12月26日周一 16:48写道:
> > >>>>
> > >>>>> Thanks for the FLIP, Hang!
> > >>>>>
> > >>>>> This FLIP overall looks good to me. Actually I share the same
> concern
> > >>>> with
> > >>>>> Zhu that numEventsIn and numEventsOut counters are not quite useful
> > >> to
> > >>>> end
> > >>>>> users. OperatorEvent is a quite low-level abstraction, which
> requires
> > >>>>> instantialization in order to be practical to users and developers,
> > >> so
> > >>>>> maybe it's better to exclude them from the FLIP.
> > >>>>>
> > >>>>> Best,
> > >>>>> Qingsheng
> > >>>>> Ververica (Alibaba)
> > >>>>>
> > >>>>> On Mon, Dec 26, 2022 at 12:08 PM Zhu Zhu <reed...@gmail.com>
> wrote:
> > >>>>>
> > >>>>>> Hi Hang,
> > >>>>>> I still see no strong reason why we need numEventsIn/numEventsOut
> > >>>>> metrics.
> > >>>>>> In the discussion in FLINK-29801, I can see the same concern from
> > >>>> others.
> > >>>>>> So I prefer to exclude them from this FLIP to avoid over-extending
> > >>> the
> > >>>>>> scope.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Zhu
> > >>>>>>
> > >>>>>> Hang Ruan <ruanhang1...@gmail.com> 于2022年12月23日周五 15:21写道:
> > >>>>>>>
> > >>>>>>> Hi everyone,
> > >>>>>>>
> > >>>>>>> As for the Zhu Zhu's problem, I think we should keep the common
> > >>>>> metrics,
> > >>>>>> which will help to observe incoming and outgoing events. What do
> > >> you
> > >>>>> think,
> > >>>>>> @Zhu Zhu ?
> > >>>>>>> And @Chesnay, are there any other issues you are more concerned
> > >>>> about?
> > >>>>>> Looking forward to your reply.
> > >>>>>>>
> > >>>>>>> Thanks for all the comments. If there are no further comments, we
> > >>>> plan
> > >>>>>> to start the voting thread next week.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Hang
> > >>>>>>>
> > >>>>>>> Hang Ruan <ruanhang1...@gmail.com> 于2022年12月15日周四 16:49写道:
> > >>>>>>>>
> > >>>>>>>> Hi, Zhu Zhu,
> > >>>>>>>>
> > >>>>>>>> Thanks for your feedback!
> > >>>>>>>>
> > >>>>>>>> The OperatorCoordinator implementations are different. And their
> > >>>>>> metrics are much different too. We try to find the common metrics
> > >> and
> > >>>> put
> > >>>>>> them in the OperatorCoordinatorMetricGroup. If most developers
> > >> think
> > >>> we
> > >>>>> do
> > >>>>>> not need these common metrics, removing the common metrics is also
> > >>>>>> acceptable.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Hang
> > >>>>>>>>
> > >>>>>>>> Zhu Zhu <reed...@gmail.com> 于2022年12月14日周三 22:09写道:
> > >>>>>>>>>
> > >>>>>>>>> Hi Hang & MengYue,
> > >>>>>>>>>
> > >>>>>>>>> Thanks for creating this FLIP!
> > >>>>>>>>>
> > >>>>>>>>> I think it is very useful, mainly in two aspects:
> > >>>>>>>>> 1. Enables OperatorCoordinators to register metrics. Currently
> > >>>>>>>>> the coordinators has no way to do this. And operator
> > >> coordinator
> > >>>>>>>>> metric group further enables the SplitEnumerator to have access
> > >>>>>>>>> to a registered metric group (via the existing public interface
> > >>>>>>>>> SplitEnumeratorContext#metricGroup()), which is null at the
> > >>> moment.
> > >>>>>>>>>
> > >>>>>>>>> 2. Defines the scope of operator coordinator metrics. A clear
> > >>>>>> definition
> > >>>>>>>>> makes it easy for users to find their wanted metrics. The
> > >>>> definition
> > >>>>>>>>> also helps to avoid conflicts of metrics from multiple
> > >>>>>> OperatorCoordinators
> > >>>>>>>>> of the same kind. E.g. each SourceCoordinator may have its own
> > >>>>>>>>> numSourceSplits metric, these metrics should not be directly
> > >>>>> registered
> > >>>>>>>>> to the job metric group.
> > >>>>>>>>>
> > >>>>>>>>> What I'm a bit concerned is the necessity of the introduced
> > >>> common
> > >>>>>> metrics
> > >>>>>>>>> numEventsInCounter & numEventsOutCounter. If there any case
> > >> which
> > >>>>>> strongly
> > >>>>>>>>> requires them?
> > >>>>>>>>>
> > >>>>>>>>> Regarding the concerns of Chesnay,
> > >>>>>>>>>> A dedicated coordinator MG implementation is overkill
> > >>>>>>>>> Directly using the job metric group can result in metric
> > >>> conflicts,
> > >>>>> as
> > >>>>>> mentioned
> > >>>>>>>>> in above #2.
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Zhu
> > >>>>>>>>>
> > >>>>>>>>> Dong Lin <lindon...@gmail.com> 于2022年12月10日周六 14:16写道:
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Hi Chesney,
> > >>>>>>>>>>
> > >>>>>>>>>> Just to double check with you, OperatorCoordinatorMetricGroup
> > >>>>>> (annotated as
> > >>>>>>>>>> @PublicEvolving) has already been introduced into Flink by
> > >>>> FLIP-179
> > >>>>>>>>>> <
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics
> > >>>>>>> .
> > >>>>>>>>>> And that FLIP has got you +1.. Do you mean we should remove
> > >>> this
> > >>>>>>>>>> OperatorCoordinatorMetricGroup?
> > >>>>>>>>>>
> > >>>>>>>>>> Regards,
> > >>>>>>>>>> Dong
> > >>>>>>>>>>
> > >>>>>>>>>> On Sat, Dec 10, 2022 at 1:33 AM Chesnay Schepler <
> > >>>>> ches...@apache.org>
> > >>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> As a whole I feel like this FLIP is overly complicated. A
> > >>>>> dedicated
> > >>>>>>>>>>> coordinator MG implementation is overkill; it could just
> > >>> re-use
> > >>>>> the
> > >>>>>>>>>>> existing Task/OperatorMGs to create the same structure we
> > >>> have
> > >>>> on
> > >>>>>> TMs,
> > >>>>>>>>>>> similar to what we did with the Job MG.
> > >>>>>>>>>>>
> > >>>>>>>>>>> However, I'm not convinced that this is required anyway,
> > >>>> because
> > >>>>>> all the
> > >>>>>>>>>>> example metrics you listed can be implemented on the TM
> > >> side
> > >>> +
> > >>>>>>>>>>> aggregating them in the external metrics backend.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Since I'm on holidays soon, just so no one tries to pull a
> > >>> fast
> > >>>>>> one on
> > >>>>>>>>>>> me, if this were to go to a vote as-is I'd be against it.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On 09/12/2022 15:30, Dong Lin wrote:
> > >>>>>>>>>>>> Hi Hang,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thanks for the FLIP! The FLIP looks good and it is pretty
> > >>>>>> informative.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I have just two minor comments regarding names:
> > >>>>>>>>>>>> - Would it be useful to rename the config key as
> > >>>>>>>>>>>> *metrics.scope.jm.job.operator-coordinator* for
> > >> consistency
> > >>>>> with
> > >>>>>>>>>>>> *metrics.scope.jm.job
> > >>>>>>>>>>>> *(which is not named as *jm-job)?
> > >>>>>>>>>>>> - Maybe rename the variable as
> > >>>>> SCOPE_NAMING_OPERATOR_COORDINATOR
> > >>>>>> for
> > >>>>>>>>>>>> simplicity and consistency with SCOPE_NAMING_OPERATOR
> > >>> (which
> > >>>> is
> > >>>>>> not named
> > >>>>>>>>>>>> as SCOPE_NAMING_TM_JOB_OPERATOR)?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>> Dong
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan <
> > >>>>> ruanhang1...@gmail.com>
> > >>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi all,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> MengYue and I created FLIP-274[1] Introduce metric group
> > >>> for
> > >>>>>>>>>>>>> OperatorCoordinator. OperatorCoordinator is the
> > >>> coordinator
> > >>>>> for
> > >>>>>> runtime
> > >>>>>>>>>>>>> operators and running on Job Manager. The coordination
> > >>>>>> mechanism is
> > >>>>>>>>>>>>> operator events between OperatorCoordinator and its all
> > >>>>>> operators, the
> > >>>>>>>>>>>>> coordination is more and more using in Flink, for
> > >> example
> > >>>> many
> > >>>>>> Sources
> > >>>>>>>>>>> and
> > >>>>>>>>>>>>> Sinks depend on the mechanism to assign splits and
> > >>>> coordinate
> > >>>>>> commits to
> > >>>>>>>>>>>>> external systems. The OperatorCoordinator is widely
> > >> using
> > >>> in
> > >>>>>> flink kafka
> > >>>>>>>>>>>>> connector, flink pulsar connector, flink cdc connector,
> > >>>> flink
> > >>>>>> hudi
> > >>>>>>>>>>>>> connector and so on.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> But there is not a suitable metric group scope for the
> > >>>>>>>>>>> OperatorCoordinator
> > >>>>>>>>>>>>> and not an implementation for the interface
> > >>>>>>>>>>> OperatorCoordinatorMetricGroup.
> > >>>>>>>>>>>>> These metrics in OperatorCoordinator could be how many
> > >>>>>> splits/partitions
> > >>>>>>>>>>>>> have been assigned to source readers, how many files
> > >> have
> > >>>> been
> > >>>>>> written
> > >>>>>>>>>>> out
> > >>>>>>>>>>>>> by sink writers, these metrics not only help users to
> > >> know
> > >>>> the
> > >>>>>> job
> > >>>>>>>>>>> progress
> > >>>>>>>>>>>>> but also make big job maintaining easier. Thus we
> > >> propose
> > >>>> the
> > >>>>>> FLIP-274
> > >>>>>>>>>>> to
> > >>>>>>>>>>>>> introduce a new metric group scope for
> > >> OperatorCoordinator
> > >>>> and
> > >>>>>> provide
> > >>>>>>>>>>> an
> > >>>>>>>>>>>>> internal implementation for
> > >>> OperatorCoordinatorMetricGroup.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Could you help review this FLIP when you get time? Any
> > >>>>> feedback
> > >>>>>> is
> > >>>>>>>>>>>>> appreciated!
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>> Hang
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> [1]
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> >
>

Reply via email to