Hi, @ches...@apache.org <ches...@apache.org> , Do you have time to help to review this FLIP again after the modification? Looking forward to your reply. This FLIP will add a new configuration for the OperatorCoordinatorMetricGroup scope format. It provides an internal implementation and is added as a component to the JobManagerJobMetricGroup. If something doesn't make sense, could you provide some advice? It will be very helpful. Thanks a lot for your help.
Best, Hang Martijn Visser <martijnvis...@apache.org> 于2023年1月11日周三 16:34写道: > Hi Hang, > > I'm a bit surprised that this has gone to a vote, given that Chesnay > deliberately mentioned that he would vote against it as-is. I would expect > that before going to a vote, he has had the opportunity to participate in > this discussion. > > Best regards, > > Martijn > > On Tue, Jan 3, 2023 at 12:53 PM Jark Wu <imj...@gmail.com> wrote: > > > Hi Dong, > > > > Regarding “SplitEnumeratorContext#metricGroup”, my only concern is that > > this is a core interface for the FLIP. It’s hard to tell how sources use > > metric group without mentioning this interface. Even if this is an > existing > > API, I think it’s worth introducing the interface again and declaring > that > > we will implement the interface instead of a no-op method in this FLIP. > > > > Anyway, this is a minor problem and shouldn’t block this FLIP. I’m +1 to > > start a vote. > > > > Best, > > Jark > > > > > > > 2023年1月3日 10:03,Hang Ruan <ruanhang1...@gmail.com> 写道: > > > > > > Hi, Jark and Dong, > > > > > > Thanks for your comments. Sorry for my late reply. > > > > > > For suggestion 1, I plan to implement the SplitEnumeratorMetricGroup in > > > another issue, and it is not contained in this FLIP. I will add some > > > description about this part. > > > For suggestion 2, changes about OperatorCoordinator#metricGroup has > > already > > > been documented in the proposed change section. > > > > > > Best, > > > Hang > > > > > > Dong Lin <lindon...@gmail.com> 于2023年1月1日周日 09:45写道: > > > > > >> Let me chime-in and add comments regarding the public interface > section. > > >> Please see my comments inline. > > >> > > >> On Thu, Dec 29, 2022 at 6:08 PM Jark Wu <imj...@gmail.com> wrote: > > >> > > >>> Hi Hang, > > >>> > > >>> Thanks for driving this discussion. I think this is a very useful > > feature > > >>> for connectors. > > >>> > > >>> The FLIP looks quite good to me, and I just have two suggestions. > > >>> > > >>> 1. In the "Public Interface" section, mention that the implementation > > >>> behavior of "SplitEnumeratorContext#metricGroup" is changed from > > >> returning > > >>> null to returning a concrete SplitEnumeratorMetricGroup instance. > Even > > >>> though the API is already there, the behavior change can also be > > >> considered > > >>> a public change. > > >>> > > >> > > >> SplitEnumeratorContext#metricGroup is an interface and this FLIP does > > not > > >> seem to change its semantics/behavior. The FLIP does change the > > >> implementation/behavior of SourceCoordinatorContext#metricGroup, which > > is > > >> marked @Internal. > > >> > > >> Thus it might seem a bit weird to add in the public interface section > > >> saying that we change the interface SplitEnumeratorContext#metricGroup > > from > > >> returning null to non-null object. > > >> > > >> 2. Mention the newly added interface of > > "OperatorCoordinator#metricGroup" > > >>> in the "Proposed Changes" section or "Public Interface" section. As > the > > >>> FLIP said, OperatorCoordinator is widely used in many connectors. > > Though > > >> it > > >>> is still an @Internal API, I think it is worth mentioning the change > in > > >> the > > >>> FLIP. > > >> > > >> > > >> Since OperatorCoordinator is an internal API, it seems reasonable to > > >> explain it in the proposed change section. The FLIP seems to have > > >> documented this in point 5 of the proposed change section. > > >> > > >> BTW, if we think there are @internal classes that are important enough > > to > > >> be added in the public interface section, it might be useful to > > explicitly > > >> discuss this topic and document it in the "*What are the "public > > >> interfaces" of the project*" in this > > >> < > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > >>> > > >> wiki. > > >> > > >> > > >>> Best, > > >>> Jark > > >>> > > >>> > > >>> On Mon, 26 Dec 2022 at 18:06, Hang Ruan <ruanhang1...@gmail.com> > > wrote: > > >>> > > >>>> Hi, thanks for the feedback, Zhu Zhu and Qingsheng. > > >>>> After combining everyone's comments, the main concerns and > > >> corresponding > > >>>> adjustments are as follows. > > >>>> > > >>>> Q1: Common metrics are not quite useful. > > >>>> numEventsIn and numEventsOut counters will be removed from the > > >>>> OperatorCoordinatorMetricGroup. These common metrics do not provide > > >>> enough > > >>>> information for users. The users are more willing to get the number > of > > >>>> events of the specified type instead of the total number. And this > > >> metric > > >>>> is calculated differently. The implementation could register the > > metric > > >>> by > > >>>> themselves. > > >>>> > > >>>> Q2: This FLIP is overly complicated. > > >>>> This FLIP will become concise after these modifications. > > >>>> OperatorCoordinatorMetricGroup has already been introduced into > Flink > > >> by > > >>>> FLIP-179< > > >>>> > > >>>> > > >>> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics > > >>>>> . > > >>>> And > > >>>> this FLIP will not change it. This FLIP only provides a new metric > > >> option > > >>>> and a new metric group scope. The changes in proposed changes > provide > > >> the > > >>>> details about the modifications for the internal classes, which > might > > >>> make > > >>>> it look complicated. > > >>>> > > >>>> Thanks for all the comments again. If there are no further comments, > > we > > >>>> plan to start the voting thread this week. > > >>>> > > >>>> Best, > > >>>> Hang > > >>>> > > >>>> Qingsheng Ren <renqs...@gmail.com> 于2022年12月26日周一 16:48写道: > > >>>> > > >>>>> Thanks for the FLIP, Hang! > > >>>>> > > >>>>> This FLIP overall looks good to me. Actually I share the same > concern > > >>>> with > > >>>>> Zhu that numEventsIn and numEventsOut counters are not quite useful > > >> to > > >>>> end > > >>>>> users. OperatorEvent is a quite low-level abstraction, which > requires > > >>>>> instantialization in order to be practical to users and developers, > > >> so > > >>>>> maybe it's better to exclude them from the FLIP. > > >>>>> > > >>>>> Best, > > >>>>> Qingsheng > > >>>>> Ververica (Alibaba) > > >>>>> > > >>>>> On Mon, Dec 26, 2022 at 12:08 PM Zhu Zhu <reed...@gmail.com> > wrote: > > >>>>> > > >>>>>> Hi Hang, > > >>>>>> I still see no strong reason why we need numEventsIn/numEventsOut > > >>>>> metrics. > > >>>>>> In the discussion in FLINK-29801, I can see the same concern from > > >>>> others. > > >>>>>> So I prefer to exclude them from this FLIP to avoid over-extending > > >>> the > > >>>>>> scope. > > >>>>>> > > >>>>>> Thanks, > > >>>>>> Zhu > > >>>>>> > > >>>>>> Hang Ruan <ruanhang1...@gmail.com> 于2022年12月23日周五 15:21写道: > > >>>>>>> > > >>>>>>> Hi everyone, > > >>>>>>> > > >>>>>>> As for the Zhu Zhu's problem, I think we should keep the common > > >>>>> metrics, > > >>>>>> which will help to observe incoming and outgoing events. What do > > >> you > > >>>>> think, > > >>>>>> @Zhu Zhu ? > > >>>>>>> And @Chesnay, are there any other issues you are more concerned > > >>>> about? > > >>>>>> Looking forward to your reply. > > >>>>>>> > > >>>>>>> Thanks for all the comments. If there are no further comments, we > > >>>> plan > > >>>>>> to start the voting thread next week. > > >>>>>>> > > >>>>>>> Best, > > >>>>>>> Hang > > >>>>>>> > > >>>>>>> Hang Ruan <ruanhang1...@gmail.com> 于2022年12月15日周四 16:49写道: > > >>>>>>>> > > >>>>>>>> Hi, Zhu Zhu, > > >>>>>>>> > > >>>>>>>> Thanks for your feedback! > > >>>>>>>> > > >>>>>>>> The OperatorCoordinator implementations are different. And their > > >>>>>> metrics are much different too. We try to find the common metrics > > >> and > > >>>> put > > >>>>>> them in the OperatorCoordinatorMetricGroup. If most developers > > >> think > > >>> we > > >>>>> do > > >>>>>> not need these common metrics, removing the common metrics is also > > >>>>>> acceptable. > > >>>>>>>> > > >>>>>>>> Best, > > >>>>>>>> Hang > > >>>>>>>> > > >>>>>>>> Zhu Zhu <reed...@gmail.com> 于2022年12月14日周三 22:09写道: > > >>>>>>>>> > > >>>>>>>>> Hi Hang & MengYue, > > >>>>>>>>> > > >>>>>>>>> Thanks for creating this FLIP! > > >>>>>>>>> > > >>>>>>>>> I think it is very useful, mainly in two aspects: > > >>>>>>>>> 1. Enables OperatorCoordinators to register metrics. Currently > > >>>>>>>>> the coordinators has no way to do this. And operator > > >> coordinator > > >>>>>>>>> metric group further enables the SplitEnumerator to have access > > >>>>>>>>> to a registered metric group (via the existing public interface > > >>>>>>>>> SplitEnumeratorContext#metricGroup()), which is null at the > > >>> moment. > > >>>>>>>>> > > >>>>>>>>> 2. Defines the scope of operator coordinator metrics. A clear > > >>>>>> definition > > >>>>>>>>> makes it easy for users to find their wanted metrics. The > > >>>> definition > > >>>>>>>>> also helps to avoid conflicts of metrics from multiple > > >>>>>> OperatorCoordinators > > >>>>>>>>> of the same kind. E.g. each SourceCoordinator may have its own > > >>>>>>>>> numSourceSplits metric, these metrics should not be directly > > >>>>> registered > > >>>>>>>>> to the job metric group. > > >>>>>>>>> > > >>>>>>>>> What I'm a bit concerned is the necessity of the introduced > > >>> common > > >>>>>> metrics > > >>>>>>>>> numEventsInCounter & numEventsOutCounter. If there any case > > >> which > > >>>>>> strongly > > >>>>>>>>> requires them? > > >>>>>>>>> > > >>>>>>>>> Regarding the concerns of Chesnay, > > >>>>>>>>>> A dedicated coordinator MG implementation is overkill > > >>>>>>>>> Directly using the job metric group can result in metric > > >>> conflicts, > > >>>>> as > > >>>>>> mentioned > > >>>>>>>>> in above #2. > > >>>>>>>>> > > >>>>>>>>> Thanks, > > >>>>>>>>> Zhu > > >>>>>>>>> > > >>>>>>>>> Dong Lin <lindon...@gmail.com> 于2022年12月10日周六 14:16写道: > > >>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> Hi Chesney, > > >>>>>>>>>> > > >>>>>>>>>> Just to double check with you, OperatorCoordinatorMetricGroup > > >>>>>> (annotated as > > >>>>>>>>>> @PublicEvolving) has already been introduced into Flink by > > >>>> FLIP-179 > > >>>>>>>>>> < > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics > > >>>>>>> . > > >>>>>>>>>> And that FLIP has got you +1.. Do you mean we should remove > > >>> this > > >>>>>>>>>> OperatorCoordinatorMetricGroup? > > >>>>>>>>>> > > >>>>>>>>>> Regards, > > >>>>>>>>>> Dong > > >>>>>>>>>> > > >>>>>>>>>> On Sat, Dec 10, 2022 at 1:33 AM Chesnay Schepler < > > >>>>> ches...@apache.org> > > >>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> As a whole I feel like this FLIP is overly complicated. A > > >>>>> dedicated > > >>>>>>>>>>> coordinator MG implementation is overkill; it could just > > >>> re-use > > >>>>> the > > >>>>>>>>>>> existing Task/OperatorMGs to create the same structure we > > >>> have > > >>>> on > > >>>>>> TMs, > > >>>>>>>>>>> similar to what we did with the Job MG. > > >>>>>>>>>>> > > >>>>>>>>>>> However, I'm not convinced that this is required anyway, > > >>>> because > > >>>>>> all the > > >>>>>>>>>>> example metrics you listed can be implemented on the TM > > >> side > > >>> + > > >>>>>>>>>>> aggregating them in the external metrics backend. > > >>>>>>>>>>> > > >>>>>>>>>>> Since I'm on holidays soon, just so no one tries to pull a > > >>> fast > > >>>>>> one on > > >>>>>>>>>>> me, if this were to go to a vote as-is I'd be against it. > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On 09/12/2022 15:30, Dong Lin wrote: > > >>>>>>>>>>>> Hi Hang, > > >>>>>>>>>>>> > > >>>>>>>>>>>> Thanks for the FLIP! The FLIP looks good and it is pretty > > >>>>>> informative. > > >>>>>>>>>>>> > > >>>>>>>>>>>> I have just two minor comments regarding names: > > >>>>>>>>>>>> - Would it be useful to rename the config key as > > >>>>>>>>>>>> *metrics.scope.jm.job.operator-coordinator* for > > >> consistency > > >>>>> with > > >>>>>>>>>>>> *metrics.scope.jm.job > > >>>>>>>>>>>> *(which is not named as *jm-job)? > > >>>>>>>>>>>> - Maybe rename the variable as > > >>>>> SCOPE_NAMING_OPERATOR_COORDINATOR > > >>>>>> for > > >>>>>>>>>>>> simplicity and consistency with SCOPE_NAMING_OPERATOR > > >>> (which > > >>>> is > > >>>>>> not named > > >>>>>>>>>>>> as SCOPE_NAMING_TM_JOB_OPERATOR)? > > >>>>>>>>>>>> > > >>>>>>>>>>>> Cheers, > > >>>>>>>>>>>> Dong > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan < > > >>>>> ruanhang1...@gmail.com> > > >>>>>> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>>> Hi all, > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> MengYue and I created FLIP-274[1] Introduce metric group > > >>> for > > >>>>>>>>>>>>> OperatorCoordinator. OperatorCoordinator is the > > >>> coordinator > > >>>>> for > > >>>>>> runtime > > >>>>>>>>>>>>> operators and running on Job Manager. The coordination > > >>>>>> mechanism is > > >>>>>>>>>>>>> operator events between OperatorCoordinator and its all > > >>>>>> operators, the > > >>>>>>>>>>>>> coordination is more and more using in Flink, for > > >> example > > >>>> many > > >>>>>> Sources > > >>>>>>>>>>> and > > >>>>>>>>>>>>> Sinks depend on the mechanism to assign splits and > > >>>> coordinate > > >>>>>> commits to > > >>>>>>>>>>>>> external systems. The OperatorCoordinator is widely > > >> using > > >>> in > > >>>>>> flink kafka > > >>>>>>>>>>>>> connector, flink pulsar connector, flink cdc connector, > > >>>> flink > > >>>>>> hudi > > >>>>>>>>>>>>> connector and so on. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> But there is not a suitable metric group scope for the > > >>>>>>>>>>> OperatorCoordinator > > >>>>>>>>>>>>> and not an implementation for the interface > > >>>>>>>>>>> OperatorCoordinatorMetricGroup. > > >>>>>>>>>>>>> These metrics in OperatorCoordinator could be how many > > >>>>>> splits/partitions > > >>>>>>>>>>>>> have been assigned to source readers, how many files > > >> have > > >>>> been > > >>>>>> written > > >>>>>>>>>>> out > > >>>>>>>>>>>>> by sink writers, these metrics not only help users to > > >> know > > >>>> the > > >>>>>> job > > >>>>>>>>>>> progress > > >>>>>>>>>>>>> but also make big job maintaining easier. Thus we > > >> propose > > >>>> the > > >>>>>> FLIP-274 > > >>>>>>>>>>> to > > >>>>>>>>>>>>> introduce a new metric group scope for > > >> OperatorCoordinator > > >>>> and > > >>>>>> provide > > >>>>>>>>>>> an > > >>>>>>>>>>>>> internal implementation for > > >>> OperatorCoordinatorMetricGroup. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Could you help review this FLIP when you get time? Any > > >>>>> feedback > > >>>>>> is > > >>>>>>>>>>>>> appreciated! > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Best, > > >>>>>>>>>>>>> Hang > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> [1] > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator > > >>>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > > >