Hi Dong, Regarding “SplitEnumeratorContext#metricGroup”, my only concern is that this is a core interface for the FLIP. It’s hard to tell how sources use metric group without mentioning this interface. Even if this is an existing API, I think it’s worth introducing the interface again and declaring that we will implement the interface instead of a no-op method in this FLIP.
Anyway, this is a minor problem and shouldn’t block this FLIP. I’m +1 to start a vote. Best, Jark > 2023年1月3日 10:03,Hang Ruan <ruanhang1...@gmail.com> 写道: > > Hi, Jark and Dong, > > Thanks for your comments. Sorry for my late reply. > > For suggestion 1, I plan to implement the SplitEnumeratorMetricGroup in > another issue, and it is not contained in this FLIP. I will add some > description about this part. > For suggestion 2, changes about OperatorCoordinator#metricGroup has already > been documented in the proposed change section. > > Best, > Hang > > Dong Lin <lindon...@gmail.com> 于2023年1月1日周日 09:45写道: > >> Let me chime-in and add comments regarding the public interface section. >> Please see my comments inline. >> >> On Thu, Dec 29, 2022 at 6:08 PM Jark Wu <imj...@gmail.com> wrote: >> >>> Hi Hang, >>> >>> Thanks for driving this discussion. I think this is a very useful feature >>> for connectors. >>> >>> The FLIP looks quite good to me, and I just have two suggestions. >>> >>> 1. In the "Public Interface" section, mention that the implementation >>> behavior of "SplitEnumeratorContext#metricGroup" is changed from >> returning >>> null to returning a concrete SplitEnumeratorMetricGroup instance. Even >>> though the API is already there, the behavior change can also be >> considered >>> a public change. >>> >> >> SplitEnumeratorContext#metricGroup is an interface and this FLIP does not >> seem to change its semantics/behavior. The FLIP does change the >> implementation/behavior of SourceCoordinatorContext#metricGroup, which is >> marked @Internal. >> >> Thus it might seem a bit weird to add in the public interface section >> saying that we change the interface SplitEnumeratorContext#metricGroup from >> returning null to non-null object. >> >> 2. Mention the newly added interface of "OperatorCoordinator#metricGroup" >>> in the "Proposed Changes" section or "Public Interface" section. As the >>> FLIP said, OperatorCoordinator is widely used in many connectors. Though >> it >>> is still an @Internal API, I think it is worth mentioning the change in >> the >>> FLIP. >> >> >> Since OperatorCoordinator is an internal API, it seems reasonable to >> explain it in the proposed change section. The FLIP seems to have >> documented this in point 5 of the proposed change section. >> >> BTW, if we think there are @internal classes that are important enough to >> be added in the public interface section, it might be useful to explicitly >> discuss this topic and document it in the "*What are the "public >> interfaces" of the project*" in this >> < >> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals >>> >> wiki. >> >> >>> Best, >>> Jark >>> >>> >>> On Mon, 26 Dec 2022 at 18:06, Hang Ruan <ruanhang1...@gmail.com> wrote: >>> >>>> Hi, thanks for the feedback, Zhu Zhu and Qingsheng. >>>> After combining everyone's comments, the main concerns and >> corresponding >>>> adjustments are as follows. >>>> >>>> Q1: Common metrics are not quite useful. >>>> numEventsIn and numEventsOut counters will be removed from the >>>> OperatorCoordinatorMetricGroup. These common metrics do not provide >>> enough >>>> information for users. The users are more willing to get the number of >>>> events of the specified type instead of the total number. And this >> metric >>>> is calculated differently. The implementation could register the metric >>> by >>>> themselves. >>>> >>>> Q2: This FLIP is overly complicated. >>>> This FLIP will become concise after these modifications. >>>> OperatorCoordinatorMetricGroup has already been introduced into Flink >> by >>>> FLIP-179< >>>> >>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics >>>>> . >>>> And >>>> this FLIP will not change it. This FLIP only provides a new metric >> option >>>> and a new metric group scope. The changes in proposed changes provide >> the >>>> details about the modifications for the internal classes, which might >>> make >>>> it look complicated. >>>> >>>> Thanks for all the comments again. If there are no further comments, we >>>> plan to start the voting thread this week. >>>> >>>> Best, >>>> Hang >>>> >>>> Qingsheng Ren <renqs...@gmail.com> 于2022年12月26日周一 16:48写道: >>>> >>>>> Thanks for the FLIP, Hang! >>>>> >>>>> This FLIP overall looks good to me. Actually I share the same concern >>>> with >>>>> Zhu that numEventsIn and numEventsOut counters are not quite useful >> to >>>> end >>>>> users. OperatorEvent is a quite low-level abstraction, which requires >>>>> instantialization in order to be practical to users and developers, >> so >>>>> maybe it's better to exclude them from the FLIP. >>>>> >>>>> Best, >>>>> Qingsheng >>>>> Ververica (Alibaba) >>>>> >>>>> On Mon, Dec 26, 2022 at 12:08 PM Zhu Zhu <reed...@gmail.com> wrote: >>>>> >>>>>> Hi Hang, >>>>>> I still see no strong reason why we need numEventsIn/numEventsOut >>>>> metrics. >>>>>> In the discussion in FLINK-29801, I can see the same concern from >>>> others. >>>>>> So I prefer to exclude them from this FLIP to avoid over-extending >>> the >>>>>> scope. >>>>>> >>>>>> Thanks, >>>>>> Zhu >>>>>> >>>>>> Hang Ruan <ruanhang1...@gmail.com> 于2022年12月23日周五 15:21写道: >>>>>>> >>>>>>> Hi everyone, >>>>>>> >>>>>>> As for the Zhu Zhu's problem, I think we should keep the common >>>>> metrics, >>>>>> which will help to observe incoming and outgoing events. What do >> you >>>>> think, >>>>>> @Zhu Zhu ? >>>>>>> And @Chesnay, are there any other issues you are more concerned >>>> about? >>>>>> Looking forward to your reply. >>>>>>> >>>>>>> Thanks for all the comments. If there are no further comments, we >>>> plan >>>>>> to start the voting thread next week. >>>>>>> >>>>>>> Best, >>>>>>> Hang >>>>>>> >>>>>>> Hang Ruan <ruanhang1...@gmail.com> 于2022年12月15日周四 16:49写道: >>>>>>>> >>>>>>>> Hi, Zhu Zhu, >>>>>>>> >>>>>>>> Thanks for your feedback! >>>>>>>> >>>>>>>> The OperatorCoordinator implementations are different. And their >>>>>> metrics are much different too. We try to find the common metrics >> and >>>> put >>>>>> them in the OperatorCoordinatorMetricGroup. If most developers >> think >>> we >>>>> do >>>>>> not need these common metrics, removing the common metrics is also >>>>>> acceptable. >>>>>>>> >>>>>>>> Best, >>>>>>>> Hang >>>>>>>> >>>>>>>> Zhu Zhu <reed...@gmail.com> 于2022年12月14日周三 22:09写道: >>>>>>>>> >>>>>>>>> Hi Hang & MengYue, >>>>>>>>> >>>>>>>>> Thanks for creating this FLIP! >>>>>>>>> >>>>>>>>> I think it is very useful, mainly in two aspects: >>>>>>>>> 1. Enables OperatorCoordinators to register metrics. Currently >>>>>>>>> the coordinators has no way to do this. And operator >> coordinator >>>>>>>>> metric group further enables the SplitEnumerator to have access >>>>>>>>> to a registered metric group (via the existing public interface >>>>>>>>> SplitEnumeratorContext#metricGroup()), which is null at the >>> moment. >>>>>>>>> >>>>>>>>> 2. Defines the scope of operator coordinator metrics. A clear >>>>>> definition >>>>>>>>> makes it easy for users to find their wanted metrics. The >>>> definition >>>>>>>>> also helps to avoid conflicts of metrics from multiple >>>>>> OperatorCoordinators >>>>>>>>> of the same kind. E.g. each SourceCoordinator may have its own >>>>>>>>> numSourceSplits metric, these metrics should not be directly >>>>> registered >>>>>>>>> to the job metric group. >>>>>>>>> >>>>>>>>> What I'm a bit concerned is the necessity of the introduced >>> common >>>>>> metrics >>>>>>>>> numEventsInCounter & numEventsOutCounter. If there any case >> which >>>>>> strongly >>>>>>>>> requires them? >>>>>>>>> >>>>>>>>> Regarding the concerns of Chesnay, >>>>>>>>>> A dedicated coordinator MG implementation is overkill >>>>>>>>> Directly using the job metric group can result in metric >>> conflicts, >>>>> as >>>>>> mentioned >>>>>>>>> in above #2. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Zhu >>>>>>>>> >>>>>>>>> Dong Lin <lindon...@gmail.com> 于2022年12月10日周六 14:16写道: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Chesney, >>>>>>>>>> >>>>>>>>>> Just to double check with you, OperatorCoordinatorMetricGroup >>>>>> (annotated as >>>>>>>>>> @PublicEvolving) has already been introduced into Flink by >>>> FLIP-179 >>>>>>>>>> < >>>>>> >>>>> >>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics >>>>>>> . >>>>>>>>>> And that FLIP has got you +1.. Do you mean we should remove >>> this >>>>>>>>>> OperatorCoordinatorMetricGroup? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Dong >>>>>>>>>> >>>>>>>>>> On Sat, Dec 10, 2022 at 1:33 AM Chesnay Schepler < >>>>> ches...@apache.org> >>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> As a whole I feel like this FLIP is overly complicated. A >>>>> dedicated >>>>>>>>>>> coordinator MG implementation is overkill; it could just >>> re-use >>>>> the >>>>>>>>>>> existing Task/OperatorMGs to create the same structure we >>> have >>>> on >>>>>> TMs, >>>>>>>>>>> similar to what we did with the Job MG. >>>>>>>>>>> >>>>>>>>>>> However, I'm not convinced that this is required anyway, >>>> because >>>>>> all the >>>>>>>>>>> example metrics you listed can be implemented on the TM >> side >>> + >>>>>>>>>>> aggregating them in the external metrics backend. >>>>>>>>>>> >>>>>>>>>>> Since I'm on holidays soon, just so no one tries to pull a >>> fast >>>>>> one on >>>>>>>>>>> me, if this were to go to a vote as-is I'd be against it. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 09/12/2022 15:30, Dong Lin wrote: >>>>>>>>>>>> Hi Hang, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the FLIP! The FLIP looks good and it is pretty >>>>>> informative. >>>>>>>>>>>> >>>>>>>>>>>> I have just two minor comments regarding names: >>>>>>>>>>>> - Would it be useful to rename the config key as >>>>>>>>>>>> *metrics.scope.jm.job.operator-coordinator* for >> consistency >>>>> with >>>>>>>>>>>> *metrics.scope.jm.job >>>>>>>>>>>> *(which is not named as *jm-job)? >>>>>>>>>>>> - Maybe rename the variable as >>>>> SCOPE_NAMING_OPERATOR_COORDINATOR >>>>>> for >>>>>>>>>>>> simplicity and consistency with SCOPE_NAMING_OPERATOR >>> (which >>>> is >>>>>> not named >>>>>>>>>>>> as SCOPE_NAMING_TM_JOB_OPERATOR)? >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Dong >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan < >>>>> ruanhang1...@gmail.com> >>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> MengYue and I created FLIP-274[1] Introduce metric group >>> for >>>>>>>>>>>>> OperatorCoordinator. OperatorCoordinator is the >>> coordinator >>>>> for >>>>>> runtime >>>>>>>>>>>>> operators and running on Job Manager. The coordination >>>>>> mechanism is >>>>>>>>>>>>> operator events between OperatorCoordinator and its all >>>>>> operators, the >>>>>>>>>>>>> coordination is more and more using in Flink, for >> example >>>> many >>>>>> Sources >>>>>>>>>>> and >>>>>>>>>>>>> Sinks depend on the mechanism to assign splits and >>>> coordinate >>>>>> commits to >>>>>>>>>>>>> external systems. The OperatorCoordinator is widely >> using >>> in >>>>>> flink kafka >>>>>>>>>>>>> connector, flink pulsar connector, flink cdc connector, >>>> flink >>>>>> hudi >>>>>>>>>>>>> connector and so on. >>>>>>>>>>>>> >>>>>>>>>>>>> But there is not a suitable metric group scope for the >>>>>>>>>>> OperatorCoordinator >>>>>>>>>>>>> and not an implementation for the interface >>>>>>>>>>> OperatorCoordinatorMetricGroup. >>>>>>>>>>>>> These metrics in OperatorCoordinator could be how many >>>>>> splits/partitions >>>>>>>>>>>>> have been assigned to source readers, how many files >> have >>>> been >>>>>> written >>>>>>>>>>> out >>>>>>>>>>>>> by sink writers, these metrics not only help users to >> know >>>> the >>>>>> job >>>>>>>>>>> progress >>>>>>>>>>>>> but also make big job maintaining easier. Thus we >> propose >>>> the >>>>>> FLIP-274 >>>>>>>>>>> to >>>>>>>>>>>>> introduce a new metric group scope for >> OperatorCoordinator >>>> and >>>>>> provide >>>>>>>>>>> an >>>>>>>>>>>>> internal implementation for >>> OperatorCoordinatorMetricGroup. >>>>>>>>>>>>> >>>>>>>>>>>>> Could you help review this FLIP when you get time? Any >>>>> feedback >>>>>> is >>>>>>>>>>>>> appreciated! >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Hang >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>> >>>>> >>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>> >>>>> >>>> >>> >>