Hi Dong,

Regarding “SplitEnumeratorContext#metricGroup”, my only concern is that this is 
a core interface for the FLIP. It’s hard to tell how sources use metric group 
without mentioning this interface. Even if this is an existing API, I think 
it’s worth introducing the interface again and declaring that we will implement 
the interface instead of a no-op method in this FLIP. 

Anyway, this is a minor problem and shouldn’t block this FLIP. I’m +1 to start 
a vote. 

Best,
Jark


> 2023年1月3日 10:03,Hang Ruan <ruanhang1...@gmail.com> 写道:
> 
> Hi, Jark and Dong,
> 
> Thanks for your comments. Sorry for my late reply.
> 
> For suggestion 1, I plan to implement the SplitEnumeratorMetricGroup in
> another issue, and it is not contained in this FLIP. I will add some
> description about this part.
> For suggestion 2, changes about OperatorCoordinator#metricGroup has already
> been documented in the proposed change section.
> 
> Best,
> Hang
> 
> Dong Lin <lindon...@gmail.com> 于2023年1月1日周日 09:45写道:
> 
>> Let me chime-in and add comments regarding the public interface section.
>> Please see my comments inline.
>> 
>> On Thu, Dec 29, 2022 at 6:08 PM Jark Wu <imj...@gmail.com> wrote:
>> 
>>> Hi Hang,
>>> 
>>> Thanks for driving this discussion. I think this is a very useful feature
>>> for connectors.
>>> 
>>> The FLIP looks quite good to me, and I just have two suggestions.
>>> 
>>> 1. In the "Public Interface" section, mention that the implementation
>>> behavior of "SplitEnumeratorContext#metricGroup" is changed from
>> returning
>>> null to returning a concrete SplitEnumeratorMetricGroup instance. Even
>>> though the API is already there, the behavior change can also be
>> considered
>>> a public change.
>>> 
>> 
>> SplitEnumeratorContext#metricGroup is an interface and this FLIP does not
>> seem to change its semantics/behavior. The FLIP does change the
>> implementation/behavior of SourceCoordinatorContext#metricGroup, which is
>> marked @Internal.
>> 
>> Thus it might seem a bit weird to add in the public interface section
>> saying that we change the interface SplitEnumeratorContext#metricGroup from
>> returning null to non-null object.
>> 
>> 2. Mention the newly added interface of "OperatorCoordinator#metricGroup"
>>> in the "Proposed Changes" section or "Public Interface" section. As the
>>> FLIP said, OperatorCoordinator is widely used in many connectors. Though
>> it
>>> is still an @Internal API, I think it is worth mentioning the change in
>> the
>>> FLIP.
>> 
>> 
>> Since OperatorCoordinator is an internal API, it seems reasonable to
>> explain it in the proposed change section. The FLIP seems to have
>> documented this in point 5 of the proposed change section.
>> 
>> BTW, if we think there are @internal classes that are important enough to
>> be added in the public interface section, it might be useful to explicitly
>> discuss this topic and document it in the "*What are the "public
>> interfaces" of the project*" in this
>> <
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>>> 
>> wiki.
>> 
>> 
>>> Best,
>>> Jark
>>> 
>>> 
>>> On Mon, 26 Dec 2022 at 18:06, Hang Ruan <ruanhang1...@gmail.com> wrote:
>>> 
>>>> Hi, thanks for the feedback, Zhu Zhu and Qingsheng.
>>>> After combining everyone's comments, the main concerns and
>> corresponding
>>>> adjustments are as follows.
>>>> 
>>>> Q1: Common metrics are not quite useful.
>>>> numEventsIn and numEventsOut counters will be removed from the
>>>> OperatorCoordinatorMetricGroup. These common metrics do not provide
>>> enough
>>>> information for users. The users are more willing to get the number of
>>>> events of the specified type instead of the total number. And this
>> metric
>>>> is calculated differently. The implementation could register the metric
>>> by
>>>> themselves.
>>>> 
>>>> Q2: This FLIP is overly complicated.
>>>> This FLIP will become concise after these modifications.
>>>> OperatorCoordinatorMetricGroup has already been introduced into Flink
>> by
>>>> FLIP-179<
>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics
>>>>> .
>>>> And
>>>> this FLIP will not change it. This FLIP only provides a new metric
>> option
>>>> and a new metric group scope. The changes in proposed changes provide
>> the
>>>> details about the modifications for the internal classes, which might
>>> make
>>>> it look complicated.
>>>> 
>>>> Thanks for all the comments again. If there are no further comments, we
>>>> plan to start the voting thread this week.
>>>> 
>>>> Best,
>>>> Hang
>>>> 
>>>> Qingsheng Ren <renqs...@gmail.com> 于2022年12月26日周一 16:48写道:
>>>> 
>>>>> Thanks for the FLIP, Hang!
>>>>> 
>>>>> This FLIP overall looks good to me. Actually I share the same concern
>>>> with
>>>>> Zhu that numEventsIn and numEventsOut counters are not quite useful
>> to
>>>> end
>>>>> users. OperatorEvent is a quite low-level abstraction, which requires
>>>>> instantialization in order to be practical to users and developers,
>> so
>>>>> maybe it's better to exclude them from the FLIP.
>>>>> 
>>>>> Best,
>>>>> Qingsheng
>>>>> Ververica (Alibaba)
>>>>> 
>>>>> On Mon, Dec 26, 2022 at 12:08 PM Zhu Zhu <reed...@gmail.com> wrote:
>>>>> 
>>>>>> Hi Hang,
>>>>>> I still see no strong reason why we need numEventsIn/numEventsOut
>>>>> metrics.
>>>>>> In the discussion in FLINK-29801, I can see the same concern from
>>>> others.
>>>>>> So I prefer to exclude them from this FLIP to avoid over-extending
>>> the
>>>>>> scope.
>>>>>> 
>>>>>> Thanks,
>>>>>> Zhu
>>>>>> 
>>>>>> Hang Ruan <ruanhang1...@gmail.com> 于2022年12月23日周五 15:21写道:
>>>>>>> 
>>>>>>> Hi everyone,
>>>>>>> 
>>>>>>> As for the Zhu Zhu's problem, I think we should keep the common
>>>>> metrics,
>>>>>> which will help to observe incoming and outgoing events. What do
>> you
>>>>> think,
>>>>>> @Zhu Zhu ?
>>>>>>> And @Chesnay, are there any other issues you are more concerned
>>>> about?
>>>>>> Looking forward to your reply.
>>>>>>> 
>>>>>>> Thanks for all the comments. If there are no further comments, we
>>>> plan
>>>>>> to start the voting thread next week.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Hang
>>>>>>> 
>>>>>>> Hang Ruan <ruanhang1...@gmail.com> 于2022年12月15日周四 16:49写道:
>>>>>>>> 
>>>>>>>> Hi, Zhu Zhu,
>>>>>>>> 
>>>>>>>> Thanks for your feedback!
>>>>>>>> 
>>>>>>>> The OperatorCoordinator implementations are different. And their
>>>>>> metrics are much different too. We try to find the common metrics
>> and
>>>> put
>>>>>> them in the OperatorCoordinatorMetricGroup. If most developers
>> think
>>> we
>>>>> do
>>>>>> not need these common metrics, removing the common metrics is also
>>>>>> acceptable.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Hang
>>>>>>>> 
>>>>>>>> Zhu Zhu <reed...@gmail.com> 于2022年12月14日周三 22:09写道:
>>>>>>>>> 
>>>>>>>>> Hi Hang & MengYue,
>>>>>>>>> 
>>>>>>>>> Thanks for creating this FLIP!
>>>>>>>>> 
>>>>>>>>> I think it is very useful, mainly in two aspects:
>>>>>>>>> 1. Enables OperatorCoordinators to register metrics. Currently
>>>>>>>>> the coordinators has no way to do this. And operator
>> coordinator
>>>>>>>>> metric group further enables the SplitEnumerator to have access
>>>>>>>>> to a registered metric group (via the existing public interface
>>>>>>>>> SplitEnumeratorContext#metricGroup()), which is null at the
>>> moment.
>>>>>>>>> 
>>>>>>>>> 2. Defines the scope of operator coordinator metrics. A clear
>>>>>> definition
>>>>>>>>> makes it easy for users to find their wanted metrics. The
>>>> definition
>>>>>>>>> also helps to avoid conflicts of metrics from multiple
>>>>>> OperatorCoordinators
>>>>>>>>> of the same kind. E.g. each SourceCoordinator may have its own
>>>>>>>>> numSourceSplits metric, these metrics should not be directly
>>>>> registered
>>>>>>>>> to the job metric group.
>>>>>>>>> 
>>>>>>>>> What I'm a bit concerned is the necessity of the introduced
>>> common
>>>>>> metrics
>>>>>>>>> numEventsInCounter & numEventsOutCounter. If there any case
>> which
>>>>>> strongly
>>>>>>>>> requires them?
>>>>>>>>> 
>>>>>>>>> Regarding the concerns of Chesnay,
>>>>>>>>>> A dedicated coordinator MG implementation is overkill
>>>>>>>>> Directly using the job metric group can result in metric
>>> conflicts,
>>>>> as
>>>>>> mentioned
>>>>>>>>> in above #2.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Zhu
>>>>>>>>> 
>>>>>>>>> Dong Lin <lindon...@gmail.com> 于2022年12月10日周六 14:16写道:
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Hi Chesney,
>>>>>>>>>> 
>>>>>>>>>> Just to double check with you, OperatorCoordinatorMetricGroup
>>>>>> (annotated as
>>>>>>>>>> @PublicEvolving) has already been introduced into Flink by
>>>> FLIP-179
>>>>>>>>>> <
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics
>>>>>>> .
>>>>>>>>>> And that FLIP has got you +1.. Do you mean we should remove
>>> this
>>>>>>>>>> OperatorCoordinatorMetricGroup?
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Dong
>>>>>>>>>> 
>>>>>>>>>> On Sat, Dec 10, 2022 at 1:33 AM Chesnay Schepler <
>>>>> ches...@apache.org>
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> As a whole I feel like this FLIP is overly complicated. A
>>>>> dedicated
>>>>>>>>>>> coordinator MG implementation is overkill; it could just
>>> re-use
>>>>> the
>>>>>>>>>>> existing Task/OperatorMGs to create the same structure we
>>> have
>>>> on
>>>>>> TMs,
>>>>>>>>>>> similar to what we did with the Job MG.
>>>>>>>>>>> 
>>>>>>>>>>> However, I'm not convinced that this is required anyway,
>>>> because
>>>>>> all the
>>>>>>>>>>> example metrics you listed can be implemented on the TM
>> side
>>> +
>>>>>>>>>>> aggregating them in the external metrics backend.
>>>>>>>>>>> 
>>>>>>>>>>> Since I'm on holidays soon, just so no one tries to pull a
>>> fast
>>>>>> one on
>>>>>>>>>>> me, if this were to go to a vote as-is I'd be against it.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 09/12/2022 15:30, Dong Lin wrote:
>>>>>>>>>>>> Hi Hang,
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for the FLIP! The FLIP looks good and it is pretty
>>>>>> informative.
>>>>>>>>>>>> 
>>>>>>>>>>>> I have just two minor comments regarding names:
>>>>>>>>>>>> - Would it be useful to rename the config key as
>>>>>>>>>>>> *metrics.scope.jm.job.operator-coordinator* for
>> consistency
>>>>> with
>>>>>>>>>>>> *metrics.scope.jm.job
>>>>>>>>>>>> *(which is not named as *jm-job)?
>>>>>>>>>>>> - Maybe rename the variable as
>>>>> SCOPE_NAMING_OPERATOR_COORDINATOR
>>>>>> for
>>>>>>>>>>>> simplicity and consistency with SCOPE_NAMING_OPERATOR
>>> (which
>>>> is
>>>>>> not named
>>>>>>>>>>>> as SCOPE_NAMING_TM_JOB_OPERATOR)?
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Dong
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan <
>>>>> ruanhang1...@gmail.com>
>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> MengYue and I created FLIP-274[1] Introduce metric group
>>> for
>>>>>>>>>>>>> OperatorCoordinator. OperatorCoordinator is the
>>> coordinator
>>>>> for
>>>>>> runtime
>>>>>>>>>>>>> operators and running on Job Manager. The coordination
>>>>>> mechanism is
>>>>>>>>>>>>> operator events between OperatorCoordinator and its all
>>>>>> operators, the
>>>>>>>>>>>>> coordination is more and more using in Flink, for
>> example
>>>> many
>>>>>> Sources
>>>>>>>>>>> and
>>>>>>>>>>>>> Sinks depend on the mechanism to assign splits and
>>>> coordinate
>>>>>> commits to
>>>>>>>>>>>>> external systems. The OperatorCoordinator is widely
>> using
>>> in
>>>>>> flink kafka
>>>>>>>>>>>>> connector, flink pulsar connector, flink cdc connector,
>>>> flink
>>>>>> hudi
>>>>>>>>>>>>> connector and so on.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> But there is not a suitable metric group scope for the
>>>>>>>>>>> OperatorCoordinator
>>>>>>>>>>>>> and not an implementation for the interface
>>>>>>>>>>> OperatorCoordinatorMetricGroup.
>>>>>>>>>>>>> These metrics in OperatorCoordinator could be how many
>>>>>> splits/partitions
>>>>>>>>>>>>> have been assigned to source readers, how many files
>> have
>>>> been
>>>>>> written
>>>>>>>>>>> out
>>>>>>>>>>>>> by sink writers, these metrics not only help users to
>> know
>>>> the
>>>>>> job
>>>>>>>>>>> progress
>>>>>>>>>>>>> but also make big job maintaining easier. Thus we
>> propose
>>>> the
>>>>>> FLIP-274
>>>>>>>>>>> to
>>>>>>>>>>>>> introduce a new metric group scope for
>> OperatorCoordinator
>>>> and
>>>>>> provide
>>>>>>>>>>> an
>>>>>>>>>>>>> internal implementation for
>>> OperatorCoordinatorMetricGroup.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Could you help review this FLIP when you get time? Any
>>>>> feedback
>>>>>> is
>>>>>>>>>>>>> appreciated!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Hang
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to