Hi all, I make a mistake about the default value of the scope format JobManagerOperatorScopeFormat. It should be <host>.jobmanager.<job_name>.<operator_name>. I have fixed it in the FLIP. And the variables (task id and task name) have been added to the list.
Best, Hang Hang Ruan <ruanhang1...@gmail.com> 于2023年2月1日周三 14:11写道: > Hi all, > > > IIUC, there is only an OperatorCoordinatorMetricGroup interface without > implementations (it never worked). There were no coordinator metrics > implemented and registered. @Hang Ruan please correct me if I'm wrong. > Yes, we shouldn't have to worry about backwards-compatibility as Chesnay > said. > > After the discussion, I think the FLIP should be modified as follow. > 1. Add new metric group JobManagerOperatorMetricGroup, whose scope > format is JobManagerOperatorScopeFormat. The config key of > JobManagerOperatorScopeFormat > is metrics.scope.jm-operator, and the default value is > <host>.jobmanager.<job_name>.<task_name>.<operator_name>. > 2. JobManagerOperatorMetricGroup will be sub components of > JobManagerJobMetricGroup. > 3. InternalOperatorCoordinatorMetricGroup will extend ProxyMetricGroup, > which registered under JobManagerOperatorMetricGroup by > addGroup("coordinator"). > > I have updated the FLIP-274. @Chesnay Schepler <ches...@apache.org> @Jark > Wu <imj...@gmail.com> , please help to take a look at the modified FLIP > and make sure that I do not miss something. > > If there are no more problems, we plan to start the voting thread later. > > Best, > Hang > > Chesnay Schepler <ches...@apache.org> 于2023年1月31日周二 20:09写道: > >> > IIUC, there is only an OperatorCoordinatorMetricGroup interface without >> implementations (it never worked). There were no coordinator metrics >> implemented and registered. @Hang Ruan <ruanhang1...@gmail.com> please >> correct me if I'm wrong. >> >> Indeed. There are interfaces, and theoretically a metric group is exposed >> to user-code via the SplitEnumeratorContext, but the metric group is always >> null in the SourceCoordinatorContext. So we shouldn't have to worry about >> backwards-compatibility. >> >> > However, the config key "metrics.scope.jm.job" is deprecated and is >> replaced by "metrics.scope.jm-job". Similarly, "metrics.scope.tm.job" is >> replaced with "metrics.scope.tm-job". That's why I though >> "metrics.scope.jm-operator" might be more consistent. >> >> HA! Good catch. I only looked at the 1.16 docs where they weren't >> deprecated yet. >> >> Then it should be "jm-operator" indeed. >> >> On 31/01/2023 10:51, Jark Wu wrote: >> >> Hi Chesnay, >> >> > The new option must default to the configured value of >> metrics.scope.jm.job because otherwise it isn't backwards-compatible with >> existing coordinator metrics. >> IIUC, there is only an OperatorCoordinatorMetricGroup interface without >> implementations (it never worked). There were no coordinator metrics >> implemented and registered. @Hang Ruan <ruanhang1...@gmail.com> please >> correct me if I'm wrong. >> >> > Intuitively I think the config key should be >> metrics.scope.jm.operator., similar to metrics.scope.jm.job. >> However, the config key "metrics.scope.jm.job" is deprecated and is >> replaced by "metrics.scope.jm-job". Similarly, "metrics.scope.tm.job" is >> replaced with "metrics.scope.tm-job". That's why I though >> "metrics.scope.jm-operator" might be more consistent. >> >> > We would only support operator_name, operator_id, task_name and task_id >> (+ everything else from the jm/job groups of course). >> +1 >> >> Best, >> Jark >> >> >> >> On Mon, 30 Jan 2023 at 20:09, Chesnay Schepler <ches...@apache.org> >> wrote: >> >>> The new option must default to the configured value of >>> metrics.scope.jm.job because otherwise it isn't backwards-compatible with >>> existing coordinator metrics. >>> According to some earlier response in this thread there are already >>> coordinator metrics. >>> >>> If that weren't the case you'd be correct, and we'd keep the default >>> consistent with the default for tm operator metrics. >>> >>> Intuitively I think the config key should be metrics.scope.jm.operator., >>> similar to metrics.scope.jm.job. >>> For symmetry we should also add new metrics.scope.tm.operator/task and >>> eventually phase out metrics.scope.operator. >>> >>> > Regarding "coordinator metrics don't *belong to* any tasks/vertexes", >>> I mean the coordinator doesn't run on specific tasks, so it doesn't have >>> runtime information of tasks. >>> >>> That is true. I intentionally only referred to task id/name, as things >>> like subtask index or attempt ids don't make sense for coordinators as you >>> said. >>> We would only support operator_name, operator_id, task_name and task_id >>> (+ everything else from the jm/job groups of course). >>> (Maybe in the future attempt_num could be interesting though) >>> >>> On 30/01/2023 10:39, Jark Wu wrote: >>> >>> Hi Chesnay, >>> >>> IIUC, our only disagreement is whether to support task info in the >>> metric format. But we all agree with the default metric format >>> "metrics.scope.jm-operator=<host>.jobmanager.<job_name>.<operator_name>" >>> which doesn't include task info, right? >>> >>> Regarding "coordinator metrics don't *belong to* any tasks/vertexes", I >>> mean the coordinator doesn't run on specific tasks, so it doesn't have >>> runtime information of tasks. My concern about adding task info to the >>> coordinator metric is that >>> 1) if we register coordinator metrics for every subtask, the number of >>> coordinator metrics will be exploded when task parallelism is large. This >>> is an overhead to JM & metric systems, and the duplicated metrics are >>> useless for users. >>> 2) Correct me if I'm wrong, MetricGroup#counter only supports >>> registering one metric. It seems it can't register a metric reporting for >>> multiple metric names (e.g. multiple subtasks). >>> >>> Best, >>> Jark >>> >>> >>> On Mon, 30 Jan 2023 at 16:11, Chesnay Schepler <ches...@apache.org> >>> wrote: >>> >>>> I hadn't looked in detail in to how the task info can be introduced, >>>> but >>>> given that the coordinates are created by the execution graph, where we >>>> only work on tasks, it should be possible and rather simple. >>>> >>>> The OperatorCoordinatorHolder gets access to the ExecutionJobVertex >>>> from >>>> which it can extract everything and create the task/operator groups. >>>> Exposure to the coordinator could happen via the coordinator context as >>>> originally planned. >>>> >>>> > As we know, coordinator metrics are just like JMJobMetricGroup, >>>> which >>>> don't belong to any tasks/vertexes. >>>> >>>> I don't think this is true, every coordinator is scoped to a specific >>>> operator, and every operator is associated with a particular vertex. >>>> >>>> From the javadocs: >>>> >>>> " A coordinator for runtime operators. The OperatorCoordinator runs on >>>> the master, associated with >>>> the job vertex of the operator. It communicates with operators via >>>> sending operator events." >>>> >>>> On 28/01/2023 09:15, Jark Wu wrote: >>>> > Hi all, >>>> > >>>> > IIUC, Chesnay means we should have a more general metric group for >>>> > **operator** >>>> > not for **coordinator** in JM, this would be useful to extend >>>> > other operator-specific >>>> > metrics in the future. That means the new scope format should be >>>> designed >>>> > for >>>> > the operator, >>>> > e.g., >>>> metrics.scope.jm-operator=<host>.jobmanager.<job_name>.<operator_name> >>>> > The coordinator metric is a subgroup (a constant "coordinator" >>>> suffix) of >>>> > the JMOperatorMG. >>>> > >>>> > I think this is a nice design. However, I have a question about >>>> adding task >>>> > id/name to this list. >>>> > How to get the task id/name when reporting a coordinator metric? As we >>>> > know, coordinator metrics >>>> > are just like JMJobMetricGroup, which don't belong to any >>>> tasks/vertexes. >>>> > Do you mean reporting >>>> > coordinator metrics for every task id under the operator? >>>> > >>>> > Best, >>>> > Jark >>>> > >>>> > >>>> > On Thu, 19 Jan 2023 at 17:33, Chesnay Schepler <ches...@apache.org> >>>> wrote: >>>> > >>>> >> > First, I do not understand why users have to configure the new >>>> scope >>>> >> format, which has a default value. >>>> >> >>>> >> If you don't use scope formats, sure. If you do use scope formats, >>>> e.g. >>>> >> to add a common prefix (which is the case for datadog users for >>>> >> example), then the current default in the FLIP is insufficient and >>>> >> requires the user to update the configuration. >>>> >> >>>> >> > I usually do not need to change these configuration of scope >>>> formats >>>> >> when submitting the flink job. >>>> >> >>>> >> Scope formats as a whole are quite a power-user feature, but that >>>> >> doesn't mean we should ignore it. >>>> >> >>>> >> > I try to let it extend the ComponentMetricGroup >>>> >> >>>> >> This isn't inherently required just because it is a "component". >>>> >> Component metric groups should _only_ be used for cases where such a >>>> >> component provides several bits of metadata. >>>> >> For example, tasks provide vertex ids, task names, attempted IDs >>>> etc., >>>> >> and we'd like users to have the option on how this metadata ends up >>>> >> being used (via scope formats). This currently can't be built with >>>> the >>>> >> addGroup() methods, hence why the component groups exist. >>>> >> However, the operator coordinator _itself_does not provide several >>>> bits >>>> >> of metadata. Logically, the _operator_ does, not the coordinator. >>>> >> >>>> >> > This make me decide to add a new scope format. >>>> >> >>>> >> And this is fine; just don't add a new scope format for the >>>> coordinator >>>> >> but logically for operators on the JM side, such that we can extend >>>> the >>>> >> set of operator-specific metrics in the future without having to add >>>> yet >>>> >> another scope format. >>>> >> >>>> >> > The relationship between the Operator and OperatorCoordinator is >>>> >> maintained in the OperatorCoordinatorHolder. In the POC, the operator >>>> >> name/id could be found and the OperatorCoordinatorMetricGroup will be >>>> >> created here. >>>> >> >>>> >> That's all irrelevant. >>>> >> >>>> >> > The registered metrics of OperatorCoordinator always are those >>>> which >>>> >> can not be aggregated from the tasks, like the number of unassigned >>>> >> splits in the SourceCoordinator. Actually we have not encountered the >>>> >> scenario you mentioned. I think the OperatorCoordinatorMetricGroup >>>> >> should only contain the metrics for itself instead of its subtasks. >>>> >> >>>> >> You are misunderstanding the problem. >>>> >> >>>> >> This isn't about aggregating metrics on our side or exposing metrics >>>> >> from TMs on the JM side or anything like that. >>>> >> >>>> >> It's purely about the available metadata for operator metrics on the >>>> JM >>>> >> side. Currently you suggest operator name / id; what I'm proposing >>>> is to >>>> >> add task(==vertex!) id / name to this list. >>>> >> >>>> >> The benefit here is simple. I can look up all metrics where >>>> task_id==XXX >>>> >> and get _everything_ related to that vertex, including all metrics >>>> >> associated with operators that are part of that vertex, including the >>>> >> coordinator. >>>> >> >>>> >> > Using metrics.scope.jm.job is not enough to distinguish the >>>> different >>>> >> coordinators. >>>> >> >>>> >> I did not suggest just using metrics.scope.jm.job. I suggested that >>>> as >>>> >> the default for the jm operator scope format, that is used for the >>>> >> proposed JobManagerOperatorMetricGroup, which will have some plain >>>> >> metric groups as children for coordinators. >>>> >> Aka, you have the JMOperatorMG, then you call >>>> addGroup("coordinators") >>>> >> and then addGroup(coordinator_name) for each coordinator. >>>> >> >>>> >> > So there are two choice for the >>>> >> InternalOperatorCoordinatorMetricGroup: 1. add and improve the new >>>> scope >>>> >> format; 2. use the metrics.scope.jm.job and ProxyMetricGroup. Which >>>> one >>>> >> is better? >>>> >> >>>> >> There are more options than that. >>>> >> I feel like there are a lot of misunderstandings in this discussion. >>>> >> Please let me know if I could clear things up. If not I can also >>>> provide >>>> >> a PoC based on your PoC if that can speed things up. It's not that >>>> >> different anyway. >>>> >> >>>> >> On 19/01/2023 07:39, Hang Ruan wrote: >>>> >>> Hi, chesnay, >>>> >>> >>>> >>> Thanks for your reply. I still have some doubts about the questions >>>> >>> you raised. >>>> >>> >>>> >>> > Extending the set of ScopeFormats is problematic because it >>>> in >>>> >>> practice >>>> >>> it breaks the config if users actively rely on it, since >>>> there's now >>>> >>> another key that they _must_ set for it to be >>>> >>> consistent/compatible with >>>> >>> their existing setup. >>>> >>> Unfortunately due to how powerful scope formats are we can't >>>> derive a >>>> >>> default value that matches their existing setup. >>>> >>> Hence we should try to do this as rarely as possible. >>>> >>> >>>> >>> >>>> >>> > This FLIP does not adhere to that since it proposes a >>>> dedicated >>>> >>> format >>>> >>> for coordinators; next time we want to expose operator-specific >>>> >>> metrics >>>> >>> (e.g., in the scheduler) we'd have to add another one to >>>> support it. >>>> >>> >>>> >>> >>>> >>> First, I do not understand why users have to configure the new scope >>>> >>> format, which has a default value. I usually do not need to change >>>> >>> these configuration of scope formats when submitting the flink job. >>>> >>> IMO, the OperatorCoordinator is a component running at the >>>> JobMaster. >>>> >>> I try to let it extend the ComponentMetricGroup, but I cannot find >>>> >>> a suitable scope. This make me decide to add a new scope format. >>>> >>> >>>> >>> > Additionally, the configurable variables (operator name/id) >>>> are >>>> >>> logically not attached to the coordinator, but operators, so >>>> to me it >>>> >>> just doesn't make sense to structure it like this. >>>> >>> >>>> >>> >>>> >>> The relationship between the Operator and OperatorCoordinator is >>>> >>> maintained in the OperatorCoordinatorHolder. In the POC, the >>>> operator >>>> >>> name/id could be found and the OperatorCoordinatorMetricGroup will >>>> be >>>> >>> created here. >>>> >>> >>>> >>> > Another thing I'm concerned about is that, because we don't >>>> include >>>> >>> tasks in the hierarchy, users wishing to collect all metrics >>>> for a >>>> >>> particular task (in this case ==vertex) now have to go >>>> >>> significantly out >>>> >>> of their way to get them, since they can no longer just filter >>>> by the >>>> >>> task ID but have to be filter for _all_ operators that are >>>> part of >>>> >>> the task. >>>> >>> >>>> >>> >>>> >>> The registered metrics of OperatorCoordinator always are those which >>>> >>> can not be aggregated from the tasks, like the number of unassigned >>>> >>> splits in the SourceCoordinator. Actually we have not encountered >>>> the >>>> >>> scenario you mentioned. I think the OperatorCoordinatorMetricGroup >>>> >>> should only contain the metrics for itself instead of its subtasks. >>>> >>> >>>> >>> Using metrics.scope.jm.job is not enough to distinguish the >>>> different >>>> >>> coordinators. The operator id/name is necessary. Then >>>> >>> the implementation should be like this. But users can not change the >>>> >>> scope in this way. This is also acceptable. >>>> >>> @Internal public class InternalOperatorCoordinatorMetricGroup >>>> >>> extends ProxyMetricGroup<MetricGroup> >>>> >>> implements OperatorCoordinatorMetricGroup { >>>> >>> public InternalOperatorCoordinatorMetricGroup( >>>> >>> JobManagerJobMetricGroup parent, OperatorID >>>> operatorID, >>>> >> String operatorName) { >>>> >>> super(parent.addGroup(operatorID + >>>> >> operatorName).addGroup("coordinator")); } >>>> >>> } >>>> >>> So there are two choice for the >>>> >>> InternalOperatorCoordinatorMetricGroup: 1. add and improve the new >>>> >>> scope format; 2. use the metrics.scope.jm.job and ProxyMetricGroup. >>>> >>> Which one is better? >>>> >>> >>>> >>> Best, >>>> >>> Hang >>>> >>> >>>> >>> >>>> >>> Chesnay Schepler <ches...@apache.org> 于2023年1月18日周三 17:03写道: >>>> >>> >>>> >>> You're misunderstanding the problem. >>>> >>> >>>> >>> Metric groups form a tree with each group providing certain >>>> metadata. >>>> >>> >>>> >>> E.g., on the taskmanager we have a TM metric group that >>>> provides info >>>> >>> about the TM, that has child task metric groups, that have >>>> child >>>> >>> operator metric groups etc. The operator metric group again has >>>> >>> children, where we are now mostly in the user/connector land, >>>> like a >>>> >>> kafka metric group providing partition info or something. >>>> >>> >>>> >>> What is being proposed is to create a coordinator metric group >>>> on >>>> >>> the JM >>>> >>> side that provides operator metadata. >>>> >>> >>>> >>> A more appropriate structure is to create an operator group on >>>> the JM >>>> >>> side that provides this info, with the coordinator metric group >>>> >>> being a >>>> >>> child. >>>> >>> >>>> >>> On 17/01/2023 19:56, Steven Wu wrote: >>>> >>> >> Additionally, the configurable variables (operator name/id) >>>> are >>>> >>> > logically not attached to the coordinator, but operators, so >>>> to >>>> >>> me it >>>> >>> > just doesn't make sense to structure it like this. >>>> >>> > >>>> >>> > Chesnay, maybe we should clarify the terminology. To me, >>>> >>> pperators (like >>>> >>> > FLIP-27 source) can have two parts (coordinator and >>>> >>> reader/subtask). I >>>> >>> > think it is fine to include operator name/id for coordinator >>>> >>> metrics. >>>> >>> > >>>> >>> > On Mon, Jan 16, 2023 at 2:13 AM Chesnay Schepler >>>> >>> <ches...@apache.org> wrote: >>>> >>> > >>>> >>> >> Slight correction: Using metrics.scope.jm.job as the default >>>> >>> should be >>>> >>> >> safe. >>>> >>> >> >>>> >>> >> On 16/01/2023 10:18, Chesnay Schepler wrote: >>>> >>> >>> The proposed ScopeFormat is still problematic for a few >>>> reasons. >>>> >>> >>> >>>> >>> >>> Extending the set of ScopeFormats is problematic because >>>> it in >>>> >>> >>> practice it breaks the config if users actively rely on >>>> it, since >>>> >>> >>> there's now another key that they _must_ set for it to be >>>> >>> >>> consistent/compatible with their existing setup. >>>> >>> >>> Unfortunately due to how powerful scope formats are we >>>> can't >>>> >>> derive a >>>> >>> >>> default value that matches their existing setup. >>>> >>> >>> Hence we should try to do this as rarely as possible. >>>> >>> >>> >>>> >>> >>> This FLIP does not adhere to that since it proposes a >>>> >>> dedicated format >>>> >>> >>> for coordinators; next time we want to expose >>>> operator-specific >>>> >>> >>> metrics (e.g., in the scheduler) we'd have to add another >>>> one to >>>> >>> >>> support it. >>>> >>> >>> >>>> >>> >>> Additionally, the configurable variables (operator >>>> name/id) are >>>> >>> >>> logically not attached to the coordinator, but operators, >>>> so >>>> >>> to me it >>>> >>> >>> just doesn't make sense to structure it like this. >>>> >>> >>> >>>> >>> >>> Another thing I'm concerned about is that, because we don't >>>> >>> include >>>> >>> >>> tasks in the hierarchy, users wishing to collect all >>>> metrics for >>>> >> a >>>> >>> >>> particular task (in this case ==vertex) now have to go >>>> >>> significantly >>>> >>> >>> out of their way to get them, since they can no longer just >>>> >>> filter by >>>> >>> >>> the task ID but have to be filter for _all_ operators that >>>> are >>>> >>> part of >>>> >>> >>> the task. >>>> >>> >>> >>>> >>> >>> On 16/01/2023 03:09, Hang Ruan wrote: >>>> >>> >>>> Hi, @ches...@apache.org <ches...@apache.org> , >>>> >>> >>>> >>>> >>> >>>> Do you have time to help to review this FLIP again after >>>> the >>>> >>> >>>> modification? >>>> >>> >>>> Looking forward to your reply. >>>> >>> >>>> This FLIP will add a new configuration for the >>>> >>> >>>> OperatorCoordinatorMetricGroup scope format. It provides >>>> an >>>> >>> internal >>>> >>> >>>> implementation and is added as a component to the >>>> >>> >>>> JobManagerJobMetricGroup. >>>> >>> >>>> If something doesn't make sense, could you provide some >>>> >>> advice? It >>>> >>> >>>> will be >>>> >>> >>>> very helpful. Thanks a lot for your help. >>>> >>> >>>> >>>> >>> >>>> Best, >>>> >>> >>>> Hang >>>> >>> >>>> >>>> >>> >>>> Martijn Visser <martijnvis...@apache.org> 于2023年1月11日周三 >>>> >>> 16:34写道: >>>> >>> >>>> >>>> >>> >>>> >>>> >>> >>>>> Hi Hang, >>>> >>> >>>>> >>>> >>> >>>>> I'm a bit surprised that this has gone to a vote, given >>>> that >>>> >>> Chesnay >>>> >>> >>>>> deliberately mentioned that he would vote against it >>>> as-is. >>>> >>> I would >>>> >>> >>>>> expect >>>> >>> >>>>> that before going to a vote, he has had the opportunity >>>> to >>>> >>> >>>>> participate in >>>> >>> >>>>> this discussion. >>>> >>> >>>>> >>>> >>> >>>>> Best regards, >>>> >>> >>>>> >>>> >>> >>>>> Martijn >>>> >>> >>>>> >>>> >>> >>>>> On Tue, Jan 3, 2023 at 12:53 PM Jark Wu < >>>> imj...@gmail.com> >>>> >>> wrote: >>>> >>> >>>>> >>>> >>> >>>>>> Hi Dong, >>>> >>> >>>>>> >>>> >>> >>>>>> Regarding “SplitEnumeratorContext#metricGroup”, my only >>>> >>> concern is >>>> >>> >>>>>> that >>>> >>> >>>>>> this is a core interface for the FLIP. It’s hard to >>>> tell how >>>> >>> >>>>>> sources use >>>> >>> >>>>>> metric group without mentioning this interface. Even if >>>> >>> this is an >>>> >>> >>>>> existing >>>> >>> >>>>>> API, I think it’s worth introducing the interface again >>>> and >>>> >>> declaring >>>> >>> >>>>> that >>>> >>> >>>>>> we will implement the interface instead of a no-op >>>> method >>>> >>> in this >>>> >>> >>>>>> FLIP. >>>> >>> >>>>>> >>>> >>> >>>>>> Anyway, this is a minor problem and shouldn’t block this >>>> >>> FLIP. I’m >>>> >>> >>>>>> +1 to >>>> >>> >>>>>> start a vote. >>>> >>> >>>>>> >>>> >>> >>>>>> Best, >>>> >>> >>>>>> Jark >>>> >>> >>>>>> >>>> >>> >>>>>> >>>> >>> >>>>>>> 2023年1月3日 10:03,Hang Ruan <ruanhang1...@gmail.com> 写道: >>>> >>> >>>>>>> >>>> >>> >>>>>>> Hi, Jark and Dong, >>>> >>> >>>>>>> >>>> >>> >>>>>>> Thanks for your comments. Sorry for my late reply. >>>> >>> >>>>>>> >>>> >>> >>>>>>> For suggestion 1, I plan to implement the >>>> >>> >>>>>>> SplitEnumeratorMetricGroup in >>>> >>> >>>>>>> another issue, and it is not contained in this FLIP. I >>>> >>> will add some >>>> >>> >>>>>>> description about this part. >>>> >>> >>>>>>> For suggestion 2, changes about >>>> >>> OperatorCoordinator#metricGroup has >>>> >>> >>>>>> already >>>> >>> >>>>>>> been documented in the proposed change section. >>>> >>> >>>>>>> >>>> >>> >>>>>>> Best, >>>> >>> >>>>>>> Hang >>>> >>> >>>>>>> >>>> >>> >>>>>>> Dong Lin <lindon...@gmail.com> 于2023年1月1日周日 09:45写道: >>>> >>> >>>>>>> >>>> >>> >>>>>>>> Let me chime-in and add comments regarding the public >>>> >>> interface >>>> >>> >>>>> section. >>>> >>> >>>>>>>> Please see my comments inline. >>>> >>> >>>>>>>> >>>> >>> >>>>>>>> On Thu, Dec 29, 2022 at 6:08 PM Jark Wu >>>> >>> <imj...@gmail.com> wrote: >>>> >>> >>>>>>>> >>>> >>> >>>>>>>>> Hi Hang, >>>> >>> >>>>>>>>> >>>> >>> >>>>>>>>> Thanks for driving this discussion. I think this is a >>>> >>> very useful >>>> >>> >>>>>> feature >>>> >>> >>>>>>>>> for connectors. >>>> >>> >>>>>>>>> >>>> >>> >>>>>>>>> The FLIP looks quite good to me, and I just have two >>>> >>> suggestions. >>>> >>> >>>>>>>>> >>>> >>> >>>>>>>>> 1. In the "Public Interface" section, mention that >>>> the >>>> >>> >>>>>>>>> implementation >>>> >>> >>>>>>>>> behavior of "SplitEnumeratorContext#metricGroup" is >>>> >>> changed from >>>> >>> >>>>>>>> returning >>>> >>> >>>>>>>>> null to returning a concrete >>>> SplitEnumeratorMetricGroup >>>> >>> instance. >>>> >>> >>>>> Even >>>> >>> >>>>>>>>> though the API is already there, the behavior change >>>> can >>>> >>> also be >>>> >>> >>>>>>>> considered >>>> >>> >>>>>>>>> a public change. >>>> >>> >>>>>>>>> >>>> >>> >>>>>>>> SplitEnumeratorContext#metricGroup is an interface and >>>> >>> this FLIP >>>> >>> >>>>>>>> does >>>> >>> >>>>>> not >>>> >>> >>>>>>>> seem to change its semantics/behavior. The FLIP does >>>> >>> change the >>>> >>> >>>>>>>> implementation/behavior of >>>> >>> SourceCoordinatorContext#metricGroup, >>>> >>> >>>>>>>> which >>>> >>> >>>>>> is >>>> >>> >>>>>>>> marked @Internal. >>>> >>> >>>>>>>> >>>> >>> >>>>>>>> Thus it might seem a bit weird to add in the public >>>> >> interface >>>> >>> >>>>>>>> section >>>> >>> >>>>>>>> saying that we change the interface >>>> >>> >>>>>>>> SplitEnumeratorContext#metricGroup >>>> >>> >>>>>> from >>>> >>> >>>>>>>> returning null to non-null object. >>>> >>> >>>>>>>> >>>> >>> >>>>>>>> 2. Mention the newly added interface of >>>> >>> >>>>>> "OperatorCoordinator#metricGroup" >>>> >>> >>>>>>>>> in the "Proposed Changes" section or "Public >>>> Interface" >>>> >>> section. As >>>> >>> >>>>> the >>>> >>> >>>>>>>>> FLIP said, OperatorCoordinator is widely used in many >>>> >>> connectors. >>>> >>> >>>>>> Though >>>> >>> >>>>>>>> it >>>> >>> >>>>>>>>> is still an @Internal API, I think it is worth >>>> >>> mentioning the >>>> >>> >>>>>>>>> change >>>> >>> >>>>> in >>>> >>> >>>>>>>> the >>>> >>> >>>>>>>>> FLIP. >>>> >>> >>>>>>>> Since OperatorCoordinator is an internal API, it seems >>>> >>> reasonable to >>>> >>> >>>>>>>> explain it in the proposed change section. The FLIP >>>> seems >>>> >>> to have >>>> >>> >>>>>>>> documented this in point 5 of the proposed change >>>> section. >>>> >>> >>>>>>>> >>>> >>> >>>>>>>> BTW, if we think there are @internal classes that are >>>> >>> important >>>> >>> >>>>>>>> enough >>>> >>> >>>>>> to >>>> >>> >>>>>>>> be added in the public interface section, it might be >>>> >>> useful to >>>> >>> >>>>>> explicitly >>>> >>> >>>>>>>> discuss this topic and document it in the "*What are >>>> the >>>> >>> "public >>>> >>> >>>>>>>> interfaces" of the project*" in this >>>> >>> >>>>>>>> < >>>> >>> >>>>>>>> >>>> >>> >> >>>> >>> >>>> >> >>>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals >>>> >>> >>>>>>>> wiki. >>>> >>> >>>>>>>> >>>> >>> >>>>>>>> >>>> >>> >>>>>>>>> Best, >>>> >>> >>>>>>>>> Jark >>>> >>> >>>>>>>>> >>>> >>> >>>>>>>>> >>>> >>> >>>>>>>>> On Mon, 26 Dec 2022 at 18:06, Hang Ruan >>>> >>> <ruanhang1...@gmail.com> >>>> >>> >>>>>> wrote: >>>> >>> >>>>>>>>>> Hi, thanks for the feedback, Zhu Zhu and Qingsheng. >>>> >>> >>>>>>>>>> After combining everyone's comments, the main >>>> concerns and >>>> >>> >>>>>>>> corresponding >>>> >>> >>>>>>>>>> adjustments are as follows. >>>> >>> >>>>>>>>>> >>>> >>> >>>>>>>>>> Q1: Common metrics are not quite useful. >>>> >>> >>>>>>>>>> numEventsIn and numEventsOut counters will be >>>> removed >>>> >>> from the >>>> >>> >>>>>>>>>> OperatorCoordinatorMetricGroup. These common >>>> metrics do >>>> >> not >>>> >>> >>>>>>>>>> provide >>>> >>> >>>>>>>>> enough >>>> >>> >>>>>>>>>> information for users. The users are more willing to >>>> >>> get the >>>> >>> >>>>>>>>>> number >>>> >>> >>>>> of >>>> >>> >>>>>>>>>> events of the specified type instead of the total >>>> >>> number. And this >>>> >>> >>>>>>>> metric >>>> >>> >>>>>>>>>> is calculated differently. The implementation could >>>> >>> register the >>>> >>> >>>>>> metric >>>> >>> >>>>>>>>> by >>>> >>> >>>>>>>>>> themselves. >>>> >>> >>>>>>>>>> >>>> >>> >>>>>>>>>> Q2: This FLIP is overly complicated. >>>> >>> >>>>>>>>>> This FLIP will become concise after these >>>> modifications. >>>> >>> >>>>>>>>>> OperatorCoordinatorMetricGroup has already been >>>> >>> introduced into >>>> >>> >>>>> Flink >>>> >>> >>>>>>>> by >>>> >>> >>>>>>>>>> FLIP-179< >>>> >>> >>>>>>>>>> >>>> >>> >>>>>>>>>> >>>> >>> >> >>>> >>> >>>> >> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics >>>> >>> >>>>>>>>>>> . >>>> >>> >>>>>>>>>> And >>>> >>> >>>>>>>>>> this FLIP will not change it. This FLIP only >>>> provides a >>>> >>> new metric >>>> >>> >>>>>>>> option >>>> >>> >>>>>>>>>> and a new metric group scope. The changes in >>>> proposed >>>> >>> changes >>>> >>> >>>>> provide >>>> >>> >>>>>>>> the >>>> >>> >>>>>>>>>> details about the modifications for the internal >>>> >>> classes, which >>>> >>> >>>>> might >>>> >>> >>>>>>>>> make >>>> >>> >>>>>>>>>> it look complicated. >>>> >>> >>>>>>>>>> >>>> >>> >>>>>>>>>> Thanks for all the comments again. If there are no >>>> further >>>> >>> >>>>>>>>>> comments, >>>> >>> >>>>>> we >>>> >>> >>>>>>>>>> plan to start the voting thread this week. >>>> >>> >>>>>>>>>> >>>> >>> >>>>>>>>>> Best, >>>> >>> >>>>>>>>>> Hang >>>> >>> >>>>>>>>>> >>>> >>> >>>>>>>>>> Qingsheng Ren <renqs...@gmail.com> 于2022年12月26日周一 >>>> >>> 16:48写道: >>>> >>> >>>>>>>>>> >>>> >>> >>>>>>>>>> >>>> >>> >>>>>>>>>>> Thanks for the FLIP, Hang! >>>> >>> >>>>>>>>>>> >>>> >>> >>>>>>>>>>> This FLIP overall looks good to me. Actually I >>>> share >>>> >>> the same >>>> >>> >>>>> concern >>>> >>> >>>>>>>>>> with >>>> >>> >>>>>>>>>>> Zhu that numEventsIn and numEventsOut counters are >>>> not >>>> >>> quite >>>> >>> >>>>>>>>>>> useful >>>> >>> >>>>>>>> to >>>> >>> >>>>>>>>>> end >>>> >>> >>>>>>>>>>> users. OperatorEvent is a quite low-level >>>> abstraction, >>>> >>> which >>>> >>> >>>>> requires >>>> >>> >>>>>>>>>>> instantialization in order to be practical to >>>> users and >>>> >>> >>>>>>>>>>> developers, >>>> >>> >>>>>>>> so >>>> >>> >>>>>>>>>>> maybe it's better to exclude them from the FLIP. >>>> >>> >>>>>>>>>>> >>>> >>> >>>>>>>>>>> Best, >>>> >>> >>>>>>>>>>> Qingsheng >>>> >>> >>>>>>>>>>> Ververica (Alibaba) >>>> >>> >>>>>>>>>>> >>>> >>> >>>>>>>>>>> On Mon, Dec 26, 2022 at 12:08 PM Zhu Zhu >>>> >>> <reed...@gmail.com> >>>> >>> >>>>> wrote: >>>> >>> >>>>>>>>>>>> Hi Hang, >>>> >>> >>>>>>>>>>>> I still see no strong reason why we need >>>> >>> >>>>>>>>>>>> numEventsIn/numEventsOut >>>> >>> >>>>>>>>>>> metrics. >>>> >>> >>>>>>>>>>>> In the discussion in FLINK-29801, I can see the >>>> same >>>> >>> concern >>>> >>> >>>>>>>>>>>> from >>>> >>> >>>>>>>>>> others. >>>> >>> >>>>>>>>>>>> So I prefer to exclude them from this FLIP to >>>> avoid >>>> >>> >>>>>>>>>>>> over-extending >>>> >>> >>>>>>>>> the >>>> >>> >>>>>>>>>>>> scope. >>>> >>> >>>>>>>>>>>> >>>> >>> >>>>>>>>>>>> Thanks, >>>> >>> >>>>>>>>>>>> Zhu >>>> >>> >>>>>>>>>>>> >>>> >>> >>>>>>>>>>>> Hang Ruan <ruanhang1...@gmail.com> 于2022年12月23日周五 >>>> >>> >>>>>>>>>>>> 15:21写道: >>>> >>> >>>>>>>>>>>>> Hi everyone, >>>> >>> >>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>> As for the Zhu Zhu's problem, I think we should >>>> keep >>>> >>> the common >>>> >>> >>>>>>>>>>> metrics, >>>> >>> >>>>>>>>>>>> which will help to observe incoming and outgoing >>>> >>> events. What do >>>> >>> >>>>>>>> you >>>> >>> >>>>>>>>>>> think, >>>> >>> >>>>>>>>>>>> @Zhu Zhu ? >>>> >>> >>>>>>>>>>>>> And @Chesnay, are there any other issues you are >>>> >>> more concerned >>>> >>> >>>>>>>>>> about? >>>> >>> >>>>>>>>>>>> Looking forward to your reply. >>>> >>> >>>>>>>>>>>>> Thanks for all the comments. If there are no >>>> further >>>> >>> >>>>>>>>>>>>> comments, we >>>> >>> >>>>>>>>>> plan >>>> >>> >>>>>>>>>>>> to start the voting thread next week. >>>> >>> >>>>>>>>>>>>> Best, >>>> >>> >>>>>>>>>>>>> Hang >>>> >>> >>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>> Hang Ruan <ruanhang1...@gmail.com> >>>> 于2022年12月15日周四 >>>> >>> >>>>>>>>>>>>> 16:49写道: >>>> >>> >>>>>>>>>>>>>> Hi, Zhu Zhu, >>>> >>> >>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>> Thanks for your feedback! >>>> >>> >>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>> The OperatorCoordinator implementations are >>>> >>> different. And >>>> >>> >>>>>>>>>>>>>> their >>>> >>> >>>>>>>>>>>> metrics are much different too. We try to find the >>>> >> common >>>> >>> >>>>>>>>>>>> metrics >>>> >>> >>>>>>>> and >>>> >>> >>>>>>>>>> put >>>> >>> >>>>>>>>>>>> them in the OperatorCoordinatorMetricGroup. If >>>> most >>>> >>> developers >>>> >>> >>>>>>>> think >>>> >>> >>>>>>>>> we >>>> >>> >>>>>>>>>>> do >>>> >>> >>>>>>>>>>>> not need these common metrics, removing the common >>>> >>> metrics is >>>> >>> >>>>>>>>>>>> also >>>> >>> >>>>>>>>>>>> acceptable. >>>> >>> >>>>>>>>>>>>>> Best, >>>> >>> >>>>>>>>>>>>>> Hang >>>> >>> >>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>> Zhu Zhu <reed...@gmail.com> 于2022年12月14日周三 >>>> >>> 22:09写道: >>>> >>> >>>>>>>>>>>>>>> Hi Hang & MengYue, >>>> >>> >>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>> Thanks for creating this FLIP! >>>> >>> >>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>> I think it is very useful, mainly in two >>>> aspects: >>>> >>> >>>>>>>>>>>>>>> 1. Enables OperatorCoordinators to register >>>> metrics. >>>> >>> >>>>>>>>>>>>>>> Currently >>>> >>> >>>>>>>>>>>>>>> the coordinators has no way to do this. And >>>> operator >>>> >>> >>>>>>>> coordinator >>>> >>> >>>>>>>>>>>>>>> metric group further enables the >>>> SplitEnumerator >>>> >>> to have >>>> >>> >>>>>>>>>>>>>>> access >>>> >>> >>>>>>>>>>>>>>> to a registered metric group (via the existing >>>> public >>>> >>> >>>>>>>>>>>>>>> interface >>>> >>> >>>>>>>>>>>>>>> SplitEnumeratorContext#metricGroup()), which is >>>> >>> null at the >>>> >>> >>>>>>>>> moment. >>>> >>> >>>>>>>>>>>>>>> 2. Defines the scope of operator coordinator >>>> >>> metrics. A clear >>>> >>> >>>>>>>>>>>> definition >>>> >>> >>>>>>>>>>>>>>> makes it easy for users to find their wanted >>>> >>> metrics. The >>>> >>> >>>>>>>>>> definition >>>> >>> >>>>>>>>>>>>>>> also helps to avoid conflicts of metrics from >>>> >> multiple >>>> >>> >>>>>>>>>>>> OperatorCoordinators >>>> >>> >>>>>>>>>>>>>>> of the same kind. E.g. each SourceCoordinator >>>> may >>>> >>> have its >>>> >>> >>>>>>>>>>>>>>> own >>>> >>> >>>>>>>>>>>>>>> numSourceSplits metric, these metrics should >>>> not >>>> >>> be directly >>>> >>> >>>>>>>>>>> registered >>>> >>> >>>>>>>>>>>>>>> to the job metric group. >>>> >>> >>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>> What I'm a bit concerned is the necessity of >>>> the >>>> >>> introduced >>>> >>> >>>>>>>>> common >>>> >>> >>>>>>>>>>>> metrics >>>> >>> >>>>>>>>>>>>>>> numEventsInCounter & numEventsOutCounter. If >>>> there >>>> >>> any case >>>> >>> >>>>>>>> which >>>> >>> >>>>>>>>>>>> strongly >>>> >>> >>>>>>>>>>>>>>> requires them? >>>> >>> >>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>> Regarding the concerns of Chesnay, >>>> >>> >>>>>>>>>>>>>>>> A dedicated coordinator MG implementation is >>>> >> overkill >>>> >>> >>>>>>>>>>>>>>> Directly using the job metric group can result >>>> in >>>> >>> metric >>>> >>> >>>>>>>>> conflicts, >>>> >>> >>>>>>>>>>> as >>>> >>> >>>>>>>>>>>> mentioned >>>> >>> >>>>>>>>>>>>>>> in above #2. >>>> >>> >>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>> Thanks, >>>> >>> >>>>>>>>>>>>>>> Zhu >>>> >>> >>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>> Dong Lin <lindon...@gmail.com> 于2022年12月10日周六 >>>> >>> >>>>>>>>>>>>>>> 14:16写道: >>>> >>> >>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>> Hi Chesney, >>>> >>> >>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>> Just to double check with you, >>>> >>> >>>>>>>>>>>>>>>> OperatorCoordinatorMetricGroup >>>> >>> >>>>>>>>>>>> (annotated as >>>> >>> >>>>>>>>>>>>>>>> @PublicEvolving) has already been introduced >>>> into >>>> >>> Flink by >>>> >>> >>>>>>>>>> FLIP-179 >>>> >>> >>>>>>>>>>>>>>>> < >>>> >>> >> >>>> >>> >>>> >> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics >>>> >>> >>>>>>>>>>>>> . >>>> >>> >>>>>>>>>>>>>>>> And that FLIP has got you +1.. Do you mean we >>>> >>> should remove >>>> >>> >>>>>>>>> this >>>> >>> >>>>>>>>>>>>>>>> OperatorCoordinatorMetricGroup? >>>> >>> >>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>> Regards, >>>> >>> >>>>>>>>>>>>>>>> Dong >>>> >>> >>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>> On Sat, Dec 10, 2022 at 1:33 AM Chesnay >>>> Schepler < >>>> >>> >>>>>>>>>>> ches...@apache.org> >>>> >>> >>>>>>>>>>>> wrote: >>>> >>> >>>>>>>>>>>>>>>>> As a whole I feel like this FLIP is overly >>>> >>> complicated. A >>>> >>> >>>>>>>>>>> dedicated >>>> >>> >>>>>>>>>>>>>>>>> coordinator MG implementation is overkill; it >>>> >>> could just >>>> >>> >>>>>>>>> re-use >>>> >>> >>>>>>>>>>> the >>>> >>> >>>>>>>>>>>>>>>>> existing Task/OperatorMGs to create the same >>>> >>> structure we >>>> >>> >>>>>>>>> have >>>> >>> >>>>>>>>>> on >>>> >>> >>>>>>>>>>>> TMs, >>>> >>> >>>>>>>>>>>>>>>>> similar to what we did with the Job MG. >>>> >>> >>>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>>> However, I'm not convinced that this is >>>> required >>>> >>> anyway, >>>> >>> >>>>>>>>>> because >>>> >>> >>>>>>>>>>>> all the >>>> >>> >>>>>>>>>>>>>>>>> example metrics you listed can be >>>> implemented on >>>> >>> the TM >>>> >>> >>>>>>>> side >>>> >>> >>>>>>>>> + >>>> >>> >>>>>>>>>>>>>>>>> aggregating them in the external metrics >>>> backend. >>>> >>> >>>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>>> Since I'm on holidays soon, just so no one >>>> tries >>>> >>> to pull a >>>> >>> >>>>>>>>> fast >>>> >>> >>>>>>>>>>>> one on >>>> >>> >>>>>>>>>>>>>>>>> me, if this were to go to a vote as-is I'd be >>>> >>> against it. >>>> >>> >>>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>>> On 09/12/2022 15:30, Dong Lin wrote: >>>> >>> >>>>>>>>>>>>>>>>>> Hi Hang, >>>> >>> >>>>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>>>> Thanks for the FLIP! The FLIP looks good >>>> and it >>>> >>> is pretty >>>> >>> >>>>>>>>>>>> informative. >>>> >>> >>>>>>>>>>>>>>>>>> I have just two minor comments regarding >>>> names: >>>> >>> >>>>>>>>>>>>>>>>>> - Would it be useful to rename the config >>>> key as >>>> >>> >>>>>>>>>>>>>>>>>> *metrics.scope.jm.job.operator-coordinator* >>>> for >>>> >>> >>>>>>>> consistency >>>> >>> >>>>>>>>>>> with >>>> >>> >>>>>>>>>>>>>>>>>> *metrics.scope.jm.job >>>> >>> >>>>>>>>>>>>>>>>>> *(which is not named as *jm-job)? >>>> >>> >>>>>>>>>>>>>>>>>> - Maybe rename the variable as >>>> >>> >>>>>>>>>>> SCOPE_NAMING_OPERATOR_COORDINATOR >>>> >>> >>>>>>>>>>>> for >>>> >>> >>>>>>>>>>>>>>>>>> simplicity and consistency with >>>> >>> SCOPE_NAMING_OPERATOR >>>> >>> >>>>>>>>> (which >>>> >>> >>>>>>>>>> is >>>> >>> >>>>>>>>>>>> not named >>>> >>> >>>>>>>>>>>>>>>>>> as SCOPE_NAMING_TM_JOB_OPERATOR)? >>>> >>> >>>>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>>>> Cheers, >>>> >>> >>>>>>>>>>>>>>>>>> Dong >>>> >>> >>>>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>>>> On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan < >>>> >>> >>>>>>>>>>> ruanhang1...@gmail.com> >>>> >>> >>>>>>>>>>>> wrote: >>>> >>> >>>>>>>>>>>>>>>>>>> Hi all, >>>> >>> >>>>>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>>>>> MengYue and I created FLIP-274[1] Introduce >>>> >>> metric group >>>> >>> >>>>>>>>> for >>>> >>> >>>>>>>>>>>>>>>>>>> OperatorCoordinator. OperatorCoordinator >>>> is the >>>> >>> >>>>>>>>> coordinator >>>> >>> >>>>>>>>>>> for >>>> >>> >>>>>>>>>>>> runtime >>>> >>> >>>>>>>>>>>>>>>>>>> operators and running on Job Manager. The >>>> >>> coordination >>>> >>> >>>>>>>>>>>> mechanism is >>>> >>> >>>>>>>>>>>>>>>>>>> operator events between OperatorCoordinator >>>> >>> and its all >>>> >>> >>>>>>>>>>>> operators, the >>>> >>> >>>>>>>>>>>>>>>>>>> coordination is more and more using in >>>> Flink, for >>>> >>> >>>>>>>> example >>>> >>> >>>>>>>>>> many >>>> >>> >>>>>>>>>>>> Sources >>>> >>> >>>>>>>>>>>>>>>>> and >>>> >>> >>>>>>>>>>>>>>>>>>> Sinks depend on the mechanism to assign >>>> splits >>>> >> and >>>> >>> >>>>>>>>>> coordinate >>>> >>> >>>>>>>>>>>> commits to >>>> >>> >>>>>>>>>>>>>>>>>>> external systems. The OperatorCoordinator >>>> is >>>> >>> widely >>>> >>> >>>>>>>> using >>>> >>> >>>>>>>>> in >>>> >>> >>>>>>>>>>>> flink kafka >>>> >>> >>>>>>>>>>>>>>>>>>> connector, flink pulsar connector, flink >>>> cdc >>>> >>> connector, >>>> >>> >>>>>>>>>> flink >>>> >>> >>>>>>>>>>>> hudi >>>> >>> >>>>>>>>>>>>>>>>>>> connector and so on. >>>> >>> >>>>>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>>>>> But there is not a suitable metric group >>>> scope >>>> >>> for the >>>> >>> >>>>>>>>>>>>>>>>> OperatorCoordinator >>>> >>> >>>>>>>>>>>>>>>>>>> and not an implementation for the interface >>>> >>> >>>>>>>>>>>>>>>>> OperatorCoordinatorMetricGroup. >>>> >>> >>>>>>>>>>>>>>>>>>> These metrics in OperatorCoordinator could >>>> be >>>> >>> how many >>>> >>> >>>>>>>>>>>> splits/partitions >>>> >>> >>>>>>>>>>>>>>>>>>> have been assigned to source readers, how >>>> many >>>> >>> files >>>> >>> >>>>>>>> have >>>> >>> >>>>>>>>>> been >>>> >>> >>>>>>>>>>>> written >>>> >>> >>>>>>>>>>>>>>>>> out >>>> >>> >>>>>>>>>>>>>>>>>>> by sink writers, these metrics not only >>>> help >>>> >>> users to >>>> >>> >>>>>>>> know >>>> >>> >>>>>>>>>> the >>>> >>> >>>>>>>>>>>> job >>>> >>> >>>>>>>>>>>>>>>>> progress >>>> >>> >>>>>>>>>>>>>>>>>>> but also make big job maintaining easier. >>>> Thus we >>>> >>> >>>>>>>> propose >>>> >>> >>>>>>>>>> the >>>> >>> >>>>>>>>>>>> FLIP-274 >>>> >>> >>>>>>>>>>>>>>>>> to >>>> >>> >>>>>>>>>>>>>>>>>>> introduce a new metric group scope for >>>> >>> >>>>>>>> OperatorCoordinator >>>> >>> >>>>>>>>>> and >>>> >>> >>>>>>>>>>>> provide >>>> >>> >>>>>>>>>>>>>>>>> an >>>> >>> >>>>>>>>>>>>>>>>>>> internal implementation for >>>> >>> >>>>>>>>> OperatorCoordinatorMetricGroup. >>>> >>> >>>>>>>>>>>>>>>>>>> Could you help review this FLIP when you >>>> get >>>> >>> time? Any >>>> >>> >>>>>>>>>>> feedback >>>> >>> >>>>>>>>>>>> is >>>> >>> >>>>>>>>>>>>>>>>>>> appreciated! >>>> >>> >>>>>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>>>>> Best, >>>> >>> >>>>>>>>>>>>>>>>>>> Hang >>>> >>> >>>>>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>>>>> [1] >>>> >>> >>>>>>>>>>>>>>>>>>> >>>> >>> >>>>>>>>>>>>>>>>>>> >>>> >>> >> >>>> >>> >>>> >> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator >>>> >>> >> >>>> >>> >>>> >>>> >>> >>