Hi all,

I make a mistake about the default value of the  scope format
JobManagerOperatorScopeFormat. It should be
<host>.jobmanager.<job_name>.<operator_name>.
I have fixed it in the FLIP.
And the variables (task id and task name) have been added to the list.

Best,
Hang


Hang Ruan <ruanhang1...@gmail.com> 于2023年2月1日周三 14:11写道:

> Hi all,
>
> > IIUC, there is only an OperatorCoordinatorMetricGroup interface without
> implementations (it never worked). There were no coordinator metrics
> implemented and registered. @Hang Ruan please correct me if I'm wrong.
> Yes,  we shouldn't have to worry about backwards-compatibility as Chesnay
> said.
>
> After the discussion, I think the FLIP should be modified as follow.
> 1.  Add new metric group JobManagerOperatorMetricGroup, whose scope
> format is JobManagerOperatorScopeFormat. The config key of 
> JobManagerOperatorScopeFormat
> is metrics.scope.jm-operator, and the default value is
> <host>.jobmanager.<job_name>.<task_name>.<operator_name>.
> 2. JobManagerOperatorMetricGroup will be sub components of
> JobManagerJobMetricGroup.
> 3. InternalOperatorCoordinatorMetricGroup will extend ProxyMetricGroup,
> which registered under JobManagerOperatorMetricGroup by
> addGroup("coordinator").
>
> I have updated the FLIP-274. @Chesnay Schepler <ches...@apache.org> @Jark
> Wu <imj...@gmail.com> , please help to take a look at the modified FLIP
> and make sure that I do not miss  something.
>
> If there are no more problems, we plan to start the voting thread later.
>
> Best,
> Hang
>
> Chesnay Schepler <ches...@apache.org> 于2023年1月31日周二 20:09写道:
>
>> > IIUC, there is only an OperatorCoordinatorMetricGroup interface without
>> implementations (it never worked). There were no coordinator metrics
>> implemented and registered. @Hang Ruan <ruanhang1...@gmail.com> please
>> correct me if I'm wrong.
>>
>> Indeed. There are interfaces, and theoretically a metric group is exposed
>> to user-code via the SplitEnumeratorContext, but the metric group is always
>> null in the SourceCoordinatorContext. So we shouldn't have to worry about
>> backwards-compatibility.
>>
>> > However, the config key "metrics.scope.jm.job" is deprecated and is
>> replaced by "metrics.scope.jm-job". Similarly, "metrics.scope.tm.job" is
>> replaced with "metrics.scope.tm-job". That's why I though
>> "metrics.scope.jm-operator" might be more consistent.
>>
>> HA! Good catch. I only looked at the 1.16 docs where they weren't
>> deprecated yet.
>>
>> Then it should be "jm-operator" indeed.
>>
>> On 31/01/2023 10:51, Jark Wu wrote:
>>
>> Hi Chesnay,
>>
>> > The new option must default to the configured value of
>> metrics.scope.jm.job because otherwise it isn't backwards-compatible with
>> existing coordinator metrics.
>> IIUC, there is only an OperatorCoordinatorMetricGroup interface without
>> implementations (it never worked). There were no coordinator metrics
>> implemented and registered. @Hang Ruan <ruanhang1...@gmail.com> please
>> correct me if I'm wrong.
>>
>> > Intuitively I think the config key should be
>> metrics.scope.jm.operator., similar to metrics.scope.jm.job.
>> However, the config key "metrics.scope.jm.job" is deprecated and is
>> replaced by "metrics.scope.jm-job". Similarly, "metrics.scope.tm.job" is
>> replaced with "metrics.scope.tm-job". That's why I though
>> "metrics.scope.jm-operator" might be more consistent.
>>
>> > We would only support operator_name, operator_id, task_name and task_id
>> (+ everything else from the jm/job groups of course).
>> +1
>>
>> Best,
>> Jark
>>
>>
>>
>> On Mon, 30 Jan 2023 at 20:09, Chesnay Schepler <ches...@apache.org>
>> wrote:
>>
>>> The new option must default to the configured value of
>>> metrics.scope.jm.job because otherwise it isn't backwards-compatible with
>>> existing coordinator metrics.
>>> According to some earlier response in this thread there are already
>>> coordinator metrics.
>>>
>>> If that weren't the case you'd be correct, and we'd keep the default
>>> consistent with the default for tm operator metrics.
>>>
>>> Intuitively I think the config key should be metrics.scope.jm.operator.,
>>> similar to metrics.scope.jm.job.
>>> For symmetry we should also add new metrics.scope.tm.operator/task and
>>> eventually phase out metrics.scope.operator.
>>>
>>> > Regarding "coordinator metrics don't *belong to* any tasks/vertexes",
>>> I mean the coordinator doesn't run on specific tasks, so it doesn't have
>>> runtime information of tasks.
>>>
>>> That is true. I intentionally only referred to task id/name, as things
>>> like subtask index or attempt ids don't make sense for coordinators as you
>>> said.
>>> We would only support operator_name, operator_id, task_name and task_id
>>> (+ everything else from the jm/job groups of course).
>>> (Maybe in the future attempt_num could be interesting though)
>>>
>>> On 30/01/2023 10:39, Jark Wu wrote:
>>>
>>> Hi Chesnay,
>>>
>>> IIUC, our only disagreement is whether to support task info in the
>>> metric format. But we all agree with the default metric format
>>> "metrics.scope.jm-operator=<host>.jobmanager.<job_name>.<operator_name>"
>>> which doesn't include task info, right?
>>>
>>> Regarding "coordinator metrics don't *belong to* any tasks/vertexes", I
>>> mean the coordinator doesn't run on specific tasks, so it doesn't have
>>> runtime information of tasks. My concern about adding task info to the
>>> coordinator metric is that
>>> 1) if we register coordinator metrics for every subtask, the number of
>>> coordinator metrics will be exploded when task parallelism is large. This
>>> is an overhead to JM & metric systems, and the duplicated metrics are
>>> useless for users.
>>> 2) Correct me if I'm wrong, MetricGroup#counter only supports
>>> registering one metric. It seems it can't register a metric reporting for
>>> multiple metric names (e.g. multiple subtasks).
>>>
>>> Best,
>>> Jark
>>>
>>>
>>> On Mon, 30 Jan 2023 at 16:11, Chesnay Schepler <ches...@apache.org>
>>> wrote:
>>>
>>>> I hadn't looked in detail in to how the task info can be introduced,
>>>> but
>>>> given that the coordinates are created by the execution graph, where we
>>>> only work on tasks, it should be possible and rather simple.
>>>>
>>>> The OperatorCoordinatorHolder gets access to the ExecutionJobVertex
>>>> from
>>>> which it can extract everything and create the task/operator groups.
>>>> Exposure to the coordinator could happen via the coordinator context as
>>>> originally planned.
>>>>
>>>>  > As we know, coordinator metrics are just like JMJobMetricGroup,
>>>> which
>>>> don't belong to any tasks/vertexes.
>>>>
>>>> I don't think this is true, every coordinator is scoped to a specific
>>>> operator, and every operator is associated with a particular vertex.
>>>>
>>>>  From the javadocs:
>>>>
>>>> " A coordinator for runtime operators. The OperatorCoordinator runs on
>>>> the master, associated with
>>>>   the job vertex of the operator. It communicates with operators via
>>>> sending operator events."
>>>>
>>>> On 28/01/2023 09:15, Jark Wu wrote:
>>>> > Hi all,
>>>> >
>>>> > IIUC, Chesnay means we should have a more general metric group for
>>>> > **operator**
>>>> > not for **coordinator** in JM, this would be useful to extend
>>>> > other operator-specific
>>>> > metrics in the future. That means the new scope format should be
>>>> designed
>>>> > for
>>>> > the operator,
>>>> > e.g.,
>>>> metrics.scope.jm-operator=<host>.jobmanager.<job_name>.<operator_name>
>>>> > The coordinator metric is a subgroup (a constant "coordinator"
>>>> suffix) of
>>>> > the JMOperatorMG.
>>>> >
>>>> > I think this is a nice design. However, I have a question about
>>>> adding task
>>>> > id/name to this list.
>>>> > How to get the task id/name when reporting a coordinator metric? As we
>>>> > know, coordinator metrics
>>>> > are just like JMJobMetricGroup, which don't belong to any
>>>> tasks/vertexes.
>>>> > Do you mean reporting
>>>> > coordinator metrics for every task id under the operator?
>>>> >
>>>> > Best,
>>>> > Jark
>>>> >
>>>> >
>>>> > On Thu, 19 Jan 2023 at 17:33, Chesnay Schepler <ches...@apache.org>
>>>> wrote:
>>>> >
>>>> >>   > First, I do not understand why users have to configure the new
>>>> scope
>>>> >> format, which has a default value.
>>>> >>
>>>> >> If you don't use scope formats, sure. If you do use scope formats,
>>>> e.g.
>>>> >> to add a common prefix (which is the case for datadog users for
>>>> >> example), then the current default in the FLIP is insufficient and
>>>> >> requires the user to update the configuration.
>>>> >>
>>>> >>   > I usually do not need to change these configuration of scope
>>>> formats
>>>> >> when submitting the flink job.
>>>> >>
>>>> >> Scope formats as a whole are quite a power-user feature, but that
>>>> >> doesn't mean we should ignore it.
>>>> >>
>>>> >>   > I try to let it extend the ComponentMetricGroup
>>>> >>
>>>> >> This isn't inherently required just because it is a "component".
>>>> >> Component metric groups should _only_ be used for cases where such a
>>>> >> component provides several bits of metadata.
>>>> >> For example, tasks provide vertex ids, task names, attempted IDs
>>>> etc.,
>>>> >> and we'd like users to have the option on how this metadata ends up
>>>> >> being used (via scope formats). This currently can't be built with
>>>> the
>>>> >> addGroup() methods, hence why the component groups exist.
>>>> >> However, the operator coordinator _itself_does not provide several
>>>> bits
>>>> >> of metadata. Logically, the _operator_ does, not the coordinator.
>>>> >>
>>>> >>   > This make me decide to add a new scope format.
>>>> >>
>>>> >> And this is fine; just don't add a new scope format for the
>>>> coordinator
>>>> >> but logically for operators on the JM side, such that we can extend
>>>> the
>>>> >> set of operator-specific metrics in the future without having to add
>>>> yet
>>>> >> another scope format.
>>>> >>
>>>> >>   > The relationship between the Operator and OperatorCoordinator is
>>>> >> maintained in the OperatorCoordinatorHolder. In the POC, the operator
>>>> >> name/id could be found and the OperatorCoordinatorMetricGroup will be
>>>> >> created here.
>>>> >>
>>>> >> That's all irrelevant.
>>>> >>
>>>> >>   > The registered metrics of OperatorCoordinator always are those
>>>> which
>>>> >> can not be aggregated from the tasks, like the number of unassigned
>>>> >> splits in the SourceCoordinator. Actually we have not encountered the
>>>> >> scenario you mentioned. I think the OperatorCoordinatorMetricGroup
>>>> >> should only contain the metrics for itself instead of its subtasks.
>>>> >>
>>>> >> You are misunderstanding the problem.
>>>> >>
>>>> >> This isn't about aggregating metrics on our side or exposing metrics
>>>> >> from TMs on the JM side or anything like that.
>>>> >>
>>>> >> It's purely about the available metadata for operator metrics on the
>>>> JM
>>>> >> side. Currently you suggest operator name / id; what I'm proposing
>>>> is to
>>>> >> add task(==vertex!) id / name to this list.
>>>> >>
>>>> >> The benefit here is simple. I can look up all metrics where
>>>> task_id==XXX
>>>> >> and get _everything_ related to that vertex, including all metrics
>>>> >> associated with operators that are part of that vertex, including the
>>>> >> coordinator.
>>>> >>
>>>> >>   > Using metrics.scope.jm.job is not enough to distinguish the
>>>> different
>>>> >> coordinators.
>>>> >>
>>>> >> I did not suggest just using metrics.scope.jm.job. I suggested that
>>>> as
>>>> >> the default for the jm operator scope format, that is used for the
>>>> >> proposed JobManagerOperatorMetricGroup, which will have some plain
>>>> >> metric groups as children for coordinators.
>>>> >> Aka, you have the JMOperatorMG, then you call
>>>> addGroup("coordinators")
>>>> >> and then addGroup(coordinator_name) for each coordinator.
>>>> >>
>>>> >>   > So there are two choice for the
>>>> >> InternalOperatorCoordinatorMetricGroup: 1. add and improve the new
>>>> scope
>>>> >> format; 2. use the metrics.scope.jm.job and ProxyMetricGroup. Which
>>>> one
>>>> >> is better?
>>>> >>
>>>> >> There are more options than that.
>>>> >> I feel like there are a lot of misunderstandings in this discussion.
>>>> >> Please let me know if I could clear things up. If not I can also
>>>> provide
>>>> >> a PoC based on your PoC if that can speed things up. It's not that
>>>> >> different anyway.
>>>> >>
>>>> >> On 19/01/2023 07:39, Hang Ruan wrote:
>>>> >>> Hi, chesnay,
>>>> >>>
>>>> >>> Thanks for your reply. I still have some doubts about the questions
>>>> >>> you raised.
>>>> >>>
>>>> >>>      > Extending the set of ScopeFormats is problematic because it
>>>> in
>>>> >>>      practice
>>>> >>>      it breaks the config if users actively rely on it, since
>>>> there's now
>>>> >>>      another key that they _must_ set for it to be
>>>> >>>      consistent/compatible with
>>>> >>>      their existing setup.
>>>> >>>      Unfortunately due to how powerful scope formats are we can't
>>>> derive a
>>>> >>>      default value that matches their existing setup.
>>>> >>>      Hence we should try to do this as rarely as possible.
>>>> >>>
>>>> >>>
>>>> >>>      > This FLIP does not adhere to that since it proposes a
>>>> dedicated
>>>> >>>      format
>>>> >>>      for coordinators; next time we want to expose operator-specific
>>>> >>>      metrics
>>>> >>>      (e.g., in the scheduler) we'd have to add another one to
>>>> support it.
>>>> >>>
>>>> >>>
>>>> >>> First, I do not understand why users have to configure the new scope
>>>> >>> format, which has a default value. I usually do not need to change
>>>> >>> these configuration of scope formats when submitting the flink job.
>>>> >>> IMO, the OperatorCoordinator is a component running at the
>>>> JobMaster.
>>>> >>> I try to let it extend the ComponentMetricGroup, but I cannot find
>>>> >>> a suitable scope. This make me decide to add a new scope format.
>>>> >>>
>>>> >>>      > Additionally, the configurable variables (operator name/id)
>>>> are
>>>> >>>      logically not attached to the coordinator, but operators, so
>>>> to me it
>>>> >>>      just doesn't make sense to structure it like this.
>>>> >>>
>>>> >>>
>>>> >>> The relationship between the Operator and OperatorCoordinator is
>>>> >>> maintained in the OperatorCoordinatorHolder. In the POC, the
>>>> operator
>>>> >>> name/id could be found and the OperatorCoordinatorMetricGroup will
>>>> be
>>>> >>> created here.
>>>> >>>
>>>> >>>      > Another thing I'm concerned about is that, because we don't
>>>> include
>>>> >>>      tasks in the hierarchy, users wishing to collect all metrics
>>>> for a
>>>> >>>      particular task (in this case ==vertex) now have to go
>>>> >>>      significantly out
>>>> >>>      of their way to get them, since they can no longer just filter
>>>> by the
>>>> >>>      task ID but have to be filter for _all_ operators that are
>>>> part of
>>>> >>>      the task.
>>>> >>>
>>>> >>>
>>>> >>> The registered metrics of OperatorCoordinator always are those which
>>>> >>> can not be aggregated from the tasks, like the number of unassigned
>>>> >>> splits in the SourceCoordinator. Actually we have not encountered
>>>> the
>>>> >>> scenario you mentioned. I think the OperatorCoordinatorMetricGroup
>>>> >>> should only contain the metrics for itself instead of its subtasks.
>>>> >>>
>>>> >>> Using metrics.scope.jm.job is not enough to distinguish the
>>>> different
>>>> >>> coordinators. The operator id/name is necessary. Then
>>>> >>> the implementation should be like this. But users can not change the
>>>> >>> scope in this way. This is also acceptable.
>>>> >>> @Internal public class InternalOperatorCoordinatorMetricGroup
>>>> >>>           extends ProxyMetricGroup<MetricGroup>
>>>> >>>           implements OperatorCoordinatorMetricGroup {
>>>> >>>       public InternalOperatorCoordinatorMetricGroup(
>>>> >>>               JobManagerJobMetricGroup parent, OperatorID
>>>> operatorID,
>>>> >> String operatorName) {
>>>> >>>           super(parent.addGroup(operatorID +
>>>> >> operatorName).addGroup("coordinator")); }
>>>> >>> }
>>>> >>> So there are two choice for the
>>>> >>> InternalOperatorCoordinatorMetricGroup: 1. add and improve the new
>>>> >>> scope format; 2. use the metrics.scope.jm.job and ProxyMetricGroup.
>>>> >>> Which one is better?
>>>> >>>
>>>> >>> Best,
>>>> >>> Hang
>>>> >>>
>>>> >>>
>>>> >>> Chesnay Schepler <ches...@apache.org> 于2023年1月18日周三 17:03写道:
>>>> >>>
>>>> >>>      You're misunderstanding the problem.
>>>> >>>
>>>> >>>      Metric groups form a tree with each group providing certain
>>>> metadata.
>>>> >>>
>>>> >>>      E.g., on the taskmanager we have a TM metric group that
>>>> provides info
>>>> >>>      about the TM, that has child task metric groups, that have
>>>> child
>>>> >>>      operator metric groups etc. The operator metric group again has
>>>> >>>      children, where we are now mostly in the user/connector land,
>>>> like a
>>>> >>>      kafka metric group providing partition info or something.
>>>> >>>
>>>> >>>      What is being proposed is to create a coordinator metric group
>>>> on
>>>> >>>      the JM
>>>> >>>      side that provides operator metadata.
>>>> >>>
>>>> >>>      A more appropriate structure is to create an operator group on
>>>> the JM
>>>> >>>      side that provides this info, with the coordinator metric group
>>>> >>>      being a
>>>> >>>      child.
>>>> >>>
>>>> >>>      On 17/01/2023 19:56, Steven Wu wrote:
>>>> >>>      >> Additionally, the configurable variables (operator name/id)
>>>> are
>>>> >>>      > logically not attached to the coordinator, but operators, so
>>>> to
>>>> >>>      me it
>>>> >>>      > just doesn't make sense to structure it like this.
>>>> >>>      >
>>>> >>>      > Chesnay, maybe we should clarify the terminology. To me,
>>>> >>>      pperators (like
>>>> >>>      > FLIP-27 source) can have two parts (coordinator and
>>>> >>>      reader/subtask). I
>>>> >>>      > think it is fine to include operator name/id for coordinator
>>>> >>>      metrics.
>>>> >>>      >
>>>> >>>      > On Mon, Jan 16, 2023 at 2:13 AM Chesnay Schepler
>>>> >>>      <ches...@apache.org> wrote:
>>>> >>>      >
>>>> >>>      >> Slight correction: Using metrics.scope.jm.job as the default
>>>> >>>      should be
>>>> >>>      >> safe.
>>>> >>>      >>
>>>> >>>      >> On 16/01/2023 10:18, Chesnay Schepler wrote:
>>>> >>>      >>> The proposed ScopeFormat is still problematic for a few
>>>> reasons.
>>>> >>>      >>>
>>>> >>>      >>> Extending the set of ScopeFormats is problematic because
>>>> it in
>>>> >>>      >>> practice it breaks the config if users actively rely on
>>>> it, since
>>>> >>>      >>> there's now another key that they _must_ set for it to be
>>>> >>>      >>> consistent/compatible with their existing setup.
>>>> >>>      >>> Unfortunately due to how powerful scope formats are we
>>>> can't
>>>> >>>      derive a
>>>> >>>      >>> default value that matches their existing setup.
>>>> >>>      >>> Hence we should try to do this as rarely as possible.
>>>> >>>      >>>
>>>> >>>      >>> This FLIP does not adhere to that since it proposes a
>>>> >>>      dedicated format
>>>> >>>      >>> for coordinators; next time we want to expose
>>>> operator-specific
>>>> >>>      >>> metrics (e.g., in the scheduler) we'd have to add another
>>>> one to
>>>> >>>      >>> support it.
>>>> >>>      >>>
>>>> >>>      >>> Additionally, the configurable variables (operator
>>>> name/id) are
>>>> >>>      >>> logically not attached to the coordinator, but operators,
>>>> so
>>>> >>>      to me it
>>>> >>>      >>> just doesn't make sense to structure it like this.
>>>> >>>      >>>
>>>> >>>      >>> Another thing I'm concerned about is that, because we don't
>>>> >>>      include
>>>> >>>      >>> tasks in the hierarchy, users wishing to collect all
>>>> metrics for
>>>> >> a
>>>> >>>      >>> particular task (in this case ==vertex) now have to go
>>>> >>>      significantly
>>>> >>>      >>> out of their way to get them, since they can no longer just
>>>> >>>      filter by
>>>> >>>      >>> the task ID but have to be filter for _all_ operators that
>>>> are
>>>> >>>      part of
>>>> >>>      >>> the task.
>>>> >>>      >>>
>>>> >>>      >>> On 16/01/2023 03:09, Hang Ruan wrote:
>>>> >>>      >>>> Hi, @ches...@apache.org <ches...@apache.org> ,
>>>> >>>      >>>>
>>>> >>>      >>>> Do you have time to help to review this FLIP again after
>>>> the
>>>> >>>      >>>> modification?
>>>> >>>      >>>> Looking forward to your reply.
>>>> >>>      >>>> This FLIP will add a new configuration for the
>>>> >>>      >>>> OperatorCoordinatorMetricGroup scope format. It provides
>>>> an
>>>> >>>      internal
>>>> >>>      >>>> implementation and is added as a component to the
>>>> >>>      >>>> JobManagerJobMetricGroup.
>>>> >>>      >>>> If something doesn't make sense, could you provide some
>>>> >>>      advice? It
>>>> >>>      >>>> will be
>>>> >>>      >>>> very helpful. Thanks a lot for your help.
>>>> >>>      >>>>
>>>> >>>      >>>> Best,
>>>> >>>      >>>> Hang
>>>> >>>      >>>>
>>>> >>>      >>>> Martijn Visser <martijnvis...@apache.org> 于2023年1月11日周三
>>>> >>>      16:34写道:
>>>> >>>      >>>>
>>>> >>>      >>>>
>>>> >>>      >>>>> Hi Hang,
>>>> >>>      >>>>>
>>>> >>>      >>>>> I'm a bit surprised that this has gone to a vote, given
>>>> that
>>>> >>>      Chesnay
>>>> >>>      >>>>> deliberately mentioned that he would vote against it
>>>> as-is.
>>>> >>>      I would
>>>> >>>      >>>>> expect
>>>> >>>      >>>>> that before going to a vote, he has had the opportunity
>>>> to
>>>> >>>      >>>>> participate in
>>>> >>>      >>>>> this discussion.
>>>> >>>      >>>>>
>>>> >>>      >>>>> Best regards,
>>>> >>>      >>>>>
>>>> >>>      >>>>> Martijn
>>>> >>>      >>>>>
>>>> >>>      >>>>> On Tue, Jan 3, 2023 at 12:53 PM Jark Wu <
>>>> imj...@gmail.com>
>>>> >>>      wrote:
>>>> >>>      >>>>>
>>>> >>>      >>>>>> Hi Dong,
>>>> >>>      >>>>>>
>>>> >>>      >>>>>> Regarding “SplitEnumeratorContext#metricGroup”, my only
>>>> >>>      concern is
>>>> >>>      >>>>>> that
>>>> >>>      >>>>>> this is a core interface for the FLIP. It’s hard to
>>>> tell how
>>>> >>>      >>>>>> sources use
>>>> >>>      >>>>>> metric group without mentioning this interface. Even if
>>>> >>>      this is an
>>>> >>>      >>>>> existing
>>>> >>>      >>>>>> API, I think it’s worth introducing the interface again
>>>> and
>>>> >>>      declaring
>>>> >>>      >>>>> that
>>>> >>>      >>>>>> we will implement the interface instead of a no-op
>>>> method
>>>> >>>      in this
>>>> >>>      >>>>>> FLIP.
>>>> >>>      >>>>>>
>>>> >>>      >>>>>> Anyway, this is a minor problem and shouldn’t block this
>>>> >>>      FLIP. I’m
>>>> >>>      >>>>>> +1 to
>>>> >>>      >>>>>> start a vote.
>>>> >>>      >>>>>>
>>>> >>>      >>>>>> Best,
>>>> >>>      >>>>>> Jark
>>>> >>>      >>>>>>
>>>> >>>      >>>>>>
>>>> >>>      >>>>>>> 2023年1月3日 10:03,Hang Ruan <ruanhang1...@gmail.com> 写道:
>>>> >>>      >>>>>>>
>>>> >>>      >>>>>>> Hi, Jark and Dong,
>>>> >>>      >>>>>>>
>>>> >>>      >>>>>>> Thanks for your comments. Sorry for my late reply.
>>>> >>>      >>>>>>>
>>>> >>>      >>>>>>> For suggestion 1, I plan to implement the
>>>> >>>      >>>>>>> SplitEnumeratorMetricGroup in
>>>> >>>      >>>>>>> another issue, and it is not contained in this FLIP. I
>>>> >>>      will add some
>>>> >>>      >>>>>>> description about this part.
>>>> >>>      >>>>>>> For suggestion 2, changes about
>>>> >>>      OperatorCoordinator#metricGroup has
>>>> >>>      >>>>>> already
>>>> >>>      >>>>>>> been documented in the proposed change section.
>>>> >>>      >>>>>>>
>>>> >>>      >>>>>>> Best,
>>>> >>>      >>>>>>> Hang
>>>> >>>      >>>>>>>
>>>> >>>      >>>>>>> Dong Lin <lindon...@gmail.com> 于2023年1月1日周日 09:45写道:
>>>> >>>      >>>>>>>
>>>> >>>      >>>>>>>> Let me chime-in and add comments regarding the public
>>>> >>>      interface
>>>> >>>      >>>>> section.
>>>> >>>      >>>>>>>> Please see my comments inline.
>>>> >>>      >>>>>>>>
>>>> >>>      >>>>>>>> On Thu, Dec 29, 2022 at 6:08 PM Jark Wu
>>>> >>>      <imj...@gmail.com> wrote:
>>>> >>>      >>>>>>>>
>>>> >>>      >>>>>>>>> Hi Hang,
>>>> >>>      >>>>>>>>>
>>>> >>>      >>>>>>>>> Thanks for driving this discussion. I think this is a
>>>> >>>      very useful
>>>> >>>      >>>>>> feature
>>>> >>>      >>>>>>>>> for connectors.
>>>> >>>      >>>>>>>>>
>>>> >>>      >>>>>>>>> The FLIP looks quite good to me, and I just have two
>>>> >>>      suggestions.
>>>> >>>      >>>>>>>>>
>>>> >>>      >>>>>>>>> 1. In the "Public Interface" section, mention that
>>>> the
>>>> >>>      >>>>>>>>> implementation
>>>> >>>      >>>>>>>>> behavior of "SplitEnumeratorContext#metricGroup" is
>>>> >>>      changed from
>>>> >>>      >>>>>>>> returning
>>>> >>>      >>>>>>>>> null to returning a concrete
>>>> SplitEnumeratorMetricGroup
>>>> >>>      instance.
>>>> >>>      >>>>> Even
>>>> >>>      >>>>>>>>> though the API is already there, the behavior change
>>>> can
>>>> >>>      also be
>>>> >>>      >>>>>>>> considered
>>>> >>>      >>>>>>>>> a public change.
>>>> >>>      >>>>>>>>>
>>>> >>>      >>>>>>>> SplitEnumeratorContext#metricGroup is an interface and
>>>> >>>      this FLIP
>>>> >>>      >>>>>>>> does
>>>> >>>      >>>>>> not
>>>> >>>      >>>>>>>> seem to change its semantics/behavior. The FLIP does
>>>> >>>      change the
>>>> >>>      >>>>>>>> implementation/behavior of
>>>> >>>      SourceCoordinatorContext#metricGroup,
>>>> >>>      >>>>>>>> which
>>>> >>>      >>>>>> is
>>>> >>>      >>>>>>>> marked @Internal.
>>>> >>>      >>>>>>>>
>>>> >>>      >>>>>>>> Thus it might seem a bit weird to add in the public
>>>> >> interface
>>>> >>>      >>>>>>>> section
>>>> >>>      >>>>>>>> saying that we change the interface
>>>> >>>      >>>>>>>> SplitEnumeratorContext#metricGroup
>>>> >>>      >>>>>> from
>>>> >>>      >>>>>>>> returning null to non-null object.
>>>> >>>      >>>>>>>>
>>>> >>>      >>>>>>>> 2. Mention the newly added interface of
>>>> >>>      >>>>>> "OperatorCoordinator#metricGroup"
>>>> >>>      >>>>>>>>> in the "Proposed Changes" section or "Public
>>>> Interface"
>>>> >>>      section. As
>>>> >>>      >>>>> the
>>>> >>>      >>>>>>>>> FLIP said, OperatorCoordinator is widely used in many
>>>> >>>      connectors.
>>>> >>>      >>>>>> Though
>>>> >>>      >>>>>>>> it
>>>> >>>      >>>>>>>>> is still an @Internal API, I think it is worth
>>>> >>>      mentioning the
>>>> >>>      >>>>>>>>> change
>>>> >>>      >>>>> in
>>>> >>>      >>>>>>>> the
>>>> >>>      >>>>>>>>> FLIP.
>>>> >>>      >>>>>>>> Since OperatorCoordinator is an internal API, it seems
>>>> >>>      reasonable to
>>>> >>>      >>>>>>>> explain it in the proposed change section. The FLIP
>>>> seems
>>>> >>>      to have
>>>> >>>      >>>>>>>> documented this in point 5 of the proposed change
>>>> section.
>>>> >>>      >>>>>>>>
>>>> >>>      >>>>>>>> BTW, if we think there are @internal classes that are
>>>> >>>      important
>>>> >>>      >>>>>>>> enough
>>>> >>>      >>>>>> to
>>>> >>>      >>>>>>>> be added in the public interface section, it might be
>>>> >>>      useful to
>>>> >>>      >>>>>> explicitly
>>>> >>>      >>>>>>>> discuss this topic and document it in the "*What are
>>>> the
>>>> >>>      "public
>>>> >>>      >>>>>>>> interfaces" of the project*" in this
>>>> >>>      >>>>>>>> <
>>>> >>>      >>>>>>>>
>>>> >>>      >>
>>>> >>>
>>>> >>
>>>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>>>> >>>      >>>>>>>> wiki.
>>>> >>>      >>>>>>>>
>>>> >>>      >>>>>>>>
>>>> >>>      >>>>>>>>> Best,
>>>> >>>      >>>>>>>>> Jark
>>>> >>>      >>>>>>>>>
>>>> >>>      >>>>>>>>>
>>>> >>>      >>>>>>>>> On Mon, 26 Dec 2022 at 18:06, Hang Ruan
>>>> >>>      <ruanhang1...@gmail.com>
>>>> >>>      >>>>>> wrote:
>>>> >>>      >>>>>>>>>> Hi, thanks for the feedback, Zhu Zhu and Qingsheng.
>>>> >>>      >>>>>>>>>> After combining everyone's comments, the main
>>>> concerns and
>>>> >>>      >>>>>>>> corresponding
>>>> >>>      >>>>>>>>>> adjustments are as follows.
>>>> >>>      >>>>>>>>>>
>>>> >>>      >>>>>>>>>> Q1: Common metrics are not quite useful.
>>>> >>>      >>>>>>>>>> numEventsIn and numEventsOut counters will be
>>>> removed
>>>> >>>      from the
>>>> >>>      >>>>>>>>>> OperatorCoordinatorMetricGroup. These common
>>>> metrics do
>>>> >> not
>>>> >>>      >>>>>>>>>> provide
>>>> >>>      >>>>>>>>> enough
>>>> >>>      >>>>>>>>>> information for users. The users are more willing to
>>>> >>>      get the
>>>> >>>      >>>>>>>>>> number
>>>> >>>      >>>>> of
>>>> >>>      >>>>>>>>>> events of the specified type instead of the total
>>>> >>>      number. And this
>>>> >>>      >>>>>>>> metric
>>>> >>>      >>>>>>>>>> is calculated differently. The implementation could
>>>> >>>      register the
>>>> >>>      >>>>>> metric
>>>> >>>      >>>>>>>>> by
>>>> >>>      >>>>>>>>>> themselves.
>>>> >>>      >>>>>>>>>>
>>>> >>>      >>>>>>>>>> Q2: This FLIP is overly complicated.
>>>> >>>      >>>>>>>>>> This FLIP will become concise after these
>>>> modifications.
>>>> >>>      >>>>>>>>>> OperatorCoordinatorMetricGroup has already been
>>>> >>>      introduced into
>>>> >>>      >>>>> Flink
>>>> >>>      >>>>>>>> by
>>>> >>>      >>>>>>>>>> FLIP-179<
>>>> >>>      >>>>>>>>>>
>>>> >>>      >>>>>>>>>>
>>>> >>>      >>
>>>> >>>
>>>> >>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics
>>>> >>>      >>>>>>>>>>> .
>>>> >>>      >>>>>>>>>> And
>>>> >>>      >>>>>>>>>> this FLIP will not change it. This FLIP only
>>>> provides a
>>>> >>>      new metric
>>>> >>>      >>>>>>>> option
>>>> >>>      >>>>>>>>>> and a new metric group scope. The changes in
>>>> proposed
>>>> >>>      changes
>>>> >>>      >>>>> provide
>>>> >>>      >>>>>>>> the
>>>> >>>      >>>>>>>>>> details about the modifications for the internal
>>>> >>>      classes, which
>>>> >>>      >>>>> might
>>>> >>>      >>>>>>>>> make
>>>> >>>      >>>>>>>>>> it look complicated.
>>>> >>>      >>>>>>>>>>
>>>> >>>      >>>>>>>>>> Thanks for all the comments again. If there are no
>>>> further
>>>> >>>      >>>>>>>>>> comments,
>>>> >>>      >>>>>> we
>>>> >>>      >>>>>>>>>> plan to start the voting thread this week.
>>>> >>>      >>>>>>>>>>
>>>> >>>      >>>>>>>>>> Best,
>>>> >>>      >>>>>>>>>> Hang
>>>> >>>      >>>>>>>>>>
>>>> >>>      >>>>>>>>>> Qingsheng Ren <renqs...@gmail.com> 于2022年12月26日周一
>>>> >>>      16:48写道:
>>>> >>>      >>>>>>>>>>
>>>> >>>      >>>>>>>>>>
>>>> >>>      >>>>>>>>>>> Thanks for the FLIP, Hang!
>>>> >>>      >>>>>>>>>>>
>>>> >>>      >>>>>>>>>>> This FLIP overall looks good to me. Actually I
>>>> share
>>>> >>>      the same
>>>> >>>      >>>>> concern
>>>> >>>      >>>>>>>>>> with
>>>> >>>      >>>>>>>>>>> Zhu that numEventsIn and numEventsOut counters are
>>>> not
>>>> >>>      quite
>>>> >>>      >>>>>>>>>>> useful
>>>> >>>      >>>>>>>> to
>>>> >>>      >>>>>>>>>> end
>>>> >>>      >>>>>>>>>>> users. OperatorEvent is a quite low-level
>>>> abstraction,
>>>> >>>      which
>>>> >>>      >>>>> requires
>>>> >>>      >>>>>>>>>>> instantialization in order to be practical to
>>>> users and
>>>> >>>      >>>>>>>>>>> developers,
>>>> >>>      >>>>>>>> so
>>>> >>>      >>>>>>>>>>> maybe it's better to exclude them from the FLIP.
>>>> >>>      >>>>>>>>>>>
>>>> >>>      >>>>>>>>>>> Best,
>>>> >>>      >>>>>>>>>>> Qingsheng
>>>> >>>      >>>>>>>>>>> Ververica (Alibaba)
>>>> >>>      >>>>>>>>>>>
>>>> >>>      >>>>>>>>>>> On Mon, Dec 26, 2022 at 12:08 PM Zhu Zhu
>>>> >>>      <reed...@gmail.com>
>>>> >>>      >>>>> wrote:
>>>> >>>      >>>>>>>>>>>> Hi Hang,
>>>> >>>      >>>>>>>>>>>> I still see no strong reason why we need
>>>> >>>      >>>>>>>>>>>> numEventsIn/numEventsOut
>>>> >>>      >>>>>>>>>>> metrics.
>>>> >>>      >>>>>>>>>>>> In the discussion in FLINK-29801, I can see the
>>>> same
>>>> >>>      concern
>>>> >>>      >>>>>>>>>>>> from
>>>> >>>      >>>>>>>>>> others.
>>>> >>>      >>>>>>>>>>>> So I prefer to exclude them from this FLIP to
>>>> avoid
>>>> >>>      >>>>>>>>>>>> over-extending
>>>> >>>      >>>>>>>>> the
>>>> >>>      >>>>>>>>>>>> scope.
>>>> >>>      >>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>> Thanks,
>>>> >>>      >>>>>>>>>>>> Zhu
>>>> >>>      >>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>> Hang Ruan <ruanhang1...@gmail.com> 于2022年12月23日周五
>>>> >>>      >>>>>>>>>>>> 15:21写道:
>>>> >>>      >>>>>>>>>>>>> Hi everyone,
>>>> >>>      >>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>> As for the Zhu Zhu's problem, I think we should
>>>> keep
>>>> >>>      the common
>>>> >>>      >>>>>>>>>>> metrics,
>>>> >>>      >>>>>>>>>>>> which will help to observe incoming and outgoing
>>>> >>>      events. What do
>>>> >>>      >>>>>>>> you
>>>> >>>      >>>>>>>>>>> think,
>>>> >>>      >>>>>>>>>>>> @Zhu Zhu ?
>>>> >>>      >>>>>>>>>>>>> And @Chesnay, are there any other issues you are
>>>> >>>      more concerned
>>>> >>>      >>>>>>>>>> about?
>>>> >>>      >>>>>>>>>>>> Looking forward to your reply.
>>>> >>>      >>>>>>>>>>>>> Thanks for all the comments. If there are no
>>>> further
>>>> >>>      >>>>>>>>>>>>> comments, we
>>>> >>>      >>>>>>>>>> plan
>>>> >>>      >>>>>>>>>>>> to start the voting thread next week.
>>>> >>>      >>>>>>>>>>>>> Best,
>>>> >>>      >>>>>>>>>>>>> Hang
>>>> >>>      >>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>> Hang Ruan <ruanhang1...@gmail.com>
>>>> 于2022年12月15日周四
>>>> >>>      >>>>>>>>>>>>> 16:49写道:
>>>> >>>      >>>>>>>>>>>>>> Hi, Zhu Zhu,
>>>> >>>      >>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>> Thanks for your feedback!
>>>> >>>      >>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>> The OperatorCoordinator implementations are
>>>> >>>      different. And
>>>> >>>      >>>>>>>>>>>>>> their
>>>> >>>      >>>>>>>>>>>> metrics are much different too. We try to find the
>>>> >> common
>>>> >>>      >>>>>>>>>>>> metrics
>>>> >>>      >>>>>>>> and
>>>> >>>      >>>>>>>>>> put
>>>> >>>      >>>>>>>>>>>> them in the OperatorCoordinatorMetricGroup. If
>>>> most
>>>> >>>      developers
>>>> >>>      >>>>>>>> think
>>>> >>>      >>>>>>>>> we
>>>> >>>      >>>>>>>>>>> do
>>>> >>>      >>>>>>>>>>>> not need these common metrics, removing the common
>>>> >>>      metrics is
>>>> >>>      >>>>>>>>>>>> also
>>>> >>>      >>>>>>>>>>>> acceptable.
>>>> >>>      >>>>>>>>>>>>>> Best,
>>>> >>>      >>>>>>>>>>>>>> Hang
>>>> >>>      >>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>> Zhu Zhu <reed...@gmail.com> 于2022年12月14日周三
>>>> >>>      22:09写道:
>>>> >>>      >>>>>>>>>>>>>>> Hi Hang & MengYue,
>>>> >>>      >>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>> Thanks for creating this FLIP!
>>>> >>>      >>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>> I think it is very useful, mainly in two
>>>> aspects:
>>>> >>>      >>>>>>>>>>>>>>> 1. Enables OperatorCoordinators to register
>>>> metrics.
>>>> >>>      >>>>>>>>>>>>>>> Currently
>>>> >>>      >>>>>>>>>>>>>>> the coordinators has no way to do this. And
>>>> operator
>>>> >>>      >>>>>>>> coordinator
>>>> >>>      >>>>>>>>>>>>>>> metric group further enables the
>>>> SplitEnumerator
>>>> >>>      to have
>>>> >>>      >>>>>>>>>>>>>>> access
>>>> >>>      >>>>>>>>>>>>>>> to a registered metric group (via the existing
>>>> public
>>>> >>>      >>>>>>>>>>>>>>> interface
>>>> >>>      >>>>>>>>>>>>>>> SplitEnumeratorContext#metricGroup()), which is
>>>> >>>      null at the
>>>> >>>      >>>>>>>>> moment.
>>>> >>>      >>>>>>>>>>>>>>> 2. Defines the scope of operator coordinator
>>>> >>>      metrics. A clear
>>>> >>>      >>>>>>>>>>>> definition
>>>> >>>      >>>>>>>>>>>>>>> makes it easy for users to find their wanted
>>>> >>>      metrics. The
>>>> >>>      >>>>>>>>>> definition
>>>> >>>      >>>>>>>>>>>>>>> also helps to avoid conflicts of metrics from
>>>> >> multiple
>>>> >>>      >>>>>>>>>>>> OperatorCoordinators
>>>> >>>      >>>>>>>>>>>>>>> of the same kind. E.g. each SourceCoordinator
>>>> may
>>>> >>>      have its
>>>> >>>      >>>>>>>>>>>>>>> own
>>>> >>>      >>>>>>>>>>>>>>> numSourceSplits metric, these metrics should
>>>> not
>>>> >>>      be directly
>>>> >>>      >>>>>>>>>>> registered
>>>> >>>      >>>>>>>>>>>>>>> to the job metric group.
>>>> >>>      >>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>> What I'm a bit concerned is the necessity of
>>>> the
>>>> >>>      introduced
>>>> >>>      >>>>>>>>> common
>>>> >>>      >>>>>>>>>>>> metrics
>>>> >>>      >>>>>>>>>>>>>>> numEventsInCounter & numEventsOutCounter. If
>>>> there
>>>> >>>      any case
>>>> >>>      >>>>>>>> which
>>>> >>>      >>>>>>>>>>>> strongly
>>>> >>>      >>>>>>>>>>>>>>> requires them?
>>>> >>>      >>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>> Regarding the concerns of Chesnay,
>>>> >>>      >>>>>>>>>>>>>>>> A dedicated coordinator MG implementation is
>>>> >> overkill
>>>> >>>      >>>>>>>>>>>>>>> Directly using the job metric group can result
>>>> in
>>>> >>>      metric
>>>> >>>      >>>>>>>>> conflicts,
>>>> >>>      >>>>>>>>>>> as
>>>> >>>      >>>>>>>>>>>> mentioned
>>>> >>>      >>>>>>>>>>>>>>> in above #2.
>>>> >>>      >>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>> Thanks,
>>>> >>>      >>>>>>>>>>>>>>> Zhu
>>>> >>>      >>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>> Dong Lin <lindon...@gmail.com> 于2022年12月10日周六
>>>> >>>      >>>>>>>>>>>>>>> 14:16写道:
>>>> >>>      >>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>> Hi Chesney,
>>>> >>>      >>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>> Just to double check with you,
>>>> >>>      >>>>>>>>>>>>>>>> OperatorCoordinatorMetricGroup
>>>> >>>      >>>>>>>>>>>> (annotated as
>>>> >>>      >>>>>>>>>>>>>>>> @PublicEvolving) has already been introduced
>>>> into
>>>> >>>      Flink by
>>>> >>>      >>>>>>>>>> FLIP-179
>>>> >>>      >>>>>>>>>>>>>>>> <
>>>> >>>      >>
>>>> >>>
>>>> >>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics
>>>> >>>      >>>>>>>>>>>>> .
>>>> >>>      >>>>>>>>>>>>>>>> And that FLIP has got you +1.. Do you mean we
>>>> >>>      should remove
>>>> >>>      >>>>>>>>> this
>>>> >>>      >>>>>>>>>>>>>>>> OperatorCoordinatorMetricGroup?
>>>> >>>      >>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>> Regards,
>>>> >>>      >>>>>>>>>>>>>>>> Dong
>>>> >>>      >>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>> On Sat, Dec 10, 2022 at 1:33 AM Chesnay
>>>> Schepler <
>>>> >>>      >>>>>>>>>>> ches...@apache.org>
>>>> >>>      >>>>>>>>>>>> wrote:
>>>> >>>      >>>>>>>>>>>>>>>>> As a whole I feel like this FLIP is overly
>>>> >>>      complicated. A
>>>> >>>      >>>>>>>>>>> dedicated
>>>> >>>      >>>>>>>>>>>>>>>>> coordinator MG implementation is overkill; it
>>>> >>>      could just
>>>> >>>      >>>>>>>>> re-use
>>>> >>>      >>>>>>>>>>> the
>>>> >>>      >>>>>>>>>>>>>>>>> existing Task/OperatorMGs to create the same
>>>> >>>      structure we
>>>> >>>      >>>>>>>>> have
>>>> >>>      >>>>>>>>>> on
>>>> >>>      >>>>>>>>>>>> TMs,
>>>> >>>      >>>>>>>>>>>>>>>>> similar to what we did with the Job MG.
>>>> >>>      >>>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>>> However, I'm not convinced that this is
>>>> required
>>>> >>>      anyway,
>>>> >>>      >>>>>>>>>> because
>>>> >>>      >>>>>>>>>>>> all the
>>>> >>>      >>>>>>>>>>>>>>>>> example metrics you listed can be
>>>> implemented on
>>>> >>>      the TM
>>>> >>>      >>>>>>>> side
>>>> >>>      >>>>>>>>> +
>>>> >>>      >>>>>>>>>>>>>>>>> aggregating them in the external metrics
>>>> backend.
>>>> >>>      >>>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>>> Since I'm on holidays soon, just so no one
>>>> tries
>>>> >>>      to pull a
>>>> >>>      >>>>>>>>> fast
>>>> >>>      >>>>>>>>>>>> one on
>>>> >>>      >>>>>>>>>>>>>>>>> me, if this were to go to a vote as-is I'd be
>>>> >>>      against it.
>>>> >>>      >>>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>>> On 09/12/2022 15:30, Dong Lin wrote:
>>>> >>>      >>>>>>>>>>>>>>>>>> Hi Hang,
>>>> >>>      >>>>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>>>> Thanks for the FLIP! The FLIP looks good
>>>> and it
>>>> >>>      is pretty
>>>> >>>      >>>>>>>>>>>> informative.
>>>> >>>      >>>>>>>>>>>>>>>>>> I have just two minor comments regarding
>>>> names:
>>>> >>>      >>>>>>>>>>>>>>>>>> - Would it be useful to rename the config
>>>> key as
>>>> >>>      >>>>>>>>>>>>>>>>>> *metrics.scope.jm.job.operator-coordinator*
>>>> for
>>>> >>>      >>>>>>>> consistency
>>>> >>>      >>>>>>>>>>> with
>>>> >>>      >>>>>>>>>>>>>>>>>> *metrics.scope.jm.job
>>>> >>>      >>>>>>>>>>>>>>>>>> *(which is not named as *jm-job)?
>>>> >>>      >>>>>>>>>>>>>>>>>> - Maybe rename the variable as
>>>> >>>      >>>>>>>>>>> SCOPE_NAMING_OPERATOR_COORDINATOR
>>>> >>>      >>>>>>>>>>>> for
>>>> >>>      >>>>>>>>>>>>>>>>>> simplicity and consistency with
>>>> >>>      SCOPE_NAMING_OPERATOR
>>>> >>>      >>>>>>>>> (which
>>>> >>>      >>>>>>>>>> is
>>>> >>>      >>>>>>>>>>>> not named
>>>> >>>      >>>>>>>>>>>>>>>>>> as SCOPE_NAMING_TM_JOB_OPERATOR)?
>>>> >>>      >>>>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>>>> Cheers,
>>>> >>>      >>>>>>>>>>>>>>>>>> Dong
>>>> >>>      >>>>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>>>> On Thu, Dec 8, 2022 at 3:28 PM Hang Ruan <
>>>> >>>      >>>>>>>>>>> ruanhang1...@gmail.com>
>>>> >>>      >>>>>>>>>>>> wrote:
>>>> >>>      >>>>>>>>>>>>>>>>>>> Hi all,
>>>> >>>      >>>>>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>>>>> MengYue and I created FLIP-274[1] Introduce
>>>> >>>      metric group
>>>> >>>      >>>>>>>>> for
>>>> >>>      >>>>>>>>>>>>>>>>>>> OperatorCoordinator. OperatorCoordinator
>>>> is the
>>>> >>>      >>>>>>>>> coordinator
>>>> >>>      >>>>>>>>>>> for
>>>> >>>      >>>>>>>>>>>> runtime
>>>> >>>      >>>>>>>>>>>>>>>>>>> operators and running on Job Manager. The
>>>> >>>      coordination
>>>> >>>      >>>>>>>>>>>> mechanism is
>>>> >>>      >>>>>>>>>>>>>>>>>>> operator events between OperatorCoordinator
>>>> >>>      and its all
>>>> >>>      >>>>>>>>>>>> operators, the
>>>> >>>      >>>>>>>>>>>>>>>>>>> coordination is more and more using in
>>>> Flink, for
>>>> >>>      >>>>>>>> example
>>>> >>>      >>>>>>>>>> many
>>>> >>>      >>>>>>>>>>>> Sources
>>>> >>>      >>>>>>>>>>>>>>>>> and
>>>> >>>      >>>>>>>>>>>>>>>>>>> Sinks depend on the mechanism to assign
>>>> splits
>>>> >> and
>>>> >>>      >>>>>>>>>> coordinate
>>>> >>>      >>>>>>>>>>>> commits to
>>>> >>>      >>>>>>>>>>>>>>>>>>> external systems. The OperatorCoordinator
>>>> is
>>>> >>>      widely
>>>> >>>      >>>>>>>> using
>>>> >>>      >>>>>>>>> in
>>>> >>>      >>>>>>>>>>>> flink kafka
>>>> >>>      >>>>>>>>>>>>>>>>>>> connector, flink pulsar connector, flink
>>>> cdc
>>>> >>>      connector,
>>>> >>>      >>>>>>>>>> flink
>>>> >>>      >>>>>>>>>>>> hudi
>>>> >>>      >>>>>>>>>>>>>>>>>>> connector and so on.
>>>> >>>      >>>>>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>>>>> But there is not a suitable metric group
>>>> scope
>>>> >>>      for the
>>>> >>>      >>>>>>>>>>>>>>>>> OperatorCoordinator
>>>> >>>      >>>>>>>>>>>>>>>>>>> and not an implementation for the interface
>>>> >>>      >>>>>>>>>>>>>>>>> OperatorCoordinatorMetricGroup.
>>>> >>>      >>>>>>>>>>>>>>>>>>> These metrics in OperatorCoordinator could
>>>> be
>>>> >>>      how many
>>>> >>>      >>>>>>>>>>>> splits/partitions
>>>> >>>      >>>>>>>>>>>>>>>>>>> have been assigned to source readers, how
>>>> many
>>>> >>>      files
>>>> >>>      >>>>>>>> have
>>>> >>>      >>>>>>>>>> been
>>>> >>>      >>>>>>>>>>>> written
>>>> >>>      >>>>>>>>>>>>>>>>> out
>>>> >>>      >>>>>>>>>>>>>>>>>>> by sink writers, these metrics not only
>>>> help
>>>> >>>      users to
>>>> >>>      >>>>>>>> know
>>>> >>>      >>>>>>>>>> the
>>>> >>>      >>>>>>>>>>>> job
>>>> >>>      >>>>>>>>>>>>>>>>> progress
>>>> >>>      >>>>>>>>>>>>>>>>>>> but also make big job maintaining easier.
>>>> Thus we
>>>> >>>      >>>>>>>> propose
>>>> >>>      >>>>>>>>>> the
>>>> >>>      >>>>>>>>>>>> FLIP-274
>>>> >>>      >>>>>>>>>>>>>>>>> to
>>>> >>>      >>>>>>>>>>>>>>>>>>> introduce a new metric group scope for
>>>> >>>      >>>>>>>> OperatorCoordinator
>>>> >>>      >>>>>>>>>> and
>>>> >>>      >>>>>>>>>>>> provide
>>>> >>>      >>>>>>>>>>>>>>>>> an
>>>> >>>      >>>>>>>>>>>>>>>>>>> internal implementation for
>>>> >>>      >>>>>>>>> OperatorCoordinatorMetricGroup.
>>>> >>>      >>>>>>>>>>>>>>>>>>> Could you help review this FLIP when you
>>>> get
>>>> >>>      time? Any
>>>> >>>      >>>>>>>>>>> feedback
>>>> >>>      >>>>>>>>>>>> is
>>>> >>>      >>>>>>>>>>>>>>>>>>> appreciated!
>>>> >>>      >>>>>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>>>>> Best,
>>>> >>>      >>>>>>>>>>>>>>>>>>> Hang
>>>> >>>      >>>>>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>>>>> [1]
>>>> >>>      >>>>>>>>>>>>>>>>>>>
>>>> >>>      >>>>>>>>>>>>>>>>>>>
>>>> >>>      >>
>>>> >>>
>>>> >>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-274%3A+Introduce+metric+group+for+OperatorCoordinator
>>>> >>>      >>
>>>> >>>
>>>>
>>>>
>>>
>>

Reply via email to