As a supplement, considering it could be a big reconstruction
redefining internal and external traffic and touching metric names in
almost all operators, this requires a lot of discussions and we might
do it finally in Flink 2.0. I think compatibility is a bigger blocker
in front of us, as the output of sink is a metric that users care a
lot about.

Thanks,
Qingsheng

On Wed, Oct 12, 2022 at 6:20 PM Qingsheng Ren <re...@apache.org> wrote:
>
> Thanks Chesnay for the reply. +1 for making a unified and clearer
> metric definition distinguishing internal and external data transfers.
> As you described, having IO in operators is quite common such as
> dimension tables in Table/SQL API. This definitely deserves a FLIP and
> an overall design.
>
> However I think it's necessary to change the metric back to
> numRecordsOut instead of sticking with numRecordsSend in 1.15 and
> 1.16. The most important argument is for compatibility as I mentioned
> in my previous email, otherwise all users have to modify their configs
> of metric systems after upgrading to Flink 1.15+, and all custom
> connectors have to change their implementations to migrate to the new
> metric name. I believe other ones participating and approving this
> proposal share the same concern about compatibility too. Also
> considering this issue is blocking the release of 1.16, maybe we could
> fix this asap, and as for defining a new metric for internal data
> transfers we can have an in-depth discussion later. WDYT?
>
> Best,
> Qingsheng
>
> On Tue, Oct 11, 2022 at 6:06 PM Chesnay Schepler <ches...@apache.org> wrote:
> >
> > Currently I think that would be a mistake.
> >
> > Ultimately what we have here is the culmination of us never really 
> > considering how the numRecordsOut metric should behave for operators that 
> > emit data to other operators _and_ external systems. This goes beyond sinks.
> > This even applies to numRecordsIn, for cases where functions query/write 
> > data from/to the outside, (e.g., Async IO).
> >
> > Having 2 separate metrics for that, 1 exclusively for internal data 
> > transfers, and 1 exclusively for external data transfers, is the only way 
> > to get a consistent metric definition in the long-run.
> > We can jump back-and-forth now or just commit to it.
> >
> > I don't think we can really judge this based on FLIP-33. It was IIRC 
> > written before the two phase sinks were added, which heavily blurred the 
> > lines of what a sink even is. Because it definitely is _not_ the last 
> > operator in a chain anymore.
> >
> > What I would suggest is to stick with what we got (although I despise the 
> > name numRecordsSend), and alias the numRecordsOut metric for all 
> > non-TwoPhaseCommittingSink.
> >
> > On 11/10/2022 05:54, Qingsheng Ren wrote:
> >
> > Thanks for the details Chesnay!
> >
> > By “alias” I mean to respect the original definition made in FLIP-33 for 
> > numRecordsOut, which is the number of records written to the external 
> > system, and keep numRecordsSend as the same value as numRecordsOut for 
> > compatibility.
> >
> > I think keeping numRecordsOut for the output to the external system is more 
> > intuitive to end users because in most cases the metric of data flow output 
> > is more essential. I agree with you that a new metric is required, but 
> > considering compatibility and users’ intuition I prefer to keep the initial 
> > definition of numRecordsOut in FLIP-33 and name a new metric for sink 
> > writer’s output to downstream operators. This might be against consistency 
> > with metrics in other operators in Flink but maybe it’s acceptable to have 
> > the sink as a special case.
> >
> > Best,
> > Qingsheng
> > On Oct 10, 2022, 19:13 +0800, Chesnay Schepler <ches...@apache.org>, wrote:
> >
> > > I’m with Xintong’s idea to treat numXXXSend as an alias of numXXXOut
> >
> > But that's not possible. If it were that simple there would have never been 
> > a need to introduce another metric in the first place.
> >
> > It's a rather fundamental issue with how the new sinks work, in that they 
> > emit data to the external system (usually considered as "numRecordsOut" of 
> > sinks) while _also_ sending data to a downstream operator (usually 
> > considered as "numRecordsOut" of tasks).
> > The original issue was that the numRecordsOut of the sink counted both 
> > (which is completely wrong).
> >
> > A new metric was always required; otherwise you inevitably end up breaking 
> > some semantic.
> > Adding a new metric for what the sink writes to the external system is, for 
> > better or worse, more consistent with how these metrics usually work in 
> > Flink.
> >
> > On 10/10/2022 12:45, Qingsheng Ren wrote:
> >
> > Thanks everyone for joining the discussion!
> >
> > > Do you have any idea what has happened in the process here?
> >
> > The discussion in this PR [1] shows some details and could be helpful to 
> > understand the original motivation of the renaming. We do have a test case 
> > for guarding metrics but unfortunaly the case was also modified so the 
> > defense was broken.
> >
> > I think the reason why both the developer and the reviewer forgot to 
> > trigger an discussion and gave a green pass on the change is that metrics 
> > are quite “trivial” to be noticed as public APIs. As mentioned by Martijn I 
> > couldn’t find a place noting that metrics are public APIs and should be 
> > treated carefully while contributing and reviewing.
> >
> > IMHO three actions could be made to prevent this kind of changes in the 
> > future:
> >
> > a. Add test case for metrics (which we already have in SinkMetricsITCase)
> > b. We emphasize that any public-interface breaking changes should be 
> > proposed by a FLIP or discussed in mailing list, and should be listed in 
> > the release note.
> > c. We remind contributors and reviewers about what should be considered as 
> > public API, and include metric names in it.
> >
> > For b and c these two pages [2][3] might be proper places.
> >
> > About the patch to revert this, it looks like we have a consensus on 1.16. 
> > As of 1.15 I think it’s worthy to trigger a minor version. I didn’t see 
> > complaints about this for now so it should be OK to save the situation 
> > asap. I’m with Xintong’s idea to treat numXXXSend as an alias of numXXXOut 
> > considering there could possibly some users have already adapted their 
> > system to the new naming, and have another internal metric for reflecting 
> > number of outgoing committable batches (actually the numRecordsIn of sink 
> > committer operator should be carrying this info already).
> >
> > [1] https://github.com/apache/flink/pull/18825
> > [2] https://flink.apache.org/contributing/contribute-code.html
> > [3] https://flink.apache.org/contributing/reviewing-prs.html
> >
> > Best,
> > Qingsheng
> > On Oct 10, 2022, 17:40 +0800, Xintong Song <tonysong...@gmail.com>, wrote:
> >
> > +1 for reverting these changes in Flink 1.16.
> >
> > For 1.15.3, can we make these metrics available via both names (numXXXOut 
> > and numXXXSend)? In this way we don't break it for those who already 
> > migrated to 1.15 and numXXXSend. That means we still need to change 
> > SinkWriterOperator to use another metric name in 1.15.3, which IIUC is 
> > internal to Flink sink.
> >
> > I'm overall +1 to change numXXXOut back to its original semantics. AFAIK 
> > (from meetup / flink-forward questionaires), most users do not migrate to a 
> > new Flink release immediately, until the next 1-2 major releases are out.
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Mon, Oct 10, 2022 at 5:26 PM Martijn Visser <martijnvis...@apache.org> 
> > wrote:
> >>
> >> Hi Qingsheng,
> >>
> >> Do you have any idea what has happened in the process here? Do we know why
> >> they were changed? I was under the impression that these metric names were
> >> newly introduced due to the new interfaces and because it still depends on
> >> each connector implementing these.
> >>
> >> Sidenote: metric names are not mentioned in the FLIP process as a public
> >> API. Might make sense to have a separate follow-up to add that to the list
> >> (I do think we should list them there).
> >>
> >> +1 for reverting this and make this change in Flink 1.16
> >>
> >> I'm not in favour of releasing a Flink 1.15.3 with this change: I think the
> >> impact is too big for a patch version, especially given how long Flink 1.15
> >> is already out there.
> >>
> >> Best regards,
> >>
> >> Martijn
> >>
> >> On Mon, Oct 10, 2022 at 11:13 AM Leonard Xu <xbjt...@gmail.com> wrote:
> >>
> >> > Thanks Qingsheng for starting this thread.
> >> >
> >> > +1 on reverting sink metric name and releasing 1.15.3 to fix this
> >> > inconsistent behavior.
> >> >
> >> >
> >> > Best,
> >> > Leonard
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > 2022年10月10日 下午3:06,Jark Wu <imj...@gmail.com> 写道:
> >> >
> >> > Thanks for discovering this problem, Qingsheng!
> >> >
> >> > I'm also +1 for reverting the breaking changes.
> >> >
> >> > IIUC, currently, the behavior of "numXXXOut" metrics of the new and old
> >> > sink is inconsistent.
> >> > We have to break one of them to have consistent behavior. Sink V2 is an
> >> > evolving API which is just introduced in 1.15.
> >> > I think it makes sense to break the unstable API instead of the stable 
> >> > API
> >> > which many connectors and users depend on.
> >> >
> >> > Best,
> >> > Jark
> >> >
> >> >
> >> >
> >> > On Mon, 10 Oct 2022 at 11:36, Jingsong Li <jingsongl...@gmail.com> wrote:
> >> >
> >> >> Thanks for driving, Qingsheng.
> >> >>
> >> >> +1 for reverting sink metric name.
> >> >>
> >> >> We often forget that metric is also one of the important APIs.
> >> >>
> >> >> +1 for releasing 1.15.3 to fix this.
> >> >>
> >> >> Best,
> >> >> Jingsong
> >> >>
> >> >> On Sun, Oct 9, 2022 at 11:35 PM Becket Qin <becket....@gmail.com> wrote:
> >> >> >
> >> >> > Thanks for raising the discussion, Qingsheng,
> >> >> >
> >> >> > +1 on reverting the breaking changes.
> >> >> >
> >> >> > In addition, we might want to release a 1.15.3 to fix this and update
> >> >> the previous release docs with this known issue, so that users can 
> >> >> upgrade
> >> >> to 1.15.3 when they hit it. It would also be good to add some backwards
> >> >> compatibility tests on metrics to avoid unintended breaking changes like
> >> >> this in the future.
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > Jiangjie (Becket) Qin
> >> >> >
> >> >> > On Sun, Oct 9, 2022 at 10:35 AM Qingsheng Ren <re...@apache.org> 
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi devs and users,
> >> >> >>
> >> >> >> I’d like to start a discussion about reverting a breaking change 
> >> >> >> about
> >> >> sink metrics made in 1.15 by FLINK-26126 [1] and FLINK-26492 [2].
> >> >> >>
> >> >> >> TL;DR
> >> >> >>
> >> >> >> All sink metrics with name “numXXXOut” defined in FLIP-33 are replace
> >> >> by “numXXXSend” in FLINK-26126 and FLINK-26492. Considering metric names
> >> >> are public APIs, this is a breaking change to end users and not backward
> >> >> compatible. Also unfortunately this breaking change was not discussed in
> >> >> the mailing list before.
> >> >> >>
> >> >> >> Background
> >> >> >>
> >> >> >> As defined previously in FLIP-33 (the FLIP page has been changed so
> >> >> please refer to the old version [3] ), metric “numRecordsOut” is used 
> >> >> for
> >> >> reporting the total number of output records since the sink started 
> >> >> (number
> >> >> of records written to the external system), and similarly for
> >> >> “numRecordsOutPerSecond”, “numBytesOut”, “numBytesOutPerSecond” and
> >> >> “numRecordsOutError”. Most sinks are following this naming and 
> >> >> definition.
> >> >> However, these metrics are ambiguous in the new Sink API as “numXXXOut”
> >> >> could be used by the output of SinkWriterOperator for reporting number 
> >> >> of
> >> >> Committables delivered to SinkCommitterOperator. In order to resolve the
> >> >> conflict, FLINK-26126 and FLINK-26492 changed names of these metrics 
> >> >> with
> >> >> “numXXXSend”.
> >> >> >>
> >> >> >> Necessity of reverting this change
> >> >> >>
> >> >> >> - Metric names are actually public API, as end users need to 
> >> >> >> configure
> >> >> metric collecting and alerting system with metric names. Users have to
> >> >> reset all configurations related to affected metrics.
> >> >> >> - This could also affect custom and external sinks not maintained by
> >> >> Flink, which might have implemented with numXXXOut metrics.
> >> >> >> - The number of records sent to external system is way more important
> >> >> than the number of Committables sent to SinkCommitterOperator, as the
> >> >> latter one is just an internal implementation of sink. We could have a 
> >> >> new
> >> >> metric name for the latter one instead.
> >> >> >> - We could avoid splitting the project by version (like “plz use
> >> >> numXXXOut before 1.15 and use numXXXSend after”) if we revert it ASAP,
> >> >> cosidering 1.16 is still not released for now.
> >> >> >>
> >> >> >> As a consequence, I’d like to hear from devs and users about your
> >> >> opinion on changing these metrics back to “numXXXOut”.
> >> >> >>
> >> >> >> Looking forward to your reply!
> >> >> >>
> >> >> >> [1] https://issues.apache.org/jira/browse/FLINK-26126
> >> >> >> [2] https://issues.apache.org/jira/browse/FLINK-26492
> >> >> >> [1] FLIP-33, version 18:
> >> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883136
> >> >> >>
> >> >> >> Best,
> >> >> >> Qingsheng
> >> >>
> >> >
> >> >
> >
> >
> >

Reply via email to