Thanks for the details Chesnay! By “alias” I mean to respect the original definition made in FLIP-33 for numRecordsOut, which is the number of records written to the external system, and keep numRecordsSend as the same value as numRecordsOut for compatibility.
I think keeping numRecordsOut for the output to the external system is more intuitive to end users because in most cases the metric of data flow output is more essential. I agree with you that a new metric is required, but considering compatibility and users’ intuition I prefer to keep the initial definition of numRecordsOut in FLIP-33 and name a new metric for sink writer’s output to downstream operators. This might be against consistency with metrics in other operators in Flink but maybe it’s acceptable to have the sink as a special case. Best, Qingsheng On Oct 10, 2022, 19:13 +0800, Chesnay Schepler <ches...@apache.org>, wrote: > > I’m with Xintong’s idea to treat numXXXSend as an alias of numXXXOut > > But that's not possible. If it were that simple there would have never been a > need to introduce another metric in the first place. > > It's a rather fundamental issue with how the new sinks work, in that they > emit data to the external system (usually considered as "numRecordsOut" of > sinks) while _also_ sending data to a downstream operator (usually considered > as "numRecordsOut" of tasks). > The original issue was that the numRecordsOut of the sink counted both (which > is completely wrong). > > A new metric was always required; otherwise you inevitably end up breaking > some semantic. > Adding a new metric for what the sink writes to the external system is, for > better or worse, more consistent with how these metrics usually work in Flink. > > On 10/10/2022 12:45, Qingsheng Ren wrote: > > Thanks everyone for joining the discussion! > > > > > Do you have any idea what has happened in the process here? > > > > The discussion in this PR [1] shows some details and could be helpful to > > understand the original motivation of the renaming. We do have a test case > > for guarding metrics but unfortunaly the case was also modified so the > > defense was broken. > > > > I think the reason why both the developer and the reviewer forgot to > > trigger an discussion and gave a green pass on the change is that metrics > > are quite “trivial” to be noticed as public APIs. As mentioned by Martijn I > > couldn’t find a place noting that metrics are public APIs and should be > > treated carefully while contributing and reviewing. > > > > IMHO three actions could be made to prevent this kind of changes in the > > future: > > > > a. Add test case for metrics (which we already have in SinkMetricsITCase) > > b. We emphasize that any public-interface breaking changes should be > > proposed by a FLIP or discussed in mailing list, and should be listed in > > the release note. > > c. We remind contributors and reviewers about what should be considered as > > public API, and include metric names in it. > > > > For b and c these two pages [2][3] might be proper places. > > > > About the patch to revert this, it looks like we have a consensus on 1.16. > > As of 1.15 I think it’s worthy to trigger a minor version. I didn’t see > > complaints about this for now so it should be OK to save the situation > > asap. I’m with Xintong’s idea to treat numXXXSend as an alias of numXXXOut > > considering there could possibly some users have already adapted their > > system to the new naming, and have another internal metric for reflecting > > number of outgoing committable batches (actually the numRecordsIn of sink > > committer operator should be carrying this info already). > > > > [1] https://github.com/apache/flink/pull/18825 > > [2] https://flink.apache.org/contributing/contribute-code.html > > [3] https://flink.apache.org/contributing/reviewing-prs.html > > > > Best, > > Qingsheng > > On Oct 10, 2022, 17:40 +0800, Xintong Song <tonysong...@gmail.com>, wrote: > > > +1 for reverting these changes in Flink 1.16. > > > > > > For 1.15.3, can we make these metrics available via both names (numXXXOut > > > and numXXXSend)? In this way we don't break it for those who already > > > migrated to 1.15 and numXXXSend. That means we still need to change > > > SinkWriterOperator to use another metric name in 1.15.3, which IIUC is > > > internal to Flink sink. > > > > > > I'm overall +1 to change numXXXOut back to its original semantics. AFAIK > > > (from meetup / flink-forward questionaires), most users do not migrate to > > > a new Flink release immediately, until the next 1-2 major releases are > > > out. > > > > > > Best, > > > Xintong > > > > > > > > > > On Mon, Oct 10, 2022 at 5:26 PM Martijn Visser > > > > <martijnvis...@apache.org> wrote: > > > > > Hi Qingsheng, > > > > > > > > > > Do you have any idea what has happened in the process here? Do we > > > > > know why > > > > > they were changed? I was under the impression that these metric names > > > > > were > > > > > newly introduced due to the new interfaces and because it still > > > > > depends on > > > > > each connector implementing these. > > > > > > > > > > Sidenote: metric names are not mentioned in the FLIP process as a > > > > > public > > > > > API. Might make sense to have a separate follow-up to add that to the > > > > > list > > > > > (I do think we should list them there). > > > > > > > > > > +1 for reverting this and make this change in Flink 1.16 > > > > > > > > > > I'm not in favour of releasing a Flink 1.15.3 with this change: I > > > > > think the > > > > > impact is too big for a patch version, especially given how long > > > > > Flink 1.15 > > > > > is already out there. > > > > > > > > > > Best regards, > > > > > > > > > > Martijn > > > > > > > > > > On Mon, Oct 10, 2022 at 11:13 AM Leonard Xu <xbjt...@gmail.com> wrote: > > > > > > > > > > > Thanks Qingsheng for starting this thread. > > > > > > > > > > > > +1 on reverting sink metric name and releasing 1.15.3 to fix this > > > > > > inconsistent behavior. > > > > > > > > > > > > > > > > > > Best, > > > > > > Leonard > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2022年10月10日 下午3:06,Jark Wu <imj...@gmail.com> 写道: > > > > > > > > > > > > Thanks for discovering this problem, Qingsheng! > > > > > > > > > > > > I'm also +1 for reverting the breaking changes. > > > > > > > > > > > > IIUC, currently, the behavior of "numXXXOut" metrics of the new and > > > > > > old > > > > > > sink is inconsistent. > > > > > > We have to break one of them to have consistent behavior. Sink V2 > > > > > > is an > > > > > > evolving API which is just introduced in 1.15. > > > > > > I think it makes sense to break the unstable API instead of the > > > > > > stable API > > > > > > which many connectors and users depend on. > > > > > > > > > > > > Best, > > > > > > Jark > > > > > > > > > > > > > > > > > > > > > > > > On Mon, 10 Oct 2022 at 11:36, Jingsong Li <jingsongl...@gmail.com> > > > > > > wrote: > > > > > > > > > > > >> Thanks for driving, Qingsheng. > > > > > >> > > > > > >> +1 for reverting sink metric name. > > > > > >> > > > > > >> We often forget that metric is also one of the important APIs. > > > > > >> > > > > > >> +1 for releasing 1.15.3 to fix this. > > > > > >> > > > > > >> Best, > > > > > >> Jingsong > > > > > >> > > > > > >> On Sun, Oct 9, 2022 at 11:35 PM Becket Qin <becket....@gmail.com> > > > > > >> wrote: > > > > > >> > > > > > > >> > Thanks for raising the discussion, Qingsheng, > > > > > >> > > > > > > >> > +1 on reverting the breaking changes. > > > > > >> > > > > > > >> > In addition, we might want to release a 1.15.3 to fix this and > > > > > >> > update > > > > > >> the previous release docs with this known issue, so that users can > > > > > >> upgrade > > > > > >> to 1.15.3 when they hit it. It would also be good to add some > > > > > >> backwards > > > > > >> compatibility tests on metrics to avoid unintended breaking > > > > > >> changes like > > > > > >> this in the future. > > > > > >> > > > > > > >> > Thanks, > > > > > >> > > > > > > >> > Jiangjie (Becket) Qin > > > > > >> > > > > > > >> > On Sun, Oct 9, 2022 at 10:35 AM Qingsheng Ren <re...@apache.org> > > > > > >> > wrote: > > > > > >> >> > > > > > >> >> Hi devs and users, > > > > > >> >> > > > > > >> >> I’d like to start a discussion about reverting a breaking > > > > > >> >> change about > > > > > >> sink metrics made in 1.15 by FLINK-26126 [1] and FLINK-26492 [2]. > > > > > >> >> > > > > > >> >> TL;DR > > > > > >> >> > > > > > >> >> All sink metrics with name “numXXXOut” defined in FLIP-33 are > > > > > >> >> replace > > > > > >> by “numXXXSend” in FLINK-26126 and FLINK-26492. Considering metric > > > > > >> names > > > > > >> are public APIs, this is a breaking change to end users and not > > > > > >> backward > > > > > >> compatible. Also unfortunately this breaking change was not > > > > > >> discussed in > > > > > >> the mailing list before. > > > > > >> >> > > > > > >> >> Background > > > > > >> >> > > > > > >> >> As defined previously in FLIP-33 (the FLIP page has been > > > > > >> >> changed so > > > > > >> please refer to the old version [3] ), metric “numRecordsOut” is > > > > > >> used for > > > > > >> reporting the total number of output records since the sink > > > > > >> started (number > > > > > >> of records written to the external system), and similarly for > > > > > >> “numRecordsOutPerSecond”, “numBytesOut”, “numBytesOutPerSecond” and > > > > > >> “numRecordsOutError”. Most sinks are following this naming and > > > > > >> definition. > > > > > >> However, these metrics are ambiguous in the new Sink API as > > > > > >> “numXXXOut” > > > > > >> could be used by the output of SinkWriterOperator for reporting > > > > > >> number of > > > > > >> Committables delivered to SinkCommitterOperator. In order to > > > > > >> resolve the > > > > > >> conflict, FLINK-26126 and FLINK-26492 changed names of these > > > > > >> metrics with > > > > > >> “numXXXSend”. > > > > > >> >> > > > > > >> >> Necessity of reverting this change > > > > > >> >> > > > > > >> >> - Metric names are actually public API, as end users need to > > > > > >> >> configure > > > > > >> metric collecting and alerting system with metric names. Users > > > > > >> have to > > > > > >> reset all configurations related to affected metrics. > > > > > >> >> - This could also affect custom and external sinks not > > > > > >> >> maintained by > > > > > >> Flink, which might have implemented with numXXXOut metrics. > > > > > >> >> - The number of records sent to external system is way more > > > > > >> >> important > > > > > >> than the number of Committables sent to SinkCommitterOperator, as > > > > > >> the > > > > > >> latter one is just an internal implementation of sink. We could > > > > > >> have a new > > > > > >> metric name for the latter one instead. > > > > > >> >> - We could avoid splitting the project by version (like “plz use > > > > > >> numXXXOut before 1.15 and use numXXXSend after”) if we revert it > > > > > >> ASAP, > > > > > >> cosidering 1.16 is still not released for now. > > > > > >> >> > > > > > >> >> As a consequence, I’d like to hear from devs and users about > > > > > >> >> your > > > > > >> opinion on changing these metrics back to “numXXXOut”. > > > > > >> >> > > > > > >> >> Looking forward to your reply! > > > > > >> >> > > > > > >> >> [1] https://issues.apache.org/jira/browse/FLINK-26126 > > > > > >> >> [2] https://issues.apache.org/jira/browse/FLINK-26492 > > > > > >> >> [1] FLIP-33, version 18: > > > > > >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883136 > > > > > >> >> > > > > > >> >> Best, > > > > > >> >> Qingsheng > > > > > >> > > > > > > > > > > > > >