Hi Dong and Martijin,
I'll create a FLIP for this public API change and put related links in this discussion. Best regards, Wencong At 2023-01-10 09:34:05, "Dong Lin" <lindon...@gmail.com> wrote: >Hi Martijn, > >No worries. Thanks for the discussion! > >Suppose we agree to deprecate the existing SourceReaderBase public >constructor, it means we plan to remove this construct (which could be >backward incompatible) sooner or later. I think it makes sense to at least >update all the externalized connectors listed under >github.com/apache/flink-connector-* to use the new constructor, before we >remove the existing constructor. > >Maybe Wencong can create a FLIP for this public API change and specify this >plan "Compatibility, Deprecation, and Migration Plan" section? The changes >are probably straightforward to make for each externalized connectors. I am >happy to make the change if it is not made after 2 Flink release cycles. > >Cheers, >Dong > > > > >On Mon, Jan 9, 2023 at 4:27 PM Martijn Visser <martijnvis...@apache.org> >wrote: > >> Hi Dong, >> >> I think the key part is indeed your sentence that "FLIP-33 intends to >> standard the convention of metric name/semantic, but not the >> implementation." >> When I read FLIP-33, that's not how I interpreted it. So I've been reading >> back in the FLIP-33 discussion thread, there were some more clarifications >> in that thread that it is indeed only meant for standardizing metric names. >> Should have done that earlier, sorry for that. >> >> As part of this proposal, does that also include updating the existing >> connectors (given that a lot of them have been externalized) to not use the >> to-be-deprecated SourceReaderBase constructor? >> >> Best regards, >> >> Martijn >> >> On Mon, Jan 9, 2023 at 1:39 AM Dong Lin <lindon...@gmail.com> wrote: >> >> > Hi Martijn, >> > >> > By saying "Each connector defines their own metrics at the moment", what >> > FLIP-33 intends to avoid is that connectA defines a metric named >> > numRecordsIn and connectB defines a metric named numOfRecordsIn. In this >> > case, these two metrics would have the same semantic meaning but >> different >> > name in the dashboard, which complicates the monitoring for users. >> FLIP-33 >> > intends to standard the convention of metric name/semantic, but not the >> > implementation. >> > >> > It is mentioned in FLIP-33 that "A connector implementation does not have >> > to report all the defined metrics. But if a connector reports a metric of >> > the same semantic defined below, the implementation should follow the >> > convention". >> > >> > What is proposed in this thread will only affect the connection >> > implementation, i.e. SourceReaderBase would no longer increment >> > numRecordsIn. But it would not change the name/semantic of numRecordsIn. >> > >> > Note that even if we don't change SourceReaderBase, a subclass of >> > SourceReaderBase (or Source in general) can still update numRecordsIn in >> a >> > way that breaks the semantic of this metric mentioned in FLIP-33. There >> is >> > no way we can programmatically prevent that from happening. Developers of >> > respective connectors need to be aware of the convention and enforce it >> > properly. >> > >> > Regards, >> > Dong >> > >> > >> > >> > On Mon, Jan 9, 2023 at 6:06 AM Martijn Visser <martijnvis...@apache.org> >> > wrote: >> > >> > > Hi Dong, >> > > >> > > In the proposal from Wencong is stated "If source reader want to report >> > to >> > > metric by self, it can set source.reader.metric.num_records_in.override >> > to >> > > true, which disables the registration of numRecordsIn in >> SourceReaderBase >> > > and let the actual implementation to report the metric instead." >> > > >> > > While in FLIP-33 is stated "Each connector defines their own metrics at >> > the >> > > moment. This complicates operation and monitoring. Admittedly, >> different >> > > connectors may have different metrics, but some commonly used metrics >> can >> > > probably be standardized. This FLIP proposes a set of standard >> connector >> > > metrics that each connector should emit if applicable." >> > > >> > > With this proposal, the actual implementation can be disabled in >> > > SourceReaderBase and the connector can define its own metric again. >> With >> > > how I read FLIP-33, the purpose was exactly to avoid that. So you can >> > have >> > > a connector that has its own definition for numRecordsIn, which it >> > > submits/sends as numRecordsIn, which basically breaks it being a >> default >> > > metric. >> > > >> > > Best regards, >> > > >> > > Martijn >> > > >> > > On Sun, Jan 8, 2023 at 3:26 PM Dong Lin <lindon...@gmail.com> wrote: >> > > >> > > > Hi Martijn, >> > > > >> > > > I think the change proposed in this thread would *not* break the idea >> > in >> > > > FLIP-33. FLIP-33 standardized metrics such as numRecordsIn, by >> > specifying >> > > > the name/type/semantics of those metrics so that all connectors can >> > > follow. >> > > > The name/type/semantics of numRecordsIn metric would be the same >> before >> > > and >> > > > after the proposed change. >> > > > >> > > > Does this make sense? Or could you explain which part of FLIP-33 >> would >> > be >> > > > broken by the proposed change? >> > > > >> > > > Regards, >> > > > Dong >> > > > >> > > > >> > > > On Sun, Jan 8, 2023 at 9:33 PM Martijn Visser < >> > martijnvis...@apache.org> >> > > > wrote: >> > > > >> > > > > Hi all, >> > > > > >> > > > > This feels like we purposely want to abandon the idea that was >> > > introduced >> > > > > with FLIP-33 on standardizing metrics [1]. From this proposal, I >> > don't >> > > > see >> > > > > why we want to abandon that idea. Next to that, if you override >> > > > > the numRecordsIn logic, you also touch on the other metrics that >> rely >> > > on >> > > > > this value. >> > > > > >> > > > > Best regards, >> > > > > >> > > > > Martijn >> > > > > >> > > > > [1] >> > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics >> > > > > >> > > > > On Sun, Jan 8, 2023 at 12:43 PM Wencong Liu <liuwencle...@163.com> >> > > > wrote: >> > > > > >> > > > > > Thanks for the discussion, Dong and John >> > > > > > >> > > > > > >> > > > > > Considering the extra cost of method calls, the approach of >> adding >> > an >> > > > > > extra SourceReaderBase constructor >> > > > > > should be a better choice. If there is no further discussion, I >> > will >> > > > > > follow this plan. >> > > > > > >> > > > > > >> > > > > > Best, >> > > > > > Wencong >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > At 2023-01-08 01:05:07, "John Roesler" <vvcep...@apache.org> >> > wrote: >> > > > > > >Thanks, for the discussion, Dong. >> > > > > > > >> > > > > > >To answer your question: I was unclear if the desire was to >> > > deprecate >> > > > > the >> > > > > > metric itself, to be replaced with some other metric, or whether >> we >> > > > just >> > > > > > wanted to deprecate the way the superclass manages it. It’s clear >> > now >> > > > > that >> > > > > > we want to keep the metric and only change the superclass logic. >> > > > > > > >> > > > > > >I think we’re on the same page now. >> > > > > > > >> > > > > > >Thanks! >> > > > > > >-John >> > > > > > > >> > > > > > >On Sat, Jan 7, 2023, at 07:21, Dong Lin wrote: >> > > > > > >> Hi John, >> > > > > > >> >> > > > > > >> Not sure if I understand the difference between "deprecate >> that >> > > > > metric" >> > > > > > and >> > > > > > >> "deprecate the private counter mechanism". I think what we >> want >> > is >> > > > to >> > > > > > >> update SourceReaderBase's implementation so that it no longer >> > > > > explicitly >> > > > > > >> increments this metric. But we still need to expose this >> metric >> > to >> > > > > user. >> > > > > > >> And methods such as RecordEmitter#emitRecord (which can be >> > invoked >> > > > by >> > > > > > >> SourceReaderBase#pollNext(...)) can potentially increment this >> > > > metric. >> > > > > > >> >> > > > > > >> I like your approach of adding an extra SourceReaderBase >> > > > constructor. >> > > > > > That >> > > > > > >> appears simpler than adding a deprecated config. We can mark >> the >> > > > > > existing >> > > > > > >> SourceReaderBase constructor as @deprecated. >> > > > > > SourceReaderBase#pollNext(...) >> > > > > > >> will not increment the counter if a subclass uses the newly >> > added >> > > > > > >> constructor. >> > > > > > >> >> > > > > > >> Cheers, >> > > > > > >> Dong >> > > > > > >> >> > > > > > >> >> > > > > > >> On Sat, Jan 7, 2023 at 4:47 AM John Roesler < >> > vvcep...@apache.org> >> > > > > > wrote: >> > > > > > >> >> > > > > > >>> Thanks for the replies, Dong and Wencong! >> > > > > > >>> >> > > > > > >>> That’s a good point about the overhead of the extra method. >> > > > > > >>> >> > > > > > >>> Is the desire to actually deprecate that metric in a >> > user-facing >> > > > way, >> > > > > > or >> > > > > > >>> just to deprecate the private counter mechanism? >> > > > > > >>> >> > > > > > >>> It seems like if the desire is to deprecate the existing >> > private >> > > > > > counter, >> > > > > > >>> we can accomplish it by deprecating the current constructor >> and >> > > > > > offering >> > > > > > >>> another that is documented not to track the metric. This >> seems >> > > > better >> > > > > > than >> > > > > > >>> the config option, since the concern is purely about the >> > division >> > > > of >> > > > > > >>> responsibilities between the sub- and super-class. >> > > > > > >>> >> > > > > > >>> Another option, which might be better if we wish to keep a >> > > > uniformly >> > > > > > named >> > > > > > >>> metric, would be to simply make the counter protected. That >> > would >> > > > be >> > > > > > better >> > > > > > >>> if we’re generally happy with the metric and counter, but a >> few >> > > > > special >> > > > > > >>> source connectors need to emit records in other ways. >> > > > > > >>> >> > > > > > >>> And finally, if we really want to get rid of the metric >> itself, >> > > > then >> > > > > I >> > > > > > >>> agree, a config is the way to do it. >> > > > > > >>> >> > > > > > >>> Thanks, >> > > > > > >>> John >> > > > > > >>> >> > > > > > >>> On Fri, Jan 6, 2023, at 00:55, Dong Lin wrote: >> > > > > > >>> > Hi John and Wencong, >> > > > > > >>> > >> > > > > > >>> > Thanks for the reply! >> > > > > > >>> > >> > > > > > >>> > It is nice that optional-2 can address the problem without >> > > > > affecting >> > > > > > the >> > > > > > >>> > existing source connectors as far as functionality is >> > > concerned. >> > > > > One >> > > > > > >>> > potential concern with this approach is that it might >> > increase >> > > > the >> > > > > > Flink >> > > > > > >>> > runtime overhead by adding one more virtual functional call >> > to >> > > > the >> > > > > > >>> > per-record runtime call stack. >> > > > > > >>> > >> > > > > > >>> > Since Java's default MaxInlineLevel is 12-18, I believe it >> is >> > > > easy >> > > > > > for an >> > > > > > >>> > operator chain of 5+ operators to exceed this limit. In >> this >> > > > case. >> > > > > > And >> > > > > > >>> > option-2 would incur one more virtual table lookup to >> produce >> > > > each >> > > > > > >>> record. >> > > > > > >>> > It is not clear how much this overhead would show up for >> jobs >> > > > with >> > > > > a >> > > > > > >>> chain >> > > > > > >>> > of lightweight operators. I am recently working on >> > FLINK-30531 >> > > > > > >>> > <https://issues.apache.org/jira/browse/FLINK-30531> to >> > reduce >> > > > > > runtime >> > > > > > >>> > overhead which might be related to this discussion. >> > > > > > >>> > >> > > > > > >>> > In comparison to option-2, the option-3 provided in my >> > earlier >> > > > > email >> > > > > > >>> would >> > > > > > >>> > not add this extra overhead. I think it might be worthwhile >> > to >> > > > > > invest in >> > > > > > >>> > the long-term performance (and simpler runtime infra) and >> pay >> > > for >> > > > > the >> > > > > > >>> > short-term cost of deprecating this metric in >> > > SourceOperatorBase. >> > > > > > What do >> > > > > > >>> > you think? >> > > > > > >>> > >> > > > > > >>> > Regards, >> > > > > > >>> > Dong >> > > > > > >>> > >> > > > > > >>> > >> > > > > > >>> > On Thu, Jan 5, 2023 at 10:10 PM Wencong Liu < >> > > > liuwencle...@163.com> >> > > > > > >>> wrote: >> > > > > > >>> > >> > > > > > >>> >> Hi, All >> > > > > > >>> >> >> > > > > > >>> >> >> > > > > > >>> >> Thanks for the reply! >> > > > > > >>> >> >> > > > > > >>> >> >> > > > > > >>> >> I think both John and Dong's opinions are reasonable. >> John's >> > > > > > Suggestion >> > > > > > >>> 2 >> > > > > > >>> >> is a good implementation. >> > > > > > >>> >> It does not affect the existing source connectors, but >> also >> > > > > provides >> > > > > > >>> >> support >> > > > > > >>> >> for custom counter in the future implementation. >> > > > > > >>> >> >> > > > > > >>> >> >> > > > > > >>> >> WDYT? >> > > > > > >>> >> >> > > > > > >>> >> >> > > > > > >>> >> Best, >> > > > > > >>> >> >> > > > > > >>> >> Wencong Liu >> > > > > > >>> >> > > > > > >> > > > > >> > > > >> > > >> > >>