Thanks the discussion of Dong and Martijn!
Since the constructor is only marked with @deprecated now, it's compatible with
the existing connectors.
This proposal mainly provides support for the connectors that need override
numRecordsIn report logic, which
can be updated according to their release cadence.
WDYT?
Best regards,
Wencong
At 2023-01-09 16:26:18, "Martijn Visser" <martijnvis...@apache.org> wrote:
>Hi Dong,
>
>I think the key part is indeed your sentence that "FLIP-33 intends to
>standard the convention of metric name/semantic, but not the
>implementation."
>When I read FLIP-33, that's not how I interpreted it. So I've been reading
>back in the FLIP-33 discussion thread, there were some more clarifications
>in that thread that it is indeed only meant for standardizing metric names.
>Should have done that earlier, sorry for that.
>
>As part of this proposal, does that also include updating the existing
>connectors (given that a lot of them have been externalized) to not use the
>to-be-deprecated SourceReaderBase constructor?
>
>Best regards,
>
>Martijn
>
>On Mon, Jan 9, 2023 at 1:39 AM Dong Lin <lindon...@gmail.com> wrote:
>
>> Hi Martijn,
>>
>> By saying "Each connector defines their own metrics at the moment", what
>> FLIP-33 intends to avoid is that connectA defines a metric named
>> numRecordsIn and connectB defines a metric named numOfRecordsIn. In this
>> case, these two metrics would have the same semantic meaning but different
>> name in the dashboard, which complicates the monitoring for users. FLIP-33
>> intends to standard the convention of metric name/semantic, but not the
>> implementation.
>>
>> It is mentioned in FLIP-33 that "A connector implementation does not have
>> to report all the defined metrics. But if a connector reports a metric of
>> the same semantic defined below, the implementation should follow the
>> convention".
>>
>> What is proposed in this thread will only affect the connection
>> implementation, i.e. SourceReaderBase would no longer increment
>> numRecordsIn. But it would not change the name/semantic of numRecordsIn.
>>
>> Note that even if we don't change SourceReaderBase, a subclass of
>> SourceReaderBase (or Source in general) can still update numRecordsIn in a
>> way that breaks the semantic of this metric mentioned in FLIP-33. There is
>> no way we can programmatically prevent that from happening. Developers of
>> respective connectors need to be aware of the convention and enforce it
>> properly.
>>
>> Regards,
>> Dong
>>
>>
>>
>> On Mon, Jan 9, 2023 at 6:06 AM Martijn Visser <martijnvis...@apache.org>
>> wrote:
>>
>> > Hi Dong,
>> >
>> > In the proposal from Wencong is stated "If source reader want to report
>> to
>> > metric by self, it can set source.reader.metric.num_records_in.override
>> to
>> > true, which disables the registration of numRecordsIn in SourceReaderBase
>> > and let the actual implementation to report the metric instead."
>> >
>> > While in FLIP-33 is stated "Each connector defines their own metrics at
>> the
>> > moment. This complicates operation and monitoring. Admittedly, different
>> > connectors may have different metrics, but some commonly used metrics can
>> > probably be standardized. This FLIP proposes a set of standard connector
>> > metrics that each connector should emit if applicable."
>> >
>> > With this proposal, the actual implementation can be disabled in
>> > SourceReaderBase and the connector can define its own metric again. With
>> > how I read FLIP-33, the purpose was exactly to avoid that. So you can
>> have
>> > a connector that has its own definition for numRecordsIn, which it
>> > submits/sends as numRecordsIn, which basically breaks it being a default
>> > metric.
>> >
>> > Best regards,
>> >
>> > Martijn
>> >
>> > On Sun, Jan 8, 2023 at 3:26 PM Dong Lin <lindon...@gmail.com> wrote:
>> >
>> > > Hi Martijn,
>> > >
>> > > I think the change proposed in this thread would *not* break the idea
>> in
>> > > FLIP-33. FLIP-33 standardized metrics such as numRecordsIn, by
>> specifying
>> > > the name/type/semantics of those metrics so that all connectors can
>> > follow.
>> > > The name/type/semantics of numRecordsIn metric would be the same before
>> > and
>> > > after the proposed change.
>> > >
>> > > Does this make sense? Or could you explain which part of FLIP-33 would
>> be
>> > > broken by the proposed change?
>> > >
>> > > Regards,
>> > > Dong
>> > >
>> > >
>> > > On Sun, Jan 8, 2023 at 9:33 PM Martijn Visser <
>> martijnvis...@apache.org>
>> > > wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > This feels like we purposely want to abandon the idea that was
>> > introduced
>> > > > with FLIP-33 on standardizing metrics [1]. From this proposal, I
>> don't
>> > > see
>> > > > why we want to abandon that idea. Next to that, if you override
>> > > > the numRecordsIn logic, you also touch on the other metrics that rely
>> > on
>> > > > this value.
>> > > >
>> > > > Best regards,
>> > > >
>> > > > Martijn
>> > > >
>> > > > [1]
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
>> > > >
>> > > > On Sun, Jan 8, 2023 at 12:43 PM Wencong Liu <liuwencle...@163.com>
>> > > wrote:
>> > > >
>> > > > > Thanks for the discussion, Dong and John
>> > > > >
>> > > > >
>> > > > > Considering the extra cost of method calls, the approach of adding
>> an
>> > > > > extra SourceReaderBase constructor
>> > > > > should be a better choice. If there is no further discussion, I
>> will
>> > > > > follow this plan.
>> > > > >
>> > > > >
>> > > > > Best,
>> > > > > Wencong
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > At 2023-01-08 01:05:07, "John Roesler" <vvcep...@apache.org>
>> wrote:
>> > > > > >Thanks, for the discussion, Dong.
>> > > > > >
>> > > > > >To answer your question: I was unclear if the desire was to
>> > deprecate
>> > > > the
>> > > > > metric itself, to be replaced with some other metric, or whether we
>> > > just
>> > > > > wanted to deprecate the way the superclass manages it. It’s clear
>> now
>> > > > that
>> > > > > we want to keep the metric and only change the superclass logic.
>> > > > > >
>> > > > > >I think we’re on the same page now.
>> > > > > >
>> > > > > >Thanks!
>> > > > > >-John
>> > > > > >
>> > > > > >On Sat, Jan 7, 2023, at 07:21, Dong Lin wrote:
>> > > > > >> Hi John,
>> > > > > >>
>> > > > > >> Not sure if I understand the difference between "deprecate that
>> > > > metric"
>> > > > > and
>> > > > > >> "deprecate the private counter mechanism". I think what we want
>> is
>> > > to
>> > > > > >> update SourceReaderBase's implementation so that it no longer
>> > > > explicitly
>> > > > > >> increments this metric. But we still need to expose this metric
>> to
>> > > > user.
>> > > > > >> And methods such as RecordEmitter#emitRecord (which can be
>> invoked
>> > > by
>> > > > > >> SourceReaderBase#pollNext(...)) can potentially increment this
>> > > metric.
>> > > > > >>
>> > > > > >> I like your approach of adding an extra SourceReaderBase
>> > > constructor.
>> > > > > That
>> > > > > >> appears simpler than adding a deprecated config. We can mark the
>> > > > > existing
>> > > > > >> SourceReaderBase constructor as @deprecated.
>> > > > > SourceReaderBase#pollNext(...)
>> > > > > >> will not increment the counter if a subclass uses the newly
>> added
>> > > > > >> constructor.
>> > > > > >>
>> > > > > >> Cheers,
>> > > > > >> Dong
>> > > > > >>
>> > > > > >>
>> > > > > >> On Sat, Jan 7, 2023 at 4:47 AM John Roesler <
>> vvcep...@apache.org>
>> > > > > wrote:
>> > > > > >>
>> > > > > >>> Thanks for the replies, Dong and Wencong!
>> > > > > >>>
>> > > > > >>> That’s a good point about the overhead of the extra method.
>> > > > > >>>
>> > > > > >>> Is the desire to actually deprecate that metric in a
>> user-facing
>> > > way,
>> > > > > or
>> > > > > >>> just to deprecate the private counter mechanism?
>> > > > > >>>
>> > > > > >>> It seems like if the desire is to deprecate the existing
>> private
>> > > > > counter,
>> > > > > >>> we can accomplish it by deprecating the current constructor and
>> > > > > offering
>> > > > > >>> another that is documented not to track the metric. This seems
>> > > better
>> > > > > than
>> > > > > >>> the config option, since the concern is purely about the
>> division
>> > > of
>> > > > > >>> responsibilities between the sub- and super-class.
>> > > > > >>>
>> > > > > >>> Another option, which might be better if we wish to keep a
>> > > uniformly
>> > > > > named
>> > > > > >>> metric, would be to simply make the counter protected. That
>> would
>> > > be
>> > > > > better
>> > > > > >>> if we’re generally happy with the metric and counter, but a few
>> > > > special
>> > > > > >>> source connectors need to emit records in other ways.
>> > > > > >>>
>> > > > > >>> And finally, if we really want to get rid of the metric itself,
>> > > then
>> > > > I
>> > > > > >>> agree, a config is the way to do it.
>> > > > > >>>
>> > > > > >>> Thanks,
>> > > > > >>> John
>> > > > > >>>
>> > > > > >>> On Fri, Jan 6, 2023, at 00:55, Dong Lin wrote:
>> > > > > >>> > Hi John and Wencong,
>> > > > > >>> >
>> > > > > >>> > Thanks for the reply!
>> > > > > >>> >
>> > > > > >>> > It is nice that optional-2 can address the problem without
>> > > > affecting
>> > > > > the
>> > > > > >>> > existing source connectors as far as functionality is
>> > concerned.
>> > > > One
>> > > > > >>> > potential concern with this approach is that it might
>> increase
>> > > the
>> > > > > Flink
>> > > > > >>> > runtime overhead by adding one more virtual functional call
>> to
>> > > the
>> > > > > >>> > per-record runtime call stack.
>> > > > > >>> >
>> > > > > >>> > Since Java's default MaxInlineLevel is 12-18, I believe it is
>> > > easy
>> > > > > for an
>> > > > > >>> > operator chain of 5+ operators to exceed this limit. In this
>> > > case.
>> > > > > And
>> > > > > >>> > option-2 would incur one more virtual table lookup to produce
>> > > each
>> > > > > >>> record.
>> > > > > >>> > It is not clear how much this overhead would show up for jobs
>> > > with
>> > > > a
>> > > > > >>> chain
>> > > > > >>> > of lightweight operators. I am recently working on
>> FLINK-30531
>> > > > > >>> > <https://issues.apache.org/jira/browse/FLINK-30531> to
>> reduce
>> > > > > runtime
>> > > > > >>> > overhead which might be related to this discussion.
>> > > > > >>> >
>> > > > > >>> > In comparison to option-2, the option-3 provided in my
>> earlier
>> > > > email
>> > > > > >>> would
>> > > > > >>> > not add this extra overhead. I think it might be worthwhile
>> to
>> > > > > invest in
>> > > > > >>> > the long-term performance (and simpler runtime infra) and pay
>> > for
>> > > > the
>> > > > > >>> > short-term cost of deprecating this metric in
>> > SourceOperatorBase.
>> > > > > What do
>> > > > > >>> > you think?
>> > > > > >>> >
>> > > > > >>> > Regards,
>> > > > > >>> > Dong
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>> > On Thu, Jan 5, 2023 at 10:10 PM Wencong Liu <
>> > > liuwencle...@163.com>
>> > > > > >>> wrote:
>> > > > > >>> >
>> > > > > >>> >> Hi, All
>> > > > > >>> >>
>> > > > > >>> >>
>> > > > > >>> >> Thanks for the reply!
>> > > > > >>> >>
>> > > > > >>> >>
>> > > > > >>> >> I think both John and Dong's opinions are reasonable. John's
>> > > > > Suggestion
>> > > > > >>> 2
>> > > > > >>> >> is a good implementation.
>> > > > > >>> >> It does not affect the existing source connectors, but also
>> > > > provides
>> > > > > >>> >> support
>> > > > > >>> >> for custom counter in the future implementation.
>> > > > > >>> >>
>> > > > > >>> >>
>> > > > > >>> >> WDYT?
>> > > > > >>> >>
>> > > > > >>> >>
>> > > > > >>> >> Best,
>> > > > > >>> >>
>> > > > > >>> >> Wencong Liu
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>>