Hi Dong and Martijin,

I'll create a FLIP for this public API change and put related links in this 
discussion.


Best regards,
Wencong

















At 2023-01-10 09:34:05, "Dong Lin" <lindon...@gmail.com> wrote:
>Hi Martijn,
>
>No worries. Thanks for the discussion!
>
>Suppose we agree to deprecate the existing SourceReaderBase public
>constructor, it means we plan to remove this construct (which could be
>backward incompatible) sooner or later. I think it makes sense to at least
>update all the externalized connectors listed under
>github.com/apache/flink-connector-* to use the new constructor, before we
>remove the existing constructor.
>
>Maybe Wencong can create a FLIP for this public API change and specify this
>plan "Compatibility, Deprecation, and Migration Plan" section? The changes
>are probably straightforward to make for each externalized connectors. I am
>happy to make the change if it is not made after 2 Flink release cycles.
>
>Cheers,
>Dong
>
>
>
>
>On Mon, Jan 9, 2023 at 4:27 PM Martijn Visser <martijnvis...@apache.org>
>wrote:
>
>> Hi Dong,
>>
>> I think the key part is indeed your sentence that "FLIP-33 intends to
>> standard the convention of metric name/semantic, but not the
>> implementation."
>> When I read FLIP-33, that's not how I interpreted it. So I've been reading
>> back in the FLIP-33 discussion thread, there were some more clarifications
>> in that thread that it is indeed only meant for standardizing metric names.
>> Should have done that earlier, sorry for that.
>>
>> As part of this proposal, does that also include updating the existing
>> connectors (given that a lot of them have been externalized) to not use the
>> to-be-deprecated SourceReaderBase constructor?
>>
>> Best regards,
>>
>> Martijn
>>
>> On Mon, Jan 9, 2023 at 1:39 AM Dong Lin <lindon...@gmail.com> wrote:
>>
>> > Hi Martijn,
>> >
>> > By saying "Each connector defines their own metrics at the moment", what
>> > FLIP-33 intends to avoid is that connectA defines a metric named
>> > numRecordsIn and connectB defines a metric named numOfRecordsIn. In this
>> > case, these two metrics would have the same semantic meaning but
>> different
>> > name in the dashboard, which complicates the monitoring for users.
>> FLIP-33
>> > intends to standard the convention of metric name/semantic, but not the
>> > implementation.
>> >
>> > It is mentioned in FLIP-33 that "A connector implementation does not have
>> > to report all the defined metrics. But if a connector reports a metric of
>> > the same semantic defined below, the implementation should follow the
>> > convention".
>> >
>> > What is proposed in this thread will only affect the connection
>> > implementation, i.e. SourceReaderBase would no longer increment
>> > numRecordsIn. But it would not change the name/semantic of numRecordsIn.
>> >
>> > Note that even if we don't change SourceReaderBase, a subclass of
>> > SourceReaderBase (or Source in general) can still update numRecordsIn in
>> a
>> > way that breaks the semantic of this metric mentioned in FLIP-33. There
>> is
>> > no way we can programmatically prevent that from happening. Developers of
>> > respective connectors need to be aware of the convention and enforce it
>> > properly.
>> >
>> > Regards,
>> > Dong
>> >
>> >
>> >
>> > On Mon, Jan 9, 2023 at 6:06 AM Martijn Visser <martijnvis...@apache.org>
>> > wrote:
>> >
>> > > Hi Dong,
>> > >
>> > > In the proposal from Wencong is stated "If source reader want to report
>> > to
>> > > metric by self, it can set source.reader.metric.num_records_in.override
>> > to
>> > > true, which disables the registration of numRecordsIn in
>> SourceReaderBase
>> > > and let the actual implementation to report the metric instead."
>> > >
>> > > While in FLIP-33 is stated "Each connector defines their own metrics at
>> > the
>> > > moment. This complicates operation and monitoring. Admittedly,
>> different
>> > > connectors may have different metrics, but some commonly used metrics
>> can
>> > > probably be standardized. This FLIP proposes a set of standard
>> connector
>> > > metrics that each connector should emit if applicable."
>> > >
>> > > With this proposal, the actual implementation can be disabled in
>> > > SourceReaderBase and the connector can define its own metric again.
>> With
>> > > how I read FLIP-33, the purpose was exactly to avoid that. So you can
>> > have
>> > > a connector that has its own definition for numRecordsIn, which it
>> > > submits/sends as numRecordsIn, which basically breaks it being a
>> default
>> > > metric.
>> > >
>> > > Best regards,
>> > >
>> > > Martijn
>> > >
>> > > On Sun, Jan 8, 2023 at 3:26 PM Dong Lin <lindon...@gmail.com> wrote:
>> > >
>> > > > Hi Martijn,
>> > > >
>> > > > I think the change proposed in this thread would *not* break the idea
>> > in
>> > > > FLIP-33. FLIP-33 standardized metrics such as numRecordsIn, by
>> > specifying
>> > > > the name/type/semantics of those metrics so that all connectors can
>> > > follow.
>> > > > The name/type/semantics of numRecordsIn metric would be the same
>> before
>> > > and
>> > > > after the proposed change.
>> > > >
>> > > > Does this make sense? Or could you explain which part of FLIP-33
>> would
>> > be
>> > > > broken by the proposed change?
>> > > >
>> > > > Regards,
>> > > > Dong
>> > > >
>> > > >
>> > > > On Sun, Jan 8, 2023 at 9:33 PM Martijn Visser <
>> > martijnvis...@apache.org>
>> > > > wrote:
>> > > >
>> > > > > Hi all,
>> > > > >
>> > > > > This feels like we purposely want to abandon the idea that was
>> > > introduced
>> > > > > with FLIP-33 on standardizing metrics [1]. From this proposal, I
>> > don't
>> > > > see
>> > > > > why we want to abandon that idea. Next to that, if you override
>> > > > > the numRecordsIn logic, you also touch on the other metrics that
>> rely
>> > > on
>> > > > > this value.
>> > > > >
>> > > > > Best regards,
>> > > > >
>> > > > > Martijn
>> > > > >
>> > > > > [1]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
>> > > > >
>> > > > > On Sun, Jan 8, 2023 at 12:43 PM Wencong Liu <liuwencle...@163.com>
>> > > > wrote:
>> > > > >
>> > > > > > Thanks for the discussion, Dong and John
>> > > > > >
>> > > > > >
>> > > > > > Considering the extra cost of method calls, the approach of
>> adding
>> > an
>> > > > > > extra SourceReaderBase constructor
>> > > > > > should be a better choice. If there is no further discussion, I
>> > will
>> > > > > > follow this plan.
>> > > > > >
>> > > > > >
>> > > > > > Best,
>> > > > > > Wencong
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > At 2023-01-08 01:05:07, "John Roesler" <vvcep...@apache.org>
>> > wrote:
>> > > > > > >Thanks, for the discussion, Dong.
>> > > > > > >
>> > > > > > >To answer your question: I was unclear if the desire was to
>> > > deprecate
>> > > > > the
>> > > > > > metric itself, to be replaced with some other metric, or whether
>> we
>> > > > just
>> > > > > > wanted to deprecate the way the superclass manages it. It’s clear
>> > now
>> > > > > that
>> > > > > > we want to keep the metric and only change the superclass logic.
>> > > > > > >
>> > > > > > >I think we’re on the same page now.
>> > > > > > >
>> > > > > > >Thanks!
>> > > > > > >-John
>> > > > > > >
>> > > > > > >On Sat, Jan 7, 2023, at 07:21, Dong Lin wrote:
>> > > > > > >> Hi John,
>> > > > > > >>
>> > > > > > >> Not sure if I understand the difference between "deprecate
>> that
>> > > > > metric"
>> > > > > > and
>> > > > > > >> "deprecate the private counter mechanism". I think what we
>> want
>> > is
>> > > > to
>> > > > > > >> update SourceReaderBase's implementation so that it no longer
>> > > > > explicitly
>> > > > > > >> increments this metric. But we still need to expose this
>> metric
>> > to
>> > > > > user.
>> > > > > > >> And methods such as RecordEmitter#emitRecord (which can be
>> > invoked
>> > > > by
>> > > > > > >> SourceReaderBase#pollNext(...)) can potentially increment this
>> > > > metric.
>> > > > > > >>
>> > > > > > >> I like your approach of adding an extra SourceReaderBase
>> > > > constructor.
>> > > > > > That
>> > > > > > >> appears simpler than adding a deprecated config. We can mark
>> the
>> > > > > > existing
>> > > > > > >> SourceReaderBase constructor as @deprecated.
>> > > > > > SourceReaderBase#pollNext(...)
>> > > > > > >> will not increment the counter if a subclass uses the newly
>> > added
>> > > > > > >> constructor.
>> > > > > > >>
>> > > > > > >> Cheers,
>> > > > > > >> Dong
>> > > > > > >>
>> > > > > > >>
>> > > > > > >> On Sat, Jan 7, 2023 at 4:47 AM John Roesler <
>> > vvcep...@apache.org>
>> > > > > > wrote:
>> > > > > > >>
>> > > > > > >>> Thanks for the replies, Dong and Wencong!
>> > > > > > >>>
>> > > > > > >>> That’s a good point about the overhead of the extra method.
>> > > > > > >>>
>> > > > > > >>> Is the desire to actually deprecate that metric in a
>> > user-facing
>> > > > way,
>> > > > > > or
>> > > > > > >>> just to deprecate the private counter mechanism?
>> > > > > > >>>
>> > > > > > >>> It seems like if the desire is to deprecate the existing
>> > private
>> > > > > > counter,
>> > > > > > >>> we can accomplish it by deprecating the current constructor
>> and
>> > > > > > offering
>> > > > > > >>> another that is documented not to track the metric. This
>> seems
>> > > > better
>> > > > > > than
>> > > > > > >>> the config option, since the concern is purely about the
>> > division
>> > > > of
>> > > > > > >>> responsibilities between the sub- and super-class.
>> > > > > > >>>
>> > > > > > >>> Another option, which might be better if we wish to keep a
>> > > > uniformly
>> > > > > > named
>> > > > > > >>> metric, would be to simply make the counter protected. That
>> > would
>> > > > be
>> > > > > > better
>> > > > > > >>> if we’re generally happy with the metric and counter, but a
>> few
>> > > > > special
>> > > > > > >>> source connectors need to emit records in other ways.
>> > > > > > >>>
>> > > > > > >>> And finally, if we really want to get rid of the metric
>> itself,
>> > > > then
>> > > > > I
>> > > > > > >>> agree, a config is the way to do it.
>> > > > > > >>>
>> > > > > > >>> Thanks,
>> > > > > > >>> John
>> > > > > > >>>
>> > > > > > >>> On Fri, Jan 6, 2023, at 00:55, Dong Lin wrote:
>> > > > > > >>> > Hi John and Wencong,
>> > > > > > >>> >
>> > > > > > >>> > Thanks for the reply!
>> > > > > > >>> >
>> > > > > > >>> > It is nice that optional-2 can address the problem without
>> > > > > affecting
>> > > > > > the
>> > > > > > >>> > existing source connectors as far as functionality is
>> > > concerned.
>> > > > > One
>> > > > > > >>> > potential concern with this approach is that it might
>> > increase
>> > > > the
>> > > > > > Flink
>> > > > > > >>> > runtime overhead by adding one more virtual functional call
>> > to
>> > > > the
>> > > > > > >>> > per-record runtime call stack.
>> > > > > > >>> >
>> > > > > > >>> > Since Java's default MaxInlineLevel is 12-18, I believe it
>> is
>> > > > easy
>> > > > > > for an
>> > > > > > >>> > operator chain of 5+ operators to exceed this limit. In
>> this
>> > > > case.
>> > > > > > And
>> > > > > > >>> > option-2 would incur one more virtual table lookup to
>> produce
>> > > > each
>> > > > > > >>> record.
>> > > > > > >>> > It is not clear how much this overhead would show up for
>> jobs
>> > > > with
>> > > > > a
>> > > > > > >>> chain
>> > > > > > >>> > of lightweight operators. I am recently working on
>> > FLINK-30531
>> > > > > > >>> > <https://issues.apache.org/jira/browse/FLINK-30531> to
>> > reduce
>> > > > > > runtime
>> > > > > > >>> > overhead which might be related to this discussion.
>> > > > > > >>> >
>> > > > > > >>> > In comparison to option-2, the option-3 provided in my
>> > earlier
>> > > > > email
>> > > > > > >>> would
>> > > > > > >>> > not add this extra overhead. I think it might be worthwhile
>> > to
>> > > > > > invest in
>> > > > > > >>> > the long-term performance (and simpler runtime infra) and
>> pay
>> > > for
>> > > > > the
>> > > > > > >>> > short-term cost of deprecating this metric in
>> > > SourceOperatorBase.
>> > > > > > What do
>> > > > > > >>> > you think?
>> > > > > > >>> >
>> > > > > > >>> > Regards,
>> > > > > > >>> > Dong
>> > > > > > >>> >
>> > > > > > >>> >
>> > > > > > >>> > On Thu, Jan 5, 2023 at 10:10 PM Wencong Liu <
>> > > > liuwencle...@163.com>
>> > > > > > >>> wrote:
>> > > > > > >>> >
>> > > > > > >>> >> Hi, All
>> > > > > > >>> >>
>> > > > > > >>> >>
>> > > > > > >>> >> Thanks for the reply!
>> > > > > > >>> >>
>> > > > > > >>> >>
>> > > > > > >>> >> I think both John and Dong's opinions are reasonable.
>> John's
>> > > > > > Suggestion
>> > > > > > >>> 2
>> > > > > > >>> >> is a good implementation.
>> > > > > > >>> >> It does not affect the existing source connectors, but
>> also
>> > > > > provides
>> > > > > > >>> >> support
>> > > > > > >>> >> for custom counter in the future implementation.
>> > > > > > >>> >>
>> > > > > > >>> >>
>> > > > > > >>> >> WDYT?
>> > > > > > >>> >>
>> > > > > > >>> >>
>> > > > > > >>> >> Best,
>> > > > > > >>> >>
>> > > > > > >>> >> Wencong Liu
>> > > > > > >>>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>

Reply via email to