Re: [DISCUSS] FLIP-33: Standardize connector metrics

Stephan Ewen Wed, 24 Apr 2019 04:29:16 -0700

I think this sounds reasonable.

Let's keep the "reconfiguration without stopping the job" out of this,
because that would be a super big effort and if we approach that, then in
more generic way rather than specific to connector metrics.


I would suggest to look at the following things before starting with any
implementation work:

  - Try and find a committer to support this, otherwise it will be hard to
make progress
  - Start with defining a smaller set of "core metrics" and extend the set
later. I think that is easier than now blocking on reaching consensus on a
large group of metrics.
  - Find a solution to the problem Chesnay mentioned, that the "records in"
metric is somehow overloaded and exists already in the IO Metric group.


On Mon, Mar 25, 2019 at 7:16 AM Becket Qin <becket....@gmail.com> wrote:

> Hi Stephan,
>
> Thanks a lot for the feedback. All makes sense.
>
> It is a good suggestion to simply have an onRecord(numBytes, eventTime)
> method for connector writers. It should meet most of the requirements,
> individual
>
> The configurable metrics feature is something really useful, especially if
> we can somehow make it dynamically configurable without stopping the jobs.
> It might be better to make it a separate discussion because it is a more
> generic feature instead of only for connectors.
>
> So in order to make some progress, in this FLIP we can limit the discussion
> scope to the connector related items:
>
> - the standard connector metric names and types.
> - the abstract ConnectorMetricHandler interface
>
> I'll start a separate thread to discuss other general metric related
> enhancement items including:
>
> - optional metrics
> - dynamic metric configuration
> - potential combination with rate limiter
>
> Does this plan sound reasonable?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Sat, Mar 23, 2019 at 5:53 AM Stephan Ewen <se...@apache.org> wrote:
>
> > Ignoring for a moment implementation details, this connector metrics work
> > is a really good thing to do, in my opinion
> >
> > The questions "oh, my job seems to be doing nothing, I am looking at the
> UI
> > and the 'records in' value is still zero" is in the top three support
> > questions I have been asked personally.
> > Introspection into "how far is the consumer lagging behind" (event time
> > fetch latency) came up many times as well.
> >
> > So big +1 to solving this problem.
> >
> > About the exact design - I would try to go for the following properties:
> >
> >   - keep complexity of of connectors. Ideally the metrics handler has a
> > single onRecord(numBytes, eventTime) method or so, and everything else is
> > internal to the handler. That makes it dead simple for the connector. We
> > can also think of an extensive scheme for connector specific metrics.
> >
> >   - make it configurable on the job it cluster level which metrics the
> > handler internally creates when that method is invoked.
> >
> > What do you think?
> >
> > Best,
> > Stephan
> >
> >
> > On Thu, Mar 21, 2019 at 10:42 AM Chesnay Schepler <ches...@apache.org>
> > wrote:
> >
> > > As I said before, I believe this to be over-engineered and have no
> > > interest in this implementation.
> > >
> > > There are conceptual issues like defining a duplicate
> numBytesIn(PerSec)
> > > metric that already exists for each operator.
> > >
> > > On 21.03.2019 06:13, Becket Qin wrote:
> > > > A few updates to the thread. I uploaded a patch[1] as a complete
> > > > example of how users can use the metrics in the future.
> > > >
> > > > Some thoughts below after taking a look at the AbstractMetricGroup
> and
> > > > its subclasses.
> > > >
> > > > This patch intends to provide convenience for Flink connector
> > > > implementations to follow metrics standards proposed in FLIP-33. It
> > > > also try to enhance the metric management in general way to help
> users
> > > > with:
> > > >
> > > >  1. metric definition
> > > >  2. metric dependencies check
> > > >  3. metric validation
> > > >  4. metric control (turn on / off particular metrics)
> > > >
> > > > This patch wraps |MetricGroup| to extend the functionality of
> > > > |AbstractMetricGroup| and its subclasses. The
> > > > |AbstractMetricGroup| mainly focus on the metric group hierarchy, but
> > > > does not really manage the metrics other than keeping them in a Map.
> > > >
> > > > Ideally we should only have one entry point for the metrics.
> > > >
> > > > Right now the entry point is |AbstractMetricGroup|. However, besides
> > > > the missing functionality mentioned above, |AbstractMetricGroup|
> seems
> > > > deeply rooted in Flink runtime. We could extract it out to
> > > > flink-metrics in order to use it for generic purpose. There will be
> > > > some work, though.
> > > >
> > > > Another approach is to make |AbstractMetrics| in this patch as the
> > > > metric entry point. It wraps metric group and provides the missing
> > > > functionalities. Then we can roll out this pattern to runtime
> > > > components gradually as well.
> > > >
> > > > My first thought is that the latter approach gives a more smooth
> > > > migration. But I am also OK with doing a refactoring on the
> > > > |AbstractMetricGroup| family.
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > [1] https://github.com/becketqin/flink/pull/1
> > > >
> > > > On Mon, Feb 25, 2019 at 2:32 PM Becket Qin <becket....@gmail.com
> > > > <mailto:becket....@gmail.com>> wrote:
> > > >
> > > >     Hi Chesnay,
> > > >
> > > >     It might be easier to discuss some implementation details in the
> > > >     PR review instead of in the FLIP discussion thread. I have a
> patch
> > > >     for Kafka connectors ready but haven't submitted the PR yet.
> > > >     Hopefully that will help explain a bit more.
> > > >
> > > >     ** Re: metric type binding
> > > >     This is a valid point that worths discussing. If I understand
> > > >     correctly, there are two points:
> > > >
> > > >     1. Metric type / interface does not matter as long as the metric
> > > >     semantic is clearly defined.
> > > >     Conceptually speaking, I agree that as long as the metric
> semantic
> > > >     is defined, metric type does not matter. To some extent, Gauge /
> > > >     Counter / Meter / Histogram themselves can be think of as some
> > > >     well-recognized semantics, if you wish. In Flink, these metric
> > > >     semantics have their associated interface classes. In practice,
> > > >     such semantic to interface binding seems necessary for different
> > > >     components to communicate.  Simply standardize the semantic of
> the
> > > >     connector metrics seems not sufficient for people to build
> > > >     ecosystem on top of. At the end of the day, we still need to have
> > > >     some embodiment of the metric semantics that people can program
> > > >     against.
> > > >
> > > >     2. Sometimes the same metric semantic can be exposed using
> > > >     different metric types / interfaces.
> > > >     This is a good point. Counter and Gauge-as-a-Counter are pretty
> > > >     much interchangeable. This is more of a trade-off between the
> user
> > > >     experience of metric producers and consumers. The metric
> producers
> > > >     want to use Counter or Gauge depending on whether the counter is
> > > >     already tracked in code, while ideally the metric consumers only
> > > >     want to see a single metric type for each metric. I am leaning
> > > >     towards to make the metric producers happy, i.e. allow Gauge /
> > > >     Counter metric type, and the the metric consumers handle the type
> > > >     variation. The reason is that in practice, there might be more
> > > >     connector implementations than metric reporter implementations.
> We
> > > >     could also provide some helper method to facilitate reading from
> > > >     such variable metric type.
> > > >
> > > >
> > > >     Just some quick replies to the comments around implementation
> > > details.
> > > >
> > > >         4) single place where metrics are registered except
> > > >         connector-specific
> > > >         ones (which we can't really avoid).
> > > >
> > > >     Register connector specific ones in a single place is actually
> > > >     something that I want to achieve.
> > > >
> > > >         2) I'm talking about time-series databases like Prometheus.
> We
> > > >         would
> > > >         only have a gauge metric exposing the last fetchTime/emitTime
> > > >         that is
> > > >         regularly reported to the backend (Prometheus), where a user
> > > >         could build
> > > >         a histogram of his choosing when/if he wants it.
> > > >
> > > >     Not sure if such downsampling works. As an example, if a user
> > > >     complains that there are some intermittent latency spikes (maybe
> a
> > > >     few records in 10 seconds) in their processing system. Having a
> > > >     Gauge sampling instantaneous latency seems unlikely useful.
> > > >     However by looking at actual 99.9 percentile latency might help.
> > > >
> > > >     Thanks,
> > > >
> > > >     Jiangjie (Becket) Qin
> > > >
> > > >
> > > >     On Fri, Feb 22, 2019 at 9:30 PM Chesnay Schepler
> > > >     <ches...@apache.org <mailto:ches...@apache.org>> wrote:
> > > >
> > > >         Re: over complication of implementation.
> > > >
> > > >         I think I get understand better know what you're shooting
> for,
> > > >         effectively something like the OperatorIOMetricGroup.
> > > >         But still, re-define setupConnectorMetrics() to accept a set
> > > >         of flags
> > > >         for counters/meters(ans _possibly_ histograms) along with a
> > > >         set of
> > > >         well-defined Optional<Gauge<?>>, and return the group.
> > > >
> > > >         Solves all issues as far as i can tell:
> > > >         1) no metrics must be created manually (except Gauges, which
> > are
> > > >         effectively just Suppliers and you can't get around this),
> > > >         2) additional metrics can be registered on the returned
> group,
> > > >         3) see 1),
> > > >         4) single place where metrics are registered except
> > > >         connector-specific
> > > >         ones (which we can't really avoid).
> > > >
> > > >         Re: Histogram
> > > >
> > > >         1) As an example, whether "numRecordsIn" is exposed as a
> > > >         Counter or a
> > > >         Gauge should be irrelevant. So far we're using the metric
> type
> > > >         that is
> > > >         the most convenient at exposing a given value. If there is
> > > >         some backing
> > > >         data-structure that we want to expose some data from we
> > > >         typically opt
> > > >         for a Gauge, as otherwise we're just mucking around with the
> > > >         Meter/Counter API to get it to match. Similarly, if we want
> to
> > > >         count
> > > >         something but no current count exists we typically added a
> > > >         Counter.
> > > >         That's why attaching semantics to metric types makes little
> > > >         sense (but
> > > >         unfortunately several reporters already do it); for
> > > >         counters/meters
> > > >         certainly, but the majority of metrics are gauges.
> > > >
> > > >         2) I'm talking about time-series databases like Prometheus.
> We
> > > >         would
> > > >         only have a gauge metric exposing the last fetchTime/emitTime
> > > >         that is
> > > >         regularly reported to the backend (Prometheus), where a user
> > > >         could build
> > > >         a histogram of his choosing when/if he wants it.
> > > >
> > > >         On 22.02.2019 13:57, Becket Qin wrote:
> > > >         > Hi Chesnay,
> > > >         >
> > > >         > Thanks for the explanation.
> > > >         >
> > > >         > ** Re: FLIP
> > > >         > I might have misunderstood this, but it seems that "major
> > > >         changes" are well
> > > >         > defined in FLIP. The full contents is following:
> > > >         > What is considered a "major change" that needs a FLIP?
> > > >         >
> > > >         > Any of the following should be considered a major change:
> > > >         >
> > > >         >     - Any major new feature, subsystem, or piece of
> > > >         functionality
> > > >         >     - *Any change that impacts the public interfaces of the
> > > >         project*
> > > >         >
> > > >         > What are the "public interfaces" of the project?
> > > >         >
> > > >         >
> > > >         >
> > > >         > *All of the following are public interfaces *that people
> > > >         build around:
> > > >         >
> > > >         >     - DataStream and DataSet API, including classes related
> > > >         to that, such as
> > > >         >     StreamExecutionEnvironment
> > > >         >
> > > >         >
> > > >         >     - Classes marked with the @Public annotation
> > > >         >
> > > >         >
> > > >         >     - On-disk binary formats, such as
> checkpoints/savepoints
> > > >         >
> > > >         >
> > > >         >     - User-facing scripts/command-line tools, i.e.
> > > >         bin/flink, Yarn scripts,
> > > >         >     Mesos scripts
> > > >         >
> > > >         >
> > > >         >     - Configuration settings
> > > >         >
> > > >         >
> > > >         >     - *Exposed monitoring information*
> > > >         >
> > > >         >
> > > >         > So any monitoring information change is considered as
> public
> > > >         interface, and
> > > >         > any public interface change is considered as a "major
> > change".
> > > >         >
> > > >         >
> > > >         > ** Re: over complication of implementation.
> > > >         >
> > > >         > Although this is more of implementation details that is not
> > > >         covered by the
> > > >         > FLIP. But it may be worth discussing.
> > > >         >
> > > >         > First of all, I completely agree that we should use the
> > > >         simplest way to
> > > >         > achieve our goal. To me the goal is the following:
> > > >         > 1. Clear connector conventions and interfaces.
> > > >         > 2. The easiness of creating a connector.
> > > >         >
> > > >         > Both of them are important to the prosperity of the
> > > >         connector ecosystem. So
> > > >         > I'd rather abstract as much as possible on our side to make
> > > >         the connector
> > > >         > developer's work lighter. Given this goal, a static util
> > > >         method approach
> > > >         > might have a few drawbacks:
> > > >         > 1. Users still have to construct the metrics by themselves.
> > > >         (And note that
> > > >         > this might be erroneous by itself. For example, a customer
> > > >         wrapper around
> > > >         > dropwizard meter maybe used instead of MeterView).
> > > >         > 2. When connector specific metrics are added, it is
> > > >         difficult to enforce
> > > >         > the scope to be the same as standard metrics.
> > > >         > 3. It seems that a method proliferation is inevitable if we
> > > >         want to apply
> > > >         > sanity checks. e.g. The metric of numBytesIn was not
> > > >         registered for a meter.
> > > >         > 4. Metrics are still defined in random places and hard to
> > > track.
> > > >         >
> > > >         > The current PR I had was inspired by the Config system in
> > > >         Kafka, which I
> > > >         > found pretty handy. In fact it is not only used by Kafka
> > > >         itself but even
> > > >         > some other projects that depend on Kafka. I am not saying
> > > >         this approach is
> > > >         > perfect. But I think it worths to save the work for
> > > >         connector writers and
> > > >         > encourage more systematic implementation. That being said,
> I
> > > >         am fully open
> > > >         > to suggestions.
> > > >         >
> > > >         >
> > > >         > Re: Histogram
> > > >         > I think there are two orthogonal questions around those
> > > metrics:
> > > >         >
> > > >         > 1. Regardless of the metric type, by just looking at the
> > > >         meaning of a
> > > >         > metric, is generic to all connectors? If the answer is yes,
> > > >         we should
> > > >         > include the metric into the convention. No matter whether
> we
> > > >         include it
> > > >         > into the convention or not, some connector implementations
> > > >         will emit such
> > > >         > metric. It is better to have a convention than letting each
> > > >         connector do
> > > >         > random things.
> > > >         >
> > > >         > 2. If a standard metric is a histogram, what should we do?
> > > >         > I agree that we should make it clear that using histograms
> > > >         will have
> > > >         > performance risk. But I do see histogram is useful in some
> > > >         fine-granularity
> > > >         > debugging where one do not have the luxury to stop the
> > > >         system and inject
> > > >         > more inspection code. So the workaround I am thinking is to
> > > >         provide some
> > > >         > implementation suggestions. Assume later on we have a
> > > >         mechanism of
> > > >         > selective metrics. In the abstract metrics class we can
> > > >         disable those
> > > >         > metrics by default individual connector writers does not
> > > >         have to do
> > > >         > anything (this is another advantage of having an
> > > >         AbstractMetrics instead of
> > > >         > static util methods.)
> > > >         >
> > > >         > I am not sure I fully understand the histogram in the
> > > >         backend approach. Can
> > > >         > you explain a bit more? Do you mean emitting the raw data,
> > > >         e.g. fetchTime
> > > >         > and emitTime with each record and let the histogram
> > > >         computation happen in
> > > >         > the background? Or let the processing thread putting the
> > > >         values into a
> > > >         > queue and have a separate thread polling from the queue and
> > > >         add them into
> > > >         > the histogram?
> > > >         >
> > > >         > Thanks,
> > > >         >
> > > >         > Jiangjie (Becket) Qin
> > > >         >
> > > >         >
> > > >         >
> > > >         >
> > > >         >
> > > >         > On Fri, Feb 22, 2019 at 4:34 PM Chesnay Schepler
> > > >         <ches...@apache.org <mailto:ches...@apache.org>> wrote:
> > > >         >
> > > >         >> Re: Flip
> > > >         >> The very first line under both the main header and Purpose
> > > >         section
> > > >         >> describe Flips as "major changes", which this isn't.
> > > >         >>
> > > >         >> Re: complication
> > > >         >> I'm not arguing against standardization, but again an
> > > >         over-complicated
> > > >         >> implementation when a static utility method would be
> > > >         sufficient.
> > > >         >>
> > > >         >> public static void setupConnectorMetrics(
> > > >         >> MetricGroup operatorMetricGroup,
> > > >         >> String connectorName,
> > > >         >> Optional<Gauge<Long>> numRecordsIn,
> > > >         >> ...)
> > > >         >>
> > > >         >> This gives you all you need:
> > > >         >> * a well-defined set of metrics for a connector to opt-in
> > > >         >> * standardized naming schemes for scope and individual
> > metrics
> > > >         >> * standardize metric types (although personally I'm not
> > > >         interested in that
> > > >         >> since metric types should be considered syntactic sugar)
> > > >         >>
> > > >         >> Re: Configurable Histogram
> > > >         >> If anything they _must_ be turned off by default, but the
> > > >         metric system is
> > > >         >> already exposing so many options that I'm not too keen on
> > > >         adding even more.
> > > >         >> You have also only addressed my first argument against
> > > >         histograms
> > > >         >> (performance), the second one still stands (calculate
> > > >         histogram in metric
> > > >         >> backends instead).
> > > >         >>
> > > >         >> On 21.02.2019 16:27, Becket Qin wrote:
> > > >         >>> Hi Chesnay,
> > > >         >>>
> > > >         >>> Thanks for the comments. I think this is worthy of a FLIP
> > > >         because it is
> > > >         >>> public API. According to the FLIP description a FlIP is
> > > >         required in case
> > > >         >> of:
> > > >         >>>      - Any change that impacts the public interfaces of
> > > >         the project
> > > >         >>>
> > > >         >>> and the following entry is found in the definition of
> > > >         "public interface".
> > > >         >>>
> > > >         >>>      - Exposed monitoring information
> > > >         >>>
> > > >         >>> Metrics are critical to any production system. So a clear
> > > >         metric
> > > >         >> definition
> > > >         >>> is important for any serious users. For an organization
> > > >         with large Flink
> > > >         >>> installation, change in metrics means great amount of
> > > >         work. So such
> > > >         >> changes
> > > >         >>> do need to be fully discussed and documented.
> > > >         >>>
> > > >         >>> ** Re: Histogram.
> > > >         >>> We can discuss whether there is a better way to expose
> > > >         metrics that are
> > > >         >>> suitable for histograms. My micro-benchmark on various
> > > >         histogram
> > > >         >>> implementations also indicates that they are
> significantly
> > > >         slower than
> > > >         >>> other metric types. But I don't think that means never
> use
> > > >         histogram, but
> > > >         >>> means use it with caution. For example, we can suggest
> the
> > > >         >> implementations
> > > >         >>> to turn them off by default and only turn it on for a
> > > >         small amount of
> > > >         >> time
> > > >         >>> when performing some micro-debugging.
> > > >         >>>
> > > >         >>> ** Re: complication:
> > > >         >>> Connector conventions are essential for Flink ecosystem.
> > > >         Flink connectors
> > > >         >>> pool is probably the most important part of Flink, just
> > > >         like any other
> > > >         >> data
> > > >         >>> system. Clear conventions of connectors will help build
> > > >         Flink ecosystem
> > > >         >> in
> > > >         >>> a more organic way.
> > > >         >>> Take the metrics convention as an example, imagine
> someone
> > > >         has developed
> > > >         >> a
> > > >         >>> Flink connector for System foo, and another developer may
> > > >         have developed
> > > >         >> a
> > > >         >>> monitoring and diagnostic framework for Flink which
> > > >         analyzes the Flink
> > > >         >> job
> > > >         >>> performance based on metrics. With a clear metric
> > > >         convention, those two
> > > >         >>> projects could be developed independently. Once users put
> > > >         them together,
> > > >         >>> it would work without additional modifications. This
> > > >         cannot be easily
> > > >         >>> achieved by just defining a few constants.
> > > >         >>>
> > > >         >>> ** Re: selective metrics:
> > > >         >>> Sure, we can discuss that in a separate thread.
> > > >         >>>
> > > >         >>> @Dawid
> > > >         >>>
> > > >         >>> ** Re: latency / fetchedLatency
> > > >         >>> The primary purpose of establish such a convention is to
> > > >         help developers
> > > >         >>> write connectors in a more compatible way. The convention
> > > >         is supposed to
> > > >         >> be
> > > >         >>> defined more proactively. So when look at the convention,
> > > >         it seems more
> > > >         >>> important to see if the concept is applicable to
> > > >         connectors in general.
> > > >         >> It
> > > >         >>> might be true so far only Kafka connector reports
> latency.
> > > >         But there
> > > >         >> might
> > > >         >>> be hundreds of other connector implementations in the
> > > >         Flink ecosystem,
> > > >         >>> though not in the Flink repo, and some of them also emits
> > > >         latency. I
> > > >         >> think
> > > >         >>> a lot of other sources actually also has an append
> > > >         timestamp. e.g.
> > > >         >> database
> > > >         >>> bin logs and some K-V stores. So I wouldn't be surprised
> > > >         if some database
> > > >         >>> connector can also emit latency metrics.
> > > >         >>>
> > > >         >>> Thanks,
> > > >         >>>
> > > >         >>> Jiangjie (Becket) Qin
> > > >         >>>
> > > >         >>>
> > > >         >>> On Thu, Feb 21, 2019 at 10:14 PM Chesnay Schepler
> > > >         <ches...@apache.org <mailto:ches...@apache.org>>
> > > >         >>> wrote:
> > > >         >>>
> > > >         >>>> Regarding 2) It doesn't make sense to investigate this
> as
> > > >         part of this
> > > >         >>>> FLIP. This is something that could be of interest for
> the
> > > >         entire metric
> > > >         >>>> system, and should be designed for as such.
> > > >         >>>>
> > > >         >>>> Regarding the proposal as a whole:
> > > >         >>>>
> > > >         >>>> Histogram metrics shall not be added to the core of
> > > >         Flink. They are
> > > >         >>>> significantly more expensive than other metrics, and
> > > >         calculating
> > > >         >>>> histograms in the application is regarded as an
> > > >         anti-pattern by several
> > > >         >>>> metric backends, who instead recommend to expose the raw
> > > >         data and
> > > >         >>>> calculate the histogram in the backend.
> > > >         >>>>
> > > >         >>>> Second, this seems overly complicated. Given that we
> > > >         already established
> > > >         >>>> that not all connectors will export all metrics we are
> > > >         effectively
> > > >         >>>> reducing this down to a consistent naming scheme. We
> > > >         don't need anything
> > > >         >>>> sophisticated for that; basically just a few constants
> > > >         that all
> > > >         >>>> connectors use.
> > > >         >>>>
> > > >         >>>> I'm not convinced that this is worthy of a FLIP.
> > > >         >>>>
> > > >         >>>> On 21.02.2019 14:26, Dawid Wysakowicz wrote:
> > > >         >>>>> Hi,
> > > >         >>>>>
> > > >         >>>>> Ad 1. In general I undestand and I agree. But those
> > > >         particular metrics
> > > >         >>>>> (latency, fetchLatency), right now would only be
> > > >         reported if user uses
> > > >         >>>>> KafkaConsumer with internal timestampAssigner with
> > > >         StreamCharacteristic
> > > >         >>>>> set to EventTime, right? That sounds like a very
> > > >         specific case. I am
> > > >         >> not
> > > >         >>>>> sure if we should introduce a generic metric that will
> be
> > > >         >>>>> disabled/absent for most of implementations.
> > > >         >>>>>
> > > >         >>>>> Ad.2 That sounds like an orthogonal issue, that might
> > > >         make sense to
> > > >         >>>>> investigate in the future.
> > > >         >>>>>
> > > >         >>>>> Best,
> > > >         >>>>>
> > > >         >>>>> Dawid
> > > >         >>>>>
> > > >         >>>>> On 21/02/2019 13:20, Becket Qin wrote:
> > > >         >>>>>> Hi Dawid,
> > > >         >>>>>>
> > > >         >>>>>> Thanks for the feedback. That makes sense to me. There
> > > >         are two cases
> > > >         >> to
> > > >         >>>> be
> > > >         >>>>>> addressed.
> > > >         >>>>>>
> > > >         >>>>>> 1. The metrics are supposed to be a guidance. It is
> > > >         likely that a
> > > >         >>>> connector
> > > >         >>>>>> only supports some but not all of the metrics. In that
> > > >         case, each
> > > >         >>>> connector
> > > >         >>>>>> implementation should have the freedom to decide which
> > > >         metrics are
> > > >         >>>>>> reported. For the metrics that are supported, the
> > > >         guidance should be
> > > >         >>>>>> followed.
> > > >         >>>>>>
> > > >         >>>>>> 2. Sometimes users may want to disable certain metrics
> > > >         for some reason
> > > >         >>>>>> (e.g. performance / reprocessing of data). A generic
> > > >         mechanism should
> > > >         >> be
> > > >         >>>>>> provided to allow user choose which metrics are
> > > >         reported. This
> > > >         >> mechanism
> > > >         >>>>>> should also be honored by the connector
> implementations.
> > > >         >>>>>>
> > > >         >>>>>> Does this sound reasonable to you?
> > > >         >>>>>>
> > > >         >>>>>> Thanks,
> > > >         >>>>>>
> > > >         >>>>>> Jiangjie (Becket) Qin
> > > >         >>>>>>
> > > >         >>>>>>
> > > >         >>>>>>
> > > >         >>>>>> On Thu, Feb 21, 2019 at 4:22 PM Dawid Wysakowicz <
> > > >         >>>> dwysakow...@apache.org <mailto:dwysakow...@apache.org>>
> > > >         >>>>>> wrote:
> > > >         >>>>>>
> > > >         >>>>>>> Hi,
> > > >         >>>>>>>
> > > >         >>>>>>> Generally I like the idea of having a unified,
> > > >         standard set of
> > > >         >> metrics
> > > >         >>>> for
> > > >         >>>>>>> all connectors. I have some slight concerns about
> > > >         fetchLatency and
> > > >         >>>>>>> latency though. They are computed based on EventTime
> > > >         which is not a
> > > >         >>>> purely
> > > >         >>>>>>> technical feature. It depends often on some business
> > > >         logic, might be
> > > >         >>>> absent
> > > >         >>>>>>> or defined after source. Those metrics could also
> > > >         behave in a weird
> > > >         >>>> way in
> > > >         >>>>>>> case of replaying backlog. Therefore I am not sure if
> > > >         we should
> > > >         >> include
> > > >         >>>>>>> those metrics by default. Maybe we could at least
> > > >         introduce a feature
> > > >         >>>>>>> switch for them? What do you think?
> > > >         >>>>>>>
> > > >         >>>>>>> Best,
> > > >         >>>>>>>
> > > >         >>>>>>> Dawid
> > > >         >>>>>>> On 21/02/2019 03:13, Becket Qin wrote:
> > > >         >>>>>>>
> > > >         >>>>>>> Bump. If there is no objections to the proposed
> > > >         metrics. I'll start a
> > > >         >>>>>>> voting thread later toady.
> > > >         >>>>>>>
> > > >         >>>>>>> Thanks,
> > > >         >>>>>>>
> > > >         >>>>>>> Jiangjie (Becket) Qin
> > > >         >>>>>>>
> > > >         >>>>>>> On Mon, Feb 11, 2019 at 8:17 PM Becket Qin
> > > >         <becket....@gmail.com <mailto:becket....@gmail.com>> <
> > > >         >>>> becket....@gmail.com <mailto:becket....@gmail.com>>
> > wrote:
> > > >         >>>>>>> Hi folks,
> > > >         >>>>>>>
> > > >         >>>>>>> I would like to start the FLIP discussion thread
> about
> > > >         standardize
> > > >         >> the
> > > >         >>>>>>> connector metrics.
> > > >         >>>>>>>
> > > >         >>>>>>> In short, we would like to provide a convention of
> > > >         Flink connector
> > > >         >>>>>>> metrics. It will help simplify the monitoring and
> > > >         alerting on Flink
> > > >         >>>> jobs.
> > > >         >>>>>>> The FLIP link is following:
> > > >         >>>>>>>
> > > >         >>>>>>>
> > > >         >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
> > > >         >>>>>>> Thanks,
> > > >         >>>>>>>
> > > >         >>>>>>> Jiangjie (Becket) Qin
> > > >         >>>>>>>
> > > >         >>>>>>>
> > > >         >>>>>>>
> > > >         >>
> > > >
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-33: Standardize connector metrics

Reply via email to