I think this sounds reasonable. Let's keep the "reconfiguration without stopping the job" out of this, because that would be a super big effort and if we approach that, then in more generic way rather than specific to connector metrics.
I would suggest to look at the following things before starting with any implementation work: - Try and find a committer to support this, otherwise it will be hard to make progress - Start with defining a smaller set of "core metrics" and extend the set later. I think that is easier than now blocking on reaching consensus on a large group of metrics. - Find a solution to the problem Chesnay mentioned, that the "records in" metric is somehow overloaded and exists already in the IO Metric group. On Mon, Mar 25, 2019 at 7:16 AM Becket Qin <becket....@gmail.com> wrote: > Hi Stephan, > > Thanks a lot for the feedback. All makes sense. > > It is a good suggestion to simply have an onRecord(numBytes, eventTime) > method for connector writers. It should meet most of the requirements, > individual > > The configurable metrics feature is something really useful, especially if > we can somehow make it dynamically configurable without stopping the jobs. > It might be better to make it a separate discussion because it is a more > generic feature instead of only for connectors. > > So in order to make some progress, in this FLIP we can limit the discussion > scope to the connector related items: > > - the standard connector metric names and types. > - the abstract ConnectorMetricHandler interface > > I'll start a separate thread to discuss other general metric related > enhancement items including: > > - optional metrics > - dynamic metric configuration > - potential combination with rate limiter > > Does this plan sound reasonable? > > Thanks, > > Jiangjie (Becket) Qin > > On Sat, Mar 23, 2019 at 5:53 AM Stephan Ewen <se...@apache.org> wrote: > > > Ignoring for a moment implementation details, this connector metrics work > > is a really good thing to do, in my opinion > > > > The questions "oh, my job seems to be doing nothing, I am looking at the > UI > > and the 'records in' value is still zero" is in the top three support > > questions I have been asked personally. > > Introspection into "how far is the consumer lagging behind" (event time > > fetch latency) came up many times as well. > > > > So big +1 to solving this problem. > > > > About the exact design - I would try to go for the following properties: > > > > - keep complexity of of connectors. Ideally the metrics handler has a > > single onRecord(numBytes, eventTime) method or so, and everything else is > > internal to the handler. That makes it dead simple for the connector. We > > can also think of an extensive scheme for connector specific metrics. > > > > - make it configurable on the job it cluster level which metrics the > > handler internally creates when that method is invoked. > > > > What do you think? > > > > Best, > > Stephan > > > > > > On Thu, Mar 21, 2019 at 10:42 AM Chesnay Schepler <ches...@apache.org> > > wrote: > > > > > As I said before, I believe this to be over-engineered and have no > > > interest in this implementation. > > > > > > There are conceptual issues like defining a duplicate > numBytesIn(PerSec) > > > metric that already exists for each operator. > > > > > > On 21.03.2019 06:13, Becket Qin wrote: > > > > A few updates to the thread. I uploaded a patch[1] as a complete > > > > example of how users can use the metrics in the future. > > > > > > > > Some thoughts below after taking a look at the AbstractMetricGroup > and > > > > its subclasses. > > > > > > > > This patch intends to provide convenience for Flink connector > > > > implementations to follow metrics standards proposed in FLIP-33. It > > > > also try to enhance the metric management in general way to help > users > > > > with: > > > > > > > > 1. metric definition > > > > 2. metric dependencies check > > > > 3. metric validation > > > > 4. metric control (turn on / off particular metrics) > > > > > > > > This patch wraps |MetricGroup| to extend the functionality of > > > > |AbstractMetricGroup| and its subclasses. The > > > > |AbstractMetricGroup| mainly focus on the metric group hierarchy, but > > > > does not really manage the metrics other than keeping them in a Map. > > > > > > > > Ideally we should only have one entry point for the metrics. > > > > > > > > Right now the entry point is |AbstractMetricGroup|. However, besides > > > > the missing functionality mentioned above, |AbstractMetricGroup| > seems > > > > deeply rooted in Flink runtime. We could extract it out to > > > > flink-metrics in order to use it for generic purpose. There will be > > > > some work, though. > > > > > > > > Another approach is to make |AbstractMetrics| in this patch as the > > > > metric entry point. It wraps metric group and provides the missing > > > > functionalities. Then we can roll out this pattern to runtime > > > > components gradually as well. > > > > > > > > My first thought is that the latter approach gives a more smooth > > > > migration. But I am also OK with doing a refactoring on the > > > > |AbstractMetricGroup| family. > > > > > > > > > > > > Thanks, > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > [1] https://github.com/becketqin/flink/pull/1 > > > > > > > > On Mon, Feb 25, 2019 at 2:32 PM Becket Qin <becket....@gmail.com > > > > <mailto:becket....@gmail.com>> wrote: > > > > > > > > Hi Chesnay, > > > > > > > > It might be easier to discuss some implementation details in the > > > > PR review instead of in the FLIP discussion thread. I have a > patch > > > > for Kafka connectors ready but haven't submitted the PR yet. > > > > Hopefully that will help explain a bit more. > > > > > > > > ** Re: metric type binding > > > > This is a valid point that worths discussing. If I understand > > > > correctly, there are two points: > > > > > > > > 1. Metric type / interface does not matter as long as the metric > > > > semantic is clearly defined. > > > > Conceptually speaking, I agree that as long as the metric > semantic > > > > is defined, metric type does not matter. To some extent, Gauge / > > > > Counter / Meter / Histogram themselves can be think of as some > > > > well-recognized semantics, if you wish. In Flink, these metric > > > > semantics have their associated interface classes. In practice, > > > > such semantic to interface binding seems necessary for different > > > > components to communicate. Simply standardize the semantic of > the > > > > connector metrics seems not sufficient for people to build > > > > ecosystem on top of. At the end of the day, we still need to have > > > > some embodiment of the metric semantics that people can program > > > > against. > > > > > > > > 2. Sometimes the same metric semantic can be exposed using > > > > different metric types / interfaces. > > > > This is a good point. Counter and Gauge-as-a-Counter are pretty > > > > much interchangeable. This is more of a trade-off between the > user > > > > experience of metric producers and consumers. The metric > producers > > > > want to use Counter or Gauge depending on whether the counter is > > > > already tracked in code, while ideally the metric consumers only > > > > want to see a single metric type for each metric. I am leaning > > > > towards to make the metric producers happy, i.e. allow Gauge / > > > > Counter metric type, and the the metric consumers handle the type > > > > variation. The reason is that in practice, there might be more > > > > connector implementations than metric reporter implementations. > We > > > > could also provide some helper method to facilitate reading from > > > > such variable metric type. > > > > > > > > > > > > Just some quick replies to the comments around implementation > > > details. > > > > > > > > 4) single place where metrics are registered except > > > > connector-specific > > > > ones (which we can't really avoid). > > > > > > > > Register connector specific ones in a single place is actually > > > > something that I want to achieve. > > > > > > > > 2) I'm talking about time-series databases like Prometheus. > We > > > > would > > > > only have a gauge metric exposing the last fetchTime/emitTime > > > > that is > > > > regularly reported to the backend (Prometheus), where a user > > > > could build > > > > a histogram of his choosing when/if he wants it. > > > > > > > > Not sure if such downsampling works. As an example, if a user > > > > complains that there are some intermittent latency spikes (maybe > a > > > > few records in 10 seconds) in their processing system. Having a > > > > Gauge sampling instantaneous latency seems unlikely useful. > > > > However by looking at actual 99.9 percentile latency might help. > > > > > > > > Thanks, > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > On Fri, Feb 22, 2019 at 9:30 PM Chesnay Schepler > > > > <ches...@apache.org <mailto:ches...@apache.org>> wrote: > > > > > > > > Re: over complication of implementation. > > > > > > > > I think I get understand better know what you're shooting > for, > > > > effectively something like the OperatorIOMetricGroup. > > > > But still, re-define setupConnectorMetrics() to accept a set > > > > of flags > > > > for counters/meters(ans _possibly_ histograms) along with a > > > > set of > > > > well-defined Optional<Gauge<?>>, and return the group. > > > > > > > > Solves all issues as far as i can tell: > > > > 1) no metrics must be created manually (except Gauges, which > > are > > > > effectively just Suppliers and you can't get around this), > > > > 2) additional metrics can be registered on the returned > group, > > > > 3) see 1), > > > > 4) single place where metrics are registered except > > > > connector-specific > > > > ones (which we can't really avoid). > > > > > > > > Re: Histogram > > > > > > > > 1) As an example, whether "numRecordsIn" is exposed as a > > > > Counter or a > > > > Gauge should be irrelevant. So far we're using the metric > type > > > > that is > > > > the most convenient at exposing a given value. If there is > > > > some backing > > > > data-structure that we want to expose some data from we > > > > typically opt > > > > for a Gauge, as otherwise we're just mucking around with the > > > > Meter/Counter API to get it to match. Similarly, if we want > to > > > > count > > > > something but no current count exists we typically added a > > > > Counter. > > > > That's why attaching semantics to metric types makes little > > > > sense (but > > > > unfortunately several reporters already do it); for > > > > counters/meters > > > > certainly, but the majority of metrics are gauges. > > > > > > > > 2) I'm talking about time-series databases like Prometheus. > We > > > > would > > > > only have a gauge metric exposing the last fetchTime/emitTime > > > > that is > > > > regularly reported to the backend (Prometheus), where a user > > > > could build > > > > a histogram of his choosing when/if he wants it. > > > > > > > > On 22.02.2019 13:57, Becket Qin wrote: > > > > > Hi Chesnay, > > > > > > > > > > Thanks for the explanation. > > > > > > > > > > ** Re: FLIP > > > > > I might have misunderstood this, but it seems that "major > > > > changes" are well > > > > > defined in FLIP. The full contents is following: > > > > > What is considered a "major change" that needs a FLIP? > > > > > > > > > > Any of the following should be considered a major change: > > > > > > > > > > - Any major new feature, subsystem, or piece of > > > > functionality > > > > > - *Any change that impacts the public interfaces of the > > > > project* > > > > > > > > > > What are the "public interfaces" of the project? > > > > > > > > > > > > > > > > > > > > *All of the following are public interfaces *that people > > > > build around: > > > > > > > > > > - DataStream and DataSet API, including classes related > > > > to that, such as > > > > > StreamExecutionEnvironment > > > > > > > > > > > > > > > - Classes marked with the @Public annotation > > > > > > > > > > > > > > > - On-disk binary formats, such as > checkpoints/savepoints > > > > > > > > > > > > > > > - User-facing scripts/command-line tools, i.e. > > > > bin/flink, Yarn scripts, > > > > > Mesos scripts > > > > > > > > > > > > > > > - Configuration settings > > > > > > > > > > > > > > > - *Exposed monitoring information* > > > > > > > > > > > > > > > So any monitoring information change is considered as > public > > > > interface, and > > > > > any public interface change is considered as a "major > > change". > > > > > > > > > > > > > > > ** Re: over complication of implementation. > > > > > > > > > > Although this is more of implementation details that is not > > > > covered by the > > > > > FLIP. But it may be worth discussing. > > > > > > > > > > First of all, I completely agree that we should use the > > > > simplest way to > > > > > achieve our goal. To me the goal is the following: > > > > > 1. Clear connector conventions and interfaces. > > > > > 2. The easiness of creating a connector. > > > > > > > > > > Both of them are important to the prosperity of the > > > > connector ecosystem. So > > > > > I'd rather abstract as much as possible on our side to make > > > > the connector > > > > > developer's work lighter. Given this goal, a static util > > > > method approach > > > > > might have a few drawbacks: > > > > > 1. Users still have to construct the metrics by themselves. > > > > (And note that > > > > > this might be erroneous by itself. For example, a customer > > > > wrapper around > > > > > dropwizard meter maybe used instead of MeterView). > > > > > 2. When connector specific metrics are added, it is > > > > difficult to enforce > > > > > the scope to be the same as standard metrics. > > > > > 3. It seems that a method proliferation is inevitable if we > > > > want to apply > > > > > sanity checks. e.g. The metric of numBytesIn was not > > > > registered for a meter. > > > > > 4. Metrics are still defined in random places and hard to > > > track. > > > > > > > > > > The current PR I had was inspired by the Config system in > > > > Kafka, which I > > > > > found pretty handy. In fact it is not only used by Kafka > > > > itself but even > > > > > some other projects that depend on Kafka. I am not saying > > > > this approach is > > > > > perfect. But I think it worths to save the work for > > > > connector writers and > > > > > encourage more systematic implementation. That being said, > I > > > > am fully open > > > > > to suggestions. > > > > > > > > > > > > > > > Re: Histogram > > > > > I think there are two orthogonal questions around those > > > metrics: > > > > > > > > > > 1. Regardless of the metric type, by just looking at the > > > > meaning of a > > > > > metric, is generic to all connectors? If the answer is yes, > > > > we should > > > > > include the metric into the convention. No matter whether > we > > > > include it > > > > > into the convention or not, some connector implementations > > > > will emit such > > > > > metric. It is better to have a convention than letting each > > > > connector do > > > > > random things. > > > > > > > > > > 2. If a standard metric is a histogram, what should we do? > > > > > I agree that we should make it clear that using histograms > > > > will have > > > > > performance risk. But I do see histogram is useful in some > > > > fine-granularity > > > > > debugging where one do not have the luxury to stop the > > > > system and inject > > > > > more inspection code. So the workaround I am thinking is to > > > > provide some > > > > > implementation suggestions. Assume later on we have a > > > > mechanism of > > > > > selective metrics. In the abstract metrics class we can > > > > disable those > > > > > metrics by default individual connector writers does not > > > > have to do > > > > > anything (this is another advantage of having an > > > > AbstractMetrics instead of > > > > > static util methods.) > > > > > > > > > > I am not sure I fully understand the histogram in the > > > > backend approach. Can > > > > > you explain a bit more? Do you mean emitting the raw data, > > > > e.g. fetchTime > > > > > and emitTime with each record and let the histogram > > > > computation happen in > > > > > the background? Or let the processing thread putting the > > > > values into a > > > > > queue and have a separate thread polling from the queue and > > > > add them into > > > > > the histogram? > > > > > > > > > > Thanks, > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 22, 2019 at 4:34 PM Chesnay Schepler > > > > <ches...@apache.org <mailto:ches...@apache.org>> wrote: > > > > > > > > > >> Re: Flip > > > > >> The very first line under both the main header and Purpose > > > > section > > > > >> describe Flips as "major changes", which this isn't. > > > > >> > > > > >> Re: complication > > > > >> I'm not arguing against standardization, but again an > > > > over-complicated > > > > >> implementation when a static utility method would be > > > > sufficient. > > > > >> > > > > >> public static void setupConnectorMetrics( > > > > >> MetricGroup operatorMetricGroup, > > > > >> String connectorName, > > > > >> Optional<Gauge<Long>> numRecordsIn, > > > > >> ...) > > > > >> > > > > >> This gives you all you need: > > > > >> * a well-defined set of metrics for a connector to opt-in > > > > >> * standardized naming schemes for scope and individual > > metrics > > > > >> * standardize metric types (although personally I'm not > > > > interested in that > > > > >> since metric types should be considered syntactic sugar) > > > > >> > > > > >> Re: Configurable Histogram > > > > >> If anything they _must_ be turned off by default, but the > > > > metric system is > > > > >> already exposing so many options that I'm not too keen on > > > > adding even more. > > > > >> You have also only addressed my first argument against > > > > histograms > > > > >> (performance), the second one still stands (calculate > > > > histogram in metric > > > > >> backends instead). > > > > >> > > > > >> On 21.02.2019 16:27, Becket Qin wrote: > > > > >>> Hi Chesnay, > > > > >>> > > > > >>> Thanks for the comments. I think this is worthy of a FLIP > > > > because it is > > > > >>> public API. According to the FLIP description a FlIP is > > > > required in case > > > > >> of: > > > > >>> - Any change that impacts the public interfaces of > > > > the project > > > > >>> > > > > >>> and the following entry is found in the definition of > > > > "public interface". > > > > >>> > > > > >>> - Exposed monitoring information > > > > >>> > > > > >>> Metrics are critical to any production system. So a clear > > > > metric > > > > >> definition > > > > >>> is important for any serious users. For an organization > > > > with large Flink > > > > >>> installation, change in metrics means great amount of > > > > work. So such > > > > >> changes > > > > >>> do need to be fully discussed and documented. > > > > >>> > > > > >>> ** Re: Histogram. > > > > >>> We can discuss whether there is a better way to expose > > > > metrics that are > > > > >>> suitable for histograms. My micro-benchmark on various > > > > histogram > > > > >>> implementations also indicates that they are > significantly > > > > slower than > > > > >>> other metric types. But I don't think that means never > use > > > > histogram, but > > > > >>> means use it with caution. For example, we can suggest > the > > > > >> implementations > > > > >>> to turn them off by default and only turn it on for a > > > > small amount of > > > > >> time > > > > >>> when performing some micro-debugging. > > > > >>> > > > > >>> ** Re: complication: > > > > >>> Connector conventions are essential for Flink ecosystem. > > > > Flink connectors > > > > >>> pool is probably the most important part of Flink, just > > > > like any other > > > > >> data > > > > >>> system. Clear conventions of connectors will help build > > > > Flink ecosystem > > > > >> in > > > > >>> a more organic way. > > > > >>> Take the metrics convention as an example, imagine > someone > > > > has developed > > > > >> a > > > > >>> Flink connector for System foo, and another developer may > > > > have developed > > > > >> a > > > > >>> monitoring and diagnostic framework for Flink which > > > > analyzes the Flink > > > > >> job > > > > >>> performance based on metrics. With a clear metric > > > > convention, those two > > > > >>> projects could be developed independently. Once users put > > > > them together, > > > > >>> it would work without additional modifications. This > > > > cannot be easily > > > > >>> achieved by just defining a few constants. > > > > >>> > > > > >>> ** Re: selective metrics: > > > > >>> Sure, we can discuss that in a separate thread. > > > > >>> > > > > >>> @Dawid > > > > >>> > > > > >>> ** Re: latency / fetchedLatency > > > > >>> The primary purpose of establish such a convention is to > > > > help developers > > > > >>> write connectors in a more compatible way. The convention > > > > is supposed to > > > > >> be > > > > >>> defined more proactively. So when look at the convention, > > > > it seems more > > > > >>> important to see if the concept is applicable to > > > > connectors in general. > > > > >> It > > > > >>> might be true so far only Kafka connector reports > latency. > > > > But there > > > > >> might > > > > >>> be hundreds of other connector implementations in the > > > > Flink ecosystem, > > > > >>> though not in the Flink repo, and some of them also emits > > > > latency. I > > > > >> think > > > > >>> a lot of other sources actually also has an append > > > > timestamp. e.g. > > > > >> database > > > > >>> bin logs and some K-V stores. So I wouldn't be surprised > > > > if some database > > > > >>> connector can also emit latency metrics. > > > > >>> > > > > >>> Thanks, > > > > >>> > > > > >>> Jiangjie (Becket) Qin > > > > >>> > > > > >>> > > > > >>> On Thu, Feb 21, 2019 at 10:14 PM Chesnay Schepler > > > > <ches...@apache.org <mailto:ches...@apache.org>> > > > > >>> wrote: > > > > >>> > > > > >>>> Regarding 2) It doesn't make sense to investigate this > as > > > > part of this > > > > >>>> FLIP. This is something that could be of interest for > the > > > > entire metric > > > > >>>> system, and should be designed for as such. > > > > >>>> > > > > >>>> Regarding the proposal as a whole: > > > > >>>> > > > > >>>> Histogram metrics shall not be added to the core of > > > > Flink. They are > > > > >>>> significantly more expensive than other metrics, and > > > > calculating > > > > >>>> histograms in the application is regarded as an > > > > anti-pattern by several > > > > >>>> metric backends, who instead recommend to expose the raw > > > > data and > > > > >>>> calculate the histogram in the backend. > > > > >>>> > > > > >>>> Second, this seems overly complicated. Given that we > > > > already established > > > > >>>> that not all connectors will export all metrics we are > > > > effectively > > > > >>>> reducing this down to a consistent naming scheme. We > > > > don't need anything > > > > >>>> sophisticated for that; basically just a few constants > > > > that all > > > > >>>> connectors use. > > > > >>>> > > > > >>>> I'm not convinced that this is worthy of a FLIP. > > > > >>>> > > > > >>>> On 21.02.2019 14:26, Dawid Wysakowicz wrote: > > > > >>>>> Hi, > > > > >>>>> > > > > >>>>> Ad 1. In general I undestand and I agree. But those > > > > particular metrics > > > > >>>>> (latency, fetchLatency), right now would only be > > > > reported if user uses > > > > >>>>> KafkaConsumer with internal timestampAssigner with > > > > StreamCharacteristic > > > > >>>>> set to EventTime, right? That sounds like a very > > > > specific case. I am > > > > >> not > > > > >>>>> sure if we should introduce a generic metric that will > be > > > > >>>>> disabled/absent for most of implementations. > > > > >>>>> > > > > >>>>> Ad.2 That sounds like an orthogonal issue, that might > > > > make sense to > > > > >>>>> investigate in the future. > > > > >>>>> > > > > >>>>> Best, > > > > >>>>> > > > > >>>>> Dawid > > > > >>>>> > > > > >>>>> On 21/02/2019 13:20, Becket Qin wrote: > > > > >>>>>> Hi Dawid, > > > > >>>>>> > > > > >>>>>> Thanks for the feedback. That makes sense to me. There > > > > are two cases > > > > >> to > > > > >>>> be > > > > >>>>>> addressed. > > > > >>>>>> > > > > >>>>>> 1. The metrics are supposed to be a guidance. It is > > > > likely that a > > > > >>>> connector > > > > >>>>>> only supports some but not all of the metrics. In that > > > > case, each > > > > >>>> connector > > > > >>>>>> implementation should have the freedom to decide which > > > > metrics are > > > > >>>>>> reported. For the metrics that are supported, the > > > > guidance should be > > > > >>>>>> followed. > > > > >>>>>> > > > > >>>>>> 2. Sometimes users may want to disable certain metrics > > > > for some reason > > > > >>>>>> (e.g. performance / reprocessing of data). A generic > > > > mechanism should > > > > >> be > > > > >>>>>> provided to allow user choose which metrics are > > > > reported. This > > > > >> mechanism > > > > >>>>>> should also be honored by the connector > implementations. > > > > >>>>>> > > > > >>>>>> Does this sound reasonable to you? > > > > >>>>>> > > > > >>>>>> Thanks, > > > > >>>>>> > > > > >>>>>> Jiangjie (Becket) Qin > > > > >>>>>> > > > > >>>>>> > > > > >>>>>> > > > > >>>>>> On Thu, Feb 21, 2019 at 4:22 PM Dawid Wysakowicz < > > > > >>>> dwysakow...@apache.org <mailto:dwysakow...@apache.org>> > > > > >>>>>> wrote: > > > > >>>>>> > > > > >>>>>>> Hi, > > > > >>>>>>> > > > > >>>>>>> Generally I like the idea of having a unified, > > > > standard set of > > > > >> metrics > > > > >>>> for > > > > >>>>>>> all connectors. I have some slight concerns about > > > > fetchLatency and > > > > >>>>>>> latency though. They are computed based on EventTime > > > > which is not a > > > > >>>> purely > > > > >>>>>>> technical feature. It depends often on some business > > > > logic, might be > > > > >>>> absent > > > > >>>>>>> or defined after source. Those metrics could also > > > > behave in a weird > > > > >>>> way in > > > > >>>>>>> case of replaying backlog. Therefore I am not sure if > > > > we should > > > > >> include > > > > >>>>>>> those metrics by default. Maybe we could at least > > > > introduce a feature > > > > >>>>>>> switch for them? What do you think? > > > > >>>>>>> > > > > >>>>>>> Best, > > > > >>>>>>> > > > > >>>>>>> Dawid > > > > >>>>>>> On 21/02/2019 03:13, Becket Qin wrote: > > > > >>>>>>> > > > > >>>>>>> Bump. If there is no objections to the proposed > > > > metrics. I'll start a > > > > >>>>>>> voting thread later toady. > > > > >>>>>>> > > > > >>>>>>> Thanks, > > > > >>>>>>> > > > > >>>>>>> Jiangjie (Becket) Qin > > > > >>>>>>> > > > > >>>>>>> On Mon, Feb 11, 2019 at 8:17 PM Becket Qin > > > > <becket....@gmail.com <mailto:becket....@gmail.com>> < > > > > >>>> becket....@gmail.com <mailto:becket....@gmail.com>> > > wrote: > > > > >>>>>>> Hi folks, > > > > >>>>>>> > > > > >>>>>>> I would like to start the FLIP discussion thread > about > > > > standardize > > > > >> the > > > > >>>>>>> connector metrics. > > > > >>>>>>> > > > > >>>>>>> In short, we would like to provide a convention of > > > > Flink connector > > > > >>>>>>> metrics. It will help simplify the monitoring and > > > > alerting on Flink > > > > >>>> jobs. > > > > >>>>>>> The FLIP link is following: > > > > >>>>>>> > > > > >>>>>>> > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics > > > > >>>>>>> Thanks, > > > > >>>>>>> > > > > >>>>>>> Jiangjie (Becket) Qin > > > > >>>>>>> > > > > >>>>>>> > > > > >>>>>>> > > > > >> > > > > > > > > > > > > >