Re: [DISCUSS] PIP-264: Enhanced OTel-based metric system

Asaf Mesika Sun, 21 May 2023 09:01:05 -0700

Thanks for the reply, Enrico.
Completely agree.
This made me realize my TL;DR wasn't talking about export.
I added this to it:


---
Pulsar OTel Metrics will support exporting as Prometheus HTTP endpoint
(`/metrics` but different port) for backward compatibility and also OLTP,
so you can push the metrics to OTel Collector and from there ship it to any
destination.
---

OTel supports two kinds of exporter: Prometheus (HTTP) and OTLP (push).
We'll just configure to use them.



On Mon, May 15, 2023 at 10:35 AM Enrico Olivelli <[email protected]>
wrote:

> Asaf,
> thanks for contributing in this area.
> Metrics are a fundamental feature of Pulsar.
>
> Currently I find it very awkward to maintain metrics, and also I see
> it as a problem to support only Prometheus.
>
> Regarding your proposal, IIRC in the past someone else proposed to
> support other metrics systems and they have been suggested to use a
> sidecar approach,
> that is to add something next to Pulsar services that served the
> metrics in the preferred format/way.
> I find that the sidecar approach is too inefficient and I am not
> proposing it (but I wanted to add this reference for the benefit of
> new people on the list).
>
> I wonder if it would be possible to keep compatibility with the
> current Prometheus based metrics.
> Now Pulsar reached a point in which is is widely used by many
> companies and also with big clusters,
> telling people that they have to rework all the infrastructure related
> to metrics because we don't support Prometheus anymore or because we
> changed radically the way we publish metrics
> It is a step that seems too hard from my point of view.
>
> Currently I believe that compatibility is more important than
> versatility, and if we want to introduce new (and far better) features
> we must take it into account.
>
> So my point is that I generally support the idea of opening the way to
> Open Telemetry, but we must have a way to not force all of our users
> to throw away their alerting systems, dashboards and know-how in
> troubleshooting Pulsar problems in production and dev
>
> Best regards
> Enrico
>
> Il giorno lun 15 mag 2023 alle ore 02:17 Dave Fisher
> <[email protected]> ha scritto:
> >
> >
> >
> > > On May 10, 2023, at 1:01 AM, Asaf Mesika <[email protected]>
> wrote:
> > >
> > > On Tue, May 9, 2023 at 11:29 PM Dave Fisher <[email protected]> wrote:
> > >
> > >>
> > >>
> > >>>> On May 8, 2023, at 2:49 AM, Asaf Mesika <[email protected]>
> wrote:
> > >>>
> > >>> Your feedback made me realized I need to add "TL;DR" section, which I
> > >> just
> > >>> added.
> > >>>
> > >>> I'm quoting it here. It gives a brief summary of the proposal, which
> > >>> requires up to 5 min of read time, helping you get a high level
> picture
> > >>> before you dive into the background/motivation/solution.
> > >>>
> > >>> ----------------------
> > >>> TL;DR
> > >>>
> > >>> Working with Metrics today as a user or a developer is hard and has
> many
> > >>> severe issues.
> > >>>
> > >>> From the user perspective:
> > >>>
> > >>>  - One of Pulsar strongest feature is "cheap" topics so you can
> easily
> > >>>  have 10k - 100k topics per broker. Once you do that, you quickly
> learn
> > >> that
> > >>>  the amount of metrics you export via "/metrics" (Prometheus style
> > >> endpoint)
> > >>>  becomes really big. The cost to store them becomes too high, queries
> > >>>  time-out or even "/metrics" endpoint it self times out.
> > >>>  The only option Pulsar gives you today is all-or-nothing filtering
> and
> > >>>  very crude aggregation. You switch metrics from topic aggregation
> > >> level to
> > >>>  namespace aggregation level. Also you can turn off producer and
> > >> consumer
> > >>>  level metrics. You end up doing it all leaving you "blind", looking
> at
> > >> the
> > >>>  metrics from a namespace level which is too high level. You end up
> > >>>  conjuring all kinds of scripts on top of topic stats endpoint to
> glue
> > >> some
> > >>>  aggregated metrics view for the topics you need.
> > >>>  - Summaries (metric type giving you quantiles like p95) which are
> used
> > >>>  in Pulsar, can't be aggregated across topics / brokers due its
> inherent
> > >>>  design.
> > >>>  - Plugin authors spend too much time on defining and exposing
> metrics
> > >> to
> > >>>  Pulsar since the only interface Pulsar offers is writing your
> metrics
> > >> by
> > >>>  your self as UTF-8 bytes in Prometheus Text Format to byte stream
> > >> interface
> > >>>  given to you.
> > >>>  - Pulsar histograms are exported in a way that is not conformant
> with
> > >>>  Prometheus, which means you can't get the p95 quantile on such
> > >> histograms,
> > >>>  making them very hard to use in day to day life.
> > >>
> > >> What version of DataSketches is used to produce the histogram? Is is
> still
> > >> an old Yahoo one, or are we using an updated one from Apache
> DataSketches?
> > >>
> > >> Seems like this is a single PR/small PIP for 3.1?
> > >
> > >
> > > Histograms are a list of buckets, each is a counter.
> > > Summary is a collection of values collected over a time window, which
> at
> > > the end you get a calculation of the quantiles of those values: p95,
> p50,
> > > and those are exported from Pulsar.
> > >
> > > Pulsar histogram do not use Data Sketches.
> >
> > Bookkeeper Metrics wraps Yahoo DataSketches last I checked.
> >
> > > They are just counters.
> > > They are not adhere to Prometheus since:
> > > a. The counter is expected to be cumulative, but Pulsar resets each
> bucket
> > > counter to 0 every 1 min
> > > b. The bucket upper range is expected to be written as an attribute
> "le"
> > > but today it is encoded in the name of the metric itself.
> > >
> > > This is a breaking change, hence hard to mark in any small release.
> > > This is why it's part of this PIP since so many things will break, and
> all
> > > of them will break on a separate layer (OTel metrics), hence not break
> > > anyone without their consent.
> >
> > If this change will break existing Grafana dashboards and other
> operational monitoring already in place then it will break guarantees we
> have made about safely being able to downgrade from a bad upgrade.
> >
> > >
> > >
> > >
> > >>
> > >>
> > >>>  - Too many metrics are rates which also delta reset every interval
> you
> > >>>  configure in Pulsar and restart, instead of relying on cumulative
> (ever
> > >>>  growing) counters and let Prometheus use its rate function.
> > >>>  - and many more issues
> > >>>
> > >>> From the developer perspective:
> > >>>
> > >>>  - There are 4 different ways to define and record metrics in Pulsar:
> > >>>  Pulsar own metrics library, Prometheus Java Client, Bookkeeper
> metrics
> > >>>  library and plain native Java SDK objects (AtomicLong, ...). It's
> very
> > >>>  confusing for the developer and create inconsistencies for the end
> user
> > >>>  (e.g. Summary for example is different in each).
> > >>>  - Patching your metrics into "/metrics" Prometheus endpoint is
> > >>>  confusing, cumbersome and error prone.
> > >>>  - many more
> > >>>
> > >>> This proposal offers several key changes to solve that:
> > >>>
> > >>>  - Cardinality (supporting 10k-100k topics per broker) is solved by
> > >>>  introducing a new aggregation level for metrics called Topic Metric
> > >> Group.
> > >>>  Using configuration, you specify for each topic its group (using
> > >>>  wildcard/regex). This allows you to "zoom" out to a more detailed
> > >>>  granularity level like groups instead of namespaces, which you
> control
> > >> how
> > >>>  many groups you'll have hence solving the cardinality issue, without
> > >>>  sacrificing level of detail too much.
> > >>>  - Fine-grained filtering mechanism, dynamic. You'll have rule-based
> > >>>  dynamic configuration, allowing you to specify per
> > >> namespace/topic/group
> > >>>  which metrics you'd like to keep/drop. Rules allows you to set the
> > >> default
> > >>>  to have small amount of metrics in group and namespace level only
> and
> > >> drop
> > >>>  the rest. When needed, you can add an override rule to "open" up a
> > >> certain
> > >>>  group to have more metrics in higher granularity (topic or even
> > >>>  consumer/producer level). Since it's dynamic you "open" such a group
> > >> when
> > >>>  you see it's misbehaving, see it in topic level, and when all
> > >> resolved, you
> > >>>  can "close" it. A bit similar experience to logging levels in Log4j
> or
> > >>>  Logback, that you default and override per class/package.
> > >>>
> > >>> Aggregation and Filtering combined solves the cardinality without
> > >>> sacrificing the level of detail when needed and most importantly, you
> > >>> determine which topic/group/namespace it happens on.
> > >>>
> > >>> Since this change is so invasive, it requires a single metrics
> library to
> > >>> implement all of it on top of; Hence the third big change point is
> > >>> consolidating all four ways to define and record metrics to a single
> > >> one, a
> > >>> new one: OpenTelemtry Metrics (Java SDK, and also Python and Go for
> the
> > >>> Pulsar Function runners).
> > >>> Introducing OpenTelemetry (OTel) solves also the biggest pain point
> from
> > >>> the developer perspective, since it's a superb metrics library
> offering
> > >>> everything you need, and there is going to be a single way - only it.
> > >> Also,
> > >>> it solves the robustness for Plugin author which will use
> OpenTelemetry.
> > >> It
> > >>> so happens that it also solves all the numerous problems described
> in the
> > >>> doc itself.
> > >>>
> > >>> The solution will be introduced as another layer with feature
> toggles, so
> > >>> you can work with existing system, and/or OTel, until gradually
> > >> deprecating
> > >>> existing system.
> > >>>
> > >>> It's a big breaking change for Pulsar users on many fronts: names,
> > >>> semantics, configuration. Read at the end of this doc to learn
> exactly
> > >> what
> > >>> will change for the user (in high level).
> > >>>
> > >>> In my opinion, it will make Pulsar user experience so much better,
> they
> > >>> will want to migrate to it, despite the breaking change.
> > >>>
> > >>> This was a very short summary. You are most welcomed to read the full
> > >>> design document below and express feedback, so we can make it better.
> > >>>
> > >>> On Sun, May 7, 2023 at 7:52 PM Asaf Mesika <[email protected]>
> > >> wrote:
> > >>>
> > >>>>
> > >>>>
> > >>>> On Sun, May 7, 2023 at 4:23 PM Yunze Xu
> <[email protected]>
> > >>>> wrote:
> > >>>>
> > >>>>> I'm excited to learn much more about metrics when I started reading
> > >>>>> this proposal. But I became more and more frustrated when I found
> > >>>>> there is still too much content left even if I've already spent
> much
> > >>>>> time reading this proposal. I'm wondering how much time did you
> expect
> > >>>>> reviewers to read through this proposal? I just recalled the
> > >>>>> discussion you started before [1]. Did you expect each PMC member
> that
> > >>>>> gives his/her +1 to read only parts of this proposal?
> > >>>>>
> > >>>>
> > >>>> I estimated around 2 hours needed for a reviewer.
> > >>>> I hate it being so long, but I simply couldn't find a way to
> downsize it
> > >>>> more. Furthermore, I consulted with my colleagues including Matteo,
> but
> > >> we
> > >>>> couldn't see a way to scope it down.
> > >>>> Why? Because once you begin this journey, you need to know how it's
> > >> going
> > >>>> to end.
> > >>>> What I ended up doing, is writing all the crucial details for
> review in
> > >>>> the High Level Design section.
> > >>>> It's still a big, hefty section, but I don't think I can step out
> or let
> > >>>> anyone else change Pulsar so invasively without the full extent of
> the
> > >>>> change.
> > >>>>
> > >>>> I don't think it's wise to read parts.
> > >>>> I did my very best effort to minimize it, but the scope is simply
> big.
> > >>>> Open for suggestions, but it requires reading all the PIP :)
> > >>>>
> > >>>> Thanks a lot Yunze for dedicating any time to it.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>>
> > >>>>> Let's talk back to the proposal, for now, what I mainly learned and
> > >>>>> are concerned about mostly are:
> > >>>>> 1. Pulsar has many ways to expose metrics. It's not unified and
> > >> confusing.
> > >>>>> 2. The current metrics system cannot support a large amount of
> topics.
> > >>>>> 3. It's hard for plugin authors to integrate metrics. (For example,
> > >>>>> KoP [2] integrates metrics by implementing the
> > >>>>> PrometheusRawMetricsProvider interface and it indeed needs much
> work)
> > >>>>>
> > >>>>> Regarding the 1st issue, this proposal chooses OpenTelemetry
> (OTel).
> > >>>>>
> > >>>>> Regarding the 2nd issue, I scrolled to the "Why OpenTelemetry?"
> > >>>>> section. It's still frustrating to see no answer. Eventually, I
> found
> > >>>>>
> > >>>>
> > >>>> OpenTelemetry isn't the solution for large amount of topic.
> > >>>> The solution is described at
> > >>>> "Aggregate and Filtering to solve cardinality issues" section.
> > >>>>
> > >>>>
> > >>>>
> > >>>>> the explanation in the "What we need to fix in OpenTelemetry -
> > >>>>> Performance" section. It seems that we still need some
> enhancements in
> > >>>>> OTel. In other words, currently OTel is not ready for resolving all
> > >>>>> these issues listed in the proposal but we believe it will.
> > >>>>>
> > >>>>
> > >>>> Let me rephrase "believe" --> we work together with the maintainers
> to
> > >> do
> > >>>> it, yes.
> > >>>> I am open for any other suggestion.
> > >>>>
> > >>>>
> > >>>>
> > >>>>>
> > >>>>> As for the 3rd issue, from the "Integrating with Pulsar Plugins"
> > >>>>> section, the plugin authors still need to implement the new OTel
> > >>>>> interfaces. Is it much easier than using the existing ways to
> expose
> > >>>>> metrics? Could metrics still be easily integrated with Grafana?
> > >>>>>
> > >>>>
> > >>>> Yes, it's way easier.
> > >>>> Basically you have a full fledged metrics library objects: Meter,
> Gauge,
> > >>>> Histogram, Counter.
> > >>>> No more Raw Metrics Provider, writing UTF-8 bytes in Prometheus
> format.
> > >>>> You get namespacing for free with Meter name and version.
> > >>>> It's way better than current solution and any other library.
> > >>>>
> > >>>>
> > >>>>>
> > >>>>> That's all I am concerned about at the moment. I understand, and
> > >>>>> appreciate that you've spent much time studying and explaining all
> > >>>>> these things. But, this proposal is still too huge.
> > >>>>>
> > >>>>
> > >>>> I appreciate your effort a lot!
> > >>>>
> > >>>>
> > >>>>
> > >>>>>
> > >>>>> [1]
> https://lists.apache.org/thread/04jxqskcwwzdyfghkv4zstxxmzn154kf
> > >>>>> [2]
> > >>>>>
> > >>
> https://github.com/streamnative/kop/blob/master/kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/stats/PrometheusMetricsProvider.java
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Yunze
> > >>>>>
> > >>>>> On Sun, May 7, 2023 at 5:53 PM Asaf Mesika <[email protected]>
> > >> wrote:
> > >>>>>>
> > >>>>>> I'm very appreciative for feedback from multiple pulsar users and
> devs
> > >>>>> on
> > >>>>>> this PIP, since it has dramatic changes suggested and quite
> extensive
> > >>>>>> positive change for the users.
> > >>>>>>
> > >>>>>>
> > >>>>>> On Thu, Apr 27, 2023 at 7:32 PM Asaf Mesika <
> [email protected]>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>>> Hi all,
> > >>>>>>>
> > >>>>>>> I'm very excited to release a PIP I've been working on in the
> past 11
> > >>>>>>> months, which I think will be immensely valuable to Pulsar,
> which I
> > >>>>> like so
> > >>>>>>> much.
> > >>>>>>>
> > >>>>>>> PIP: https://github.com/apache/pulsar/issues/20197
> > >>>>>>>
> > >>>>>>> I'm quoting here the preface:
> > >>>>>>>
> > >>>>>>> === QUOTE START ===
> > >>>>>>>
> > >>>>>>> Roughly 11 months ago, I started working on solving the biggest
> issue
> > >>>>> with
> > >>>>>>> Pulsar metrics: the lack of ability to monitor a pulsar broker
> with a
> > >>>>> large
> > >>>>>>> topic count: 10k, 100k, and future support of 1M. This started by
> > >>>>> mapping
> > >>>>>>> the existing functionality and then enumerating all the problems
> I
> > >>>>> saw (all
> > >>>>>>> documented in this doc
> > >>>>>>> <
> > >>>>>
> > >>
> https://docs.google.com/document/d/1vke4w1nt7EEgOvEerPEUS-Al3aqLTm9cl2wTBkKNXUA/edit?usp=sharing
> >
> > I thought we were going to stop using Google docs for PIPs.
> >
> > >>>>>>
> > >>>>>>> ).
> > >>>>>>>
> > >>>>>>> This PIP is a parent PIP. It aims to gradually solve (using
> sub-PIPs)
> > >>>>> all
> > >>>>>>> the current metric system's problems and provide the ability to
> > >>>>> monitor a
> > >>>>>>> broker with a large topic count, which is currently lacking. As a
> > >>>>> parent
> > >>>>>>> PIP, it will describe each problem and its solution at a high
> level,
> > >>>>>>> leaving fine-grained details to the sub-PIPs. The parent PIP
> ensures
> > >>>>> all
> > >>>>>>> solutions align and does not contradict each other.
> > >>>>>>>
> > >>>>>>> The basic building block to solve the monitoring ability of large
> > >>>>> topic
> > >>>>>>> count is aggregating internally (to topic groups) and adding
> > >>>>> fine-grained
> > >>>>>>> filtering. We could have shoe-horned it into the existing metric
> > >>>>> system,
> > >>>>>>> but we thought adding that to a system already ingrained with
> many
> > >>>>> problems
> > >>>>>>> would be wrong and hard to do gradually, as so many things will
> > >>>>> break. This
> > >>>>>>> is why the second-biggest design decision presented here is
> > >>>>> consolidating
> > >>>>>>> all existing metric libraries into a single one - OpenTelemetry
> > >>>>>>> <https://opentelemetry.io/>. The parent PIP will explain why
> > >>>>>>> OpenTelemetry was chosen out of existing solutions and why it far
> > >>>>> exceeds
> > >>>>>>> all other options. I’ve been working closely with the
> OpenTelemetry
> > >>>>>>> community in the past eight months: brain-storming this
> integration,
> > >>>>> and
> > >>>>>>> raising issues, in an effort to remove serious blockers to make
> this
> > >>>>>>> migration successful.
> > >>>>>>>
> > >>>>>>> I made every effort to summarize this document so that it can be
> > >>>>> concise
> > >>>>>>> yet clear. I understand it is an effort to read it and, more so,
> > >>>>> provide
> > >>>>>>> meaningful feedback on such a large document; hence I’m very
> grateful
> > >>>>> for
> > >>>>>>> each individual who does so.
> > >>>>>>>
> > >>>>>>> I think this design will help improve the user experience
> immensely,
> > >>>>> so it
> > >>>>>>> is worth the time spent reading it.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> === QUOTE END ===
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Thanks!
> > >>>>>>>
> > >>>>>>> Asaf Mesika
> > >>>>>>>
> > >>>>>
> > >>>>
> > >>
> > >>
> >
>

Re: [DISCUSS] PIP-264: Enhanced OTel-based metric system

Reply via email to