Thanks for the reply, Enrico. Completely agree. This made me realize my TL;DR wasn't talking about export. I added this to it:
--- Pulsar OTel Metrics will support exporting as Prometheus HTTP endpoint (`/metrics` but different port) for backward compatibility and also OLTP, so you can push the metrics to OTel Collector and from there ship it to any destination. --- OTel supports two kinds of exporter: Prometheus (HTTP) and OTLP (push). We'll just configure to use them. On Mon, May 15, 2023 at 10:35 AM Enrico Olivelli <eolive...@gmail.com> wrote: > Asaf, > thanks for contributing in this area. > Metrics are a fundamental feature of Pulsar. > > Currently I find it very awkward to maintain metrics, and also I see > it as a problem to support only Prometheus. > > Regarding your proposal, IIRC in the past someone else proposed to > support other metrics systems and they have been suggested to use a > sidecar approach, > that is to add something next to Pulsar services that served the > metrics in the preferred format/way. > I find that the sidecar approach is too inefficient and I am not > proposing it (but I wanted to add this reference for the benefit of > new people on the list). > > I wonder if it would be possible to keep compatibility with the > current Prometheus based metrics. > Now Pulsar reached a point in which is is widely used by many > companies and also with big clusters, > telling people that they have to rework all the infrastructure related > to metrics because we don't support Prometheus anymore or because we > changed radically the way we publish metrics > It is a step that seems too hard from my point of view. > > Currently I believe that compatibility is more important than > versatility, and if we want to introduce new (and far better) features > we must take it into account. > > So my point is that I generally support the idea of opening the way to > Open Telemetry, but we must have a way to not force all of our users > to throw away their alerting systems, dashboards and know-how in > troubleshooting Pulsar problems in production and dev > > Best regards > Enrico > > Il giorno lun 15 mag 2023 alle ore 02:17 Dave Fisher > <wave4d...@comcast.net> ha scritto: > > > > > > > > > On May 10, 2023, at 1:01 AM, Asaf Mesika <asaf.mes...@gmail.com> > wrote: > > > > > > On Tue, May 9, 2023 at 11:29 PM Dave Fisher <w...@apache.org> wrote: > > > > > >> > > >> > > >>>> On May 8, 2023, at 2:49 AM, Asaf Mesika <asaf.mes...@gmail.com> > wrote: > > >>> > > >>> Your feedback made me realized I need to add "TL;DR" section, which I > > >> just > > >>> added. > > >>> > > >>> I'm quoting it here. It gives a brief summary of the proposal, which > > >>> requires up to 5 min of read time, helping you get a high level > picture > > >>> before you dive into the background/motivation/solution. > > >>> > > >>> ---------------------- > > >>> TL;DR > > >>> > > >>> Working with Metrics today as a user or a developer is hard and has > many > > >>> severe issues. > > >>> > > >>> From the user perspective: > > >>> > > >>> - One of Pulsar strongest feature is "cheap" topics so you can > easily > > >>> have 10k - 100k topics per broker. Once you do that, you quickly > learn > > >> that > > >>> the amount of metrics you export via "/metrics" (Prometheus style > > >> endpoint) > > >>> becomes really big. The cost to store them becomes too high, queries > > >>> time-out or even "/metrics" endpoint it self times out. > > >>> The only option Pulsar gives you today is all-or-nothing filtering > and > > >>> very crude aggregation. You switch metrics from topic aggregation > > >> level to > > >>> namespace aggregation level. Also you can turn off producer and > > >> consumer > > >>> level metrics. You end up doing it all leaving you "blind", looking > at > > >> the > > >>> metrics from a namespace level which is too high level. You end up > > >>> conjuring all kinds of scripts on top of topic stats endpoint to > glue > > >> some > > >>> aggregated metrics view for the topics you need. > > >>> - Summaries (metric type giving you quantiles like p95) which are > used > > >>> in Pulsar, can't be aggregated across topics / brokers due its > inherent > > >>> design. > > >>> - Plugin authors spend too much time on defining and exposing > metrics > > >> to > > >>> Pulsar since the only interface Pulsar offers is writing your > metrics > > >> by > > >>> your self as UTF-8 bytes in Prometheus Text Format to byte stream > > >> interface > > >>> given to you. > > >>> - Pulsar histograms are exported in a way that is not conformant > with > > >>> Prometheus, which means you can't get the p95 quantile on such > > >> histograms, > > >>> making them very hard to use in day to day life. > > >> > > >> What version of DataSketches is used to produce the histogram? Is is > still > > >> an old Yahoo one, or are we using an updated one from Apache > DataSketches? > > >> > > >> Seems like this is a single PR/small PIP for 3.1? > > > > > > > > > Histograms are a list of buckets, each is a counter. > > > Summary is a collection of values collected over a time window, which > at > > > the end you get a calculation of the quantiles of those values: p95, > p50, > > > and those are exported from Pulsar. > > > > > > Pulsar histogram do not use Data Sketches. > > > > Bookkeeper Metrics wraps Yahoo DataSketches last I checked. > > > > > They are just counters. > > > They are not adhere to Prometheus since: > > > a. The counter is expected to be cumulative, but Pulsar resets each > bucket > > > counter to 0 every 1 min > > > b. The bucket upper range is expected to be written as an attribute > "le" > > > but today it is encoded in the name of the metric itself. > > > > > > This is a breaking change, hence hard to mark in any small release. > > > This is why it's part of this PIP since so many things will break, and > all > > > of them will break on a separate layer (OTel metrics), hence not break > > > anyone without their consent. > > > > If this change will break existing Grafana dashboards and other > operational monitoring already in place then it will break guarantees we > have made about safely being able to downgrade from a bad upgrade. > > > > > > > > > > > > > >> > > >> > > >>> - Too many metrics are rates which also delta reset every interval > you > > >>> configure in Pulsar and restart, instead of relying on cumulative > (ever > > >>> growing) counters and let Prometheus use its rate function. > > >>> - and many more issues > > >>> > > >>> From the developer perspective: > > >>> > > >>> - There are 4 different ways to define and record metrics in Pulsar: > > >>> Pulsar own metrics library, Prometheus Java Client, Bookkeeper > metrics > > >>> library and plain native Java SDK objects (AtomicLong, ...). It's > very > > >>> confusing for the developer and create inconsistencies for the end > user > > >>> (e.g. Summary for example is different in each). > > >>> - Patching your metrics into "/metrics" Prometheus endpoint is > > >>> confusing, cumbersome and error prone. > > >>> - many more > > >>> > > >>> This proposal offers several key changes to solve that: > > >>> > > >>> - Cardinality (supporting 10k-100k topics per broker) is solved by > > >>> introducing a new aggregation level for metrics called Topic Metric > > >> Group. > > >>> Using configuration, you specify for each topic its group (using > > >>> wildcard/regex). This allows you to "zoom" out to a more detailed > > >>> granularity level like groups instead of namespaces, which you > control > > >> how > > >>> many groups you'll have hence solving the cardinality issue, without > > >>> sacrificing level of detail too much. > > >>> - Fine-grained filtering mechanism, dynamic. You'll have rule-based > > >>> dynamic configuration, allowing you to specify per > > >> namespace/topic/group > > >>> which metrics you'd like to keep/drop. Rules allows you to set the > > >> default > > >>> to have small amount of metrics in group and namespace level only > and > > >> drop > > >>> the rest. When needed, you can add an override rule to "open" up a > > >> certain > > >>> group to have more metrics in higher granularity (topic or even > > >>> consumer/producer level). Since it's dynamic you "open" such a group > > >> when > > >>> you see it's misbehaving, see it in topic level, and when all > > >> resolved, you > > >>> can "close" it. A bit similar experience to logging levels in Log4j > or > > >>> Logback, that you default and override per class/package. > > >>> > > >>> Aggregation and Filtering combined solves the cardinality without > > >>> sacrificing the level of detail when needed and most importantly, you > > >>> determine which topic/group/namespace it happens on. > > >>> > > >>> Since this change is so invasive, it requires a single metrics > library to > > >>> implement all of it on top of; Hence the third big change point is > > >>> consolidating all four ways to define and record metrics to a single > > >> one, a > > >>> new one: OpenTelemtry Metrics (Java SDK, and also Python and Go for > the > > >>> Pulsar Function runners). > > >>> Introducing OpenTelemetry (OTel) solves also the biggest pain point > from > > >>> the developer perspective, since it's a superb metrics library > offering > > >>> everything you need, and there is going to be a single way - only it. > > >> Also, > > >>> it solves the robustness for Plugin author which will use > OpenTelemetry. > > >> It > > >>> so happens that it also solves all the numerous problems described > in the > > >>> doc itself. > > >>> > > >>> The solution will be introduced as another layer with feature > toggles, so > > >>> you can work with existing system, and/or OTel, until gradually > > >> deprecating > > >>> existing system. > > >>> > > >>> It's a big breaking change for Pulsar users on many fronts: names, > > >>> semantics, configuration. Read at the end of this doc to learn > exactly > > >> what > > >>> will change for the user (in high level). > > >>> > > >>> In my opinion, it will make Pulsar user experience so much better, > they > > >>> will want to migrate to it, despite the breaking change. > > >>> > > >>> This was a very short summary. You are most welcomed to read the full > > >>> design document below and express feedback, so we can make it better. > > >>> > > >>> On Sun, May 7, 2023 at 7:52 PM Asaf Mesika <asaf.mes...@gmail.com> > > >> wrote: > > >>> > > >>>> > > >>>> > > >>>> On Sun, May 7, 2023 at 4:23 PM Yunze Xu > <y...@streamnative.io.invalid> > > >>>> wrote: > > >>>> > > >>>>> I'm excited to learn much more about metrics when I started reading > > >>>>> this proposal. But I became more and more frustrated when I found > > >>>>> there is still too much content left even if I've already spent > much > > >>>>> time reading this proposal. I'm wondering how much time did you > expect > > >>>>> reviewers to read through this proposal? I just recalled the > > >>>>> discussion you started before [1]. Did you expect each PMC member > that > > >>>>> gives his/her +1 to read only parts of this proposal? > > >>>>> > > >>>> > > >>>> I estimated around 2 hours needed for a reviewer. > > >>>> I hate it being so long, but I simply couldn't find a way to > downsize it > > >>>> more. Furthermore, I consulted with my colleagues including Matteo, > but > > >> we > > >>>> couldn't see a way to scope it down. > > >>>> Why? Because once you begin this journey, you need to know how it's > > >> going > > >>>> to end. > > >>>> What I ended up doing, is writing all the crucial details for > review in > > >>>> the High Level Design section. > > >>>> It's still a big, hefty section, but I don't think I can step out > or let > > >>>> anyone else change Pulsar so invasively without the full extent of > the > > >>>> change. > > >>>> > > >>>> I don't think it's wise to read parts. > > >>>> I did my very best effort to minimize it, but the scope is simply > big. > > >>>> Open for suggestions, but it requires reading all the PIP :) > > >>>> > > >>>> Thanks a lot Yunze for dedicating any time to it. > > >>>> > > >>>> > > >>>> > > >>>> > > >>>>> > > >>>>> Let's talk back to the proposal, for now, what I mainly learned and > > >>>>> are concerned about mostly are: > > >>>>> 1. Pulsar has many ways to expose metrics. It's not unified and > > >> confusing. > > >>>>> 2. The current metrics system cannot support a large amount of > topics. > > >>>>> 3. It's hard for plugin authors to integrate metrics. (For example, > > >>>>> KoP [2] integrates metrics by implementing the > > >>>>> PrometheusRawMetricsProvider interface and it indeed needs much > work) > > >>>>> > > >>>>> Regarding the 1st issue, this proposal chooses OpenTelemetry > (OTel). > > >>>>> > > >>>>> Regarding the 2nd issue, I scrolled to the "Why OpenTelemetry?" > > >>>>> section. It's still frustrating to see no answer. Eventually, I > found > > >>>>> > > >>>> > > >>>> OpenTelemetry isn't the solution for large amount of topic. > > >>>> The solution is described at > > >>>> "Aggregate and Filtering to solve cardinality issues" section. > > >>>> > > >>>> > > >>>> > > >>>>> the explanation in the "What we need to fix in OpenTelemetry - > > >>>>> Performance" section. It seems that we still need some > enhancements in > > >>>>> OTel. In other words, currently OTel is not ready for resolving all > > >>>>> these issues listed in the proposal but we believe it will. > > >>>>> > > >>>> > > >>>> Let me rephrase "believe" --> we work together with the maintainers > to > > >> do > > >>>> it, yes. > > >>>> I am open for any other suggestion. > > >>>> > > >>>> > > >>>> > > >>>>> > > >>>>> As for the 3rd issue, from the "Integrating with Pulsar Plugins" > > >>>>> section, the plugin authors still need to implement the new OTel > > >>>>> interfaces. Is it much easier than using the existing ways to > expose > > >>>>> metrics? Could metrics still be easily integrated with Grafana? > > >>>>> > > >>>> > > >>>> Yes, it's way easier. > > >>>> Basically you have a full fledged metrics library objects: Meter, > Gauge, > > >>>> Histogram, Counter. > > >>>> No more Raw Metrics Provider, writing UTF-8 bytes in Prometheus > format. > > >>>> You get namespacing for free with Meter name and version. > > >>>> It's way better than current solution and any other library. > > >>>> > > >>>> > > >>>>> > > >>>>> That's all I am concerned about at the moment. I understand, and > > >>>>> appreciate that you've spent much time studying and explaining all > > >>>>> these things. But, this proposal is still too huge. > > >>>>> > > >>>> > > >>>> I appreciate your effort a lot! > > >>>> > > >>>> > > >>>> > > >>>>> > > >>>>> [1] > https://lists.apache.org/thread/04jxqskcwwzdyfghkv4zstxxmzn154kf > > >>>>> [2] > > >>>>> > > >> > https://github.com/streamnative/kop/blob/master/kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/stats/PrometheusMetricsProvider.java > > >>>>> > > >>>>> Thanks, > > >>>>> Yunze > > >>>>> > > >>>>> On Sun, May 7, 2023 at 5:53 PM Asaf Mesika <asaf.mes...@gmail.com> > > >> wrote: > > >>>>>> > > >>>>>> I'm very appreciative for feedback from multiple pulsar users and > devs > > >>>>> on > > >>>>>> this PIP, since it has dramatic changes suggested and quite > extensive > > >>>>>> positive change for the users. > > >>>>>> > > >>>>>> > > >>>>>> On Thu, Apr 27, 2023 at 7:32 PM Asaf Mesika < > asaf.mes...@gmail.com> > > >>>>> wrote: > > >>>>>> > > >>>>>>> Hi all, > > >>>>>>> > > >>>>>>> I'm very excited to release a PIP I've been working on in the > past 11 > > >>>>>>> months, which I think will be immensely valuable to Pulsar, > which I > > >>>>> like so > > >>>>>>> much. > > >>>>>>> > > >>>>>>> PIP: https://github.com/apache/pulsar/issues/20197 > > >>>>>>> > > >>>>>>> I'm quoting here the preface: > > >>>>>>> > > >>>>>>> === QUOTE START === > > >>>>>>> > > >>>>>>> Roughly 11 months ago, I started working on solving the biggest > issue > > >>>>> with > > >>>>>>> Pulsar metrics: the lack of ability to monitor a pulsar broker > with a > > >>>>> large > > >>>>>>> topic count: 10k, 100k, and future support of 1M. This started by > > >>>>> mapping > > >>>>>>> the existing functionality and then enumerating all the problems > I > > >>>>> saw (all > > >>>>>>> documented in this doc > > >>>>>>> < > > >>>>> > > >> > https://docs.google.com/document/d/1vke4w1nt7EEgOvEerPEUS-Al3aqLTm9cl2wTBkKNXUA/edit?usp=sharing > > > > I thought we were going to stop using Google docs for PIPs. > > > > >>>>>> > > >>>>>>> ). > > >>>>>>> > > >>>>>>> This PIP is a parent PIP. It aims to gradually solve (using > sub-PIPs) > > >>>>> all > > >>>>>>> the current metric system's problems and provide the ability to > > >>>>> monitor a > > >>>>>>> broker with a large topic count, which is currently lacking. As a > > >>>>> parent > > >>>>>>> PIP, it will describe each problem and its solution at a high > level, > > >>>>>>> leaving fine-grained details to the sub-PIPs. The parent PIP > ensures > > >>>>> all > > >>>>>>> solutions align and does not contradict each other. > > >>>>>>> > > >>>>>>> The basic building block to solve the monitoring ability of large > > >>>>> topic > > >>>>>>> count is aggregating internally (to topic groups) and adding > > >>>>> fine-grained > > >>>>>>> filtering. We could have shoe-horned it into the existing metric > > >>>>> system, > > >>>>>>> but we thought adding that to a system already ingrained with > many > > >>>>> problems > > >>>>>>> would be wrong and hard to do gradually, as so many things will > > >>>>> break. This > > >>>>>>> is why the second-biggest design decision presented here is > > >>>>> consolidating > > >>>>>>> all existing metric libraries into a single one - OpenTelemetry > > >>>>>>> <https://opentelemetry.io/>. The parent PIP will explain why > > >>>>>>> OpenTelemetry was chosen out of existing solutions and why it far > > >>>>> exceeds > > >>>>>>> all other options. I’ve been working closely with the > OpenTelemetry > > >>>>>>> community in the past eight months: brain-storming this > integration, > > >>>>> and > > >>>>>>> raising issues, in an effort to remove serious blockers to make > this > > >>>>>>> migration successful. > > >>>>>>> > > >>>>>>> I made every effort to summarize this document so that it can be > > >>>>> concise > > >>>>>>> yet clear. I understand it is an effort to read it and, more so, > > >>>>> provide > > >>>>>>> meaningful feedback on such a large document; hence I’m very > grateful > > >>>>> for > > >>>>>>> each individual who does so. > > >>>>>>> > > >>>>>>> I think this design will help improve the user experience > immensely, > > >>>>> so it > > >>>>>>> is worth the time spent reading it. > > >>>>>>> > > >>>>>>> > > >>>>>>> === QUOTE END === > > >>>>>>> > > >>>>>>> > > >>>>>>> Thanks! > > >>>>>>> > > >>>>>>> Asaf Mesika > > >>>>>>> > > >>>>> > > >>>> > > >> > > >> > > >