> Basically you have a full fledged metrics library objects: Meter, Gauge, > Histogram, Counter.
It sounds good, but not so attractive. Currently KoP implements its own metrics library objects. So after that, we need to leverage the similar classes from OTel. I want to talk a little more beyond that. IIUC, this proposal wants to replace the current metrics systems with the OTel. But for most developers and maintainers, the most important thing that they cared about might be how many changes could it bring?For example, currently the Grafana dashboards have been widely used. How many changes could it bring? Do users need to learn completely different dashboards? I asked this question before but it's not answered. Then I found the "Breaking changes" section. So many breaking changes are usually not acceptable. I see you listed a lot of problems for the current design. I think each of them needs a PIP or at least a PR to resolve if a breaking change could be made. Why not solve them one by one in Pulsar? Thanks, Yunze On Mon, May 8, 2023 at 12:53 AM Asaf Mesika <asaf.mes...@gmail.com> wrote: > > On Sun, May 7, 2023 at 4:23 PM Yunze Xu <y...@streamnative.io.invalid> > wrote: > > > I'm excited to learn much more about metrics when I started reading > > this proposal. But I became more and more frustrated when I found > > there is still too much content left even if I've already spent much > > time reading this proposal. I'm wondering how much time did you expect > > reviewers to read through this proposal? I just recalled the > > discussion you started before [1]. Did you expect each PMC member that > > gives his/her +1 to read only parts of this proposal? > > > > I estimated around 2 hours needed for a reviewer. > I hate it being so long, but I simply couldn't find a way to downsize it > more. Furthermore, I consulted with my colleagues including Matteo, but we > couldn't see a way to scope it down. > Why? Because once you begin this journey, you need to know how it's going > to end. > What I ended up doing, is writing all the crucial details for review in the > High Level Design section. > It's still a big, hefty section, but I don't think I can step out or let > anyone else change Pulsar so invasively without the full extent of the > change. > > I don't think it's wise to read parts. > I did my very best effort to minimize it, but the scope is simply big. Open > for suggestions, but it requires reading all the PIP :) > > Thanks a lot Yunze for dedicating any time to it. > > > > > > > > Let's talk back to the proposal, for now, what I mainly learned and > > are concerned about mostly are: > > 1. Pulsar has many ways to expose metrics. It's not unified and confusing. > > 2. The current metrics system cannot support a large amount of topics. > > 3. It's hard for plugin authors to integrate metrics. (For example, > > KoP [2] integrates metrics by implementing the > > PrometheusRawMetricsProvider interface and it indeed needs much work) > > > > Regarding the 1st issue, this proposal chooses OpenTelemetry (OTel). > > > > Regarding the 2nd issue, I scrolled to the "Why OpenTelemetry?" > > section. It's still frustrating to see no answer. Eventually, I found > > > > OpenTelemetry isn't the solution for large amount of topic. > The solution is described at > "Aggregate and Filtering to solve cardinality issues" section. > > > > > the explanation in the "What we need to fix in OpenTelemetry - > > Performance" section. It seems that we still need some enhancements in > > OTel. In other words, currently OTel is not ready for resolving all > > these issues listed in the proposal but we believe it will. > > > > Let me rephrase "believe" --> we work together with the maintainers to do > it, yes. > I am open for any other suggestion. > > > > > > > As for the 3rd issue, from the "Integrating with Pulsar Plugins" > > section, the plugin authors still need to implement the new OTel > > interfaces. Is it much easier than using the existing ways to expose > > metrics? Could metrics still be easily integrated with Grafana? > > > > Yes, it's way easier. > Basically you have a full fledged metrics library objects: Meter, Gauge, > Histogram, Counter. > No more Raw Metrics Provider, writing UTF-8 bytes in Prometheus format. > You get namespacing for free with Meter name and version. > It's way better than current solution and any other library. > > > > > > That's all I am concerned about at the moment. I understand, and > > appreciate that you've spent much time studying and explaining all > > these things. But, this proposal is still too huge. > > > > I appreciate your effort a lot! > > > > > > > [1] https://lists.apache.org/thread/04jxqskcwwzdyfghkv4zstxxmzn154kf > > [2] > > https://github.com/streamnative/kop/blob/master/kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/stats/PrometheusMetricsProvider.java > > > > Thanks, > > Yunze > > > > On Sun, May 7, 2023 at 5:53 PM Asaf Mesika <asaf.mes...@gmail.com> wrote: > > > > > > I'm very appreciative for feedback from multiple pulsar users and devs on > > > this PIP, since it has dramatic changes suggested and quite extensive > > > positive change for the users. > > > > > > > > > On Thu, Apr 27, 2023 at 7:32 PM Asaf Mesika <asaf.mes...@gmail.com> > > wrote: > > > > > > > Hi all, > > > > > > > > I'm very excited to release a PIP I've been working on in the past 11 > > > > months, which I think will be immensely valuable to Pulsar, which I > > like so > > > > much. > > > > > > > > PIP: https://github.com/apache/pulsar/issues/20197 > > > > > > > > I'm quoting here the preface: > > > > > > > > === QUOTE START === > > > > > > > > Roughly 11 months ago, I started working on solving the biggest issue > > with > > > > Pulsar metrics: the lack of ability to monitor a pulsar broker with a > > large > > > > topic count: 10k, 100k, and future support of 1M. This started by > > mapping > > > > the existing functionality and then enumerating all the problems I saw > > (all > > > > documented in this doc > > > > < > > https://docs.google.com/document/d/1vke4w1nt7EEgOvEerPEUS-Al3aqLTm9cl2wTBkKNXUA/edit?usp=sharing > > > > > > > ). > > > > > > > > This PIP is a parent PIP. It aims to gradually solve (using sub-PIPs) > > all > > > > the current metric system's problems and provide the ability to > > monitor a > > > > broker with a large topic count, which is currently lacking. As a > > parent > > > > PIP, it will describe each problem and its solution at a high level, > > > > leaving fine-grained details to the sub-PIPs. The parent PIP ensures > > all > > > > solutions align and does not contradict each other. > > > > > > > > The basic building block to solve the monitoring ability of large topic > > > > count is aggregating internally (to topic groups) and adding > > fine-grained > > > > filtering. We could have shoe-horned it into the existing metric > > system, > > > > but we thought adding that to a system already ingrained with many > > problems > > > > would be wrong and hard to do gradually, as so many things will break. > > This > > > > is why the second-biggest design decision presented here is > > consolidating > > > > all existing metric libraries into a single one - OpenTelemetry > > > > <https://opentelemetry.io/>. The parent PIP will explain why > > > > OpenTelemetry was chosen out of existing solutions and why it far > > exceeds > > > > all other options. I’ve been working closely with the OpenTelemetry > > > > community in the past eight months: brain-storming this integration, > > and > > > > raising issues, in an effort to remove serious blockers to make this > > > > migration successful. > > > > > > > > I made every effort to summarize this document so that it can be > > concise > > > > yet clear. I understand it is an effort to read it and, more so, > > provide > > > > meaningful feedback on such a large document; hence I’m very grateful > > for > > > > each individual who does so. > > > > > > > > I think this design will help improve the user experience immensely, > > so it > > > > is worth the time spent reading it. > > > > > > > > > > > > === QUOTE END === > > > > > > > > > > > > Thanks! > > > > > > > > Asaf Mesika > > > > > >