Den fre 18 juni 2021 kl 22:32 skrev Travis Bischel <travis.bisc...@gmail.com >:
> H Colin (and Magnus), > > Thanks for the replies! > > I think the biggest concern I have is the cardinality bits. I'm > sympathetic to the aspect of this making it easier for Kafka brokers to > understand *every* aspect of the kafka ecoystem. I am not sure this will > 100% solve the need there, though: if a client is unable to connect to a > broker, visibility disappears immediately, no? > At the end of the day this is an unsolvable problem, but what the proposed approach gives us is a channel that is operational when Kafka is operational, regardless of external systems. If a Kafka client can't connect to Kafka, its internal Kafka metrics are not the main interest, but rather on the connectivity/networking side. > > I do still think that the problem of difficulty of monitoring within an > organization results from issues within organizations themselves: orgs > should have proper processes in place such that anything talking to Kafka > has the org's plug-in monitoring libraries. Kafka operators can define > those libraries, such that all clients in the org have the libraries the > operators require. This satisfies the same goals this KIP aims to provide, > albeit with the increased org cost of not just having something defined to > be plugged in. > Yeah, that would be great, and some orgs do indeed come close to this. But most don't, and then there's the case of multi-org; where the client developers and cluster operators reside in different organizations. > > If Kafka operators themselves can which metrics they want, so that the > broker can tell the client "only send these metrics", then my biggest > concern is removed. > That's indeed how it works, the metrics that a client pushes are set up by the cluster operator (et.al) by configuring metrics subscriptions. The client will not send any metrics that have not been centrally requested/subscribed, it is all controlled from the cluster; what clients sends what metrics at what interval. > > I do still think that hooks can be a cleaner abstraction to this same > goal, and then pre-provided libraries (say, "this library provides X,Y,Z > and sends to prometheus from your client") could exist that more exactly > satisfy what this KIP aims to provide. This would also avoid the > kitchen-sink vs. not-comprehensive-enough issue I brought up previously. > This would also avoid require KIPs for any supported metrics. > This defeats the general availability always-on goal of the KIP though: client metrics available on demand out of the box. Thanks for your comments Travis. /Magnus > On 2021/06/16 22:27:55, "Colin McCabe" <cmcc...@apache.org> wrote: > > On Sun, Jun 13, 2021, at 21:51, Travis Bischel wrote: > > > Hi! I have a few thoughts on this KIP. First, I'd like to thank you > for > > > the writeup, > > > clearly a lot of thought has gone into it and it is very thorough. > > > However, I'm not > > > convinced it's the right approach from a fundamental level. > > > > > > Fundamentally, this KIP seems like somewhat of a solution to an > organizational > > > problem. Metrics are organizational concerns, not Kafka operator > concerns. > > > > Hi Travis, > > > > Metrics are certainly Kafka operator concerns. It is very important for > cluster operators to know things like how many clients there are, what they > clients are doing, and so forth. This information is needed to administer > Kafka. Therefore it certainly falls in the domain of the Kafka operations > team (and the Kafka development team.) > > > > We have added many metrics in the past to make it easier to monitor > clients. I think this is just another step in that direction. > > > > > Clients should make it easy to plug in metrics (this is the approach I > take in > > > my own client), and organizations should have processes such that all > clients > > > gather and ship metrics how that organization desires. > > > > > > If an organization is set up correctly, there is no reason for metrics > to be > > > forwarded through Kafka. This feels like a solution to an organization > not > > > properly setting up how processes ship metrics, and in some ways, it's > an > > > overbroad solution, and in other ways, it doesn't cover the entire > problem. > > > > I think the reason was explained pretty clearly: many admins find it > difficult to set up monitoring for every client in the organization. In > general the team which maintains a Kafka cluster is often separate from the > teams that use the cluster. Therefore rolling out monitoring for clients > can be very difficult to coordinate. > > > > No metrics will ever cover every possible use-case, but the set proposed > here does seem useful. > > > > > > > > From the perspective of Kafka operators, it is easy to see that this > KIP is > > > nice in that it just dictates what clients should support for metrics > and that > > > the metrics should ship through Kafka. But, from the perspective of an > > > observability team, this workflow is basically hijacking the standard > flow that > > > organizations may have. I would rather have applications collect > metrics and > > > ship them the same way every other application does. I'd rather not > have to > > > configure additional plugins within Kafka to take metrics and forward > them. > > > > This change doesn't remove any functionality. If you don't want to use > KIP-714 metrics collection, you can simply turn it off and continue > collecting metrics the way you always have. > > > > > > > > More importantly, this KIP prescibes cardinality problems, requires > that to > > > officially support the KIP a client must support all relevant metrics > within > > > the KIP, and requires that a client cannot support other metrics > unless those > > > other metrics also go through a KIP process. It is difficult to > imagine all of > > > these metrics being relevant to every organization, and there is no > way for an > > > organization to filter what is relevant within the client. Instead, the > > > filtering is pushed downwards, meaning more network IO and more CPU > costs to > > > filter what is irrelevant and aggregate what needs to be aggregated, > and more > > > time for an organization to setup whatever it is that will be doing > this > > > filtering and aggregating. Contrast this with a client that enables > hooking in > > > to capture numbers that are relevant within an org itself: the org can > gather > > > what they want, ship only want they want, and ship directly to the > > > observability system they have already set up. As an aside, it may > also be > > > wise to avoid shipping metrics through Kafka about client interaction > with > > > Kafka, because if Kafka is having problems, then orgs lose insight > into those > > > problems. This would be like statuspage using itself for status on its > own > > > systems. > > > > > > Another downside is that by dictating the important metrics, this KIP > either > > > has two choices: try to choose what is important to every org, and > inevitably > > > leave out something important to somebody else, or just add everything > and let > > > the orgs filter. This KIP mostly looks to go with the latter approach, > meaning > > > orgs will be shipping & filtering. With hooks, an org would be able to > gather > > > exactly what they want. > > > > I actually do agree with this criticism to some extent. It would be good > if the broker could specify what metrics it wants, and the clients would > send only those metrics. > > > > More generally, I'd like to see this split up into several RPCs rather > than one mega-RPC. > > > > Maybe something like > > 1. RegisterClient{Request,Response} > > 2. ClientMetricsReport{Request,Response} > > 3. UnregisterClient{Request,Response} > > > > Then the broker can communicate which metrics it wants in > RegisterClientResponse. It can also assign a client instance ID (which I > think should be a UUID, not another string). > > > > > > > > As well, I expect that org applications have metrics on the state of > the > > > applications outside of the Kafka client. Applications are already > sending > > > non-Kafka-client related metrics outbound to observability systems. If > a Kafka > > > client provided hooks, then users could just gather the additional > relevant > > > Kafka client metrics and ship those metrics the same way they do all > of their > > > other metrics. It feels a bit odd for a Kafka client to have its own > separate > > > way of forwarding metrics. Another benefit hooks in clients is that > > > organizations do not _have_ to set up additional plugins to forward > metrics > > > from Kafka. Hooks avoid extra organizational work. > > > > Again, if you want to continue collecting metrics directly from clients, > you can simply do that. Nothing has to change for you as a result of this > KIP. > > > > > > > > The option that the KIP provides for users of clients to opt out of > metrics may > > > avoid some of the above issues (by just disabling things at the user > level), > > > but that's not really great from the perspective of client authors, > because the > > > existence of this KIP forces authors to either just not implement the > KIP, or > > > increase complexity within the KIP. Further, from an operator > perspective, if I > > > would prefer clients to ship metrics through the systems they already > have in > > > place, now I have to expect that anything that uses librdkafka or the > official > > > Java client will be shipping me metrics that I have to deal with > (since the KIP > > > is default enabled). > > > > It is clear that we want to avoid unnecessary complexity. However, the > ability to easily gather metrics from clients without doing a lot of > client-side configuration is not an ability that we currently have, and > this KIP changes that. So I think it's very well worth it. > > > > > > > > Lastly, I'm a little wary that this KIP may stem from a product goal of > > > Confluent: since most everything uses librdkafka or the Java client, > then by > > > defaulting clients sending metrics, Confluent gets an easy way to > provide > > > metric panels for a nice cloud UI. If any client does not want to > support these > > > metrics, and then a user wonders why these hypothetical panels have no > metrics, > > > then Confluent can just reply "use a supported client". Even if this > > > (potentially unlikely) scenario is true, then hooks would still be a > great > > > alternative, because then Confluent could provide drop-in hooks for > any client > > > and the end result of easy-panels would be the same. > > > > > > > In general, if a feature provides a benefit to users and operators, > that's a reason to put it in, not a reason to leave it out. We want to make > Kafka better, which includes making it better for vendors. There is nothing > Confluent-specific in the proposal. > > > > > In summary, > > > > > > - Metrics are more of an organizational concern, not specifically a > broker > > > operator concern. > > > > > > - The proposal seems to hijack how metrics are gathered within > organizations > > > > > > - I don't think KIPs should dictate which metrics should be gathered > and which > > > should not. Clients instead should make it easy for users to gather > anything > > > they could be interested in, and ignore anything they are not. > > > > KIPs have always dictated which metrics are gathered and which should > not. This is very intentionally part of the KIP process (since metrics are > public API). > > > > best, > > Colin > > > > > > > > - I think hooks are more extensible, more exact, and fit better into > > > organizational workflows. > > > > > > On 2021/06/02 12:45:45, Magnus Edenhill <mag...@edenhill.se> wrote: > > > > Hey all, > > > > > > > > I'm proposing KIP-714 to add remote Client metrics and observability. > > > > This functionality will allow centralized monitoring and > troubleshooting of > > > > clients and their internals. > > > > > > > > Please see > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability > > > > > > > > Looking forward to your feedback! > > > > > > > > Regards, > > > > Magnus > > > > > > > > > >