Re: [DISCUSS] KIP-714: Client metrics and observability

Travis Bischel Mon, 14 Jun 2021 11:31:49 -0700

Apologies for this duplicate reply, I did not notice the success confirmation 
on the first submission.


On 2021/06/14 04:52:11, Travis Bischel <travis.bisc...@gmail.com> wrote: 
> Hi! I have a few thoughts on this KIP. First, I'd like to thank you for your 
> work
> and writeup, it's clear that a lot of thought went into this and it's very 
> thorough!
> However, I'm not convinced it's the right approach from a fundamental level.
> 
> Fundamentally, this KIP seems like somewhat of a solution to an organizational
> problem. Metrics are organizational concerns, not Kafka operator concerns.
> Clients should make it easy to plug in metrics (this is the approach I take in
> my own client), and organizations should have processes such that all clients
> gather and ship metrics how that organization desires. If an organization is
> set up correctly, there is no reason for metrics to be forwarded through 
> Kafka.
> This feels like a solution to an organization not properly setting up how
> processes ship metrics, and in some ways, it's an overbroad solution, and in
> other ways, it doesn't cover the entire problem.
> 
> From the perspective of Kafka operators, it is easy to see that this KIP is
> nice in that it just dictates what clients should support for metrics and that
> the metrics should ship through Kafka. But, from the perspective of an
> observability team, this workflow is basically hijacking the standard flow 
> that
> organizations may have. I would rather have applications collect metrics and
> ship them the same way every other application does. I'd rather not have to
> configure additional plugins within Kafka to take metrics and forward them.
> 
> More importantly, this KIP prescibes cardinality problems, requires that to
> officially support the KIP a client must support all relevant metrics within
> the KIP, and requires that a client cannot support other metrics unless those
> other metrics also go through a KIP process. It is difficult to imagine all of
> these metrics being relevant to every organization, and there is no way for an
> organization to filter what is relevant within the client. Instead, the
> filtering is pushed downwards, meaning more network IO and more CPU costs to
> filter what is irrelevant and aggregate what needs to be aggregated, and more
> time for an organization to setup whatever it is that will be doing this
> filtering and aggregating. Contrast this with a client that enables hooking in
> to capture numbers that are relevant within an org itself: the org can gather
> what they want, ship only want they want, and ship directly to the
> observability system they have already set up. As an aside, it may also be
> wise to avoid shipping metrics through Kafka about client interaction with
> Kafka, because if Kafka is having problems, then orgs lose insight into those
> problems. This would be like statuspage using itself for status on its own
> systems.
> 
> Another downside is that by dictating the important metrics, this KIP either
> has two choices: try to choose what is important to every org, and inevitably
> leave out something important to somebody else, or just add everything and let
> the orgs filter. This KIP mostly looks to go with the latter approach, meaning
> orgs will be shipping & filtering. With hooks, an org would be able to gather
> exactly what they want.
> 
> As well, I expect that org applications have metrics on the state of the
> applications outside of the Kafka client. Applications are already sending
> non-Kafka-client related metrics outbound to observability systems. If a Kafka
> client provided hooks, then users could just gather the additional relevant
> Kafka client metrics and ship those metrics the same way they do all of their
> other metrics. It feels a bit odd for a Kafka client to have its own separate
> way of forwarding metrics. Another benefit hooks in clients is that
> organizations do not _have_ to set up additional plugins to forward metrics
> from Kafka. Hooks avoid extra organizational work.
> 
> The option that the KIP provides for users of clients to opt out of metrics 
> may
> avoid some of the above issues (by just disabling things at the user level),
> but that's not really great from the perspective of client authors, because 
> the
> existence of this KIP forces authors to either just not implement the KIP, or
> increase complexity within the KIP. Further, from an operator perspective, if 
> I
> would prefer clients to ship metrics through the systems they already have in
> place, now I have to expect that anything that uses librdkafka or the official
> Java client will be shipping me metrics that I have to deal with (since the 
> KIP
> is default enabled).
> 
> Lastly, I'm a little wary that this KIP may stem from a product goal of
> Confluent: since most everything uses librdkafka or the Java client, then by
> defaulting clients sending metrics, Confluent gets an easy way to provide
> metric panels for a nice cloud UI. If any client does not want to support 
> these
> metrics, and then a user wonders why these hypothetical panels have no 
> metrics,
> then Confluent can just reply "use a supported client".  Even if this
> (potentially unlikely) scenario is true, then hooks would still be a great
> alternative, because then Confluent could provide drop-in hooks for any client
> and the end result of easy-panels would be the same.
> 
> In summary,
> 
> - Metrics are more of an organizational concern, not specifically a broker
>   operator concern.
> 
> - The proposal seems to hijack how metrics are gathered within organizations
> 
> - I don't think KIPs should dictate which metrics should be gathered and which
>   should not. Clients instead should make it easy for users to gather anything
>   they could be interested in, and ignore anything they are not.
> 
> - I think hooks are more extensible, more exact, and fit better into
>   organizational workflows.
> 
> On 2021/06/02 12:45:45, Magnus Edenhill <mag...@edenhill.se> wrote: 
> > Hey all,
> > 
> > I'm proposing KIP-714 to add remote Client metrics and observability.
> > This functionality will allow centralized monitoring and troubleshooting of
> > clients and their internals.
> > 
> > Please see
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability
> > 
> > Looking forward to your feedback!
> > 
> > Regards,
> > Magnus
> > 
>

Re: [DISCUSS] KIP-714: Client metrics and observability

Reply via email to