Re: [DISCUSS] KIP-714: Client metrics and observability

Magnus Edenhill Tue, 29 Jun 2021 07:31:42 -0700

Den fre 18 juni 2021 kl 22:32 skrev Travis Bischel <travis.bisc...@gmail.com
>:


> H Colin (and Magnus),
>
> Thanks for the replies!
>
> I think the biggest concern I have is the cardinality bits. I'm
> sympathetic to the aspect of this making it easier for Kafka brokers to
> understand *every* aspect of the kafka ecoystem. I am not sure this will
> 100% solve the need there, though: if a client is unable to connect to a
> broker, visibility disappears immediately, no?
>

At the end of the day this is an unsolvable problem, but what the proposed
approach gives us is a channel that is operational when Kafka is
operational, regardless of external systems.
If a Kafka client can't connect to Kafka, its internal Kafka metrics are
not the main interest, but rather on the connectivity/networking side.



>
> I do still think that the problem of difficulty of monitoring within an
> organization results from issues within organizations themselves: orgs
> should have proper processes in place such that anything talking to Kafka
> has the org's plug-in monitoring libraries. Kafka operators can define
> those libraries, such that all clients in the org have the libraries the
> operators require. This satisfies the same goals this KIP aims to provide,
> albeit with the increased org cost of not just having something defined to
> be plugged in.
>

Yeah, that would be great, and some orgs do indeed come close to this. But
most don't, and then there's the case of multi-org; where the client
developers and cluster operators reside in different organizations.



>
> If Kafka operators themselves can which metrics they want, so that the
> broker can tell the client "only send these metrics", then my biggest
> concern is removed.
>

That's indeed how it works, the metrics that a client pushes are set up by
the cluster operator (et.al) by configuring metrics subscriptions. The
client will not
send any metrics that have not been centrally requested/subscribed, it is
all controlled from the cluster; what clients sends what metrics at what
interval.



>
> I do still think that hooks can be a cleaner abstraction to this same
> goal, and then pre-provided libraries (say, "this library provides X,Y,Z
> and sends to prometheus from your client") could exist that more exactly
> satisfy what this KIP aims to provide. This would also avoid the
> kitchen-sink vs. not-comprehensive-enough issue I brought up previously.
> This would also avoid require KIPs for any supported metrics.
>


This defeats the general availability always-on goal of the KIP though:
client metrics available on demand out of the box.


Thanks for your comments Travis.

/Magnus





> On 2021/06/16 22:27:55, "Colin McCabe" <cmcc...@apache.org> wrote:
> > On Sun, Jun 13, 2021, at 21:51, Travis Bischel wrote:
> > > Hi! I have a few thoughts on this KIP. First, I'd like to thank you
> for
> > > the writeup,
> > > clearly a lot of thought has gone into it and it is very thorough.
> > > However, I'm not
> > > convinced it's the right approach from a fundamental level.
> > >
> > > Fundamentally, this KIP seems like somewhat of a solution to an
> organizational
> > > problem. Metrics are organizational concerns, not Kafka operator
> concerns.
> >
> > Hi Travis,
> >
> > Metrics are certainly Kafka operator concerns. It is very important for
> cluster operators to know things like how many clients there are, what they
> clients are doing, and so forth. This information is needed to administer
> Kafka. Therefore it certainly falls in the domain of the Kafka operations
> team (and the Kafka development team.)
> >
> > We have added many metrics in the past to make it easier to monitor
> clients. I think this is just another step in that direction.
> >
> > > Clients should make it easy to plug in metrics (this is the approach I
> take in
> > > my own client), and organizations should have processes such that all
> clients
> > > gather and ship metrics how that organization desires.
> > >
> > > If an organization is set up correctly, there is no reason for metrics
> to be
> > > forwarded through Kafka. This feels like a solution to an organization
> not
> > > properly setting up how processes ship metrics, and in some ways, it's
> an
> > > overbroad solution, and in other ways, it doesn't cover the entire
> problem.
> >
> > I think the reason was explained pretty clearly: many admins find it
> difficult to set up monitoring for every client in the organization. In
> general the team which maintains a Kafka cluster is often separate from the
> teams that use the cluster. Therefore rolling out monitoring for clients
> can be very difficult to coordinate.
> >
> > No metrics will ever cover every possible use-case, but the set proposed
> here does seem useful.
> >
> > >
> > > From the perspective of Kafka operators, it is easy to see that this
> KIP is
> > > nice in that it just dictates what clients should support for metrics
> and that
> > > the metrics should ship through Kafka. But, from the perspective of an
> > > observability team, this workflow is basically hijacking the standard
> flow that
> > > organizations may have. I would rather have applications collect
> metrics and
> > > ship them the same way every other application does. I'd rather not
> have to
> > > configure additional plugins within Kafka to take metrics and forward
> them.
> >
> > This change doesn't remove any functionality. If you don't want to use
> KIP-714 metrics collection, you can simply turn it off and continue
> collecting metrics the way you always have.
> >
> > >
> > > More importantly, this KIP prescibes cardinality problems, requires
> that to
> > > officially support the KIP a client must support all relevant metrics
> within
> > > the KIP, and requires that a client cannot support other metrics
> unless those
> > > other metrics also go through a KIP process. It is difficult to
> imagine all of
> > > these metrics being relevant to every organization, and there is no
> way for an
> > > organization to filter what is relevant within the client. Instead, the
> > > filtering is pushed downwards, meaning more network IO and more CPU
> costs to
> > > filter what is irrelevant and aggregate what needs to be aggregated,
> and more
> > > time for an organization to setup whatever it is that will be doing
> this
> > > filtering and aggregating. Contrast this with a client that enables
> hooking in
> > > to capture numbers that are relevant within an org itself: the org can
> gather
> > > what they want, ship only want they want, and ship directly to the
> > > observability system they have already set up. As an aside, it may
> also be
> > > wise to avoid shipping metrics through Kafka about client interaction
> with
> > > Kafka, because if Kafka is having problems, then orgs lose insight
> into those
> > > problems. This would be like statuspage using itself for status on its
> own
> > > systems.
> > >
> > > Another downside is that by dictating the important metrics, this KIP
> either
> > > has two choices: try to choose what is important to every org, and
> inevitably
> > > leave out something important to somebody else, or just add everything
> and let
> > > the orgs filter. This KIP mostly looks to go with the latter approach,
> meaning
> > > orgs will be shipping & filtering. With hooks, an org would be able to
> gather
> > > exactly what they want.
> >
> > I actually do agree with this criticism to some extent. It would be good
> if the broker could specify what metrics it wants, and the clients would
> send only those metrics.
> >
> > More generally, I'd like to see this split up into several RPCs rather
> than one mega-RPC.
> >
> > Maybe something like
> > 1. RegisterClient{Request,Response}
> > 2. ClientMetricsReport{Request,Response}
> > 3. UnregisterClient{Request,Response}
> >
> > Then the broker can communicate which metrics it wants in
> RegisterClientResponse. It can also assign a client instance ID (which I
> think should be a UUID, not another string).
> >
> > >
> > > As well, I expect that org applications have metrics on the state of
> the
> > > applications outside of the Kafka client. Applications are already
> sending
> > > non-Kafka-client related metrics outbound to observability systems. If
> a Kafka
> > > client provided hooks, then users could just gather the additional
> relevant
> > > Kafka client metrics and ship those metrics the same way they do all
> of their
> > > other metrics. It feels a bit odd for a Kafka client to have its own
> separate
> > > way of forwarding metrics. Another benefit hooks in clients is that
> > > organizations do not _have_ to set up additional plugins to forward
> metrics
> > > from Kafka. Hooks avoid extra organizational work.
> >
> > Again, if you want to continue collecting metrics directly from clients,
> you can simply do that. Nothing has to change for you as a result of this
> KIP.
> >
> > >
> > > The option that the KIP provides for users of clients to opt out of
> metrics may
> > > avoid some of the above issues (by just disabling things at the user
> level),
> > > but that's not really great from the perspective of client authors,
> because the
> > > existence of this KIP forces authors to either just not implement the
> KIP, or
> > > increase complexity within the KIP. Further, from an operator
> perspective, if I
> > > would prefer clients to ship metrics through the systems they already
> have in
> > > place, now I have to expect that anything that uses librdkafka or the
> official
> > > Java client will be shipping me metrics that I have to deal with
> (since the KIP
> > > is default enabled).
> >
> > It is clear that we want to avoid unnecessary complexity. However, the
> ability to easily gather metrics from clients without doing a lot of
> client-side configuration is not an ability that we currently have, and
> this KIP changes that. So I think it's very well worth it.
> >
> > >
> > > Lastly, I'm a little wary that this KIP may stem from a product goal of
> > > Confluent: since most everything uses librdkafka or the Java client,
> then by
> > > defaulting clients sending metrics, Confluent gets an easy way to
> provide
> > > metric panels for a nice cloud UI. If any client does not want to
> support these
> > > metrics, and then a user wonders why these hypothetical panels have no
> metrics,
> > > then Confluent can just reply "use a supported client".  Even if this
> > > (potentially unlikely) scenario is true, then hooks would still be a
> great
> > > alternative, because then Confluent could provide drop-in hooks for
> any client
> > > and the end result of easy-panels would be the same.
> > >
> >
> > In general, if a feature provides a benefit to users and operators,
> that's a reason to put it in, not a reason to leave it out. We want to make
> Kafka better, which includes making it better for vendors. There is nothing
> Confluent-specific in the proposal.
> >
> > > In summary,
> > >
> > > - Metrics are more of an organizational concern, not specifically a
> broker
> > >   operator concern.
> > >
> > > - The proposal seems to hijack how metrics are gathered within
> organizations
> > >
> > > - I don't think KIPs should dictate which metrics should be gathered
> and which
> > >   should not. Clients instead should make it easy for users to gather
> anything
> > >   they could be interested in, and ignore anything they are not.
> >
> > KIPs have always dictated which metrics are gathered and which should
> not. This is very intentionally part of the KIP process (since metrics are
> public API).
> >
> > best,
> > Colin
> >
> > >
> > > - I think hooks are more extensible, more exact, and fit better into
> > >   organizational workflows.
> > >
> > > On 2021/06/02 12:45:45, Magnus Edenhill <mag...@edenhill.se> wrote:
> > > > Hey all,
> > > >
> > > > I'm proposing KIP-714 to add remote Client metrics and observability.
> > > > This functionality will allow centralized monitoring and
> troubleshooting of
> > > > clients and their internals.
> > > >
> > > > Please see
> > > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability
> > > >
> > > > Looking forward to your feedback!
> > > >
> > > > Regards,
> > > > Magnus
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-714: Client metrics and observability

Reply via email to