Re: [DISCUSS] KIP-714: Client metrics and observability

Tom Bentley Tue, 08 Mar 2022 01:11:30 -0800

Hi Ashenafi,

You'll need to unsubscribe from the dev mailing list by sending an email to
dev-unsubscr...@kafka.apache.org. No one else can do this for you.


Kind regards,

Tom

On Tue, 8 Mar 2022 at 04:40, Ashenafi Marcos <ashenafi...@gmail.com> wrote:

> Hi,
> Can you please take out my email I’d so that will not be able to receive
> any mail from you.
> Thank you
>
> On Tue, Oct 19, 2021 at 1:30 PM Mickael Maison <mickael.mai...@gmail.com>
> wrote:
>
> > Hi Magnus,
> >
> > Thanks for the proposal.
> >
> > 1. Looking at the protocol section, isn't "ClientInstanceId" expected
> > to be a field in GetTelemetrySubscriptionsResponseV0? Otherwise, how
> > does a client retrieve this value?
> >
> > 2. In the client API section, you mention a new method
> > "clientInstanceId()". Can you clarify which interfaces are affected?
> > Is it only Consumer and Producer?
> >
> > 3. I'm a bit concerned this is enabled by default. Even if the data
> > collected is supposed to be not sensitive, I think this can be
> > problematic in some environments. Also users don't seem to have the
> > choice to only expose some metrics. Knowing how much data transit
> > through some applications can be considered critical.
> >
> > 4. As a user, how do you know if your application is actively sending
> > metrics? Are there new metrics exposing what's going on, like how much
> > data is being sent?
> >
> > 5. If all metrics are enabled on a regular Consumer or Producer, do
> > you have an idea how much throughput this would use?
> >
> > Thanks
> >
> > On Tue, Oct 19, 2021 at 5:06 PM Magnus Edenhill <mag...@edenhill.se>
> > wrote:
> > >
> > > Den tis 19 okt. 2021 kl 13:22 skrev Tom Bentley <tbent...@redhat.com>:
> > >
> > > > Hi Magnus,
> > > >
> > > > I reviewed the KIP since you called the vote (sorry for not reviewing
> > when
> > > > you announced your intention to call the vote). I have a few
> questions
> > on
> > > > some of the details.
> > > >
> > > > 1. There's no Javadoc on ClientTelemetryPayload.data(), so I don't
> know
> > > > whether the payload is exposed through this method as compressed or
> > not.
> > > > Later on you say "Decompression of the payloads will be handled by
> the
> > > > broker metrics plugin, the broker should expose a suitable
> > decompression
> > > > API to the metrics plugin for this purpose.", which suggests it's the
> > > > compressed data in the buffer, but then we don't know which codec was
> > used,
> > > > nor the API via which the plugin should decompress it if required for
> > > > forwarding to the ultimate metrics store. Should the
> > ClientTelemetryPayload
> > > > expose a method to get the compression and a decompressor?
> > > >
> > >
> > > Good point, updated.
> > >
> > >
> > >
> > > > 2. The client-side API is expressed as StringOrError
> > > > ClientInstance::ClientInstanceId(int timeout_ms). I understand that
> > you're
> > > > thinking about the librdkafka implementation, but it would be good to
> > show
> > > > the API as it would appear on the Apache Kafka clients.
> > > >
> > >
> > > This was meant as pseudo-code, but I changed it to Java.
> > >
> > >
> > > > 3. "PushTelemetryRequest|Response - protocol request used by the
> > client to
> > > > send metrics to any broker it is connected to." To be clear, this
> means
> > > > that the client can choose any of the connected brokers and push to
> > just
> > > > one of them? What should a supporting client do if it gets an error
> > when
> > > > pushing metrics to a broker, retry sending to the same broker or try
> > > > pushing to another broker, or drop the metrics? Should supporting
> > clients
> > > > send successive requests to a single broker, or round robin, or is
> > that up
> > > > to the client author? I'm guessing the behaviour should be sticky to
> > > > support the rate limiting features, but I think it would be good for
> > client
> > > > authors if this section were explicit on the recommended behaviour.
> > > >
> > >
> > > You are right, I've updated the KIP to make this clearer.
> > >
> > >
> > > > 4. "Mapping the client instance id to an actual application instance
> > > > running on a (virtual) machine can be done by inspecting the metrics
> > > > resource labels, such as the client source address and source port,
> or
> > > > security principal, all of which are added by the receiving broker.
> > This
> > > > will allow the operator together with the user to identify the actual
> > > > application instance." Is this really always true? The source IP and
> > port
> > > > might be a loadbalancer/proxy in some setups. The principal, as
> already
> > > > mentioned in the KIP, might be shared between multiple applications.
> > So at
> > > > worst the organization running the clients might have to consult the
> > logs
> > > > of a set of client applications, right?
> > > >
> > >
> > > Yes, that's correct. There's no guaranteed mapping from
> > client_instance_id
> > > to
> > > an actual instance, that's why the KIP recommends client
> implementations
> > to
> > > log the client instance id
> > > upon retrieval, and also provide an API for the application to retrieve
> > the
> > > instance id programmatically
> > > if it has a better way of exposing it.
> > >
> > >
> > > 5. "Tests indicate that a compression ratio up to 10x is possible for
> the
> > > > standard metrics." Client authors might appreciate your mentioning
> > which
> > > > compression codec got these results.
> > > >
> > >
> > > Good point. Updated.
> > >
> > >
> > > > 6. "Should the client send a push request prior to expiry of the
> > previously
> > > > calculated PushIntervalMs the broker will discard the metrics and
> > return a
> > > > PushTelemetryResponse with the ErrorCode set to RateLimited." Is this
> > > > RATE_LIMITED a new error code? It's not mentioned in the "New Error
> > Codes"
> > > > section.
> > > >
> > >
> > > That's a leftover, it should be using the standard ThrottleTime
> > mechanism.
> > > Fixed.
> > >
> > >
> > > > 7. In the section "Standard client resource labels" application_id is
> > > > described as Kafka Streams only, but the section of "Client
> > Identification"
> > > > talks about "application instance id as an optional future
> nice-to-have
> > > > that may be included as a metrics label if it has been set by the
> > user", so
> > > > I'm confused whether non-Kafka Streams clients should set an
> > application_id
> > > > or not.
> > > >
> > >
> > > I'll clarify this in the KIP, but basically we would need to add an `
> > > application.id` config
> > > property for non-streams clients for this purpose, and that's outside
> the
> > > scope of this KIP since we want to make it zero-conf:ish on the client
> > side.
> > >
> > >
> > > >
> > > > Kind regards,
> > > >
> > > > Tom
> > > >
> > >
> > > Thanks for the review,
> > > Magnus
> > >
> > >
> > >
> > > >
> > > > On Thu, Oct 7, 2021 at 5:26 PM Magnus Edenhill <mag...@edenhill.se>
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I've updated the KIP following our recent discussions on the
> mailing
> > > > list:
> > > > >  - split the protocol in two, one for getting the metrics
> > subscriptions,
> > > > > and one for pushing the metrics.
> > > > >  - simplifications: initially only one supported metrics format, no
> > > > > client.id in the instance id, etc.
> > > > >  - made CLIENT_METRICS subscription configuration entries more
> > structured
> > > > >    and allowing better client matching selectors (not only on the
> > > > instance
> > > > > id, but also the other
> > > > >    client resource labels, such as client_software_name, etc.).
> > > > >
> > > > > Unless there are further comments I'll call the vote in a day or
> two.
> > > > >
> > > > > Regards,
> > > > > Magnus
> > > > >
> > > > >
> > > > >
> > > > > Den mån 4 okt. 2021 kl 20:57 skrev Magnus Edenhill <
> > mag...@edenhill.se>:
> > > > >
> > > > > > Hi Gwen,
> > > > > >
> > > > > > I'm finishing up the KIP based on the last couple of discussion
> > points
> > > > in
> > > > > > this thread
> > > > > > and will call the Vote later this week.
> > > > > >
> > > > > > Best,
> > > > > > Magnus
> > > > > >
> > > > > > Den lör 2 okt. 2021 kl 02:01 skrev Gwen Shapira
> > > > > <g...@confluent.io.invalid
> > > > > > >:
> > > > > >
> > > > > >> Hey,
> > > > > >>
> > > > > >> I noticed that there was no discussion for the last 10 days,
> but I
> > > > > >> couldn't
> > > > > >> find the vote thread. Is there one that I'm missing?
> > > > > >>
> > > > > >> Gwen
> > > > > >>
> > > > > >> On Wed, Sep 22, 2021 at 4:58 AM Magnus Edenhill <
> > mag...@edenhill.se>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Den tis 21 sep. 2021 kl 06:58 skrev Colin McCabe <
> > > > cmcc...@apache.org
> > > > > >:
> > > > > >> >
> > > > > >> > > On Mon, Sep 20, 2021, at 17:35, Feng Min wrote:
> > > > > >> > > > Thanks Magnus & Colin for the discussion.
> > > > > >> > > >
> > > > > >> > > > Based on KIP-714's stateless design, Client can pretty
> much
> > use
> > > > > any
> > > > > >> > > > connection to any broker to send metrics. We are not
> > associating
> > > > > >> > > connection
> > > > > >> > > > with client metric state. Is my understanding correct? If
> > yes,
> > > > > how
> > > > > >> > about
> > > > > >> > > > the following two scenarios
> > > > > >> > > >
> > > > > >> > > > 1) One Client (Client-ID) registers two different client
> > > > instance
> > > > > id
> > > > > >> > via
> > > > > >> > > > separate registration. Is it permitted? If OK, how to
> > > > distinguish
> > > > > >> them
> > > > > >> > > from
> > > > > >> > > > the case 2 below.
> > > > > >> > > >
> > > > > >> > >
> > > > > >> > > Hi Feng,
> > > > > >> > >
> > > > > >> > > My understanding, which Magnus can clarify I guess, is that
> > you
> > > > > could
> > > > > >> > have
> > > > > >> > > something like two Producer instances running with the same
> > > > > client.id
> > > > > >> > > (perhaps because they're using the same config file, for
> > example).
> > > > > >> They
> > > > > >> > > could even be in the same process. But they would get
> separate
> > > > > UUIDs.
> > > > > >> > >
> > > > > >> > > I believe Magnus used the term client to mean "Producer or
> > > > > Consumer".
> > > > > >> So
> > > > > >> > > if you have both a Producer and a Consumer in your
> > application I
> > > > > would
> > > > > >> > > expect you'd get separate UUIDs for both. Again Magnus can
> > chime
> > > > in
> > > > > >> > here, I
> > > > > >> > > guess.
> > > > > >> > >
> > > > > >> >
> > > > > >> > That's correct.
> > > > > >> >
> > > > > >> >
> > > > > >> > >
> > > > > >> > > > 2) How about the client restarting? What's the
> expectation?
> > > > Should
> > > > > >> the
> > > > > >> > > > server expect the client to carry a persisted client
> > instance id
> > > > > or
> > > > > >> > > should
> > > > > >> > > > the client be treated as a new instance?
> > > > > >> > >
> > > > > >> > > The KIP doesn't describe any mechanism for persistence, so I
> > would
> > > > > >> assume
> > > > > >> > > that when you restart the client you get a new UUID. I agree
> > that
> > > > it
> > > > > >> > would
> > > > > >> > > be good to spell this out.
> > > > > >> > >
> > > > > >> > >
> > > > > >> > Right, it will not be persisted since a client instance can't
> be
> > > > > >> restarted.
> > > > > >> >
> > > > > >> > Will update the KIP to make this clearer.
> > > > > >> >
> > > > > >> > /Magnus
> > > > > >> >
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Gwen Shapira
> > > > > >> Engineering Manager | Confluent
> > > > > >> 650.450.2760 | @gwenshap
> > > > > >> Follow us: Twitter | blog
> > > > > >>
> > > > > >
> > > > >
> > > >
> >
>

Re: [DISCUSS] KIP-714: Client metrics and observability

Reply via email to