Hi, Matthias, 130. Yes, throwing an exception sounds reasonable. It would be useful to document this.
131. I was thinking that we could just return all consumers (including the global consumer) through Map<String, String> consumerInstanceIds() and use keys to identify each consumer instance. The benefit is that the implementation (whether to use a separate global consumer or not) could change in the future, but the API can remain the same. Another slight benefit is that there is no need for returning Optional<String>. If the global consumer is not used, it just won't be included in the map. Thanks, Jun On Thu, Oct 12, 2023 at 9:30 AM Matthias J. Sax <mj...@apache.org> wrote: > Thanks Sophie and Jun. > > `clientInstanceIds()` is fine with me -- was not sure about the double > plural myself. > > Sorry if my comments was confusing. I was trying to say, that adding a > overload to `KafkaStreams` that does not take a timeout parameter does > not make sense, because there is no `default.api.timeout.ms` config for > Kafka Streams, so users always need to pass in a timeout. (Same for > producer.) > > For the implementation, I think KS would always call > `client.clientInstanceId(timeout)` and never rely on > `default.api.timeout.ms` though, so we can stay in control -- if a > timeout is passed by the user, it would always overwrite > `default.api.timeout.ms` on the consumer/admin and thus we should follow > the same semantics in Kafka Streams, and overwrite it explicitly when > calling `client.clientInstanceId()`. > > The proposed API also makes sense to me. I was just wondering if we want > to extend it for client users -- for KS we won't need/use the > timeout-less overloads. > > > > 130) My intent was to throw a TimeoutException if we cannot get all > instanceIds, because it's the standard contract for timeouts. It would > also be hard to tell for a user, if a full or partial result was > returned (or we add a method `boolean isPartialResult()` to make it > easier for users). > > If there is concerns/objections, I am also ok to return a partial result > -- it would require a change to the newly added `ClientInstanceIds` > return type -- for `adminInstanceId` we only return a `String` right now > -- we might need to change this to `Optional<String>` so we are able to > return a partial result? > > > 131) Of course we could, but I am not sure what we would gain? In the > end, implementation details would always leak because if we change the > number of consumer we use, we would return different keys in the `Map`. > Atm, the proposal implies that the same key might be used for the "main" > and "restore" consumer of the same thread -- but we can make keys unique > by adding a `-restore` suffix to the restore-consumer key if we merge > both maps. -- Curious to hear what others think. I am very open to do it > differently than currently proposed. > > > -Matthias > > > On 10/12/23 8:39 AM, Jun Rao wrote: > > Hi, Matthias, > > > > Thanks for the reply. > > > > 130. What would be the semantic? If the timeout has expired and only some > > of the client instances' id have been retrieved, does the call return the > > partial result or throw an exception? > > > > 131. Could we group all consumer instances in a single method since we > are > > returning the key for each instance already? This probably also avoids > > exposing implementation details that could change over time. > > > > Thanks, > > > > Jun > > > > On Thu, Oct 12, 2023 at 12:00 AM Sophie Blee-Goldman < > sop...@responsive.dev> > > wrote: > > > >> Regarding the naming, I personally think `clientInstanceId` makes sense > for > >> the plain clients > >> -- especially if we might later introduce the notion of an > >> `applicationInstanceId`. > >> > >> I'm not a huge fan of `clientsInstanceIds` for the Kafka Streams API, > >> though, can we use > >> `clientInstanceIds` instead? (The difference being the placement of the > >> plural 's') > >> I would similarly rename the class to just ClientInstanceIds > >> > >> we can also not have a timeout-less overload, because `KafkaStreams` > does > >>> not have a `default.api.timeout.ms` config either > >> > >> With respect to the timeout for the Kafka Streams API, I'm a bit > confused > >> by the > >> doubletriple-negative of Matthias' comment here, but I was thinking > about > >> this > >> earlier and this was my take: with the current proposal, we would allow > >> users to pass > >> in an absolute timeout as a parameter that would apply to the method as > a > >> whole. > >> Meanwhile within the method we would issue separate calls to each of the > >> clients using > >> the default or user-configured value of their `default.api.timeout.ms` > as > >> the timeout > >> parameter. > >> > >> So the API as proposed makes sense to me. > >> > >> > >> On Wed, Oct 11, 2023 at 6:48 PM Matthias J. Sax <mj...@apache.org> > wrote: > >> > >>> In can answer 130 and 131. > >>> > >>> 130) We cannot guarantee that all clients are already initialized due > to > >>> race conditions. We plan to not allow calling > >>> `KafkaStreams#clientsInstanceIds()` when the state is not RUNNING (or > >>> REBALANCING) though -- guess this slipped on the KIP and should be > >>> added? But because StreamThreads can be added dynamically (and producer > >>> might be created dynamically at runtime; cf below), we still cannot > >>> guarantee that all clients are already initialized when the method is > >>> called. Of course, we assume that all clients are most likely > initialize > >>> on the happy path, and blocking calls to `client.clientInstanceId()` > >>> should be rare. > >>> > >>> To address the worst case, we won't do a naive implementation and just > >>> loop over all clients, but fan-out the call to the different > >>> StreamThreads (and GlobalStreamThread if it exists), and use Futures to > >>> gather the results. > >>> > >>> Currently, `StreamThreads` has 3 clients (if ALOS or EOSv2 is used), so > >>> we might do 3 blocking calls in the worst case (for EOSv1 we get a > >>> producer per tasks, and we might end up doing more blocking calls if > the > >>> producers are not initialized yet). Note that EOSv1 is already > >>> deprecated, and we are also working on thread refactoring that will > >>> reduce the number of client on StreamThread to 2 -- and we have more > >>> refactoring planned to reduce the number of clients even further. > >>> > >>> Inside `KafakStreams#clientsInstanceIds()` we might only do single > >>> blocking call for the admin client (ie, `admin.clientInstanceId()`). > >>> > >>> I agree that we need to do some clever timeout management, but it seems > >>> to be more of an implementation detail? > >>> > >>> Do you have any particular concerns, or does the proposed > implementation > >>> as sketched above address your question? > >>> > >>> > >>> 130) If the Topology does not have a global-state-store, there won't be > >>> a GlobalThread and thus not global consumer. Thus, we return an > Optional. > >>> > >>> > >>> > >>> On three related question for Andrew. > >>> > >>> (1) Why is the method called `clientInstanceId()` and not just plain > >>> `instanceId()`? > >>> > >>> (2) Why so we return a `String` while but not a UUID type? The added > >>> protocol request/response classes use UUIDs. > >>> > >>> (3) Would it make sense to have an overloaded `clientInstanceId()` > >>> method that does not take any parameter but uses `default.api.timeout` > >>> config (this config does no exist on the producer though, so we could > >>> only have it for consumer and admin at this point). We could of course > >>> also add overloads like this later if user request them (and/or add > >>> `default.api.timeout.ms` to the producer, too). > >>> > >>> Btw: For KafkaStreams, I think `clientsInstanceIds` still makes sense > as > >>> a method name though, as `KafkaStreams` itself does not have an > >>> `instanceId` -- we can also not have a timeout-less overload, because > >>> `KafkaStreams` does not have a `default.api.timeout.ms` config either > >>> (and I don't think it make sense to add). > >>> > >>> > >>> > >>> -Matthias > >>> > >>> On 10/11/23 5:07 PM, Jun Rao wrote: > >>>> Hi, Andrew, > >>>> > >>>> Thanks for the updated KIP. Just a few more minor comments. > >>>> > >>>> 130. KafkaStreams.clientsInstanceId(Duration timeout): Does it wait > for > >>> all > >>>> consumer/producer/adminClient instances to be initialized? Are all > >> those > >>>> instances created during KafkaStreams initialization? > >>>> > >>>> 131. Why does globalConsumerInstanceId() return Optional<String> while > >>>> other consumer instances don't return Optional? > >>>> > >>>> 132. ClientMetricsSubscriptionRequestCount: Do we need this since we > >>> have a > >>>> set of generic metrics > >>>> (kafka.network:type=RequestMetrics,name=RequestsPerSec,request=*) that > >>>> report Request rate for every request type? > >>>> > >>>> Thanks, > >>>> > >>>> Jun > >>>> > >>>> On Wed, Oct 11, 2023 at 1:47 PM Matthias J. Sax <mj...@apache.org> > >>> wrote: > >>>> > >>>>> Thanks! > >>>>> > >>>>> On 10/10/23 11:31 PM, Andrew Schofield wrote: > >>>>>> Matthias, > >>>>>> Yes, I think that’s a sensible way forward and the interface you > >>> propose > >>>>> looks good. I’ll update the KIP accordingly. > >>>>>> > >>>>>> Thanks, > >>>>>> Andrew > >>>>>> > >>>>>>> On 10 Oct 2023, at 23:01, Matthias J. Sax <mj...@apache.org> > wrote: > >>>>>>> > >>>>>>> Andrew, > >>>>>>> > >>>>>>> yes I would like to get this change into KIP-714 right way. Seems > to > >>> be > >>>>> important, as we don't know if/when a follow-up KIP for Kafka Streams > >>> would > >>>>> land. > >>>>>>> > >>>>>>> I was also thinking (and discussed with a few others) how to expose > >>> it, > >>>>> and we would propose the following: > >>>>>>> > >>>>>>> We add a new method to `KafkaStreams` class: > >>>>>>> > >>>>>>> public ClientsInstanceIds clientsInstanceIds(Duration > timeout); > >>>>>>> > >>>>>>> The returned object is like below: > >>>>>>> > >>>>>>> public class ClientsInstanceIds { > >>>>>>> // we only have a single admin client per KS instance > >>>>>>> String adminInstanceId(); > >>>>>>> > >>>>>>> // we only have a single global consumer per KS instance (if > >> any) > >>>>>>> // Optional<> because we might not have global-thread > >>>>>>> Optional<String> globalConsumerInstanceId(); > >>>>>>> > >>>>>>> // return a <threadKey -> ClientInstanceId> mapping > >>>>>>> // for the underlying (restore-)consumers/producers > >>>>>>> Map<String, String> mainConsumerInstanceIds(); > >>>>>>> Map<String, String> restoreConsumerInstanceIds(); > >>>>>>> Map<String, String> producerInstanceIds(); > >>>>>>> } > >>>>>>> > >>>>>>> For the `threadKey`, we would use some pattern like this: > >>>>>>> > >>>>>>> [Stream|StateUpdater]Thread-<threadIdx> > >>>>>>> > >>>>>>> > >>>>>>> Would this work from your POV? > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -Matthias > >>>>>>> > >>>>>>> > >>>>>>> On 10/9/23 2:15 AM, Andrew Schofield wrote: > >>>>>>>> Hi Matthias, > >>>>>>>> Good point. Makes sense to me. > >>>>>>>> Is this something that can also be included in the proposed Kafka > >>>>> Streams follow-on KIP, or would you prefer that I add it to KIP-714? > >>>>>>>> I have a slight preference for the former to put all of the KS > >>>>> enhancements into a separate KIP. > >>>>>>>> Thanks, > >>>>>>>> Andrew > >>>>>>>>> On 7 Oct 2023, at 02:12, Matthias J. Sax <mj...@apache.org> > >> wrote: > >>>>>>>>> > >>>>>>>>> Thanks Andrew. SGTM. > >>>>>>>>> > >>>>>>>>> One point you did not address is the idea to add a method to > >>>>> `KafkaStreams` similar to the proposed `clientInstanceId()` that will > >> be > >>>>> added to consumer/producer/admin clients. > >>>>>>>>> > >>>>>>>>> Without addressing this, Kafka Streams users won't have a way to > >> get > >>>>> the assigned `instanceId` of the internally created clients, and thus > >> it > >>>>> would be very difficult for them to know which metrics that the > broker > >>>>> receives belong to a Kafka Streams app. It seems they would only find > >>> the > >>>>> `instanceIds` in the log4j output if they enable client logging? > >>>>>>>>> > >>>>>>>>> Of course, because there is multiple clients inside Kafka > Streams, > >>>>> the return type cannot be an single "String", but must be some some > >>> complex > >>>>> data structure -- we could either add a new class, or return a > >>>>> Map<String,String> using a client key that maps to the `instanceId`. > >>>>>>>>> > >>>>>>>>> For example we could use the following key: > >>>>>>>>> > >>>>>>>>> > >>> [Global]StreamThread[-<threadIndex>][-restore][consumer|producer] > >>>>>>>>> > >>>>>>>>> (Of course, only the valid combination.) > >>>>>>>>> > >>>>>>>>> Or maybe even better, we might want to return a `Future` because > >>>>> collection all the `instanceId` might be a blocking all on each > >> client? > >>> I > >>>>> have already a few idea how it could be implemented but I don't think > >> it > >>>>> must be discussed on the KIP, as it's an implementation detail. > >>>>>>>>> > >>>>>>>>> Thoughts? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -Matthias > >>>>>>>>> > >>>>>>>>> On 10/6/23 4:21 AM, Andrew Schofield wrote: > >>>>>>>>>> Hi Matthias, > >>>>>>>>>> Thanks for your comments. I agree that a follow-up KIP for Kafka > >>>>> Streams makes sense. This KIP currently has made a bit > >>>>>>>>>> of an effort to embrace KS, but it’s not enough by a long way. > >>>>>>>>>> I have removed `application.id <http://application.id/>`. This > >>>>> should be done properly in the follow-up KIP. I don’t believe there’s > >> a > >>>>> downside to > >>>>>>>>>> removing it from this KIP. > >>>>>>>>>> I have reworded the statement about temporarily. In practice, > the > >>>>> implementation of this KIP that’s going on while the voting > >>>>>>>>>> progresses happens to use delta temporality, but that’s an > >>>>> implementation detail. Supporting clients must support both > >>>>>>>>>> temporalities. > >>>>>>>>>> I thought about exposing the client instance ID as a metric, but > >>>>> non-numeric metrics are not usual practice and tools > >>>>>>>>>> do not universally support them. I don’t think the KIP is > >> improved > >>>>> by adding one now. > >>>>>>>>>> I have also added constants for the various Config classes for > >>>>> ENABLE_METRICS_PUSH_CONFIG, including to > >>>>>>>>>> StreamsConfig. It’s best to be explicit about this. > >>>>>>>>>> Thanks, > >>>>>>>>>> Andrew > >>>>>>>>>>> On 2 Oct 2023, at 23:47, Matthias J. Sax <mj...@apache.org> > >>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi, > >>>>>>>>>>> > >>>>>>>>>>> I did not pay attention to this KIP in the past; seems it was > >>>>> on-hold for a while. > >>>>>>>>>>> > >>>>>>>>>>> Overall it sounds very useful, and I think we should extend > this > >>>>> with a follow up KIP for Kafka Streams. What is unclear to me at this > >>> point > >>>>> is the statement: > >>>>>>>>>>> > >>>>>>>>>>>> Kafka Streams applications have an application.id configured > >> and > >>>>> this identifier should be included as the application_id metrics > >> label. > >>>>>>>>>>> > >>>>>>>>>>> The `application.id` is currently only used as the (main) > >>>>> consumer's `group.id` (and is part of an auto-generated `client.id` > >> if > >>>>> the user does not set one). > >>>>>>>>>>> > >>>>>>>>>>> This comment related to: > >>>>>>>>>>> > >>>>>>>>>>>> The following labels should be added by the client as > >> appropriate > >>>>> before metrics are pushed. > >>>>>>>>>>> > >>>>>>>>>>> Given that Kafka Streams uses the consumer/producer/admin > client > >>> as > >>>>> "black boxes", a client does at this point not know that it's part of > >> a > >>>>> Kafka Streams application, and thus, it won't be able to attach any > >> such > >>>>> label to the metrics it sends. (Also producer and admin don't even > >> know > >>> the > >>>>> value of `application.id` -- only the (main) consumer, indirectly > >> via ` > >>>>> group.id`, but also restore and global consumer don't know it, > >> because > >>>>> they don't have `group.id` set). > >>>>>>>>>>> > >>>>>>>>>>> While I am totally in favor of the proposal, I am wondering how > >> we > >>>>> intent to implement it in clean way? Or would we do ok to have some > >>>>> internal client APIs that KS can use to "register" itself with the > >>> client? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> While clients must support both temporalities, the broker will > >>>>> initially only send > >>> GetTelemetrySubscriptionsResponse.DeltaTemporality=True > >>>>>>>>>>> > >>>>>>>>>>> Not sure if I can follow. How make the decision about DELTA or > >>>>> CUMULATIVE metrics? Should the broker side plugin not decide what > >>> metrics > >>>>> it what to receive in which form? So what does "initially" mean -- > the > >>>>> broker won't ship with a default plugin implementation? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> The following method is added to the Producer, Consumer, and > >>> Admin > >>>>> client interfaces: > >>>>>>>>>>> > >>>>>>>>>>> Should we add anything to Kafka Streams to expose the > underlying > >>>>> clients' assigned client-instance-ids programmatically? I am also > >>> wondering > >>>>> if clients should report their assigned client-instance-ids as > metrics > >>>>> itself (for this case, Kafka Streams won't need to do anything, > >> because > >>> we > >>>>> already expose all client metrics). > >>>>>>>>>>> > >>>>>>>>>>> If we add anything programmatic, we need to make it simple, > >> given > >>>>> that Kafka Streams has many clients per `StreamThread` and may have > >>>>> multiple threads. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> enable.metrics.push > >>>>>>>>>>> It might be worth to add this to `StreamsConfig`, too? It set > >> via > >>>>> StreamsConfig, we would forward it to all clients automatically. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -Matthias > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 9/29/23 5:45 PM, David Jacot wrote: > >>>>>>>>>>>> Hi Andrew, > >>>>>>>>>>>> Thanks for driving this one. I haven't read all the KIP yet > >> but I > >>>>> already > >>>>>>>>>>>> have an initial question. In the Threading section, it is > >> written > >>>>>>>>>>>> "KafkaConsumer: the "background" thread (based on the consumer > >>>>> threading > >>>>>>>>>>>> refactor which is underway)". If I understand this correctly, > >> it > >>>>> means > >>>>>>>>>>>> that KIP-714 won't work if the "old consumer" is used. Am I > >>>>> correct? > >>>>>>>>>>>> Cheers, > >>>>>>>>>>>> David > >>>>>>>>>>>> On Fri, Sep 22, 2023 at 12:18 PM Andrew Schofield < > >>>>>>>>>>>> andrew_schofield_j...@outlook.com> wrote: > >>>>>>>>>>>>> Hi Philip, > >>>>>>>>>>>>> No, I do not think it should actively search for a broker > that > >>>>> supports > >>>>>>>>>>>>> the new > >>>>>>>>>>>>> RPCs. In general, either all of the brokers or none of the > >>>>> brokers will > >>>>>>>>>>>>> support it. > >>>>>>>>>>>>> In the window, where the cluster is being upgraded or client > >>>>> telemetry is > >>>>>>>>>>>>> being > >>>>>>>>>>>>> enabled, there might be a mixed situation. I wouldn’t put too > >>>>> much effort > >>>>>>>>>>>>> into > >>>>>>>>>>>>> this mixed scenario. As the client finds brokers which > support > >>>>> the new > >>>>>>>>>>>>> RPCs, > >>>>>>>>>>>>> it can begin to follow the KIP-714 mechanism. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> Andrew > >>>>>>>>>>>>> > >>>>>>>>>>>>>> On 22 Sep 2023, at 20:01, Philip Nee <philip...@gmail.com> > >>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi Andrew - > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Question on top of your answers: Do you think the client > >> should > >>>>> actively > >>>>>>>>>>>>>> search for a broker that supports this RPC? As previously > >>>>> mentioned, the > >>>>>>>>>>>>>> broker uses the leastLoadedNode to find its first connection > >>> (am > >>>>>>>>>>>>>> I correct?), and what if that broker doesn't support the > >> metric > >>>>> push? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> P > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Fri, Sep 22, 2023 at 10:20 AM Andrew Schofield < > >>>>>>>>>>>>>> andrew_schofield_j...@outlook.com> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi Kirk, > >>>>>>>>>>>>>>> Thanks for your question. You are correct that the presence > >> or > >>>>> absence > >>>>>>>>>>>>> of > >>>>>>>>>>>>>>> the new RPCs in the > >>>>>>>>>>>>>>> ApiVersionsResponse tells the client whether to request the > >>>>> telemetry > >>>>>>>>>>>>>>> subscriptions and push > >>>>>>>>>>>>>>> metrics. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> This is of course tricky in practice. It would be > >> conceivable, > >>>>> as a > >>>>>>>>>>>>>>> cluster is upgraded to AK 3.7 > >>>>>>>>>>>>>>> or as a client metrics receiver plugin is deployed across > >> the > >>>>> cluster, > >>>>>>>>>>>>>>> that a client connects to some > >>>>>>>>>>>>>>> brokers that support the new RPCs and some that do not. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Here’s my suggestion: > >>>>>>>>>>>>>>> * If a client is not connected to any brokers that support > >> in > >>>>> the new > >>>>>>>>>>>>>>> RPCs, it cannot push metrics. > >>>>>>>>>>>>>>> * If a client is only connected to brokers that support the > >>> new > >>>>> RPCs, it > >>>>>>>>>>>>>>> will use the new RPCs in > >>>>>>>>>>>>>>> accordance with the KIP. > >>>>>>>>>>>>>>> * If a client is connected to some brokers that support the > >>> new > >>>>> RPCs and > >>>>>>>>>>>>>>> some that do not, it will > >>>>>>>>>>>>>>> use the new RPCs with the supporting subset of brokers in > >>>>> accordance > >>>>>>>>>>>>> with > >>>>>>>>>>>>>>> the KIP. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Comments? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>> Andrew > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 22 Sep 2023, at 16:01, Kirk True <k...@kirktrue.pro> > >>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi Andrew/Jun, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I want to make sure I understand question/comment #119… In > >>> the > >>>>> case > >>>>>>>>>>>>>>> where a cluster without a metrics client receiver is later > >>>>> reconfigured > >>>>>>>>>>>>> and > >>>>>>>>>>>>>>> restarted to include a metrics client receiver, do we want > >> the > >>>>> client to > >>>>>>>>>>>>>>> thereafter begin pushing metrics to the cluster? From > >> Andrew’s > >>>>> response > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>> question #119, it sounds like we’re using the > >> presence/absence > >>>>> of the > >>>>>>>>>>>>>>> relevant RPCs in ApiVersionsResponse as the > >>>>> to-push-or-not-to-push > >>>>>>>>>>>>>>> indicator. Do I have that correct? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>> Kirk > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Sep 21, 2023, at 7:42 AM, Andrew Schofield < > >>>>>>>>>>>>>>> andrew_schofield_j...@outlook.com> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hi Jun, > >>>>>>>>>>>>>>>>> Thanks for your comments. I’ve updated the KIP to clarify > >>>>> where > >>>>>>>>>>>>>>> necessary. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 110. Yes, agree. The motivation section mentions this. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 111. The replacement of ‘-‘ with ‘.’ for metric names and > >>> the > >>>>>>>>>>>>>>> replacement of > >>>>>>>>>>>>>>>>> ‘-‘ with ‘_’ for attribute keys is following the OTLP > >>>>> guidelines. I > >>>>>>>>>>>>>>> think it’s a bit > >>>>>>>>>>>>>>>>> of a debatable point. OTLP makes a distinction between a > >>>>> namespace > >>>>>>>>>>>>> and a > >>>>>>>>>>>>>>>>> multi-word component. If it was “client.id” then > “client” > >>>>> would be a > >>>>>>>>>>>>>>> namespace with > >>>>>>>>>>>>>>>>> an attribute key “id”. But “client_id” is just a key. So, > >> it > >>>>> was > >>>>>>>>>>>>>>> intentional, but debatable. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 112. Thanks. The link target moved. Fixed. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 113. Thanks. Fixed. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 114.1. If a standard metric makes sense for a client, it > >>>>> should use > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>> exact same > >>>>>>>>>>>>>>>>> name. If a standard metric doesn’t make sense for a > >> client, > >>>>> then it > >>>>>>>>>>>>> can > >>>>>>>>>>>>>>> omit that metric. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> For a required metric, the situation is stronger. All > >>> clients > >>>>> must > >>>>>>>>>>>>>>> implement these > >>>>>>>>>>>>>>>>> metrics with these names in order to implement the KIP. > >> But > >>>>> the > >>>>>>>>>>>>>>> required metrics > >>>>>>>>>>>>>>>>> are essentially the number of connections and the request > >>>>> latency, > >>>>>>>>>>>>>>> which do not > >>>>>>>>>>>>>>>>> reference the underlying implementation of the client > >> (which > >>>>>>>>>>>>>>> producer.record.queue.time.max > >>>>>>>>>>>>>>>>> of course does). > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I suppose someone might build a producer-only client that > >>>>> didn’t have > >>>>>>>>>>>>>>> consumer metrics. > >>>>>>>>>>>>>>>>> In this case, the consumer metrics would conceptually > have > >>>>> the value 0 > >>>>>>>>>>>>>>> and would not > >>>>>>>>>>>>>>>>> need to be sent to the broker. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 114.2. If a client does not implement some metrics, they > >>> will > >>>>> not be > >>>>>>>>>>>>>>> available for > >>>>>>>>>>>>>>>>> analysis and troubleshooting. It just makes the ability > to > >>>>> combine > >>>>>>>>>>>>>>> metrics from lots > >>>>>>>>>>>>>>>>> different clients less complete. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 115. I think it was probably a mistake to be so specific > >>> about > >>>>>>>>>>>>>>> threading in this KIP. > >>>>>>>>>>>>>>>>> When the consumer threading refactor is complete, of > >> course, > >>>>> it would > >>>>>>>>>>>>>>> do the appropriate > >>>>>>>>>>>>>>>>> equivalent. I’ve added a clarification and massively > >>>>> simplified this > >>>>>>>>>>>>>>> section. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 116. I removed “client.terminating”. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 117. Yes. Horrid. Fixed. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 118. The Terminating flag just indicates that this is the > >>>>> final > >>>>>>>>>>>>>>> PushTelemetryRequest > >>>>>>>>>>>>>>>>> from this client. Any subsequent request will be > >> rejected. I > >>>>> think > >>>>>>>>>>>>> this > >>>>>>>>>>>>>>> flag should remain. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 119. Good catch. This was actually contradicting another > >>> part > >>>>> of the > >>>>>>>>>>>>>>> KIP. The current behaviour > >>>>>>>>>>>>>>>>> is indeed preserved. If the broker doesn’t have a client > >>>>> metrics > >>>>>>>>>>>>>>> receiver plugin, the new RPCs > >>>>>>>>>>>>>>>>> in this KIP are “turned off” and not reported in > >>>>> ApiVersionsResponse. > >>>>>>>>>>>>>>> The client will not > >>>>>>>>>>>>>>>>> attempt to push metrics. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 120. The error handling table lists the error codes for > >>>>>>>>>>>>>>> PushTelemetryResponse. I’ve added one > >>>>>>>>>>>>>>>>> but it looked good to me. GetTelemetrySubscriptions > >> doesn’t > >>>>> have any > >>>>>>>>>>>>>>> error codes, since the > >>>>>>>>>>>>>>>>> situation in which the client telemetry is not supported > >> is > >>>>> handled by > >>>>>>>>>>>>>>> the RPCs not being offered > >>>>>>>>>>>>>>>>> by the broker. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 121. Again, I think it’s probably a mistake to be > specific > >>>>> about > >>>>>>>>>>>>>>> threading. Removed. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 122. Good catch. For DescribeConfigs, the ACL operation > >>>>> should be > >>>>>>>>>>>>>>>>> “DESCRIBE_CONFIGS”. For AlterConfigs, the ACL operation > >>>>> should be > >>>>>>>>>>>>>>>>> “ALTER” (not “WRITE” as it said). The checks are made on > >> the > >>>>> CLUSTER > >>>>>>>>>>>>>>>>> resource. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thanks for the detailed review. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>> Andrew > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 110. Another potential motivation is the multiple > clients > >>>>> support. > >>>>>>>>>>>>>>> Some of > >>>>>>>>>>>>>>>>>> the places may not have good monitoring support for > >>> non-java > >>>>> clients. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 111. OpenTelemetry Naming: We replace '-' with '.' for > >>>>> metric name > >>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>> replace '-' with '_' for attributes. Why is the > >>>>> inconsistency? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 112. OTLP specification: Page is not found from the > link. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 113. "Defining standard and required metrics makes the > >>>>> monitoring and > >>>>>>>>>>>>>>>>>> troubleshooting of clients from various client types ": > >>>>> Incomplete > >>>>>>>>>>>>>>> sentence. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 114. standard/required metrics > >>>>>>>>>>>>>>>>>> 114.1 Do other clients need to implement those metrics > >> with > >>>>> the exact > >>>>>>>>>>>>>>> same > >>>>>>>>>>>>>>>>>> names? > >>>>>>>>>>>>>>>>>> 114.2 What happens if some of those metrics are missing > >>> from > >>>>> a > >>>>>>>>>>>>> client? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 115. "KafkaConsumer: both the "heart beat" and > >> application > >>>>> threads": > >>>>>>>>>>>>> We > >>>>>>>>>>>>>>>>>> have an ongoing effort to refactor the consumer > threading > >>>>> model ( > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>> > >>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+threading+refactor+design > >>>>>>>>>>>>>>> ). > >>>>>>>>>>>>>>>>>> Once this is done, PRC requests will only be made from > >> the > >>>>> background > >>>>>>>>>>>>>>>>>> thread. Should this KIP follow the new model only? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 116. 'The metrics should contain the reason for the > >> client > >>>>>>>>>>>>> termination > >>>>>>>>>>>>>>> by > >>>>>>>>>>>>>>>>>> including the client.terminating metric with the label > >>>>> “reason” ...'. > >>>>>>>>>>>>>>> Hmm, > >>>>>>>>>>>>>>>>>> are we introducing a new metric client.terminating? If > >> so, > >>>>> that needs > >>>>>>>>>>>>>>> to be > >>>>>>>>>>>>>>>>>> explicitly listed. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 117. "As the metrics plugin may need to add additional > >>>>> metrics on top > >>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>> this the generic metrics receiver in the broker will not > >>> add > >>>>> these > >>>>>>>>>>>>>>> labels > >>>>>>>>>>>>>>>>>> but rely on the plugins to do so," The sentence doesn't > >>> read > >>>>> well. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 118. "it is possible for the client to send at most one > >>>>> accepted > >>>>>>>>>>>>>>>>>> out-of-profile per connection before the rate-limiter > >> kicks > >>>>> in": If > >>>>>>>>>>>>> we > >>>>>>>>>>>>>>> do > >>>>>>>>>>>>>>>>>> this, do we still need the Terminating flag in > >>>>>>>>>>>>> PushTelemetryRequestV0? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 119. "If there is no client metrics receiver plugin > >>>>> configured on the > >>>>>>>>>>>>>>>>>> broker, it will respond to > >> GetTelemetrySubscriptionsRequest > >>>>> with > >>>>>>>>>>>>>>>>>> RequestedMetrics set to Null and a -1 SubscriptionId. > The > >>>>> client > >>>>>>>>>>>>> should > >>>>>>>>>>>>>>>>>> send a new GetTelemetrySubscriptionsRequest after the > >>>>> PushIntervalMs > >>>>>>>>>>>>>>> has > >>>>>>>>>>>>>>>>>> expired. This allows the metrics receiver to be enabled > >> or > >>>>> disabled > >>>>>>>>>>>>>>> without > >>>>>>>>>>>>>>>>>> having to restart the broker or reset the client > >>> connection." > >>>>>>>>>>>>>>>>>> "no client metrics receiver plugin configured" is > defined > >>> by > >>>>> no > >>>>>>>>>>>>> metric > >>>>>>>>>>>>>>>>>> reporter implementing the ClientTelemetry interface, > >> right? > >>>>> In that > >>>>>>>>>>>>>>> case, > >>>>>>>>>>>>>>>>>> it would be useful to avoid the clients sending > >>>>>>>>>>>>>>>>>> GetTelemetrySubscriptionsRequest periodically to > preserve > >>>>> the current > >>>>>>>>>>>>>>>>>> behavior. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 120. GetTelemetrySubscriptionsResponseV0 and > >>>>> PushTelemetryRequestV0: > >>>>>>>>>>>>>>> Could > >>>>>>>>>>>>>>>>>> we list error codes for each? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 121. "ClientTelemetryReceiver.ClientTelemetryReceiver > >> This > >>>>> method may > >>>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>>> called from the request handling thread": Where else can > >>>>> this method > >>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>>> called? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 122. DescribeConfigs/AlterConfigs already exist. Are we > >>>>> changing the > >>>>>>>>>>>>>>> ACL? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Jun > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Mon, Jul 31, 2023 at 4:33 AM Andrew Schofield < > >>>>>>>>>>>>>>>>>> andrew_schofield_j...@outlook.com> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Hi Milind, > >>>>>>>>>>>>>>>>>>> Thanks for your question. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On reflection, I agree that INVALID_RECORD is most > >> likely > >>>>> to be > >>>>>>>>>>>>>>> caused by a > >>>>>>>>>>>>>>>>>>> problem in the serialization in the client. I have > >> changed > >>>>> the > >>>>>>>>>>>>> client > >>>>>>>>>>>>>>>>>>> action in this case > >>>>>>>>>>>>>>>>>>> to “Log an error and stop pushing metrics”. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I have updated the KIP text accordingly. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>> Andrew > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On 31 Jul 2023, at 12:09, Milind Luthra > >>>>>>>>>>>>>>> <milut...@confluent.io.INVALID> > >>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Hi Andrew, > >>>>>>>>>>>>>>>>>>>> Thanks for the clarifications. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> About 2b: > >>>>>>>>>>>>>>>>>>>> In case a client has a bug while serializing, it might > >> be > >>>>> difficult > >>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> client to recover from that without code changes. In > >>> that, > >>>>> it might > >>>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>>>> good > >>>>>>>>>>>>>>>>>>>> to just log the INVALID_RECORD as an error, and treat > >> the > >>>>> error as > >>>>>>>>>>>>>>> fatal > >>>>>>>>>>>>>>>>>>>> for the client (only fatal in terms of sending the > >>>>> metrics, the > >>>>>>>>>>>>>>> client > >>>>>>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>>>>> keep functioning otherwise). What do you think? > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Thanks > >>>>>>>>>>>>>>>>>>>> Milind > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Mon, Jul 24, 2023 at 8:18 PM Andrew Schofield < > >>>>>>>>>>>>>>>>>>>> andrew_schofield_j...@outlook.com> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Hi Milind, > >>>>>>>>>>>>>>>>>>>>> Thanks for your questions about the KIP. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> 1) I did some archaeology and looked at historical > >>>>> versions of the > >>>>>>>>>>>>>>> KIP. > >>>>>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>>>> think this is > >>>>>>>>>>>>>>>>>>>>> just a mistake. 5 minutes is the default metric push > >>>>> interval. 30 > >>>>>>>>>>>>>>>>>>> minutes > >>>>>>>>>>>>>>>>>>>>> is a mystery > >>>>>>>>>>>>>>>>>>>>> to me. I’ve updated the KIP. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> 2) I think there are two situations in which > >>>>> INVALID_RECORD might > >>>>>>>>>>>>>>> occur. > >>>>>>>>>>>>>>>>>>>>> a) The client might perhaps be using a content-type > >> that > >>>>> the > >>>>>>>>>>>>> broker > >>>>>>>>>>>>>>> does > >>>>>>>>>>>>>>>>>>>>> not support. > >>>>>>>>>>>>>>>>>>>>> The KIP mentions content-type as a future extension, > >> but > >>>>> there’s > >>>>>>>>>>>>>>> only > >>>>>>>>>>>>>>>>>>> one > >>>>>>>>>>>>>>>>>>>>> supported > >>>>>>>>>>>>>>>>>>>>> to start with. Until we have multiple content-types, > >>> this > >>>>> seems > >>>>>>>>>>>>> out > >>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>>>>> scope. I think a > >>>>>>>>>>>>>>>>>>>>> future KIP would add another error code for this. > >>>>>>>>>>>>>>>>>>>>> b) The client might perhaps have a bug which means > the > >>>>> metrics > >>>>>>>>>>>>>>> payload > >>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>> malformed. > >>>>>>>>>>>>>>>>>>>>> Logging a warning and attempting the next metrics > push > >>> on > >>>>> the push > >>>>>>>>>>>>>>>>>>>>> interval seems > >>>>>>>>>>>>>>>>>>>>> appropriate. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> UNKNOWN_SUBSCRIPTION_ID would indeed be handled by > >>> making > >>>>> an > >>>>>>>>>>>>>>> immediate > >>>>>>>>>>>>>>>>>>>>> GetTelemetrySubscriptionsRequest. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> UNSUPPORTED_COMPRESSION_TYPE seems like either a > >> client > >>>>> bug or > >>>>>>>>>>>>>>> perhaps > >>>>>>>>>>>>>>>>>>>>> a situation in which a broker sends a compression > type > >>> in > >>>>> a > >>>>>>>>>>>>>>>>>>>>> GetTelemetrySubscriptionsResponse > >>>>>>>>>>>>>>>>>>>>> which is subsequently not supported when its used > >> with a > >>>>>>>>>>>>>>>>>>>>> PushTelemetryRequest. > >>>>>>>>>>>>>>>>>>>>> We do want the client to have the opportunity to get > >> an > >>>>> up-to-date > >>>>>>>>>>>>>>> list > >>>>>>>>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>>>>> supported > >>>>>>>>>>>>>>>>>>>>> compression types. I think an immediate > >>>>>>>>>>>>>>> GetTelemetrySubscriptionsRequest > >>>>>>>>>>>>>>>>>>>>> is appropriate. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> 3) If a client attempts a subsequent handshake with a > >>> Null > >>>>>>>>>>>>>>>>>>>>> ClientInstanceId, the > >>>>>>>>>>>>>>>>>>>>> receiving broker may not already know the client's > >>>>> existing > >>>>>>>>>>>>>>>>>>>>> ClientInstanceId. If the > >>>>>>>>>>>>>>>>>>>>> receiving broker knows the existing ClientInstanceId, > >> it > >>>>> simply > >>>>>>>>>>>>>>> responds > >>>>>>>>>>>>>>>>>>>>> the existing > >>>>>>>>>>>>>>>>>>>>> value back to the client. If it does not know the > >>> existing > >>>>>>>>>>>>>>>>>>>>> ClientInstanceId, it will create > >>>>>>>>>>>>>>>>>>>>> a new client instance ID and respond with that. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> I will update the KIP with these clarifications. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>>>> Andrew > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> On 17 Jul 2023, at 14:21, Milind Luthra > >>>>>>>>>>>>>>> <milut...@confluent.io.INVALID > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Hi Andrew, thanks for this KIP. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> I had a few questions regarding the "Error handling" > >>>>> section. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> 1. It mentions that "The 5 and 30 minute retries are > >> to > >>>>>>>>>>>>> eventually > >>>>>>>>>>>>>>>>>>>>> trigger > >>>>>>>>>>>>>>>>>>>>>> a retry and avoid having to restart clients if the > >>>>> cluster > >>>>>>>>>>>>> metrics > >>>>>>>>>>>>>>>>>>>>>> configuration is disabled temporarily, e.g., by > >>> operator > >>>>> error, > >>>>>>>>>>>>>>> rolling > >>>>>>>>>>>>>>>>>>>>>> upgrades, etc." > >>>>>>>>>>>>>>>>>>>>>> But this 30 min interval isn't mentioned anywhere > >> else. > >>>>> What is > >>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>>>>>>> referring to? > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> 2. For the actual errors: > >>>>>>>>>>>>>>>>>>>>>> INVALID_RECORD : The action required is to "Log a > >>>>> warning to the > >>>>>>>>>>>>>>>>>>>>>> application and schedule the next > >>>>>>>>>>>>> GetTelemetrySubscriptionsRequest > >>>>>>>>>>>>>>> to 5 > >>>>>>>>>>>>>>>>>>>>>> minutes". Why is this 5 minutes, and not something > >> like > >>>>>>>>>>>>>>> PushIntervalMs? > >>>>>>>>>>>>>>>>>>>>> And > >>>>>>>>>>>>>>>>>>>>>> also, why are we scheduling a > >>>>> GetTelemetrySubscriptionsRequest in > >>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>>>>>> case, if the serialization is broken? > >>>>>>>>>>>>>>>>>>>>>> UNKNOWN_SUBSCRIPTION_ID , > >> UNSUPPORTED_COMPRESSION_TYPE > >>> : > >>>>> just to > >>>>>>>>>>>>>>>>>>> confirm, > >>>>>>>>>>>>>>>>>>>>>> the GetTelemetrySubscriptionsRequest needs to be > >>>>> scheduled > >>>>>>>>>>>>>>> immediately > >>>>>>>>>>>>>>>>>>>>>> after the PushTelemetry response, correct? > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> 3. For "Subsequent GetTelemetrySubscriptionsRequests > >>>>> must include > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>> ClientInstanceId returned in the first response, > >>>>> regardless of > >>>>>>>>>>>>>>> broker": > >>>>>>>>>>>>>>>>>>>>>> Will a broker error be returned in case some > >>>>> implementation of > >>>>>>>>>>>>>>> this KIP > >>>>>>>>>>>>>>>>>>>>>> violates this accidentally and sends a request with > >>>>>>>>>>>>>>> ClientInstanceId = > >>>>>>>>>>>>>>>>>>>>> Null > >>>>>>>>>>>>>>>>>>>>>> even when it's been obtained already? Or will a new > >>>>>>>>>>>>>>> ClientInstanceId be > >>>>>>>>>>>>>>>>>>>>>> returned without an error? > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Thanks! > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 13, 2023 at 8:38 PM Andrew Schofield < > >>>>>>>>>>>>>>>>>>>>>> andrew_schofield_j...@outlook.com> wrote: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>>>>>>> I would like to start a new discussion thread on > >>>>> KIP-714: Client > >>>>>>>>>>>>>>>>>>> metrics > >>>>>>>>>>>>>>>>>>>>>>> and observability. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>> > >>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> I have edited the proposal significantly to reduce > >> the > >>>>> scope. > >>>>>>>>>>>>> The > >>>>>>>>>>>>>>>>>>>>> overall > >>>>>>>>>>>>>>>>>>>>>>> mechanism for client metric subscriptions is > >>> unchanged, > >>>>> but the > >>>>>>>>>>>>>>>>>>>>>>> KIP is now based on the existing client metrics, > >>> rather > >>>>> than > >>>>>>>>>>>>>>>>>>> introducing > >>>>>>>>>>>>>>>>>>>>>>> new metrics. The purpose remains helping cluster > >>>>> operators > >>>>>>>>>>>>>>>>>>>>>>> investigate performance problems experienced by > >>> clients > >>>>> without > >>>>>>>>>>>>>>>>>>>>> requiring > >>>>>>>>>>>>>>>>>>>>>>> changes to the client application code or > >>> configuration. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>>>>>> Andrew > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > >