Hi, Kirk, Sarat, A few more comments.
40. GetTelemetrySubscriptionsResponseV0 : RequestedMetrics Array[string] uses "Array[0] empty string" to represent all metrics subscribed. We had a similar issue with the topics field in MetadataRequest and used the following convention. In version 1 and higher, an empty array indicates "request metadata for no topics," and a null array is used to indicate "request metadata for all topics." Should we use the same convention in GetTelemetrySubscriptionsResponseV0? 41. We include CompressionType in PushTelemetryRequestV0, but not in ClientTelemetryPayload. How would the implementer know the compression type for the telemetry payload? 42. For blocking the metrics for certain clients in the following example, could you describe the corresponding config value used through the kafka-config command? kafka-client-metrics.sh --bootstrap-server $BROKERS \ --add \ --name 'Disabe_b69cc35a' \ # A descriptive name makes it easier to clean up old subscriptions. --match client_instance_id=b69cc35a-7a54-4790-aa69-cc2bd4ee4538 \ # Match this specific client instance --block Thanks, Jun On Thu, Mar 10, 2022 at 11:57 AM Jun Rao <j...@confluent.io> wrote: > Hi, Kirk, Sarat, > > Thanks for the reply. > > 28. On the broker, we typically use Yammer metrics. Only for metrics that > depend on Kafka metric features (e.g., quota), we use the Kafka metric. > Yammer metrics have 4 types: gauge, meter, histogram and timer. meter > calculates a rate, but also exposes an accumulated value. > > 29. The Histogram class in org.apache.kafka.common.metrics.stats was never > used in the client metrics. The implementation of Histogram only provides a > fixed number of values in the domain and may not capture the quantiles very > accurately. So, we punted on using it. > > Thanks, > > Jun > > > > On Thu, Mar 10, 2022 at 10:59 AM Sarat Kakarla > <skaka...@confluent.io.invalid> wrote: > >> Jun, >> >> >> 28. For the broker metrics, could you spell out the full metric name >> >> including groups, tags, etc? We typically don't add the broker_id >> label for >> >> broker metrics. Also, brokers use Yammer metrics, which doesn't >> have type >> >> Sum. >> >> Sure, I will update the KIP-714 with the above information, will remove >> the broker-id label from the metrics. >> >> Regarding the type is CumulativeSum the right type to use in the place of >> Sum? >> >> Thanks >> Sarat >> >> >> On 3/8/22, 5:48 PM, "Jun Rao" <j...@confluent.io.INVALID> wrote: >> >> Hi, Magnus, Sarat and Xavier, >> >> Thanks for the reply. A few more comments below. >> >> 20. It seems that we are piggybacking the plugin on the >> existing MetricsReporter. So, this seems fine. >> >> 21. That could work. Are we requiring any additional jar dependency >> on the >> client? Or, are you suggesting that we check the runtime dependency >> to pick >> the compression codec? >> >> 28. For the broker metrics, could you spell out the full metric name >> including groups, tags, etc? We typically don't add the broker_id >> label for >> broker metrics. Also, brokers use Yammer metrics, which doesn't have >> type >> Sum. >> >> 29. There are several client metrics listed as histogram. However, >> the java >> client currently doesn't support histogram type. >> >> 30. Could you show an example of the metric payload in >> PushTelemetryRequest >> to help understand how we organize metrics at different levels (per >> instance, per topic, per partition, per broker, etc)? >> >> 31. Could you add a bit more detail on which client thread sends the >> PushTelemetryRequest? >> >> Thanks, >> >> Jun >> >> On Mon, Mar 7, 2022 at 11:48 AM Magnus Edenhill <mag...@edenhill.se> >> wrote: >> >> > Hi Jun, >> > >> > thanks for your initiated questions, see my answers below. >> > There's been a number of clarifications to the KIP. >> > >> > >> > >> > Den tors 27 jan. 2022 kl 20:08 skrev Jun Rao >> <j...@confluent.io.invalid>: >> > >> > > Hi, Magnus, >> > > >> > > Thanks for updating the KIP. The overall approach makes sense to >> me. A >> > few >> > > more detailed comments below. >> > > >> > > 20. ClientTelemetry: Should it be extending configurable and >> closable? >> > > >> > >> > I'll pass this question to Sarat and/or Xavier. >> > >> > >> > >> > > 21. Compression of the metrics on the client: what's the default? >> > > >> > >> > How about we specify a prioritized list: zstd, lz4, snappy, gzip? >> > But ultimately it is up to what the client supports. >> > >> > >> > 23. A client instance is considered a metric resource and the >> > > resource-level (thus client instance level) labels could include: >> > > client_software_name=confluent-kafka-python >> > > client_software_version=v2.1.3 >> > > client_instance_id=B64CD139-3975-440A-91D4 >> > > transactional_id=someTxnApp >> > > Are those labels added in PushTelemetryRequest? If so, are they >> per >> > metric >> > > or per request? >> > > >> > >> > >> > client_software* and client_instance_id are not added by the >> client, but >> > available to >> > the broker-side metrics plugin for adding as it see fits, remove >> them from >> > the KIP. >> > >> > As for transactional_id, group_id, etc, which I believe will be >> useful in >> > troubleshooting, >> > are included only once (per push) as resource-level attributes (the >> client >> > instance is a singular resource). >> > >> > >> > > >> > > 24. "the broker will only send >> > > GetTelemetrySubscriptionsResponse.DeltaTemporality=True" : >> > > 24.1 If it's always true, does it need to be part of the protocol? >> > > >> > >> > We're anticipating that it will take a lot longer to upgrade the >> majority >> > of clients than the >> > broker/plugin side, which is why we want the client to support both >> > temporalities out-of-the-box >> > so that cumulative reporting can be turned on seamlessly in the >> future. >> > >> > >> > >> > > 24.2 Does delta only apply to Counter type? >> > > >> > >> > >> > And Histograms. More details in Xavier's OTLP link. >> > >> > >> > >> > > 24.3 In the delta representation, the first request needs to send >> the >> > full >> > > value, how does the broker plugin know whether a value is full or >> delta? >> > > >> > >> > The client may (should) send the start time for each metric sample, >> > indicating when >> > the metric began to be collected. >> > We've discussed whether this should be the client instance start >> time or >> > the time when a matching >> > metric subscription for that metric is received. >> > For completeness we recommend using the former, the client instance >> start >> > time. >> > >> > >> > >> > > 25. quota: >> > > 25.1 Since we are fitting PushTelemetryRequest into the existing >> request >> > > quota, it would be useful to document the impact, i.e. client >> metric >> > > throttling causes the data from the same client to be delayed. >> > > 25.2 Is PushTelemetryRequest subject to the write bandwidth quota >> like >> > the >> > > producer? >> > > >> > >> > >> > Yes, it should be, as to protect the cluster from rogue clients. >> > But, in practice the size of metrics will be quite low (e.g., >> 1-10kb per >> > 60s interval), so I don't think this will pose a problem. >> > The KIP has been updated with more details on quota/throttling >> behaviour, >> > see the >> > "Throttling and rate-limiting" section. >> > >> > >> > 25.3 THROTTLING_QUOTA_EXCEEDED: Currently, we don't send this error >> when >> > > the request/bandwidth quota is exceeded since those requests are >> not >> > > rejected. We only set this error when the request is rejected >> (e.g., >> > topic >> > > creation). It would be useful to clarify when this error is used. >> > > >> > >> > Right, I was trying to reuse an existing error-code. We can >> introduce >> > a new one for the case where a client pushes metrics at a higher >> frequency >> > than the >> > than the configured push interval (e.g., out-of-profile sends). >> > This causes the broker to drop those metrics and send this error >> code back >> > to the client. There will be no connection throttling / >> channel-muting in >> > this >> > case (unless the standard quotas are exceeded). >> > >> > >> > > 27. kafka-client-metrics.sh: Could we add an example on how to >> disable a >> > > bad client? >> > > >> > >> > There's now a --block option to kafka-client-metrics.sh which >> overrides all >> > subscriptions >> > for the matched client(s). This allows silencing metrics for one or >> more >> > clients without having >> > to remove existing subscriptions. From the client's perspective it >> will >> > look like it no longer has >> > any subscriptions. >> > >> > # Block metrics collection for a specific client instance >> > $ kafka-client-metrics.sh --bootstrap-server $BROKERS \ >> > --add \ >> > --name 'Disabe_b69cc35a' \ # A descriptive name makes it easier >> to >> > clean up old subscriptions. >> > --match client_instance_id=b69cc35a-7a54-4790-aa69-cc2bd4ee4538 >> \ # >> > Match this specific client instance >> > --block >> > >> > >> > >> > >> > > 28. New broker side metrics: Could we spell out the details of the >> > metrics >> > > (e.g., group, tags, etc)? >> > > >> > >> > KIP has been updated accordingly (thanks Sarat). >> > >> > >> > >> > > >> > > 29. Client instance-level metrics: client.io.wait.time is a gauge >> not a >> > > histogram. >> > > >> > >> > I believe a population/distribution should preferably be >> represented as a >> > histogram, space permitting, >> > and only secondarily as a Gauge average. >> > While we might not want to maintain a bunch of histograms for each >> > partition, since that could be >> > quite space consuming, this client.io.wait.time is a single metric >> per >> > client instance and can >> > thus afford a Histogram representation. >> > >> > >> > >> > Thanks, >> > Magnus >> > >> > >> > >> > > Thanks, >> > > >> > > Jun >> > > >> > > On Wed, Jan 26, 2022 at 6:32 AM Magnus Edenhill < >> mag...@edenhill.se> >> > > wrote: >> > > >> > > > Hi all, >> > > > >> > > > I've updated the KIP with responses to the latest comments: >> Java client >> > > > dependencies (Thanks Kirk!), alternate designs (separate >> cluster, >> > > separate >> > > > producer, etc), etc. >> > > > >> > > > I will revive the vote thread. >> > > > >> > > > Thanks, >> > > > Magnus >> > > > >> > > > >> > > > Den mån 13 dec. 2021 kl 22:32 skrev Ryanne Dolan < >> > ryannedo...@gmail.com >> > > >: >> > > > >> > > > > I think we should be very careful about introducing new >> runtime >> > > > > dependencies into the clients. Historically this has been >> rare and >> > > > > essentially necessary (e.g. compression libs). >> > > > > >> > > > > Ryanne >> > > > > >> > > > > On Mon, Dec 13, 2021, 1:06 PM Kirk True < >> k...@mustardgrain.com> >> > wrote: >> > > > > >> > > > > > Hi Jun, >> > > > > > >> > > > > > On Thu, Dec 9, 2021, at 2:28 PM, Jun Rao wrote: >> > > > > > > 13. Using OpenTelemetry. Does that require runtime >> dependency >> > > > > > > on OpenTelemetry library? How good is the compatibility >> story >> > > > > > > of OpenTelemetry? This is important since an application >> could >> > have >> > > > > other >> > > > > > > OpenTelemetry dependencies than the Kafka client. >> > > > > > >> > > > > > The current design is that the OpenTelemetry JARs would >> ship with >> > the >> > > > > > client. Perhaps we can design the client such that the JARs >> aren't >> > > even >> > > > > > loaded if the user has opted out. The user could even >> exclude the >> > > JARs >> > > > > from >> > > > > > their dependencies if they so wished. >> > > > > > >> > > > > > I can't speak to the compatibility of the libraries. Is it >> possible >> > > > that >> > > > > > we include a shaded version? >> > > > > > >> > > > > > Thanks, >> > > > > > Kirk >> > > > > > >> > > > > > > >> > > > > > > 14. The proposal listed idempotence=true. This is more of >> a >> > > > > configuration >> > > > > > > than a metric. Are we including that as a metric? What >> other >> > > > > > configurations >> > > > > > > are we including? Should we separate the configurations >> from the >> > > > > metrics? >> > > > > > > >> > > > > > > Thanks, >> > > > > > > >> > > > > > > Jun >> > > > > > > >> > > > > > > On Mon, Nov 29, 2021 at 7:34 AM Magnus Edenhill < >> > > mag...@edenhill.se> >> > > > > > wrote: >> > > > > > > >> > > > > > > > Hey Bob, >> > > > > > > > >> > > > > > > > That's a good point. >> > > > > > > > >> > > > > > > > Request type labels were considered but since they're >> already >> > > > tracked >> > > > > > by >> > > > > > > > broker-side metrics >> > > > > > > > they were left out as to avoid metric duplication, >> however >> > those >> > > > > > metrics >> > > > > > > > are not per connection, >> > > > > > > > so they won't be that useful in practice for >> troubleshooting >> > > > specific >> > > > > > > > client instances. >> > > > > > > > >> > > > > > > > I'll add the request_type label to the relevant metrics. >> > > > > > > > >> > > > > > > > Thanks, >> > > > > > > > Magnus >> > > > > > > > >> > > > > > > > >> > > > > > > > Den tis 23 nov. 2021 kl 19:20 skrev Bob Barrett >> > > > > > > > <bob.barr...@confluent.io.invalid>: >> > > > > > > > >> > > > > > > > > Hi Magnus, >> > > > > > > > > >> > > > > > > > > Thanks for the thorough KIP, this seems very useful. >> > > > > > > > > >> > > > > > > > > Would it make sense to include the request type as a >> label >> > for >> > > > the >> > > > > > > > > `client.request.success`, `client.request.errors` and >> > > > > > > > `client.request.rtt` >> > > > > > > > > metrics? I think it would be very useful to see which >> > specific >> > > > > > requests >> > > > > > > > are >> > > > > > > > > succeeding and failing for a client. One specific >> case I can >> > > > think >> > > > > of >> > > > > > > > where >> > > > > > > > > this could be useful is producer batch timeouts. If a >> Java >> > > > > > application >> > > > > > > > does >> > > > > > > > > not enable producer client logs (unfortunately, in my >> > > experience >> > > > > this >> > > > > > > > > happens more often than it should), the application >> logs will >> > > > only >> > > > > > > > contain >> > > > > > > > > the expiration error message, but no information >> about what >> > is >> > > > > > causing >> > > > > > > > the >> > > > > > > > > timeout. The requests might all be succeeding but >> taking too >> > > long >> > > > > to >> > > > > > > > > process batches, or metadata requests might be >> failing, or >> > some >> > > > or >> > > > > > all >> > > > > > > > > produce requests might be failing (if the bootstrap >> servers >> > are >> > > > > > reachable >> > > > > > > > > from the client but one or more other brokers are >> not, for >> > > > > example). >> > > > > > If >> > > > > > > > the >> > > > > > > > > cluster operator is able to identify the specific >> requests >> > that >> > > > are >> > > > > > slow >> > > > > > > > or >> > > > > > > > > failing for a client, they will be better able to >> diagnose >> > the >> > > > > issue >> > > > > > > > > causing batch timeouts. >> > > > > > > > > >> > > > > > > > > One drawback I can think of is that this will >> increase the >> > > > > > cardinality of >> > > > > > > > > the request metrics. But any given client is only >> going to >> > use >> > > a >> > > > > > small >> > > > > > > > > subset of the request types, and since we already have >> > > partition >> > > > > > labels >> > > > > > > > for >> > > > > > > > > the topic-level metrics, I think request labels will >> still >> > make >> > > > up >> > > > > a >> > > > > > > > > relatively small percentage of the set of metrics. >> > > > > > > > > >> > > > > > > > > Thanks, >> > > > > > > > > Bob >> > > > > > > > > >> > > > > > > > > On Mon, Nov 22, 2021 at 2:08 AM Viktor Somogyi-Vass < >> > > > > > > > > viktorsomo...@gmail.com> >> > > > > > > > > wrote: >> > > > > > > > > >> > > > > > > > > > Hi Magnus, >> > > > > > > > > > >> > > > > > > > > > I think this is a very useful addition. We also >> have a >> > > similar >> > > > > (but >> > > > > > > > much >> > > > > > > > > > more simplistic) implementation of this. Maybe I >> missed it >> > in >> > > > the >> > > > > > KIP >> > > > > > > > but >> > > > > > > > > > what about adding metrics about the subscription >> cache >> > > itself? >> > > > > > That I >> > > > > > > > > think >> > > > > > > > > > would improve its usability and debuggability as >> we'd be >> > able >> > > > to >> > > > > > see >> > > > > > > > its >> > > > > > > > > > performance, hit/miss rates, eviction counts and >> others. >> > > > > > > > > > >> > > > > > > > > > Best, >> > > > > > > > > > Viktor >> > > > > > > > > > >> > > > > > > > > > On Thu, Nov 18, 2021 at 5:12 PM Magnus Edenhill < >> > > > > > mag...@edenhill.se> >> > > > > > > > > > wrote: >> > > > > > > > > > >> > > > > > > > > > > Hi Mickael, >> > > > > > > > > > > >> > > > > > > > > > > see inline. >> > > > > > > > > > > >> > > > > > > > > > > Den ons 10 nov. 2021 kl 15:21 skrev Mickael >> Maison < >> > > > > > > > > > > mickael.mai...@gmail.com >> > > > > > > > > > > >: >> > > > > > > > > > > >> > > > > > > > > > > > Hi Magnus, >> > > > > > > > > > > > >> > > > > > > > > > > > I see you've addressed some of the points I >> raised >> > above >> > > > but >> > > > > > some >> > > > > > > > (4, >> > > > > > > > > > > > 5) have not been addressed yet. >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > Re 4) How will the user/app know metrics are >> being sent. >> > > > > > > > > > > >> > > > > > > > > > > One possibility is to add a JMX metric (thus for >> user >> > > > > > consumption) >> > > > > > > > for >> > > > > > > > > > the >> > > > > > > > > > > number of metric pushes the >> > > > > > > > > > > client has performed, or perhaps the number of >> metrics >> > > > > > subscriptions >> > > > > > > > > > > currently being collected. >> > > > > > > > > > > Would that be sufficient? >> > > > > > > > > > > >> > > > > > > > > > > Re 5) Metric sizes and rates >> > > > > > > > > > > >> > > > > > > > > > > A worst case scenario for a producer that is >> producing to >> > > 50 >> > > > > > unique >> > > > > > > > > > topics >> > > > > > > > > > > and emitting all standard metrics yields >> > > > > > > > > > > a serialized size of around 100KB prior to >> compression, >> > > which >> > > > > > > > > compresses >> > > > > > > > > > > down to about 20-30% of that depending >> > > > > > > > > > > on compression type and topic name uniqueness. >> > > > > > > > > > > The numbers for a consumer would be similar. >> > > > > > > > > > > >> > > > > > > > > > > In practice the number of unique topics would be >> far >> > less, >> > > > and >> > > > > > the >> > > > > > > > > > > subscription set would typically be for a subset >> of >> > > metrics. >> > > > > > > > > > > So we're probably closer to 1kb, or less, >> compressed size >> > > per >> > > > > > client >> > > > > > > > > per >> > > > > > > > > > > push interval. >> > > > > > > > > > > >> > > > > > > > > > > As both the subscription set and push intervals >> are >> > > > controlled >> > > > > > by the >> > > > > > > > > > > cluster operator it shouldn't be too hard >> > > > > > > > > > > to strike a good balance between metrics overhead >> and >> > > > > > granularity. >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > I'm really uneasy with this being enabled by >> default on >> > > the >> > > > > > client >> > > > > > > > > > > > side. When collecting data, I think the best >> practice >> > is >> > > to >> > > > > > ensure >> > > > > > > > > > > > users are explicitly enabling it. >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > Requiring metrics to be explicitly enabled on >> clients >> > > > severely >> > > > > > > > cripples >> > > > > > > > > > its >> > > > > > > > > > > usability and value. >> > > > > > > > > > > >> > > > > > > > > > > One of the problems that this KIP aims to solve >> is for >> > > useful >> > > > > > metrics >> > > > > > > > > to >> > > > > > > > > > be >> > > > > > > > > > > available on demand >> > > > > > > > > > > regardless of the technical expertise of the >> user. As >> > > Ryanne >> > > > > > points, >> > > > > > > > > out >> > > > > > > > > > a >> > > > > > > > > > > savvy user/organization >> > > > > > > > > > > will typically have metrics collection and >> monitoring in >> > > > place >> > > > > > > > already, >> > > > > > > > > > and >> > > > > > > > > > > the benefits of this KIP >> > > > > > > > > > > are then more of a common set and format metrics >> across >> > > > client >> > > > > > > > > > > implementations and languages. >> > > > > > > > > > > But that is not the typical Kafka user in my >> experience, >> > > > > they're >> > > > > > not >> > > > > > > > > > Kafka >> > > > > > > > > > > experts and they don't have the >> > > > > > > > > > > knowledge of how to best instrument their clients. >> > > > > > > > > > > Having metrics enabled by default for this user >> base >> > allows >> > > > the >> > > > > > Kafka >> > > > > > > > > > > operators to proactively and reactively >> > > > > > > > > > > monitor and troubleshoot client issues, without >> the need >> > > for >> > > > > the >> > > > > > less >> > > > > > > > > > savvy >> > > > > > > > > > > user to do anything. >> > > > > > > > > > > It is often too late to tell a user to enable >> metrics >> > when >> > > > the >> > > > > > > > problem >> > > > > > > > > > has >> > > > > > > > > > > already occurred. >> > > > > > > > > > > >> > > > > > > > > > > Now, to be clear, even though metrics are enabled >> by >> > > default >> > > > on >> > > > > > > > clients >> > > > > > > > > > it >> > > > > > > > > > > is not enabled by default >> > > > > > > > > > > on the brokers; the Kafka operator needs to build >> and set >> > > up >> > > > a >> > > > > > > > metrics >> > > > > > > > > > > plugin and add metrics subscriptions >> > > > > > > > > > > before anything is sent from the client. >> > > > > > > > > > > It is opt-out on the clients and opt-in on the >> broker. >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > > You mentioned brokers already have >> > > > > > > > > > > > some(most?) of the information contained in >> metrics, if >> > > so >> > > > > > then why >> > > > > > > > > > > > are we collecting it again? Surely there must >> be some >> > new >> > > > > > > > information >> > > > > > > > > > > > in the client metrics. >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > From the user's perspective the Kafka >> infrastructure >> > > extends >> > > > > from >> > > > > > > > > > > producer.send() to >> > > > > > > > > > > messages being returned from consumer.poll(), a >> giant >> > black >> > > > box >> > > > > > where >> > > > > > > > > > > there's a lot going on between those >> > > > > > > > > > > two points. The brokers currently only see what >> happens >> > > once >> > > > > > those >> > > > > > > > > > requests >> > > > > > > > > > > and messages hits the broker, >> > > > > > > > > > > but as Kafka clients are complex pieces of >> machinery >> > > there's >> > > > a >> > > > > > myriad >> > > > > > > > > of >> > > > > > > > > > > queues, timers, and state >> > > > > > > > > > > that's critical to the operation and >> infrastructure >> > that's >> > > > not >> > > > > > > > > currently >> > > > > > > > > > > visible to the operator. >> > > > > > > > > > > Relying on the user to accurately and timely >> provide this >> > > > > missing >> > > > > > > > > > > information is not generally feasible. >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > Most of the standard metrics listed in the KIP >> are data >> > > > points >> > > > > > that >> > > > > > > > the >> > > > > > > > > > > broker does not have. >> > > > > > > > > > > Only a small number of metrics are duplicates >> (like the >> > > > request >> > > > > > > > counts >> > > > > > > > > > and >> > > > > > > > > > > sizes), but they are included >> > > > > > > > > > > to ease correlation when inspecting these client >> metrics. >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > > Moreover this is a brand new feature so it's >> even >> > harder >> > > to >> > > > > > justify >> > > > > > > > > > > > enabling it and forcing onto all our users. If >> disabled >> > > by >> > > > > > default, >> > > > > > > > > > > > it's relatively easy to enable in a new release >> if we >> > > > decide >> > > > > > to, >> > > > > > > > but >> > > > > > > > > > > > once enabled by default it's much harder to >> disable. >> > Also >> > > > > this >> > > > > > > > > feature >> > > > > > > > > > > > will apply to all future metrics we will add. >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > I think maturity of a feature implementation >> should be >> > the >> > > > > > deciding >> > > > > > > > > > factor, >> > > > > > > > > > > rather than >> > > > > > > > > > > the design of it (which this KIP is). I.e., if the >> > > > > > implementation is >> > > > > > > > > not >> > > > > > > > > > > deemed mature enough >> > > > > > > > > > > for release X.Y it will be disabled. >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > > Overall I think it's an interesting feature but >> I'd >> > > prefer >> > > > to >> > > > > > be >> > > > > > > > > > > > slightly defensive and see how it works in >> practice >> > > before >> > > > > > enabling >> > > > > > > > > it >> > > > > > > > > > > > everywhere. >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > Right, and I agree on being defensive, but since >> this >> > > feature >> > > > > > still >> > > > > > > > > > > requires manual >> > > > > > > > > > > enabling on the brokers before actually being >> used, I >> > think >> > > > > that >> > > > > > > > gives >> > > > > > > > > > > enough control >> > > > > > > > > > > to opt-in or out of this feature as needed. >> > > > > > > > > > > >> > > > > > > > > > > Thanks for your comments! >> > > > > > > > > > > >> > > > > > > > > > > Regards, >> > > > > > > > > > > Magnus >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > > Thanks, >> > > > > > > > > > > > Mickael >> > > > > > > > > > > > >> > > > > > > > > > > > On Mon, Nov 8, 2021 at 7:55 AM Magnus Edenhill < >> > > > > > mag...@edenhill.se >> > > > > > > > > >> > > > > > > > > > > wrote: >> > > > > > > > > > > > > >> > > > > > > > > > > > > Thanks David for pointing this out, >> > > > > > > > > > > > > I've updated the KIP to include client_id as a >> > matching >> > > > > > selector. >> > > > > > > > > > > > > >> > > > > > > > > > > > > Regards, >> > > > > > > > > > > > > Magnus >> > > > > > > > > > > > > >> > > > > > > > > > > > > Den tors 4 nov. 2021 kl 18:01 skrev David Mao >> > > > > > > > > > > <d...@confluent.io.invalid >> > > > > > > > > > > > >: >> > > > > > > > > > > > > >> > > > > > > > > > > > > > Hey Magnus, >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > I noticed that the KIP outlines the initial >> > selectors >> > > > > > supported >> > > > > > > > > as: >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > - client_instance_id - >> CLIENT_INSTANCE_ID UUID >> > > > string >> > > > > > > > > > > > representation. >> > > > > > > > > > > > > > - client_software_name - client software >> > > > > implementation >> > > > > > > > name. >> > > > > > > > > > > > > > - client_software_version - client >> software >> > > > > > implementation >> > > > > > > > > > > version. >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > In the given reactive monitoring workflow, >> we >> > mention >> > > > > that >> > > > > > the >> > > > > > > > > > > > application >> > > > > > > > > > > > > > user does not know their client's client >> instance >> > ID, >> > > > but >> > > > > > it's >> > > > > > > > > > > outlined >> > > > > > > > > > > > > > that the operator can add a metrics >> subscription >> > > > > selecting >> > > > > > for >> > > > > > > > > > > > clientId. I >> > > > > > > > > > > > > > don't see clientId as one of the supported >> > selectors. >> > > > > > > > > > > > > > I can see how this would have made sense in >> a >> > > previous >> > > > > > > > iteration >> > > > > > > > > > > given >> > > > > > > > > > > > that >> > > > > > > > > > > > > > the previous client instance ID proposal >> was to >> > > > construct >> > > > > > the >> > > > > > > > > > client >> > > > > > > > > > > > > > instance ID using clientId as a prefix. Now >> that >> > the >> > > > > client >> > > > > > > > > > instance >> > > > > > > > > > > > ID is >> > > > > > > > > > > > > > a UUID, would we want to add clientId as a >> > supported >> > > > > > selector? >> > > > > > > > > > > > > > Let me know what you think. >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > David >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > On Tue, Oct 19, 2021 at 12:33 PM Magnus >> Edenhill < >> > > > > > > > > > mag...@edenhill.se >> > > > > > > > > > > > >> > > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Hi Mickael! >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Den tis 19 okt. 2021 kl 19:30 skrev >> Mickael >> > Maison >> > > < >> > > > > > > > > > > > > > > mickael.mai...@gmail.com >> > > > > > > > > > > > > > > >: >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Hi Magnus, >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Thanks for the proposal. >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > 1. Looking at the protocol section, >> isn't >> > > > > > > > "ClientInstanceId" >> > > > > > > > > > > > expected >> > > > > > > > > > > > > > > > to be a field in >> > > > GetTelemetrySubscriptionsResponseV0? >> > > > > > > > > > Otherwise, >> > > > > > > > > > > > how >> > > > > > > > > > > > > > > > does a client retrieve this value? >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Good catch, it got removed by mistake in >> one of >> > the >> > > > > > edits. >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > 2. In the client API section, you >> mention a new >> > > > > method >> > > > > > > > > > > > > > > > "clientInstanceId()". Can you clarify >> which >> > > > > interfaces >> > > > > > are >> > > > > > > > > > > > affected? >> > > > > > > > > > > > > > > > Is it only Consumer and Producer? >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > And Admin. Will update the KIP. >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > 3. I'm a bit concerned this is enabled >> by >> > > default. >> > > > > > Even if >> > > > > > > > > the >> > > > > > > > > > > data >> > > > > > > > > > > > > > > > collected is supposed to be not >> sensitive, I >> > > think >> > > > > > this can >> > > > > > > > > be >> > > > > > > > > > > > > > > > problematic in some environments. Also >> users >> > > don't >> > > > > > seem to >> > > > > > > > > have >> > > > > > > > > > > the >> > > > > > > > > > > > > > > > choice to only expose some metrics. >> Knowing how >> > > > much >> > > > > > data >> > > > > > > > > > transit >> > > > > > > > > > > > > > > > through some applications can be >> considered >> > > > critical. >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > The broker already knows how much data >> transits >> > > > through >> > > > > > the >> > > > > > > > > > client >> > > > > > > > > > > > > > though, >> > > > > > > > > > > > > > > right? >> > > > > > > > > > > > > > > Care has been taken not to expose >> information in >> > > the >> > > > > > standard >> > > > > > > > > > > metrics >> > > > > > > > > > > > > > that >> > > > > > > > > > > > > > > might >> > > > > > > > > > > > > > > reveal sensitive information. >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Do you have an example of how the proposed >> > metrics >> > > > > could >> > > > > > leak >> > > > > > > > > > > > sensitive >> > > > > > > > > > > > > > > information? >> > > > > > > > > > > > > > > As for limiting the what metrics to >> export; I >> > guess >> > > > > that >> > > > > > > > could >> > > > > > > > > > make >> > > > > > > > > > > > sense >> > > > > > > > > > > > > > > in some >> > > > > > > > > > > > > > > very sensitive use-cases, but those users >> might >> > > > disable >> > > > > > > > metrics >> > > > > > > > > > > > > > altogether >> > > > > > > > > > > > > > > for now. >> > > > > > > > > > > > > > > Could these concerns be addressed by a >> later KIP? >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > 4. As a user, how do you know if your >> > application >> > > > is >> > > > > > > > actively >> > > > > > > > > > > > sending >> > > > > > > > > > > > > > > > metrics? Are there new metrics exposing >> what's >> > > > going >> > > > > > on, >> > > > > > > > like >> > > > > > > > > > how >> > > > > > > > > > > > much >> > > > > > > > > > > > > > > > data is being sent? >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > That's a good question. >> > > > > > > > > > > > > > > Since the proposed metrics interface is >> not aimed >> > > at, >> > > > > or >> > > > > > > > > directly >> > > > > > > > > > > > > > available >> > > > > > > > > > > > > > > to, the application >> > > > > > > > > > > > > > > I guess there's little point of adding it >> here, >> > but >> > > > > > instead >> > > > > > > > > > adding >> > > > > > > > > > > > > > > something to the >> > > > > > > > > > > > > > > existing JMX metrics? >> > > > > > > > > > > > > > > Do you have any suggestions? >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > 5. If all metrics are enabled on a >> regular >> > > Consumer >> > > > > or >> > > > > > > > > > Producer, >> > > > > > > > > > > do >> > > > > > > > > > > > > > > > you have an idea how much throughput >> this would >> > > > use? >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > It depends on the number of >> partition/topics/etc >> > > the >> > > > > > client >> > > > > > > > is >> > > > > > > > > > > > producing >> > > > > > > > > > > > > > > to/consuming from. >> > > > > > > > > > > > > > > I'll add some sizes to the KIP for some >> typical >> > > > > > use-cases. >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Thanks, >> > > > > > > > > > > > > > > Magnus >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Thanks >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > On Tue, Oct 19, 2021 at 5:06 PM Magnus >> > Edenhill < >> > > > > > > > > > > > mag...@edenhill.se> >> > > > > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Den tis 19 okt. 2021 kl 13:22 skrev >> Tom >> > > Bentley < >> > > > > > > > > > > > tbent...@redhat.com >> > > > > > > > > > > > > > >: >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Hi Magnus, >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > I reviewed the KIP since you called >> the >> > vote >> > > > > > (sorry for >> > > > > > > > > not >> > > > > > > > > > > > > > reviewing >> > > > > > > > > > > > > > > > when >> > > > > > > > > > > > > > > > > > you announced your intention to >> call the >> > > > vote). I >> > > > > > have >> > > > > > > > a >> > > > > > > > > > few >> > > > > > > > > > > > > > > questions >> > > > > > > > > > > > > > > > on >> > > > > > > > > > > > > > > > > > some of the details. >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > 1. There's no Javadoc on >> > > > > > ClientTelemetryPayload.data(), >> > > > > > > > > so >> > > > > > > > > > I >> > > > > > > > > > > > don't >> > > > > > > > > > > > > > > know >> > > > > > > > > > > > > > > > > > whether the payload is exposed >> through this >> > > > > method >> > > > > > as >> > > > > > > > > > > > compressed or >> > > > > > > > > > > > > > > > not. >> > > > > > > > > > > > > > > > > > Later on you say "Decompression of >> the >> > > payloads >> > > > > > will be >> > > > > > > > > > > > handled by >> > > > > > > > > > > > > > > the >> > > > > > > > > > > > > > > > > > broker metrics plugin, the broker >> should >> > > > expose a >> > > > > > > > > suitable >> > > > > > > > > > > > > > > > decompression >> > > > > > > > > > > > > > > > > > API to the metrics plugin for this >> > purpose.", >> > > > > which >> > > > > > > > > > suggests >> > > > > > > > > > > > it's >> > > > > > > > > > > > > > the >> > > > > > > > > > > > > > > > > > compressed data in the buffer, but >> then we >> > > > don't >> > > > > > know >> > > > > > > > > which >> > > > > > > > > > > > codec >> > > > > > > > > > > > > > was >> > > > > > > > > > > > > > > > used, >> > > > > > > > > > > > > > > > > > nor the API via which the plugin >> should >> > > > > decompress >> > > > > > it >> > > > > > > > if >> > > > > > > > > > > > required >> > > > > > > > > > > > > > for >> > > > > > > > > > > > > > > > > > forwarding to the ultimate metrics >> store. >> > > > Should >> > > > > > the >> > > > > > > > > > > > > > > > ClientTelemetryPayload >> > > > > > > > > > > > > > > > > > expose a method to get the >> compression and >> > a >> > > > > > > > > decompressor? >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Good point, updated. >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > 2. The client-side API is expressed >> as >> > > > > > StringOrError >> > > > > > > > > > > > > > > > > > ClientInstance::ClientInstanceId(int >> > > > > timeout_ms). I >> > > > > > > > > > > understand >> > > > > > > > > > > > that >> > > > > > > > > > > > > > > > you're >> > > > > > > > > > > > > > > > > > thinking about the librdkafka >> > implementation, >> > > > but >> > > > > > it >> > > > > > > > > would >> > > > > > > > > > be >> > > > > > > > > > > > good >> > > > > > > > > > > > > > to >> > > > > > > > > > > > > > > > show >> > > > > > > > > > > > > > > > > > the API as it would appear on the >> Apache >> > > Kafka >> > > > > > clients. >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > This was meant as pseudo-code, but I >> changed >> > it >> > > > to >> > > > > > Java. >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > 3. "PushTelemetryRequest|Response - >> > protocol >> > > > > > request >> > > > > > > > used >> > > > > > > > > > by >> > > > > > > > > > > > the >> > > > > > > > > > > > > > > > client to >> > > > > > > > > > > > > > > > > > send metrics to any broker it is >> connected >> > > to." >> > > > > To >> > > > > > be >> > > > > > > > > > clear, >> > > > > > > > > > > > this >> > > > > > > > > > > > > > > means >> > > > > > > > > > > > > > > > > > that the client can choose any of >> the >> > > connected >> > > > > > brokers >> > > > > > > > > and >> > > > > > > > > > > > push to >> > > > > > > > > > > > > > > > just >> > > > > > > > > > > > > > > > > > one of them? What should a >> supporting >> > client >> > > do >> > > > > if >> > > > > > it >> > > > > > > > > gets >> > > > > > > > > > an >> > > > > > > > > > > > error >> > > > > > > > > > > > > > > > when >> > > > > > > > > > > > > > > > > > pushing metrics to a broker, retry >> sending >> > to >> > > > the >> > > > > > same >> > > > > > > > > > broker >> > > > > > > > > > > > or >> > > > > > > > > > > > > > try >> > > > > > > > > > > > > > > > > > pushing to another broker, or drop >> the >> > > metrics? >> > > > > > Should >> > > > > > > > > > > > supporting >> > > > > > > > > > > > > > > > clients >> > > > > > > > > > > > > > > > > > send successive requests to a single >> > broker, >> > > or >> > > > > > round >> > > > > > > > > > robin, >> > > > > > > > > > > > or is >> > > > > > > > > > > > > > > > that up >> > > > > > > > > > > > > > > > > > to the client author? I'm guessing >> the >> > > > behaviour >> > > > > > should >> > > > > > > > > be >> > > > > > > > > > > > sticky >> > > > > > > > > > > > > > to >> > > > > > > > > > > > > > > > > > support the rate limiting features, >> but I >> > > think >> > > > > it >> > > > > > > > would >> > > > > > > > > be >> > > > > > > > > > > > good >> > > > > > > > > > > > > > for >> > > > > > > > > > > > > > > > client >> > > > > > > > > > > > > > > > > > authors if this section were >> explicit on >> > the >> > > > > > > > recommended >> > > > > > > > > > > > behaviour. >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > You are right, I've updated the KIP >> to make >> > > this >> > > > > > clearer. >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > 4. "Mapping the client instance id >> to an >> > > actual >> > > > > > > > > application >> > > > > > > > > > > > > > instance >> > > > > > > > > > > > > > > > > > running on a (virtual) machine can >> be done >> > by >> > > > > > > > inspecting >> > > > > > > > > > the >> > > > > > > > > > > > > > metrics >> > > > > > > > > > > > > > > > > > resource labels, such as the client >> source >> > > > > address >> > > > > > and >> > > > > > > > > > source >> > > > > > > > > > > > port, >> > > > > > > > > > > > > > > or >> > > > > > > > > > > > > > > > > > security principal, all of which >> are added >> > by >> > > > the >> > > > > > > > > receiving >> > > > > > > > > > > > broker. >> > > > > > > > > > > > > > > > This >> > > > > > > > > > > > > > > > > > will allow the operator together >> with the >> > > user >> > > > to >> > > > > > > > > identify >> > > > > > > > > > > the >> > > > > > > > > > > > > > actual >> > > > > > > > > > > > > > > > > > application instance." Is this >> really >> > always >> > > > > true? >> > > > > > The >> > > > > > > > > > source >> > > > > > > > > > > > IP >> > > > > > > > > > > > > > and >> > > > > > > > > > > > > > > > port >> > > > > > > > > > > > > > > > > > might be a loadbalancer/proxy in >> some >> > setups. >> > > > The >> > > > > > > > > > principal, >> > > > > > > > > > > as >> > > > > > > > > > > > > > > already >> > > > > > > > > > > > > > > > > > mentioned in the KIP, might be >> shared >> > between >> > > > > > multiple >> > > > > > > > > > > > > > applications. >> > > > > > > > > > > > > > > > So at >> > > > > > > > > > > > > > > > > > worst the organization running the >> clients >> > > > might >> > > > > > have >> > > > > > > > to >> > > > > > > > > > > > consult >> > > > > > > > > > > > > > the >> > > > > > > > > > > > > > > > logs >> > > > > > > > > > > > > > > > > > of a set of client applications, >> right? >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Yes, that's correct. There's no >> guaranteed >> > > > mapping >> > > > > > from >> > > > > > > > > > > > > > > > client_instance_id >> > > > > > > > > > > > > > > > > to >> > > > > > > > > > > > > > > > > an actual instance, that's why the KIP >> > > recommends >> > > > > > client >> > > > > > > > > > > > > > > implementations >> > > > > > > > > > > > > > > > to >> > > > > > > > > > > > > > > > > log the client instance id >> > > > > > > > > > > > > > > > > upon retrieval, and also provide an >> API for >> > the >> > > > > > > > application >> > > > > > > > > > to >> > > > > > > > > > > > > > retrieve >> > > > > > > > > > > > > > > > the >> > > > > > > > > > > > > > > > > instance id programmatically >> > > > > > > > > > > > > > > > > if it has a better way of exposing it. >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > 5. "Tests indicate that a compression >> ratio >> > up >> > > to >> > > > > > 10x is >> > > > > > > > > > > > possible for >> > > > > > > > > > > > > > > the >> > > > > > > > > > > > > > > > > > standard metrics." Client authors >> might >> > > > > appreciate >> > > > > > your >> > > > > > > > > > > > mentioning >> > > > > > > > > > > > > > > > which >> > > > > > > > > > > > > > > > > > compression codec got these results. >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Good point. Updated. >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > 6. "Should the client send a push >> request >> > > prior >> > > > > to >> > > > > > > > expiry >> > > > > > > > > > of >> > > > > > > > > > > > the >> > > > > > > > > > > > > > > > previously >> > > > > > > > > > > > > > > > > > calculated PushIntervalMs the >> broker will >> > > > discard >> > > > > > the >> > > > > > > > > > metrics >> > > > > > > > > > > > and >> > > > > > > > > > > > > > > > return a >> > > > > > > > > > > > > > > > > > PushTelemetryResponse with the >> ErrorCode >> > set >> > > to >> > > > > > > > > > RateLimited." >> > > > > > > > > > > > Is >> > > > > > > > > > > > > > this >> > > > > > > > > > > > > > > > > > RATE_LIMITED a new error code? It's >> not >> > > > mentioned >> > > > > > in >> > > > > > > > the >> > > > > > > > > > "New >> > > > > > > > > > > > Error >> > > > > > > > > > > > > > > > Codes" >> > > > > > > > > > > > > > > > > > section. >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > That's a leftover, it should be using >> the >> > > > standard >> > > > > > > > > > ThrottleTime >> > > > > > > > > > > > > > > > mechanism. >> > > > > > > > > > > > > > > > > Fixed. >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > 7. In the section "Standard client >> resource >> > > > > labels" >> > > > > > > > > > > > application_id >> > > > > > > > > > > > > > is >> > > > > > > > > > > > > > > > > > described as Kafka Streams only, >> but the >> > > > section >> > > > > of >> > > > > > > > > "Client >> > > > > > > > > > > > > > > > Identification" >> > > > > > > > > > > > > > > > > > talks about "application instance >> id as an >> > > > > optional >> > > > > > > > > future >> > > > > > > > > > > > > > > nice-to-have >> > > > > > > > > > > > > > > > > > that may be included as a metrics >> label if >> > it >> > > > has >> > > > > > been >> > > > > > > > > set >> > > > > > > > > > by >> > > > > > > > > > > > the >> > > > > > > > > > > > > > > > user", so >> > > > > > > > > > > > > > > > > > I'm confused whether non-Kafka >> Streams >> > > clients >> > > > > > should >> > > > > > > > set >> > > > > > > > > > an >> > > > > > > > > > > > > > > > application_id >> > > > > > > > > > > > > > > > > > or not. >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > I'll clarify this in the KIP, but >> basically >> > we >> > > > > would >> > > > > > need >> > > > > > > > > to >> > > > > > > > > > > add >> > > > > > > > > > > > an ` >> > > > > > > > > > > > > > > > > application.id` config >> > > > > > > > > > > > > > > > > property for non-streams clients for >> this >> > > > purpose, >> > > > > > and >> > > > > > > > > that's >> > > > > > > > > > > > outside >> > > > > > > > > > > > > > > the >> > > > > > > > > > > > > > > > > scope of this KIP since we want to >> make it >> > > > > > zero-conf:ish >> > > > > > > > on >> > > > > > > > > > the >> > > > > > > > > > > > > > client >> > > > > > > > > > > > > > > > side. >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Kind regards, >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > Tom >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Thanks for the review, >> > > > > > > > > > > > > > > > > Magnus >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > On Thu, Oct 7, 2021 at 5:26 PM >> Magnus >> > > Edenhill >> > > > < >> > > > > > > > > > > > mag...@edenhill.se >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > wrote: >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Hi all, >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > I've updated the KIP following >> our recent >> > > > > > discussions >> > > > > > > > > on >> > > > > > > > > > > the >> > > > > > > > > > > > > > > mailing >> > > > > > > > > > > > > > > > > > list: >> > > > > > > > > > > > > > > > > > > - split the protocol in two, one >> for >> > > getting >> > > > > the >> > > > > > > > > metrics >> > > > > > > > > > > > > > > > subscriptions, >> > > > > > > > > > > > > > > > > > > and one for pushing the metrics. >> > > > > > > > > > > > > > > > > > > - simplifications: initially >> only one >> > > > > supported >> > > > > > > > > metrics >> > > > > > > > > > > > format, >> > > > > > > > > > > > > > no >> > > > > > > > > > > > > > > > > > > client.id in the instance id, >> etc. >> > > > > > > > > > > > > > > > > > > - made CLIENT_METRICS >> subscription >> > > > > configuration >> > > > > > > > > entries >> > > > > > > > > > > > more >> > > > > > > > > > > > > > > > structured >> > > > > > > > > > > > > > > > > > > and allowing better client >> matching >> > > > > selectors >> > > > > > (not >> > > > > > > > > > only >> > > > > > > > > > > > on the >> > > > > > > > > > > > > > > > > > instance >> > > > > > > > > > > > > > > > > > > id, but also the other >> > > > > > > > > > > > > > > > > > > client resource labels, such as >> > > > > > > > > client_software_name, >> > > > > > > > > > > > etc.). >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Unless there are further comments >> I'll >> > call >> > > > the >> > > > > > vote >> > > > > > > > > in a >> > > > > > > > > > > > day or >> > > > > > > > > > > > > > > two. >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Regards, >> > > > > > > > > > > > > > > > > > > Magnus >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > Den mån 4 okt. 2021 kl 20:57 >> skrev Magnus >> > > > > > Edenhill < >> > > > > > > > > > > > > > > > mag...@edenhill.se>: >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Hi Gwen, >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > I'm finishing up the KIP based >> on the >> > > last >> > > > > > couple >> > > > > > > > of >> > > > > > > > > > > > discussion >> > > > > > > > > > > > > > > > points >> > > > > > > > > > > > > > > > > > in >> > > > > > > > > > > > > > > > > > > > this thread >> > > > > > > > > > > > > > > > > > > > and will call the Vote later >> this week. >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Best, >> > > > > > > > > > > > > > > > > > > > Magnus >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > Den lör 2 okt. 2021 kl 02:01 >> skrev Gwen >> > > > > Shapira >> > > > > > > > > > > > > > > > > > > <g...@confluent.io.invalid >> > > > > > > > > > > > > > > > > > > > >: >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> Hey, >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> I noticed that there was no >> discussion >> > > for >> > > > > the >> > > > > > > > last >> > > > > > > > > 10 >> > > > > > > > > > > > days, >> > > > > > > > > > > > > > > but I >> > > > > > > > > > > > > > > > > > > >> couldn't >> > > > > > > > > > > > > > > > > > > >> find the vote thread. Is there >> one >> > that >> > > > I'm >> > > > > > > > missing? >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> Gwen >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> On Wed, Sep 22, 2021 at 4:58 >> AM Magnus >> > > > > > Edenhill < >> > > > > > > > > > > > > > > > mag...@edenhill.se> >> > > > > > > > > > > > > > > > > > > >> wrote: >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> > Den tis 21 sep. 2021 kl >> 06:58 skrev >> > > > Colin >> > > > > > > > McCabe < >> > > > > > > > > > > > > > > > > > cmcc...@apache.org >> > > > > > > > > > > > > > > > > > > >: >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > >> > > On Mon, Sep 20, 2021, at >> 17:35, >> > Feng >> > > > Min >> > > > > > > > wrote: >> > > > > > > > > > > > > > > > > > > >> > > > Thanks Magnus & Colin >> for the >> > > > > > discussion. >> > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > > Based on KIP-714's >> stateless >> > > design, >> > > > > > Client >> > > > > > > > > can >> > > > > > > > > > > > pretty >> > > > > > > > > > > > > > > much >> > > > > > > > > > > > > > > > use >> > > > > > > > > > > > > > > > > > > any >> > > > > > > > > > > > > > > > > > > >> > > > connection to any broker >> to send >> > > > > > metrics. We >> > > > > > > > > are >> > > > > > > > > > > not >> > > > > > > > > > > > > > > > associating >> > > > > > > > > > > > > > > > > > > >> > > connection >> > > > > > > > > > > > > > > > > > > >> > > > with client metric >> state. Is my >> > > > > > > > understanding >> > > > > > > > > > > > correct? >> > > > > > > > > > > > > > If >> > > > > > > > > > > > > > > > yes, >> > > > > > > > > > > > > > > > > > > how >> > > > > > > > > > > > > > > > > > > >> > about >> > > > > > > > > > > > > > > > > > > >> > > > the following two >> scenarios >> > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > > 1) One Client (Client-ID) >> > > registers >> > > > > two >> > > > > > > > > > different >> > > > > > > > > > > > client >> > > > > > > > > > > > > > > > > > instance >> > > > > > > > > > > > > > > > > > > id >> > > > > > > > > > > > > > > > > > > >> > via >> > > > > > > > > > > > > > > > > > > >> > > > separate registration. >> Is it >> > > > > permitted? >> > > > > > If >> > > > > > > > OK, >> > > > > > > > > > how >> > > > > > > > > > > > to >> > > > > > > > > > > > > > > > > > distinguish >> > > > > > > > > > > > > > > > > > > >> them >> > > > > > > > > > > > > > > > > > > >> > > from >> > > > > > > > > > > > > > > > > > > >> > > > the case 2 below. >> > > > > > > > > > > > > > > > > > > >> > > > >> > > > > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > > > > >> > > Hi Feng, >> > > > > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > > > > >> > > My understanding, which >> Magnus can >> > > > > > clarify I >> > > > > > > > > > guess, >> > > > > > > > > > > is >> > > > > > > > > > > > > > that >> > > > > > > > > > > > > > > > you >> > > > > > > > > > > > > > > > > > > could >> > > > > > > > > > > > > > > > > > > >> > have >> > > > > > > > > > > > > > > > > > > >> > > something like two Producer >> > > instances >> > > > > > running >> > > > > > > > > with >> > > > > > > > > > > the >> > > > > > > > > > > > > > same >> > > > > > > > > > > > > > > > > > > client.id >> > > > > > > > > > > > > > > > > > > >> > > (perhaps because they're >> using the >> > > > same >> > > > > > config >> > > > > > > > > > file, >> > > > > > > > > > > > for >> > > > > > > > > > > > > > > > example). >> > > > > > > > > > > > > > > > > > > >> They >> > > > > > > > > > > > > > > > > > > >> > > could even be in the same >> process. >> > > But >> > > > > > they >> > > > > > > > > would >> > > > > > > > > > > get >> > > > > > > > > > > > > > > separate >> > > > > > > > > > > > > > > > > > > UUIDs. >> > > > > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > > > > >> > > I believe Magnus used the >> term >> > > client >> > > > to >> > > > > > mean >> > > > > > > > > > > > "Producer or >> > > > > > > > > > > > > > > > > > > Consumer". >> > > > > > > > > > > > > > > > > > > >> So >> > > > > > > > > > > > > > > > > > > >> > > if you have both a >> Producer and a >> > > > > > Consumer in >> > > > > > > > > your >> > > > > > > > > > > > > > > > application I >> > > > > > > > > > > > > > > > > > > would >> > > > > > > > > > > > > > > > > > > >> > > expect you'd get separate >> UUIDs >> > for >> > > > > both. >> > > > > > > > Again >> > > > > > > > > > > > Magnus can >> > > > > > > > > > > > > > > > chime >> > > > > > > > > > > > > > > > > > in >> > > > > > > > > > > > > > > > > > > >> > here, I >> > > > > > > > > > > > > > > > > > > >> > > guess. >> > > > > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > >> > That's correct. >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > > > > >> > > > 2) How about the client >> > > restarting? >> > > > > > What's >> > > > > > > > the >> > > > > > > > > > > > > > > expectation? >> > > > > > > > > > > > > > > > > > Should >> > > > > > > > > > > > > > > > > > > >> the >> > > > > > > > > > > > > > > > > > > >> > > > server expect the client >> to >> > carry >> > > a >> > > > > > > > persisted >> > > > > > > > > > > client >> > > > > > > > > > > > > > > > instance id >> > > > > > > > > > > > > > > > > > > or >> > > > > > > > > > > > > > > > > > > >> > > should >> > > > > > > > > > > > > > > > > > > >> > > > the client be treated as >> a new >> > > > > instance? >> > > > > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > > > > >> > > The KIP doesn't describe >> any >> > > mechanism >> > > > > for >> > > > > > > > > > > > persistence, >> > > > > > > > > > > > > > so I >> > > > > > > > > > > > > > > > would >> > > > > > > > > > > > > > > > > > > >> assume >> > > > > > > > > > > > > > > > > > > >> > > that when you restart the >> client >> > you >> > > > get >> > > > > > a new >> > > > > > > > > > > UUID. I >> > > > > > > > > > > > > > agree >> > > > > > > > > > > > > > > > that >> > > > > > > > > > > > > > > > > > it >> > > > > > > > > > > > > > > > > > > >> > would >> > > > > > > > > > > > > > > > > > > >> > > be good to spell this out. >> > > > > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > > > > >> > Right, it will not be >> persisted >> > since >> > > a >> > > > > > client >> > > > > > > > > > > instance >> > > > > > > > > > > > > > can't >> > > > > > > > > > > > > > > be >> > > > > > > > > > > > > > > > > > > >> restarted. >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > >> > Will update the KIP to make >> this >> > > > clearer. >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > >> > /Magnus >> > > > > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > >> -- >> > > > > > > > > > > > > > > > > > > >> Gwen Shapira >> > > > > > > > > > > > > > > > > > > >> Engineering Manager | Confluent >> > > > > > > > > > > > > > > > > > > >> 650.450.2760 | @gwenshap >> > > > > > > > > > > > > > > > > > > >> Follow us: Twitter | blog >> > > > > > > > > > > > > > > > > > > >> >> > > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >