[ 
https://issues.apache.org/jira/browse/KAFKA-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940149#comment-13940149
 ] 

Jay Kreps commented on KAFKA-1251:
----------------------------------

I posted a draft patch. This patch adds a variety of metrics. I haven't changed 
the histogram instrumentation so for now it is just avg, max, rate, etc. We add 
that fairly easily.

Several things to discuss:
1. The list of metrics
2. The naming
3. Which metrics should be captured at the broker or topic level
4. Performance
5. JMX reporting

Okay the list of metrics is below, check it out. We can discuss the names and 
doc strings for various metrics, perhaps they can be improved (if it isn't 
clear what a metric does from the doc string then it can definitely be 
improved!). Our goal should be that the doc strings fully document the metrics 
so that we don't have to keep separate HTML docs up-to-date.

Currently I give each metric an un-namespaced name such as message-send-rate. 
In the JMX I prefix everything with "kafka.producer." [+ clientId + "."] for 
uniqueness. This means all the metrics below show up as attributes under the 
same mbean (kafka.producer.<client-id>). I think this is a lot more 
straight-forward to look at in jconsole and other tools.

Performance--there is really significant performance impact from metrics 
(perhaps surprisingly). As a result I removed all the metrics from 
KafkaProducer.send() and moved them into the background thread so that they are 
all per batch or per request rather than per-message. At first I thought this 
was some bad on my part, so I did some performance comparison against the 
yammer metrics package. It is pretty similar. But basically if you do 500k 
calls/sec the overhead adds up significantly. So if you are wondering why 
things like maxMessageSize are calculated in a weird way that is why. Even 
after that fix metrics performance is still a big deal, so I may see if I can 
optimize a bit more in the metrics package.

My thought was to only break-out a few metrics per-topic or per-broker. I 
haven't done that yet, so let's discuss what we want. 

Per-topic:
message-send-rate, message-error-rate, message-retry-rate, bytes-per-second

Per-broker
message-send-rate, message-error-rate, message-retry-rate, 
bytes-sent-per-second, bytes-received-per-second, requests-sent-per-second, 
requests-received-per-second, request-latency

Here is the current list of metrics:
"message-error-rate", "The average number of errors per second returned to the 
client."
"message-retry-rate", "The average per-second number of retries"
"message-send-rate", "The average number of messages sent per second."
"waiting-threads", "The number of user threads blocked waiting for buffer 
memory to enqueue their records"
"buffer-total-bytes", "The maximum amount of buffer memory the client can use 
(whether or not it is currently used)."
"buffer-available-bytes", "The total amount of buffer memory that is not being 
used (either unallocated or in the free list)."
"ready-partitions", "The number of topic-partitions with buffered data that is 
ready to be sent."
"batch-size-avg", "The average number of bytes per partition sent in requests."
"request-latency-avg", "The average request latency in ms"
"request-latency-max", "The maximum request latency in ms"
"messages-per-request-avg", "The average number of messages per request"
"message-size-max", "The maximum message size"
"requests-in-flight", "The current number of in-flight requests awaiting a 
response."
"metadata-age", "The age in seconds of the current producer metadata being 
used."
"network-ops-per-second", "The average number of network operations (reads or 
writes) on all connections per second."
"bytes-sent-per-second", "The average number of outgoing bytes sent per second 
to all servers."
"requests-sent-per-second", "The average number of requests sent per second."
"request-size-avg", "The average size of all requests in the window.."
"request-size-max", "The maximum size of any request sent in the window."
"bytes-received-per-second", "Bytes/second read off all sockets"
"responses-received-per-second", "Responses received sent per second."
"connections-created-per-second", "New connections established per second in 
the window."
"connections-closed-per-second", "Connections closed per second in the window."
"select-calls-per-second", "Number of times the I/O layer checked for new I/O 
to perform per second",
"select-time-avg-ns", "The average length of time per select call in 
nanoseconds."
"select-percentage", "The fraction of time the I/O thread spent waiting."
"io-time-avg-ns", "The average length of time for I/O per select call in 
nanoseconds."
"io-percentage", "The fraction of time spent doing I/O"
"connection-count", "The current number of active connections."

> Add metrics to the producer
> ---------------------------
>
>                 Key: KAFKA-1251
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1251
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: producer 
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>         Attachments: KAFKA-1251.patch
>
>
> Currently there are no metrics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to