[ https://issues.apache.org/jira/browse/KAFKA-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940149#comment-13940149 ]
Jay Kreps commented on KAFKA-1251: ---------------------------------- I posted a draft patch. This patch adds a variety of metrics. I haven't changed the histogram instrumentation so for now it is just avg, max, rate, etc. We add that fairly easily. Several things to discuss: 1. The list of metrics 2. The naming 3. Which metrics should be captured at the broker or topic level 4. Performance 5. JMX reporting Okay the list of metrics is below, check it out. We can discuss the names and doc strings for various metrics, perhaps they can be improved (if it isn't clear what a metric does from the doc string then it can definitely be improved!). Our goal should be that the doc strings fully document the metrics so that we don't have to keep separate HTML docs up-to-date. Currently I give each metric an un-namespaced name such as message-send-rate. In the JMX I prefix everything with "kafka.producer." [+ clientId + "."] for uniqueness. This means all the metrics below show up as attributes under the same mbean (kafka.producer.<client-id>). I think this is a lot more straight-forward to look at in jconsole and other tools. Performance--there is really significant performance impact from metrics (perhaps surprisingly). As a result I removed all the metrics from KafkaProducer.send() and moved them into the background thread so that they are all per batch or per request rather than per-message. At first I thought this was some bad on my part, so I did some performance comparison against the yammer metrics package. It is pretty similar. But basically if you do 500k calls/sec the overhead adds up significantly. So if you are wondering why things like maxMessageSize are calculated in a weird way that is why. Even after that fix metrics performance is still a big deal, so I may see if I can optimize a bit more in the metrics package. My thought was to only break-out a few metrics per-topic or per-broker. I haven't done that yet, so let's discuss what we want. Per-topic: message-send-rate, message-error-rate, message-retry-rate, bytes-per-second Per-broker message-send-rate, message-error-rate, message-retry-rate, bytes-sent-per-second, bytes-received-per-second, requests-sent-per-second, requests-received-per-second, request-latency Here is the current list of metrics: "message-error-rate", "The average number of errors per second returned to the client." "message-retry-rate", "The average per-second number of retries" "message-send-rate", "The average number of messages sent per second." "waiting-threads", "The number of user threads blocked waiting for buffer memory to enqueue their records" "buffer-total-bytes", "The maximum amount of buffer memory the client can use (whether or not it is currently used)." "buffer-available-bytes", "The total amount of buffer memory that is not being used (either unallocated or in the free list)." "ready-partitions", "The number of topic-partitions with buffered data that is ready to be sent." "batch-size-avg", "The average number of bytes per partition sent in requests." "request-latency-avg", "The average request latency in ms" "request-latency-max", "The maximum request latency in ms" "messages-per-request-avg", "The average number of messages per request" "message-size-max", "The maximum message size" "requests-in-flight", "The current number of in-flight requests awaiting a response." "metadata-age", "The age in seconds of the current producer metadata being used." "network-ops-per-second", "The average number of network operations (reads or writes) on all connections per second." "bytes-sent-per-second", "The average number of outgoing bytes sent per second to all servers." "requests-sent-per-second", "The average number of requests sent per second." "request-size-avg", "The average size of all requests in the window.." "request-size-max", "The maximum size of any request sent in the window." "bytes-received-per-second", "Bytes/second read off all sockets" "responses-received-per-second", "Responses received sent per second." "connections-created-per-second", "New connections established per second in the window." "connections-closed-per-second", "Connections closed per second in the window." "select-calls-per-second", "Number of times the I/O layer checked for new I/O to perform per second", "select-time-avg-ns", "The average length of time per select call in nanoseconds." "select-percentage", "The fraction of time the I/O thread spent waiting." "io-time-avg-ns", "The average length of time for I/O per select call in nanoseconds." "io-percentage", "The fraction of time spent doing I/O" "connection-count", "The current number of active connections." > Add metrics to the producer > --------------------------- > > Key: KAFKA-1251 > URL: https://issues.apache.org/jira/browse/KAFKA-1251 > Project: Kafka > Issue Type: Sub-task > Components: producer > Reporter: Jay Kreps > Assignee: Jay Kreps > Attachments: KAFKA-1251.patch > > > Currently there are no metrics. -- This message was sent by Atlassian JIRA (v6.2#6252)