Updated the doc at http://kafka.apache.org/documentation.html#monitoring
Hopefully that answers your questions. Thanks, Jun On Tue, Sep 3, 2013 at 11:16 PM, Vadim Keylis <vkeylis2...@gmail.com> wrote: > Good evening. I have read through section of monitoring. I tried to map > each section to corresponding JMX attribute. I will appreciate if you > answer a few questions bellow. > > Thanks so much in advance, > Vadim > > What this JMX > "kafka.controller":type="KafkaController",name="ActiveControllerCount" for? > > The rate of data in and out of the cluster and the number of messages > written > Which jmx attributes should I monitor? Since I should alert on this What > are acceptable changes? What are not? > The log flush rate and the time taken to flush the log > "kafka.log":type="LogFlushStats",name="LogFlushRateAndTimeMs" > Which attribute I should be watching and what acceptable deviation change > before I should alert > The number of partitions that have replicas that are down or have > fallen behind and are underreplicated. > Is this the JMX > "kafka.cluster":type="Partition",name="buypets-0-UnderReplicated" that will > show replicas that are down? > > Unclean leader elections. This shouldn't happen. > > > > "kafka.controller":type="ControllerStats",name="UncleanLeaderElectionsPerSec". > I assume that should always be 0 and if its not 0 we have problem. > Number of partitions each node is the leader for. > Which JMX attribute(s) monitors this? > Leader elections: we track each time this happens and how long it took: > > > "kafka.controller":type="ControllerStats",name="LeaderElectionRateAndTimeMs" > Any changes to the ISR > Which JMX attribute I should monitor for this? Should I alert on this? > What are reasonable changes? Which are not? > The number of produce requests waiting on replication to report back > Which JMX attribute I should monitor for this? Should I alert on this? > What are reasonable changes? Which are not? > The number of fetch requests waiting on data to arrive > Which JMX attribute I should monitor for this? Should I alert on this? > What are reasonable changes? Which are not? >