Jun. Greatly appreciated.
On Wed, Sep 4, 2013 at 10:12 PM, Jun Rao <jun...@gmail.com> wrote: > Updated the doc at http://kafka.apache.org/documentation.html#monitoring > > Hopefully that answers your questions. > > Thanks, > > Jun > > > On Tue, Sep 3, 2013 at 11:16 PM, Vadim Keylis <vkeylis2...@gmail.com> > wrote: > > > Good evening. I have read through section of monitoring. I tried to map > > each section to corresponding JMX attribute. I will appreciate if you > > answer a few questions bellow. > > > > Thanks so much in advance, > > Vadim > > > > What this JMX > > "kafka.controller":type="KafkaController",name="ActiveControllerCount" > for? > > > > The rate of data in and out of the cluster and the number of messages > > written > > Which jmx attributes should I monitor? Since I should alert on this > What > > are acceptable changes? What are not? > > The log flush rate and the time taken to flush the log > > "kafka.log":type="LogFlushStats",name="LogFlushRateAndTimeMs" > > Which attribute I should be watching and what acceptable deviation change > > before I should alert > > The number of partitions that have replicas that are down or have > > fallen behind and are underreplicated. > > Is this the JMX > > "kafka.cluster":type="Partition",name="buypets-0-UnderReplicated" that > will > > show replicas that are down? > > > > Unclean leader elections. This shouldn't happen. > > > > > > > > "kafka.controller":type="ControllerStats",name="UncleanLeaderElectionsPerSec". > > I assume that should always be 0 and if its not 0 we have problem. > > Number of partitions each node is the leader for. > > Which JMX attribute(s) monitors this? > > Leader elections: we track each time this happens and how long it > took: > > > > > > > "kafka.controller":type="ControllerStats",name="LeaderElectionRateAndTimeMs" > > Any changes to the ISR > > Which JMX attribute I should monitor for this? Should I alert on > this? > > What are reasonable changes? Which are not? > > The number of produce requests waiting on replication to report back > > Which JMX attribute I should monitor for this? Should I alert on this? > > What are reasonable changes? Which are not? > > The number of fetch requests waiting on data to arrive > > Which JMX attribute I should monitor for this? Should I alert on this? > > What are reasonable changes? Which are not? > > >