Hello Tao, For your case maybe you can monitor the following jmx as well (see http://kafka.apache.org/documentation.html#monitoring):
kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec When a broker cannot properly respond to requests it will be much smaller compared with other brokers. Guozhang On Tue, Mar 1, 2016 at 7:39 PM, tao xiao <xiaotao...@gmail.com> wrote: > Thanks Elias for sharing > > On Mon, 29 Feb 2016 at 22:23 Elias Abacioglu < > elias.abacio...@deltaprojects.com> wrote: > > > Crap, forgot to remove my signature.. I guess my e-mail will now get > > spammed forever :( > > > > > > > > > > > > On Mon, Feb 29, 2016 at 3:14 PM, Elias Abacioglu < > > elias.abacio...@deltaprojects.com> wrote: > > > > > We've setup jmxtrans and use it to check these two values. > > > UncleanLeaderElectionsPerSec > > > UnderReplicatedPartitions > > > > > > Here is our shinken/nagios configuration: > > > > > > define command { > > > command_name check_kafka_underreplicated > > > command_line $USER1$/check_jmx -U > > > service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:9999/jmxrmi -O > > > "kafka.server":type="ReplicaManager",name="UnderReplicatedPartitions" > -A > > > Value -w $ARG1$ -c $ARG2$ > > > } > > > > > > define command { > > > command_name check_kafka_uncleanleader > > > command_line $USER1$/check_jmx -U > > > service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:9999/jmxrmi -O > > > > > > "kafka.controller":type="ControllerStats",name="UncleanLeaderElectionsPerSec" > > > -A Count -w $ARG1$ -c $ARG2$ > > > } > > > > > > define service { > > > hostgroup_name KafkaBroker > > > use generic-service > > > service_description Kafka Unclean Leader Elections per sec > > > check_command check_kafka_uncleanleader!1!10 > > > check_interval 15 > > > retry_interval 5 > > > } > > > define service { > > > hostgroup_name KafkaBroker > > > use generic-service > > > service_description Kafka Under Replicated Partitions > > > check_command check_kafka_underreplicated!1!10 > > > check_interval 15 > > > retry_interval 5 > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 29, 2016 at 12:41 PM, tao xiao <xiaotao...@gmail.com> > wrote: > > > > > >> Thanks Jens. What I want to achieve is to check every broker within a > > >> cluster functions probably. The way you suggest can identify the > > liveness > > >> of a cluster but it doesn't necessarily mean every broker in the > cluster > > >> is > > >> alive. In order to achieve that I can either create a topic with > number > > of > > >> partitions being same as the number of brokers and > min.insync.isr=number > > >> of > > >> brokers or one topic per broker and then send ping message to broker. > > But > > >> this approach is definitely not scalable as we expand the cluster. > > >> Therefore I am looking for a way to achieve this. > > >> > > >> On Mon, 29 Feb 2016 at 16:54 Jens Rantil <jens.ran...@tink.se> wrote: > > >> > > >> > Hi, > > >> > > > >> > I assume you first want to ask yourself what liveness you would like > > to > > >> > check for. I guess the most realistic check is to put a "ping" > message > > >> on > > >> > the broken and make sure that you can consume it. > > >> > > > >> > Cheers, > > >> > Jens > > >> > > > >> > On Fri, Feb 26, 2016 at 12:38 PM, tao xiao <xiaotao...@gmail.com> > > >> wrote: > > >> > > > >> > > Hi team, > > >> > > > > >> > > What is the best way to verify a specific Kafka node functions > > >> properly? > > >> > > Telnet the port is one of the approach but I don't think it tells > me > > >> > > whether or not the broker can still receive/send traffics. I am > > >> thinking > > >> > to > > >> > > ask for metadata from the broker using consumer.partitionsFor. If > it > > >> can > > >> > > return partitioninfo it is considered live. Is this a good > approach? > > >> > > > > >> > > > >> > > > >> > > > >> > -- > > >> > Jens Rantil > > >> > Backend engineer > > >> > Tink AB > > >> > > > >> > Email: jens.ran...@tink.se > > >> > Phone: +46 708 84 18 32 > > >> > Web: www.tink.se > > >> > > > >> > Facebook <https://www.facebook.com/#!/tink.se> Linkedin > > >> > < > > >> > > > >> > > > http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary > > >> > > > > >> > Twitter <https://twitter.com/tink> > > >> > > > >> > > > > > > > > > -- -- Guozhang