Hello Tao,

For your case maybe you can monitor the following jmx as well (see
http://kafka.apache.org/documentation.html#monitoring):

kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec

When a broker cannot properly respond to requests it will be much smaller
compared with other brokers.

Guozhang



On Tue, Mar 1, 2016 at 7:39 PM, tao xiao <xiaotao...@gmail.com> wrote:

> Thanks Elias for sharing
>
> On Mon, 29 Feb 2016 at 22:23 Elias Abacioglu <
> elias.abacio...@deltaprojects.com> wrote:
>
> > Crap, forgot to remove my signature.. I guess my e-mail will now get
> > spammed forever :(
> >
> >
> >
> >
> >
> > On Mon, Feb 29, 2016 at 3:14 PM, Elias Abacioglu <
> > elias.abacio...@deltaprojects.com> wrote:
> >
> > > We've setup jmxtrans and use it to check these two values.
> > > UncleanLeaderElectionsPerSec
> > > UnderReplicatedPartitions
> > >
> > > Here is our shinken/nagios configuration:
> > >
> > > define command {
> > >   command_name check_kafka_underreplicated
> > >   command_line $USER1$/check_jmx -U
> > > service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:9999/jmxrmi -O
> > > "kafka.server":type="ReplicaManager",name="UnderReplicatedPartitions"
> -A
> > > Value -w $ARG1$ -c $ARG2$
> > > }
> > >
> > > define command {
> > >   command_name check_kafka_uncleanleader
> > >   command_line $USER1$/check_jmx -U
> > > service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:9999/jmxrmi -O
> > >
> >
> "kafka.controller":type="ControllerStats",name="UncleanLeaderElectionsPerSec"
> > > -A Count -w $ARG1$ -c $ARG2$
> > > }
> > >
> > > define service {
> > >   hostgroup_name KafkaBroker
> > >   use generic-service
> > >   service_description Kafka Unclean Leader Elections per sec
> > >   check_command check_kafka_uncleanleader!1!10
> > >   check_interval 15
> > >   retry_interval 5
> > > }
> > > define service {
> > >   hostgroup_name KafkaBroker
> > >   use generic-service
> > >   service_description Kafka Under Replicated Partitions
> > >   check_command check_kafka_underreplicated!1!10
> > >   check_interval 15
> > >   retry_interval 5
> > > }
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Feb 29, 2016 at 12:41 PM, tao xiao <xiaotao...@gmail.com>
> wrote:
> > >
> > >> Thanks Jens. What I want to achieve is to check every broker within a
> > >> cluster functions probably. The way you suggest can identify the
> > liveness
> > >> of a cluster but it doesn't necessarily mean every broker in the
> cluster
> > >> is
> > >> alive. In order to achieve that I can either create a topic with
> number
> > of
> > >> partitions being same as the number of brokers and
> min.insync.isr=number
> > >> of
> > >> brokers or one topic per broker and then send ping message to broker.
> > But
> > >> this approach is definitely not scalable as we expand the cluster.
> > >> Therefore I am looking for a way to achieve this.
> > >>
> > >> On Mon, 29 Feb 2016 at 16:54 Jens Rantil <jens.ran...@tink.se> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I assume you first want to ask yourself what liveness you would like
> > to
> > >> > check for. I guess the most realistic check is to put a "ping"
> message
> > >> on
> > >> > the broken and make sure that you can consume it.
> > >> >
> > >> > Cheers,
> > >> > Jens
> > >> >
> > >> > On Fri, Feb 26, 2016 at 12:38 PM, tao xiao <xiaotao...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > Hi team,
> > >> > >
> > >> > > What is the best way to verify a specific Kafka node functions
> > >> properly?
> > >> > > Telnet the port is one of the approach but I don't think it tells
> me
> > >> > > whether or not the broker can still receive/send traffics. I am
> > >> thinking
> > >> > to
> > >> > > ask for metadata from the broker using consumer.partitionsFor. If
> it
> > >> can
> > >> > > return partitioninfo it is considered live. Is this a good
> approach?
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Jens Rantil
> > >> > Backend engineer
> > >> > Tink AB
> > >> >
> > >> > Email: jens.ran...@tink.se
> > >> > Phone: +46 708 84 18 32
> > >> > Web: www.tink.se
> > >> >
> > >> > Facebook <https://www.facebook.com/#!/tink.se> Linkedin
> > >> > <
> > >> >
> > >>
> >
> http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
> > >> > >
> > >> >  Twitter <https://twitter.com/tink>
> > >> >
> > >>
> > >
> > >
> >
>



-- 
-- Guozhang

Reply via email to