We've setup jmxtrans and use it to check these two values.
UncleanLeaderElectionsPerSec
UnderReplicatedPartitions

Here is our shinken/nagios configuration:

define command {
  command_name check_kafka_underreplicated
  command_line $USER1$/check_jmx -U
service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:9999/jmxrmi -O
"kafka.server":type="ReplicaManager",name="UnderReplicatedPartitions" -A
Value -w $ARG1$ -c $ARG2$
}

define command {
  command_name check_kafka_uncleanleader
  command_line $USER1$/check_jmx -U
service:jmx:rmi:///jndi/rmi://$HOSTADDRESS$:9999/jmxrmi -O
"kafka.controller":type="ControllerStats",name="UncleanLeaderElectionsPerSec"
-A Count -w $ARG1$ -c $ARG2$
}

define service {
  hostgroup_name KafkaBroker
  use generic-service
  service_description Kafka Unclean Leader Elections per sec
  check_command check_kafka_uncleanleader!1!10
  check_interval 15
  retry_interval 5
}
define service {
  hostgroup_name KafkaBroker
  use generic-service
  service_description Kafka Under Replicated Partitions
  check_command check_kafka_underreplicated!1!10
  check_interval 15
  retry_interval 5
}





################################################################################################################################################################################################################################################################################################################

*DELTA PROJECTS*

*Elias Abacioglu*
Infrastructure Specialist at Delta Projects AB

*E-mail*: elias.abacio...@deltaprojects.com
*Office*: +46 8 667 76 90 *Mobile*: +46 70 222 59 25
*Office*: Banérgatan 10, SE-115 23 Stockholm, Sweden
website <http://www.deltaprojects.com> | map <http://goo.gl/maps/P3I48> |
support <supp...@deltaprojects.com> | twitter
<https://twitter.com/DeltaProjects_> | linkedin
<http://www.linkedin.com/company/delta-projects?trk=hb_tab_compy_id_142164>



On Mon, Feb 29, 2016 at 12:41 PM, tao xiao <xiaotao...@gmail.com> wrote:

> Thanks Jens. What I want to achieve is to check every broker within a
> cluster functions probably. The way you suggest can identify the liveness
> of a cluster but it doesn't necessarily mean every broker in the cluster is
> alive. In order to achieve that I can either create a topic with number of
> partitions being same as the number of brokers and min.insync.isr=number of
> brokers or one topic per broker and then send ping message to broker. But
> this approach is definitely not scalable as we expand the cluster.
> Therefore I am looking for a way to achieve this.
>
> On Mon, 29 Feb 2016 at 16:54 Jens Rantil <jens.ran...@tink.se> wrote:
>
> > Hi,
> >
> > I assume you first want to ask yourself what liveness you would like to
> > check for. I guess the most realistic check is to put a "ping" message on
> > the broken and make sure that you can consume it.
> >
> > Cheers,
> > Jens
> >
> > On Fri, Feb 26, 2016 at 12:38 PM, tao xiao <xiaotao...@gmail.com> wrote:
> >
> > > Hi team,
> > >
> > > What is the best way to verify a specific Kafka node functions
> properly?
> > > Telnet the port is one of the approach but I don't think it tells me
> > > whether or not the broker can still receive/send traffics. I am
> thinking
> > to
> > > ask for metadata from the broker using consumer.partitionsFor. If it
> can
> > > return partitioninfo it is considered live. Is this a good approach?
> > >
> >
> >
> >
> > --
> > Jens Rantil
> > Backend engineer
> > Tink AB
> >
> > Email: jens.ran...@tink.se
> > Phone: +46 708 84 18 32
> > Web: www.tink.se
> >
> > Facebook <https://www.facebook.com/#!/tink.se> Linkedin
> > <
> >
> http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary
> > >
> >  Twitter <https://twitter.com/tink>
> >
>

Reply via email to