[ https://issues.apache.org/jira/browse/KAFKA-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950609#comment-15950609 ]
Aegeaner commented on KAFKA-4975: --------------------------------- You can enable unclean.leader.election.enable in config. Unclean leader election: A follower goes down, in the meanwhile the leader keeps appending messages. The follower comes back up and before it has completely caught up with the leader's logs, all replicas in the ISR go down. The follower is now uncleanly elected as the new leader, and it starts appending messages from the client. The old leader comes back up, becomes a follower and it may discover that the current leader's end offset is behind its own end offset. > Kafka process is running, but not listening to 9092 port > -------------------------------------------------------- > > Key: KAFKA-4975 > URL: https://issues.apache.org/jira/browse/KAFKA-4975 > Project: Kafka > Issue Type: Bug > Components: network > Affects Versions: 0.10.1.1 > Environment: A cluster of 15 Kafka brokers connected to a cluster of > 3 Zookeeper servers, all in the same data center. > uname -a: Linux dc3-kafka-02 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 > 19:39:52 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux > Kafka brokers hardware specs: > H/W path Device Class Description > ================================================ > system SR ((^_^)) > /0 bus SR > /0/0 memory 128KiB BIOS > /0/4 processor Intel(R) Atom(TM) CPU C2750 @ 2.40GHz > /0/4/5 memory 448KiB L1 cache > /0/4/6 memory 4MiB L2 cache > /0/15 memory 16GiB System Memory > /0/15/0 memory 8GiB DIMM DDR3 Synchronous 1600 MHz (0.6 > ns) > /0/15/1 memory DIMM DDR3 Synchronous [empty] > /0/15/2 memory 8GiB DIMM DDR3 Synchronous 1600 MHz (0.6 > ns) > /0/15/3 memory DIMM DDR3 Synchronous [empty] > /0/100 bridge Atom processor C2000 SoC Transaction > Router > /0/100/f generic Atom processor C2000 RCEC > /0/100/13 generic Atom processor C2000 SMBus 2.0 > /0/100/14 enp0s20f0 network Ethernet Connection I354 2.5 GbE > Backplane > /0/100/14.1 enp0s20f1 network Ethernet Connection I354 2.5 GbE > Backplane > /0/100/16 bus Atom processor C2000 USB Enhanced Host > Controller > /0/100/16/1 usb1 bus EHCI Host Controller > /0/100/16/1/1 bus USB hub > /0/100/18 storage Atom processor C2000 AHCI SATA3 > Controller > /0/100/1f bridge Atom processor C2000 PCU > /0/100/1f.3 bus Atom processor C2000 PCU SMBus > /0/101 bridge Atom processor C2000 RAS > /0/1 scsi0 storage > /0/1/0.0.0 /dev/sda disk 256GB SAMSUNG MZ7LN256 > /0/1/0.0.0/1 /dev/sda1 volume 190MiB EXT4 volume > /0/1/0.0.0/2 /dev/sda2 volume 237GiB EXT4 volume > /0/1/0.0.0/3 /dev/sda3 volume 976MiB Linux swap volume > /1 power CRB Battery 0 > /2 power OEM Define 5 > Reporter: Rafael Telles > Priority: Critical > > I have two clusters of Kafka brokers, one of them (with 15 brokers + 3 > Zookeeper servers) became sick (a lot of under-replicated partitions, > throwing a lot of NotEnoughReplicasExceptions). I logged in some of the > brokers that other couldn't connect to, and I found out that they were all > running their Kafka process, but they were not listening to the default TCP > port (9092) as expected: > root@dc3-kafka-02:/home/kafka/kafka_2.11-0.10.1.1# ps aux | grep kafka > root 14055 21.6 33.6 23001236 5513176 ? Sl Mar23 1866:20 > /usr/lib/jvm/java-8-oracle/bin/java -Xms2G -Xmx6G -server -XX:+UseG1GC > -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 > -XX:+DisableExplicitGC -Djava.awt.headless=true > -Xloggc:/home/kafka/kafka_2.11-0.10.1.1/bin/../logs/kafkaServer-gc.log > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps > -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.port=17264 > -Dkafka.logs.dir=/home/kafka/kafka_2.11-0.10.1.1/bin/../logs > -Dlog4j.configuration=file:/home/kafka/kafka_2.11-0.10.1.1/bin/../config/log4j.properties > -cp > :/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/aopalliance-repackaged-2.4.0-b34.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/argparse4j-0.5.0.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/connect-api-0.10.1.1.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/connect-file-0.10.1.1.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/connect-json-0.10.1.1.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/connect-runtime-0.10.1.1.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/guava-18.0.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/hk2-api-2.4.0-b34.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/hk2-locator-2.4.0-b34.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/hk2-utils-2.4.0-b34.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jackson-annotations-2.6.0.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jackson-core-2.6.3.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jackson-databind-2.6.3.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jackson-jaxrs-base-2.6.3.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jackson-jaxrs-json-provider-2.6.3.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jackson-module-jaxb-annotations-2.6.3.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/javassist-3.18.2-GA.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/javax.annotation-api-1.2.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/javax.inject-1.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/javax.inject-2.4.0-b34.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/javax.servlet-api-3.1.0.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/javax.ws.rs-api-2.0.1.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jersey-client-2.22.2.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jersey-common-2.22.2.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jersey-container-servlet-2.22.2.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jersey-container-servlet-core-2.22.2.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jersey-guava-2.22.2.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jersey-media-jaxb-2.22.2.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jersey-server-2.22.2.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jetty-continuation-9.2.15.v20160210.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jetty-http-9.2.15.v20160210.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jetty-io-9.2.15.v20160210.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jetty-security-9.2.15.v20160210.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jetty-server-9.2.15.v20160210.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jetty-servlet-9.2.15.v20160210.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jetty-servlets-9.2.15.v20160210.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jetty-util-9.2.15.v20160210.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/jopt-simple-4.9.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/kafka_2.11-0.10.1.1.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/kafka_2.11-0.10.1.1-sources.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/kafka_2.11-0.10.1.1-test-sources.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/kafka-clients-0.10.1.1.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/kafka-log4j-appender-0.10.1.1.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/kafka-streams-0.10.1.1.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/kafka-streams-examples-0.10.1.1.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/kafka-tools-0.10.1.1.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/log4j-1.2.17.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/lz4-1.3.0.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/metrics-core-2.2.0.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/osgi-resource-locator-1.0.1.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/raven-7.8.1.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/raven-log4j-7.8.1.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/reflections-0.9.10.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/rocksdbjni-4.9.0.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/scala-library-2.11.8.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/scala-parser-combinators_2.11-1.0.4.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/slf4j-api-1.7.21.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/slf4j-log4j12-1.7.21.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/snappy-java-1.1.2.6.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/validation-api-1.1.0.Final.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/zkclient-0.9.jar:/home/kafka/kafka_2.11-0.10.1.1/bin/../libs/zookeeper-3.4.8.jar > kafka.Kafka /home/kafka/kafka_2.11-0.10.1.1/config/server.properties > root 28615 0.0 0.0 14180 1024 pts/0 S+ 13:35 0:00 grep > --color=auto kafka > root@dc3-kafka-02:/home/kafka/kafka_2.11-0.10.1.1# netstat -tulpn | grep 9092 > ...returns empty > If I restart Kafka in these brokers, they start listening to 9092 again. > Update, I found this in the logs, (I restarted the broker, it started > listening to 9092, then it stopped): > [2017-03-29 15:11:38,181] INFO Awaiting socket connections on xxx:9092. > (kafka.network.Acceptor) > [2017-03-29 15:11:38,195] INFO [Socket Server on Broker 15], Started 1 > acceptor threads (kafka.network.SocketServer) > [2017-03-29 15:15:15,254] INFO [Socket Server on Broker 15], Shutting down > (kafka.network.SocketServer) > [2017-03-29 15:15:15,357] INFO [Socket Server on Broker 15], Shutdown > completed (kafka.network.SocketServer) > And there are these FATAL errors too: > [2017-03-29 15:13:30,114] FATAL [ReplicaFetcherThread-0-7], Exiting because > log truncation is not allowed for partition __consumer_offsets-27, Current > leader 7's latest offset 0 is less than replica 15's latest offset 1734972 > (kafka.server.ReplicaFetcherThread) > [2017-03-29 15:13:30,114] FATAL [ReplicaFetcherThread-0-7], Exiting because > log truncation is not allowed for partition __consumer_offsets-27, Current > leader 7's latest offset 0 is less than replica 15's latest offset 1734972 > (kafka.server.ReplicaFetcherThread) -- This message was sent by Atlassian JIRA (v6.3.15#6346)