Thanks!! Does the upgrade help?
On 29 December 2016 at 21:38, Tony Liu <jiangtao....@zuora.com> wrote: > hi, > > you are hitting this issue , > https://issues.apache.org/jira/browse/KAFKA-4477 > > On Wed, Dec 28, 2016 at 3:43 PM, Alessandro De Maria < > alessandro.dema...@gmail.com> wrote: > > > Hello, > > > > I would like to get some help/advise on some issues I am having with my > > kafka cluster. > > > > I am running kafka (kafka_2.11-0.10.1.0) on a 5 broker cluster (ubuntu > > 16.04) > > > > configuration is here: http://pastebin.com/cPch8Kd7 > > > > today one of the 5 brokers (id: 1) appeared to disconnect from the > others: > > > > The log shows this around that time > > [2016-12-28 16:18:30,575] INFO Partition [aki_reload5yl_5,11] on broker > 1: > > Shrinking ISR for partition [aki_reload5yl_5,11] from 2,3,1 to 1 > > (kafka.cluster.Partition) > > [2016-12-28 16:18:30,579] INFO Partition [ale_reload5yl_1,0] on broker 1: > > Shrinking ISR for partition [ale_reload5yl_1,0] from 5,1,2 to 1 > > (kafka.cluster.Partition) > > [2016-12-28 16:18:30,580] INFO Partition [hl7_staging,17] on broker 1: > > Shrinking ISR for partition [hl7_staging,17] from 4,1,5 to 1 > > (kafka.cluster.Partition) > > [2016-12-28 16:18:30,581] INFO Partition [hes_reload_5,37] on broker 1: > > Shrinking ISR for partition [hes_reload_5,37] from 1,2,5 to 1 > > (kafka.cluster.Partition) > > [2016-12-28 16:18:30,582] INFO Partition [aki_live,38] on broker 1: > > Shrinking ISR for partition [aki_live,38] from 5,2,1 to 1 > > (kafka.cluster.Partition) > > [2016-12-28 16:18:30,582] INFO Partition [hl7_live,51] on broker 1: > > Shrinking ISR for partition [hl7_live,51] from 1,3,4 to 1 > > (kafka.cluster.Partition) > > > > (other hosts had) > > java.io.IOException: Connection to 1 was disconnected before the response > > was read > > at > > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$ > > extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115) > > at > > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$ > > extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112) > > at scala.Option.foreach(Option.scala:257) > > at > > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$ > > extension$1.apply(NetworkClientBlockingOps.scala:112) > > at > > kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$ > > extension$1.apply(NetworkClientBlockingOps.scala:108) > > at > > kafka.utils.NetworkClientBlockingOps$.recursivePoll$1( > > NetworkClientBlockingOps.scala:137) > > at > > kafka.utils.NetworkClientBlockingOps$.kafka$utils$ > > NetworkClientBlockingOps$$pollContinuously$extension( > > NetworkClientBlockingOps.scala:143) > > at > > kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension( > > NetworkClientBlockingOps.scala:108) > > at > > kafka.server.ReplicaFetcherThread.sendRequest( > ReplicaFetcherThread.scala: > > 253) > > at > > kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238) > > at > > kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42) > > at > > kafka.server.AbstractFetcherThread.processFetchRequest( > > AbstractFetcherThread.scala:118) > > at > > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala: > 103) > > at kafka.utils.ShutdownableThread.run( > ShutdownableThread.scala:63) > > > > > > while this was happening, the ConsumerOffsetChecker was reporting only > few > > of the 128 partitions configured for some of the topics, and consumers > > started crashing. > > > > I then used KafkaManager to reassign partitions from broker 1 to other > > brokers. > > > > I could then see on the kafka1 log the following errors > > [2016-12-28 17:23:51,816] ERROR [ReplicaFetcherThread-0-4], Error for > > partition [aki_live,86] to broker > > 4:org.apache.kafka.common.errors.UnknownServerException: The server > > experienced an unexpected error when processing the request > > (kafka.server.ReplicaFetcherThread) > > [2016-12-28 17:23:51,817] ERROR [ReplicaFetcherThread-0-4], Error for > > partition [aki_live,21] to broker > > 4:org.apache.kafka.common.errors.UnknownServerException: The server > > experienced an unexpected error when processing the request > > (kafka.server.ReplicaFetcherThread) > > [2016-12-28 17:23:51,817] ERROR [ReplicaFetcherThread-0-4], Error for > > partition [aki_live,126] to broker > > 4:org.apache.kafka.common.errors.UnknownServerException: The server > > experienced an unexpected error when processing the request > > (kafka.server.ReplicaFetcherThread) > > [2016-12-28 17:23:51,818] ERROR [ReplicaFetcherThread-0-4], Error for > > partition [aki_live,6] to broker > > 4:org.apache.kafka.common.errors.UnknownServerException: The server > > experienced an unexpected error when processing the request > > (kafka.server.ReplicaFetcherThread) > > > > > > I thought I would restart broker1, but as soon as I did, most of my topic > > ended up with some empty partitions, and their consumer offsets were > wiped > > out completely. > > > > I understand that because of unclean.leader.election.enable = true an > > unclean leader would be elected, but why were the partition wiped out if > > there were at least 3 replicas for each? > > > > What do you thin caused the disconnection in the first place, and how > can I > > recover from situations like this in the future? > > > > Regards > > Alessandro > > > > > > > > > > > > -- > > Alessandro De Maria > > alessandro.dema...@gmail.com > > > -- Alessandro De Maria alessandro.dema...@gmail.com