Sorry, if this sounds lame, but can you ping or telnet?

On Wed, Apr 13, 2016 at 9:55 AM, Chris Neal <cwn...@gmail.com> wrote:

> Hi all.
>
> I'm running a two node cluster that has been rock solid for almost a year
> and a half.  We experienced an outage of one of the two brokers this
> morning, and from the logs, I'm not sure what happened, and how to prevent
> it.
>
> The Kafka version is 0.8.1.1 with Scala 2.10.  Java version is Open JDK
> version 1.8.0_65
>
> Everything running fine, then:
>
> [2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
> [2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
> [2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
> [2016-04-13 11:01:28,306] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
> [2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
> [2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
> [2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
> [2016-04-13 11:01:28,334] WARN Reconnect due to socket error: Received -1
> when reading from channel, socket has likely been closed.
> (kafka.consumer.SimpleConsumer)
>
> [2016-04-13 11:01:28,352] ERROR [ReplicaFetcherThread-1-0], Error in fetch
> Name: FetchRequest; Version: 0; CorrelationId: 9644043; ClientId:
> ReplicaFetcherThread-1-0; ReplicaId: 1; MaxWait: 500 ms; MinBytes: 1 bytes;
> RequestInfo:* [snip of every topic and partition on the broker listed
> here]*
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.Net.connect0(Native Method)
>         at sun.nio.ch.Net.connect(Net.java:454)
>         at sun.nio.ch.Net.connect(Net.java:446)
>         at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
>         at kafka.network.BlockingChannel.connect(BlockingChannel.scala:57)
>         at kafka.consumer.SimpleConsumer.connect(SimpleConsumer.scala:44)
>         at kafka.consumer.SimpleConsumer.reconnect(SimpleConsumer.scala:57)
>         at
> kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:79)
>         at
>
> kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:71)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:109)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$apply$mcV$sp$1.apply(SimpleConsumer.scala:109)
>         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp(SimpleConsumer.scala:108)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
>         at
>
> kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply(SimpleConsumer.scala:108)
>         at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>         at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:107)
>         at
>
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:96)
>         at
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:88)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
>
> The logs then spam that ERROR and Exception 5406 times between:
> 2016-04-13 11:01:28,352 and 2016-04-13 11:01:31,994
>
> Then I get this message twice:
> [2016-04-13 11:01:31,997] INFO [ReplicaFetcherManager on broker 1] Removed
> fetcher for partitions [snip list of all my topics and partitions listed]
>
> Then this:
> [2016-04-13 11:01:32,061] INFO [ReplicaFetcherThread-1-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,061] INFO [ReplicaFetcherThread-1-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,113] INFO New leader is 1
> (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
> [2016-04-13 11:01:32,113] INFO New leader is 1
> (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
> [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-1-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-0-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,192] INFO [ReplicaFetcherThread-0-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-0-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-3-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,194] INFO [ReplicaFetcherThread-3-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-3-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-2-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,392] INFO [ReplicaFetcherThread-2-0], Shutting down
> (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Stopped
>  (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
> [2016-04-13 11:01:32,395] INFO [ReplicaFetcherThread-2-0], Shutdown
> completed (kafka.server.ReplicaFetcherThread)
>
>
> At this point, there are no more errors to the log file, but all the
> consumers are still trying to consume from this broker, and are getting
> Connection Refused exceptions.  It isn't until I cycled the broker that
> things got back to normal.
>
> Can anyone tell me what happened?  Or why consumers didn't recognize that
> there was a problem with this broker and start consuming from the other
> one?
>
> Can I provide any more details? :)
>
> Thank you so much for your time!
>



-- 
Radha Krishna, Proddaturi
253-234-5657

Reply via email to