Thanks Andrew, 
I'm not seeing the event queue exception but,  I'm running my cluster on a set 
of virtual machines which share the same physical hardware (I know, exactly 
what I'm not supposed to do) and I'm getting some slow fsync zookeeper warnings 
in my logs. I imagine that my broker writes are slow as well. It could be that 
I'm getting write contention because of the shared metal/disk. 

I know kafka and zookeeper are susceptible to latency spikes...

Karl

On Apr 23, 2013, at 7:16 PM, Andrew Neilson <arsneil...@gmail.com>
 wrote:

> Hey Karl, I have a very similar setup (3 kafka 0.7.2 brokers, 3 ZK 3.4.3
> nodes) that I'm running right now and am getting the same error on the
> producers. Haven't resolved it yet:
> 
> ERROR ProducerSendThread--1585663279
> kafka.producer.async.ProducerSendThread - Error in handling batch of 200
> events
> java.io.IOException: Connection reset by peer
> ...
> 
> For me these errors appear to be coupled with other errors like this:
> 
> kafka.producer.async.AsyncProducer - Event queue is full of unsent
> messages, could not send event:
> 
> As I understand it, this happens when you are producing faster than the
> brokers can persist the messages. It's possible these are two different
> issues...
> 
> Anyway I've been doing a lot of work on this this afternoon so I may have
> more information later. Someone else probably knows more though.
> 
> 
> 
> On Tue, Apr 23, 2013 at 4:57 PM, Karl Kirch <kki...@wdtinc.com> wrote:
> 
>> Hmmm… that didn't seem to help.
>> Anyone else see this sort of errors?
>> 
>> Karl
>> 
>> 
>> On Apr 23, 2013, at 5:58 PM, Karl Kirch <kki...@wdtinc.com>
>> wrote:
>> 
>>> I'm going to try bumping up the "numRetries" key in my producer config.
>>> Is this a good option in this case?
>>> I am using the zookeeper connect option so I'm aware that I may get
>> stuck retrying to a failed node, but if it's just a temporary network
>> glitch I'll at least get a bit more of a chance to recover.
>>> 
>>> Thanks,
>>> Karl
>>> 
>>> On Apr 23, 2013, at 5:35 PM, Karl Kirch <kki...@wdtinc.com>
>>> wrote:
>>> 
>>>> I occasionally am getting some batch send errors from the stock async
>> producer. This is on a cluster of 3 kafka (0.7.2) and 3 zookeeper nodes.
>>>> Is there anyway to check what happens when those batch errors occur?
>>>> Or bump up the retry count? (looks like it only did a single retry).
>>>> 
>>>> I need the speed of the async producer, but it needs to be reliable (I
>> see a handful of these a day but in a weather alerting system it only takes
>> missing one let alone 25 or 100/1000).
>>>> 
>>>> Here's a stack trace of one of the errors that I'm seeing.
>>>> 
>>>> 22:23:39.405 [ProducerSendThread-1824508747] WARN
>> k.p.a.DefaultEventHandler - Error sending messages, 0 attempts remaining
>>>> java.io.IOException: Connection reset by peer
>>>>     at sun.nio.ch.FileDispatcher.writev0(Native Method) ~[na:1.6.0_24]
>>>>     at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:33)
>> ~[na:1.6.0_24]
>>>>     at sun.nio.ch.IOUtil.write(IOUtil.java:125) ~[na:1.6.0_24]
>>>>     at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:367)
>> ~[na:1.6.0_24]
>>>>     at java.nio.channels.SocketChannel.write(SocketChannel.java:360)
>> ~[na:1.6.0_24]
>>>>     at
>> kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:49)
>> ~[apns-consumer-1.0.jar:na]
>>>>     at kafka.network.Send$class.writeCompletely(Transmission.scala:73)
>> ~[apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:25)
>> ~[apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:95)
>> ~[apns-consumer-1.0.jar:na]
>>>>     at kafka.producer.SyncProducer.send(SyncProducer.scala:94)
>> ~[apns-consumer-1.0.jar:na]
>>>>     at kafka.producer.SyncProducer.multiSend(SyncProducer.scala:135)
>> ~[apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.async.DefaultEventHandler.send(DefaultEventHandler.scala:58)
>> [apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:44)
>> [apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:116)
>> [apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:95)
>> [apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:71)
>> [apns-consumer-1.0.jar:na]
>>>>     at scala.collection.immutable.Stream.foreach(Stream.scala:260)
>> [apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:70)
>> [apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:41)
>> [apns-consumer-1.0.jar:na]
>>>> 22:23:39.406 [ProducerSendThread-1824508747] ERROR
>> k.p.a.ProducerSendThread - Error in handling batch of 27 events
>>>> java.io.IOException: Connection reset by peer
>>>>     at sun.nio.ch.FileDispatcher.writev0(Native Method) ~[na:1.6.0_24]
>>>>     at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:33)
>> ~[na:1.6.0_24]
>>>>     at sun.nio.ch.IOUtil.write(IOUtil.java:125) ~[na:1.6.0_24]
>>>>     at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:367)
>> ~[na:1.6.0_24]
>>>>     at java.nio.channels.SocketChannel.write(SocketChannel.java:360)
>> ~[na:1.6.0_24]
>>>>     at
>> kafka.network.BoundedByteBufferSend.writeTo(BoundedByteBufferSend.scala:49)
>> ~[apns-consumer-1.0.jar:na]
>>>>     at kafka.network.Send$class.writeCompletely(Transmission.scala:73)
>> ~[apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.network.BoundedByteBufferSend.writeCompletely(BoundedByteBufferSend.scala:25)
>> ~[apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:95)
>> ~[apns-consumer-1.0.jar:na]
>>>>     at kafka.producer.SyncProducer.send(SyncProducer.scala:94)
>> ~[apns-consumer-1.0.jar:na]
>>>>     at kafka.producer.SyncProducer.multiSend(SyncProducer.scala:135)
>> ~[apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.async.DefaultEventHandler.send(DefaultEventHandler.scala:58)
>> ~[apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:44)
>> ~[apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:116)
>> [apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:95)
>> [apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:71)
>> [apns-consumer-1.0.jar:na]
>>>>     at scala.collection.immutable.Stream.foreach(Stream.scala:260)
>> [apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:70)
>> [apns-consumer-1.0.jar:na]
>>>>     at
>> kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:41)
>> [apns-consumer-1.0.jar:na]
>>>> 
>>>> 
>>> 
>> 
>> 

Reply via email to