Yes, this lines up with the behavior I'm seeing. I'll wait for the patch to
be released and then retest. Thanks!

On Tue, Nov 4, 2014 at 2:36 PM, Guozhang Wang <wangg...@gmail.com> wrote:

> Actually I think this issue has just been resolved:
>
> https://issues.apache.org/jira/browse/KAFKA-1733
>
> Guozhang
>
> On Tue, Nov 4, 2014 at 11:22 AM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Hello Solon,
> >
> > request.timeout.ms only controls the produce request timeout value, when
> > the producer's first produce request gets timed out, it will try to
> > re-fresh its metadata by sending metadata request. But when this
> > non-produce request hits the broker whose connectivity has been disabled
> > (i.e. trying to re-connect to that broker), it will not be respecting 1
> sec
> > timeout.
> >
> > I think this is indeed an issue: basically when we gets a request time
> out
> > from the broker, we would avoid trying to re-connect to it refreshing
> > metadata. Could you file a JIRA for this?
> >
> > Guozhang
> >
> >
> > On Tue, Nov 4, 2014 at 10:43 AM, Solon Gordon <so...@knewton.com> wrote:
> >
> >> Hi all,
> >>
> >> I've been investigating how Kafka 0.8.1.1 responds to the scenario where
> >> one broker loses connectivity (due to something like a hardware issue or
> >> network partition.) It looks like the brokers themselves adjust within a
> >> few seconds to reassign leaders and shrink ISRs. However, I see producer
> >> threads block for multiple minutes before timing out, regardless of what
> >> producer settings I use. Why would this be?
> >>
> >> Here is my test procedure:
> >> 1. Start up three brokers.
> >> 2. Create a topic with 3 partitions and replication factor 3.
> >> 3. Start up a producer with producer.type=sync, request.required.acks=1,
> >> request.timeout.ms=1000, message.send.max.retries=0. (With this
> >> configuration I'd expect all requests to complete or error within a
> >> second.)
> >> 4. Make the producer send one message per second.
> >> 5. Disable connectivity for one broker via iptables.
> >>
> >> The result is that I see the producer block for almost two minutes
> before
> >> timing out, way more than the one second timeout I configured. Often I
> see
> >> that the first request to the bad broker times out after a second as
> >> expected, but a subsequent request takes minutes to time out. I've
> >> included
> >> example producer logs below.
> >>
> >> Any idea why this would happen or if there is some config option I'm
> >> missing to prevent it? We would like to be able to recover from this
> >> scenario in seconds, not minutes.
> >>
> >> Thanks,
> >> Solon
> >>
> >>
> >> First request times out after a second:
> >> 17:48:48.602 [Producer timer] DEBUG
> k.producer.async.DefaultEventHandler -
> >> Producer sending messages with correlation id 30 for to
> >> pics [latency-measurer,0] to broker XXX on YYY:9092
> >> 17:48:49.604 [Producer timer] INFO  kafka.producer.SyncProducer -
> >> Disconnecting from YYY:9092
> >> 17:48:49.617 [Producer timer] WARN
> k.producer.async.DefaultEventHandler -
> >> Failed to send producer request with correlation id 30
> >> to broker XXX with data for partitions [latency-measurer,0]
> >> java.net.SocketTimeoutException: null
> >>         at
> >> sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:229)
> >> ~[na:1.7.0_55]
> >>         at
> sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
> >> ~[na:1.7.0_55]
> >>         at
> >>
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
> >> ~[na:1.7.0_55]
> >>         at kafka.utils.Utils$.read(Unknown Source)
> >>         at kafka.network.BoundedByteBufferReceive.readFrom(Unknown
> Source)
> >>         at kafka.network.Receive$class.readCompletely(Unknown Source)
> >>         at kafka.network.BoundedByteBufferReceive.readCompletely(Unknown
> >> Source)
> >>         at kafka.network.BlockingChannel.receive(Unknown Source)
> >>         at kafka.producer.SyncProducer.liftedTree1$1(Unknown Source)
> >>         at
> >> kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(Unknown
> >> Source)
> >>         ...
> >>
> >> The next takes over two minutes:
> >> 17:48:50.602 [Producer timer] DEBUG
> k.producer.async.DefaultEventHandler -
> >> Producer sending messages with correlation id 35 for topics
> >> [latency-measurer,0] to broker XXX on YYY:9092
> >> 17:50:57.793 [Producer timer] ERROR kafka.producer.SyncProducer -
> Producer
> >> connection to YYY:9092 unsuccessful
> >> java.net.ConnectException: Connection timed out
> >>         at sun.nio.ch.Net.connect0(Native Method) ~[na:1.7.0_55]
> >>         at sun.nio.ch.Net.connect(Net.java:465) ~[na:1.7.0_55]
> >>         at sun.nio.ch.Net.connect(Net.java:457) ~[na:1.7.0_55]
> >>         at
> >> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:670)
> >> ~[na:1.7.0_55]
> >>         at kafka.network.BlockingChannel.connect(Unknown Source)
> >>         at kafka.producer.SyncProducer.connect(Unknown Source)
> >>         at kafka.producer.SyncProducer.getOrMakeConnection(Unknown
> Source)
> >>         ...
> >>
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
> -- Guozhang
>

Reply via email to