> I checked the information in Zookeeper and found out that 2 of the brokers > are missing. The VMs with these brokers are not quite ... healthy (I cannot > find another definition for this situation). I checked the information about > replicas distribution and there are 3 replicas for each partition, so that > part is ok. Started tests again and get some of producers stuck again. May be > there is something wrong with my cluster of VMs.
Can you clarify what you mean by not healthy? Also, when you say three replicas - are those replicas in the ISR? You can either use list-topics (./bin/kafka-topics.sh --describe) or read the topic/partition/state path from zookeeper directly. > > To reproduce the situation with original performance test tools: > - start a Zookeeper node on 1 VM > - start 2 Kafka brokers on 2 VMs > - create a topic with multiple partitions and replication factor 2 > - run producer performance script on 4 VMs in a sync mode with 2 acks to send > 1M messages > > > -----Original Message----- > From: Joel Koshy [mailto:jjkosh...@gmail.com] > Sent: Wednesday, February 12, 2014 3:40 PM > To: users@kafka.apache.org > Subject: Re: Sync producers stuck waiting for 2 acks > > The request time out config is request.timeout.ms - defaults to 10 seconds - > so the request should expire by then and return a response. > Can you run the list-topics command on the topics you are sending to and make > sure there are at least two replicas in ISR while you are running your > producer test? > > You mentioned you were able to reproduce this with the original performance > tools - can you provide exact steps to reproduce if the above information > does not help resolve this? > > Joel > > On Wed, Feb 12, 2014 at 11:22:39PM +0000, Michael Popov wrote: > > I am running a test deployment of Kafka 0.8. When I configure sync > > producers to expect 2 acks for each "write" request, some of the producers > > get stuck. It looks like broker's response is not delivered back. > > This happened with original Kafka performance tools and with a test tool > > built using a custom C# client library. So I assume the issue is not on the > > client side. > > I checked the sources. Even if a replica broker does not catch up with a > > leader, a producer request should expire on time out. I don't see > > configuration parameter to set this timeout. The closest configuration > > setting I can see is "producer.purgatory.purge.interval.requests" but it is > > in the number of requests, not time units. > > > > I would appreciate any advice where to look for the problem and how to > > solve it. > > > > Thank you, > > Michael Popov >