HI Jun,

I will file a Jira Bug for this and I will attach YourKit Profile Snapshot
and screen short.  Do you want me take Thread dump each time second ?
Because the threads are blocked on SYNC code block like you mentioned and
the YourKit Profile snapshot will contain Thread dump.

Thanks,

Bhavesh

On Tue, Oct 14, 2014 at 4:06 PM, Jun Rao <jun...@gmail.com> wrote:

> Bhavesh,
>
> It seems that all those threads are blocked on waiting for the lock on the
> dq for that partition. There got to be another thread holding the dq lock
> at that point. Could you create a jira and attach the full thread dump
> there? Also, could you attach the yourkit result that shows the breakdown
> of the time?
>
> Thanks,
>
> Jun
>
> On Tue, Oct 14, 2014 at 10:41 AM, Bhavesh Mistry <
> mistry.p.bhav...@gmail.com
> > wrote:
>
> > HI Jay,
> >
> > Yes, it is reproducible quite easily.  The problem is synchronized in
> > RecordAccumulator.  You can easy produce it.  I have attached the Java
> code
> > in my original email.  Due to Application threads enqueue message into
> > single partition is causing thrad contention and application thread may
> be
> > blocked on this for more than a 2 minutes as shown in original email.
>  Let
> > me know if you need more information.
> >
> > Last Commit I tested with:
> >
> > commit 68b9f7716df1d994a9d43bec6bc42c90e66f1e99
> > Author: Anton Karamanov <atara...@gmail.com>
> > Date:   Tue Oct 7 18:22:31 2014 -0700
> >
> >     kafka-1644; Inherit FetchResponse from RequestOrResponse; patched by
> > Anton Karamanov; reviewed by Jun Rao
> >
> > Thanks,
> >
> > Bhavesh
> >
> > On Tue, Oct 14, 2014 at 10:16 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >
> > > Hey Bhavesh,
> > >
> > > This sounds like a problem. Just to confirm this is after the fix for
> > > KAFKA-1673?
> > >
> > > https://issues.apache.org/jira/browse/KAFKA-1673
> > >
> > > It sounds like you have a reproducible test case?
> > >
> > > -Jay
> > >
> > >
> > > On Mon, Oct 13, 2014 at 10:54 AM, Bhavesh Mistry <
> > > mistry.p.bhav...@gmail.com
> > > > wrote:
> > >
> > > > Hi Kafka Dev Team,
> > > >
> > > > When I run the test to send message to single partition for 3 minutes
> > or
> > > > so on, I have encounter deadlock (please see the screen attached) and
> > > > thread contention from YourKit profiling.
> > > >
> > > > Use Case:
> > > >
> > > > 1)  Aggregating messages into same partition for metric counting.
> > > > 2)  Replicate Old Producer behavior for sticking to partition for 3
> > > > minutes.
> > > >
> > > >
> > > > Here is output:
> > > >
> > > > Frozen threads found (potential deadlock)
> > > >
> > > > It seems that the following threads have not changed their stack for
> > more
> > > > than 10 seconds.
> > > > These threads are possibly (but not necessarily!) in a deadlock or
> > hung.
> > > >
> > > > pool-1-thread-128 <--- Frozen for at least 2m
> > > >
> > >
> >
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
> > > > byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> > > > org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
> > > > Callback) KafkaProducer.java:237
> > > > org.kafka.test.TestNetworkDownProducer$MyProducer.run()
> > > > TestNetworkDownProducer.java:84
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
> > > > ThreadPoolExecutor.java:1145
> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run()
> > > > ThreadPoolExecutor.java:615
> > > > java.lang.Thread.run() Thread.java:744
> > > >
> > > >
> > > >
> > > > pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> > > >
> > >
> >
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
> > > > byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> > > > org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
> > > > Callback) KafkaProducer.java:237
> > > > org.kafka.test.TestNetworkDownProducer$MyProducer.run()
> > > > TestNetworkDownProducer.java:84
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
> > > > ThreadPoolExecutor.java:1145
> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run()
> > > > ThreadPoolExecutor.java:615
> > > > java.lang.Thread.run() Thread.java:744
> > > >
> > > >
> > > >
> > > > pool-1-thread-55 <--- Frozen for at least 2m
> > > >
> > >
> >
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
> > > > byte[], byte[], CompressionType, Callback) RecordAccumulator.java:139
> > > > org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
> > > > Callback) KafkaProducer.java:237
> > > > org.kafka.test.TestNetworkDownProducer$MyProducer.run()
> > > > TestNetworkDownProducer.java:84
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
> > > > ThreadPoolExecutor.java:1145
> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run()
> > > > ThreadPoolExecutor.java:615
> > > > java.lang.Thread.run() Thread.java:744
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Reply via email to