Hi Jun,

I have entered all details into
https://issues.apache.org/jira/browse/KAFKA-1710.

Let me know if you need more details.

Thanks,

Bhavesh

On Wed, Oct 15, 2014 at 8:35 PM, Jun Rao <jun...@gmail.com> wrote:

> Probably 3 thread dumps will be enough.
>
> Thanks,
>
> Jun
>
> On Wed, Oct 15, 2014 at 11:26 AM, Bhavesh Mistry <
> mistry.p.bhav...@gmail.com
> > wrote:
>
> > HI Jun,
> >
> > I will file a Jira Bug for this and I will attach YourKit Profile
> Snapshot
> > and screen short.  Do you want me take Thread dump each time second ?
> > Because the threads are blocked on SYNC code block like you mentioned and
> > the YourKit Profile snapshot will contain Thread dump.
> >
> > Thanks,
> >
> > Bhavesh
> >
> > On Tue, Oct 14, 2014 at 4:06 PM, Jun Rao <jun...@gmail.com> wrote:
> >
> > > Bhavesh,
> > >
> > > It seems that all those threads are blocked on waiting for the lock on
> > the
> > > dq for that partition. There got to be another thread holding the dq
> lock
> > > at that point. Could you create a jira and attach the full thread dump
> > > there? Also, could you attach the yourkit result that shows the
> breakdown
> > > of the time?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Oct 14, 2014 at 10:41 AM, Bhavesh Mistry <
> > > mistry.p.bhav...@gmail.com
> > > > wrote:
> > >
> > > > HI Jay,
> > > >
> > > > Yes, it is reproducible quite easily.  The problem is synchronized in
> > > > RecordAccumulator.  You can easy produce it.  I have attached the
> Java
> > > code
> > > > in my original email.  Due to Application threads enqueue message
> into
> > > > single partition is causing thrad contention and application thread
> may
> > > be
> > > > blocked on this for more than a 2 minutes as shown in original email.
> > >  Let
> > > > me know if you need more information.
> > > >
> > > > Last Commit I tested with:
> > > >
> > > > commit 68b9f7716df1d994a9d43bec6bc42c90e66f1e99
> > > > Author: Anton Karamanov <atara...@gmail.com>
> > > > Date:   Tue Oct 7 18:22:31 2014 -0700
> > > >
> > > >     kafka-1644; Inherit FetchResponse from RequestOrResponse; patched
> > by
> > > > Anton Karamanov; reviewed by Jun Rao
> > > >
> > > > Thanks,
> > > >
> > > > Bhavesh
> > > >
> > > > On Tue, Oct 14, 2014 at 10:16 AM, Jay Kreps <jay.kr...@gmail.com>
> > wrote:
> > > >
> > > > > Hey Bhavesh,
> > > > >
> > > > > This sounds like a problem. Just to confirm this is after the fix
> for
> > > > > KAFKA-1673?
> > > > >
> > > > > https://issues.apache.org/jira/browse/KAFKA-1673
> > > > >
> > > > > It sounds like you have a reproducible test case?
> > > > >
> > > > > -Jay
> > > > >
> > > > >
> > > > > On Mon, Oct 13, 2014 at 10:54 AM, Bhavesh Mistry <
> > > > > mistry.p.bhav...@gmail.com
> > > > > > wrote:
> > > > >
> > > > > > Hi Kafka Dev Team,
> > > > > >
> > > > > > When I run the test to send message to single partition for 3
> > minutes
> > > > or
> > > > > > so on, I have encounter deadlock (please see the screen attached)
> > and
> > > > > > thread contention from YourKit profiling.
> > > > > >
> > > > > > Use Case:
> > > > > >
> > > > > > 1)  Aggregating messages into same partition for metric counting.
> > > > > > 2)  Replicate Old Producer behavior for sticking to partition
> for 3
> > > > > > minutes.
> > > > > >
> > > > > >
> > > > > > Here is output:
> > > > > >
> > > > > > Frozen threads found (potential deadlock)
> > > > > >
> > > > > > It seems that the following threads have not changed their stack
> > for
> > > > more
> > > > > > than 10 seconds.
> > > > > > These threads are possibly (but not necessarily!) in a deadlock
> or
> > > > hung.
> > > > > >
> > > > > > pool-1-thread-128 <--- Frozen for at least 2m
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
> > > > > > byte[], byte[], CompressionType, Callback)
> > RecordAccumulator.java:139
> > > > > >
> > org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
> > > > > > Callback) KafkaProducer.java:237
> > > > > > org.kafka.test.TestNetworkDownProducer$MyProducer.run()
> > > > > > TestNetworkDownProducer.java:84
> > > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
> > > > > > ThreadPoolExecutor.java:1145
> > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run()
> > > > > > ThreadPoolExecutor.java:615
> > > > > > java.lang.Thread.run() Thread.java:744
> > > > > >
> > > > > >
> > > > > >
> > > > > > pool-1-thread-159 <--- Frozen for at least 2m 1 sec
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
> > > > > > byte[], byte[], CompressionType, Callback)
> > RecordAccumulator.java:139
> > > > > >
> > org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
> > > > > > Callback) KafkaProducer.java:237
> > > > > > org.kafka.test.TestNetworkDownProducer$MyProducer.run()
> > > > > > TestNetworkDownProducer.java:84
> > > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
> > > > > > ThreadPoolExecutor.java:1145
> > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run()
> > > > > > ThreadPoolExecutor.java:615
> > > > > > java.lang.Thread.run() Thread.java:744
> > > > > >
> > > > > >
> > > > > >
> > > > > > pool-1-thread-55 <--- Frozen for at least 2m
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(TopicPartition,
> > > > > > byte[], byte[], CompressionType, Callback)
> > RecordAccumulator.java:139
> > > > > >
> > org.apache.kafka.clients.producer.KafkaProducer.send(ProducerRecord,
> > > > > > Callback) KafkaProducer.java:237
> > > > > > org.kafka.test.TestNetworkDownProducer$MyProducer.run()
> > > > > > TestNetworkDownProducer.java:84
> > > > > >
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)
> > > > > > ThreadPoolExecutor.java:1145
> > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run()
> > > > > > ThreadPoolExecutor.java:615
> > > > > > java.lang.Thread.run() Thread.java:744
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to