[ 
https://issues.apache.org/jira/browse/KAFKA-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

radai rosenblatt updated KAFKA-4228:
------------------------------------
    Description: 
a KafkaProducer's Sender thread may die:
{noformat}
2016/09/28 00:28:01.065 ERROR [KafkaThread] [kafka-producer-network-thread | 
mm_ei-lca1_uniform] [kafka-mirror-maker] [] Uncaught exception in 
kafka-producer-network-thread | mm_ei-lca1_uni
java.lang.OutOfMemoryError: Java heap space
       at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[?:1.8.0_40]
       at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[?:1.8.0_40]
       at 
org.apache.kafka.common.requests.RequestSend.serialize(RequestSend.java:35) 
~[kafka-clients-0.9.0.666.jar:?]
       at 
org.apache.kafka.common.requests.RequestSend.<init>(RequestSend.java:29) 
~[kafka-clients-0.9.0.666.jar:?]
       at 
org.apache.kafka.clients.producer.internals.Sender.produceRequest(Sender.java:355)
 ~[kafka-clients-0.9.0.666.jar:?]
       at 
org.apache.kafka.clients.producer.internals.Sender.createProduceRequests(Sender.java:337)
 ~[kafka-clients-0.9.0.666.jar:?]
       at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211) 
~[kafka-clients-0.9.0.666.jar:?]
       at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134) 
~[kafka-clients-0.9.0.666.jar:?]
       at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
{noformat}

which leaves the producer in a bad state. in this state, a call to flush(), for 
example, will hang indefinitely as the sender thread is not around to flush 
batches but theyve not been aborted.

even worse, this can happen in MirrorMaker just before a rebalance, at which 
point MM will just block indefinitely during a rebalance (in 
beforeReleasingPartitions()).

a rebalance participant hung in such a way will cause rebalance to fail for the 
rest of the participants, at which point 
ZKRebalancerListener.watcherExecutorThread() dies to an exception (cannot 
rebalance after X attempts) but the consumer that ran the thread will remain 
live. the end result is a bunch of zombie mirror makers and orphan topic 
partitions.

a dead sender thread should result in closing the producer.
a consumer failing to rebalance should shut down.
any issue with the producer or consumer should cause mirror-maker death.


  was:
a KafkaProducer's Sender thread may die:
{noformat}
2016/09/28 00:28:01.065 ERROR [KafkaThread] [kafka-producer-network-thread | 
mm_ei-lca1_uniform] [kafka-mirror-maker] [] Uncaught exception in 
kafka-producer-network-thread | mm_ei-lca1_uni
java.lang.OutOfMemoryError: Java heap space
       at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[?:1.8.0_40]
       at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[?:1.8.0_40]
       at 
org.apache.kafka.common.requests.RequestSend.serialize(RequestSend.java:35) 
~[kafka-clients-0.9.0.666.jar:?]
       at 
org.apache.kafka.common.requests.RequestSend.<init>(RequestSend.java:29) 
~[kafka-clients-0.9.0.666.jar:?]
       at 
org.apache.kafka.clients.producer.internals.Sender.produceRequest(Sender.java:355)
 ~[kafka-clients-0.9.0.666.jar:?]
       at 
org.apache.kafka.clients.producer.internals.Sender.createProduceRequests(Sender.java:337)
 ~[kafka-clients-0.9.0.666.jar:?]
       at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211) 
~[kafka-clients-0.9.0.666.jar:?]
       at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134) 
~[kafka-clients-0.9.0.666.jar:?]
       at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
{noformat}

which leaves the producer in a bad state. in this state, a call to flush(), for 
example, will hang indefinitely as the sender thread is not around to flush 
batches but theyve not been aborted.

even worse, this can happen in MirrorMaker just before a rebalance, at which 
point MM will just block indefinitely during a rebalance and the end result is 
unowned topic partitions.

a dead sender thread should result in closing the producer, and a closed 
producer should result in MirrorMaker death.



> Sender thread death leaves KafkaProducer in a bad state
> -------------------------------------------------------
>
>                 Key: KAFKA-4228
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4228
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 0.10.0.1
>            Reporter: radai rosenblatt
>
> a KafkaProducer's Sender thread may die:
> {noformat}
> 2016/09/28 00:28:01.065 ERROR [KafkaThread] [kafka-producer-network-thread | 
> mm_ei-lca1_uniform] [kafka-mirror-maker] [] Uncaught exception in 
> kafka-producer-network-thread | mm_ei-lca1_uni
> java.lang.OutOfMemoryError: Java heap space
>        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) ~[?:1.8.0_40]
>        at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[?:1.8.0_40]
>        at 
> org.apache.kafka.common.requests.RequestSend.serialize(RequestSend.java:35) 
> ~[kafka-clients-0.9.0.666.jar:?]
>        at 
> org.apache.kafka.common.requests.RequestSend.<init>(RequestSend.java:29) 
> ~[kafka-clients-0.9.0.666.jar:?]
>        at 
> org.apache.kafka.clients.producer.internals.Sender.produceRequest(Sender.java:355)
>  ~[kafka-clients-0.9.0.666.jar:?]
>        at 
> org.apache.kafka.clients.producer.internals.Sender.createProduceRequests(Sender.java:337)
>  ~[kafka-clients-0.9.0.666.jar:?]
>        at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211) 
> ~[kafka-clients-0.9.0.666.jar:?]
>        at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134) 
> ~[kafka-clients-0.9.0.666.jar:?]
>        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
> {noformat}
> which leaves the producer in a bad state. in this state, a call to flush(), 
> for example, will hang indefinitely as the sender thread is not around to 
> flush batches but theyve not been aborted.
> even worse, this can happen in MirrorMaker just before a rebalance, at which 
> point MM will just block indefinitely during a rebalance (in 
> beforeReleasingPartitions()).
> a rebalance participant hung in such a way will cause rebalance to fail for 
> the rest of the participants, at which point 
> ZKRebalancerListener.watcherExecutorThread() dies to an exception (cannot 
> rebalance after X attempts) but the consumer that ran the thread will remain 
> live. the end result is a bunch of zombie mirror makers and orphan topic 
> partitions.
> a dead sender thread should result in closing the producer.
> a consumer failing to rebalance should shut down.
> any issue with the producer or consumer should cause mirror-maker death.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to