[ 
https://issues.apache.org/jira/browse/KAFKA-4557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839608#comment-15839608
 ] 

Rajini Sivaram commented on KAFKA-4557:
---------------------------------------

[~ijuma] Yes, if a message is sent to the same partition from a callback when 
an earlier send fails due to expiry, then we are iterating over the deque for 
expiry while holding the deque lock, but the callback adds to the deque from 
the same thread (hence holds the lock). So the iterator can throw 
{{ConcurrentModificationException}}. I couldn't find any other scenarios where 
this exception could occur since the deque is correctly synchronized 
everywhere. 

> ConcurrentModificationException in KafkaProducer event loop
> -----------------------------------------------------------
>
>                 Key: KAFKA-4557
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4557
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 0.10.1.0
>            Reporter: Sergey Alaev
>            Assignee: Rajini Sivaram
>            Priority: Critical
>              Labels: reliability
>             Fix For: 0.10.2.0
>
>
> Under heavy load, Kafka producer can stop publishing events. Logs below.
> [2016-12-19T15:01:28.779Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [NetworkClient] [] [<none>] [] [DEBUG]: Disconnecting from node 2 due to 
> request timeout.
> [2016-12-19T15:01:28.793Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [KafkaProducerClient] [] [<none>] [1B2M2Y8Asg] [WARN]: Error sending message 
> to Kafka
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> [2016-12-19T15:01:28.838Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [KafkaProducerClient] [] [<none>] [1B2M2Y8Asg] [WARN]: Error sending message 
> to Kafka
>   org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received. (#2 from 2016-12-19T15:01:28.793Z)
> --------------------------------
> [2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [KafkaProducerClient] [] [<none>] [1B2M2Y8Asg] [WARN]: Error sending message 
> to Kafka
>   org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
> events-deadletter-0 due to 30032 ms has passed since batch creation plus 
> linger time (#285 from 2016-12-19
> T15:01:28.793Z)
> [2016-12-19T15:01:28.956Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [SgsService] [] [<none>] [1B2M2Y8Asg] [WARN]: Error writing signal to Kafka 
> deadletter queue
>   org.apache.kafka.common.errors.TimeoutException: Expiring 46 record(s) for 
> events-deadletter-0 due to 30032 ms has passed since batch creation plus 
> linger time (#286 from 2016-12-19
> T15:01:28.793Z)
> [2016-12-19T15:01:28.960Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [Sender] [] [<none>] [1B2M2Y8Asg] [ERROR]: Uncaught error in kafka producer 
> I/O thread:
> java.util.ConcurrentModificationException: null
>         at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:643) 
> ~[na:1.8.0_45]
>         at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:242)
>  ~[kafka-clients-0.10.1.0.jar:na]
>         at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:212) 
> ~[kafka-clients-0.10.1.0.jar:na]
>         at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) 
> ~[kafka-clients-0.10.1.0.jar:na]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> [2016-12-19T15:01:28.981Z] [sgs] [kafka-producer-network-thread | producer-3] 
> [NetworkClient] [] [<none>] [1B2M2Y8Asg] [WARN]: Error while fetching 
> metadata with correlation id 28711 : {events-deadletter=LEADER_NOT_AVAILABLE}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to