[ 
https://issues.apache.org/jira/browse/KAFKA-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram updated KAFKA-6415:
----------------------------------
    Description: 
When a log entry is appended to a Kafka topic using KafkaLog4jAppender, the 
producer.send operation may block waiting for metadata. This can result in 
deadlocks in a couple of scenarios if a log entry from the producer network 
thread is also at a log level that results in the entry being appended to a 
Kafka topic.
1. Producer's network thread will attempt to send data to a Kafka topic and 
this is unsafe since producer.send may block waiting for metadata, causing a 
deadlock since the thread will not process the metadata request/response.
2. KafkaLog4jAppender#append is invoked while holding the lock of the logger. 
So the thread waiting for metadata in the initial send will be holding the 
logger lock. If the producer network thread has.a log entry that needs to be 
appended, it will attempt to acquire the logger lock and deadlock.

This was probably the case right from the beginning when KafkaLog4jAppender was 
introduced, but did not cause any issues so far since there were only debug log 
entries in that path which were not logged to a Kafka topic by any of the 
tests. A recent info level log entry introduced by the commit 
https://github.com/apache/kafka/commit/a3aea3cf4dbedb293f2d7859e0298bebc8e2185f 
is causing system test failures in log4j_appender_test.py due to the deadlock.

The asynchronous append case can be fixed by moving all send operations to a 
separate thread. But KafkaLog4jAppender also has a syncSend option which blocks 
append while holding the logger lock until the send completes. Not sure how 
this can be fixed if we want to support log appends from the producer network 
thread.

  was:
If a log entry in producer network thread in the metadata update path is 
appended to a Kafka topic using KafkaLog4jAppender, a new send is initiated 
from the network thread which cannot complete since the metadata wait triggered 
by the new send from the network thread waits for metadata from the network 
thread, resulting in a deadlock.

This was probably the case right from the beginning when KafkaLog4jAppender was 
introduced, but did not cause any issues so far since there were only debug log 
entries in that path which were not logged to a Kafka topic by any of the 
tests. A recent info level log entry introduced by the commit 
https://github.com/apache/kafka/commit/a3aea3cf4dbedb293f2d7859e0298bebc8e2185f 
is causing system test failures in log4j_appender_test.py due to the deadlock.


> KafkaLog4jAppender deadlocks when logging from producer network thread
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-6415
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6415
>             Project: Kafka
>          Issue Type: Bug
>          Components: log
>            Reporter: Rajini Sivaram
>            Assignee: Rajini Sivaram
>             Fix For: 1.1.0
>
>
> When a log entry is appended to a Kafka topic using KafkaLog4jAppender, the 
> producer.send operation may block waiting for metadata. This can result in 
> deadlocks in a couple of scenarios if a log entry from the producer network 
> thread is also at a log level that results in the entry being appended to a 
> Kafka topic.
> 1. Producer's network thread will attempt to send data to a Kafka topic and 
> this is unsafe since producer.send may block waiting for metadata, causing a 
> deadlock since the thread will not process the metadata request/response.
> 2. KafkaLog4jAppender#append is invoked while holding the lock of the logger. 
> So the thread waiting for metadata in the initial send will be holding the 
> logger lock. If the producer network thread has.a log entry that needs to be 
> appended, it will attempt to acquire the logger lock and deadlock.
> This was probably the case right from the beginning when KafkaLog4jAppender 
> was introduced, but did not cause any issues so far since there were only 
> debug log entries in that path which were not logged to a Kafka topic by any 
> of the tests. A recent info level log entry introduced by the commit 
> https://github.com/apache/kafka/commit/a3aea3cf4dbedb293f2d7859e0298bebc8e2185f
>  is causing system test failures in log4j_appender_test.py due to the 
> deadlock.
> The asynchronous append case can be fixed by moving all send operations to a 
> separate thread. But KafkaLog4jAppender also has a syncSend option which 
> blocks append while holding the logger lock until the send completes. Not 
> sure how this can be fixed if we want to support log appends from the 
> producer network thread.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to