[ https://issues.apache.org/jira/browse/KAFKA-4901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956565#comment-15956565 ]
ASF GitHub Bot commented on KAFKA-4901: --------------------------------------- GitHub user ijuma opened a pull request: https://github.com/apache/kafka/pull/2810 KAFKA-4901: Make ProduceRequest thread-safe A more conservative version of the change for the 0.10.2 branch. Trunk commit: 1659ca1773596b. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ijuma/kafka kafka-4901-produce-request-thread-safety-0-10-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2810.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2810 ---- commit 91b0aa305e368b84e506d2ab8bc92d1a3a031d7c Author: Ismael Juma <ism...@juma.me.uk> Date: 2017-03-16T16:13:58Z KAFKA-4901: Make ProduceRequest thread-safe A more conservative version of the change for the 0.10.2 branch. ---- > Make ProduceRequest thread-safe > ------------------------------- > > Key: KAFKA-4901 > URL: https://issues.apache.org/jira/browse/KAFKA-4901 > Project: Kafka > Issue Type: Bug > Reporter: Ismael Juma > Assignee: Ismael Juma > Fix For: 0.11.0.0, 0.10.2.1 > > > If request logging is enabled, ProduceRequest can be accessed > and mutated concurrently from a network thread (which calls > toString) and the request handler thread (which calls > clearPartitionRecords()). > That can lead to a ConcurrentModificationException when iterating > the partitionRecords map. > The underlying thread-safety issue has existed since the server > started using the Java implementation of ProduceRequest in 0.10.0. > However, we were incorrectly not clearing the underlying struct until > 0.10.2, so toString itself was thread-safe until that change. In 0.10.2, > toString is no longer thread-safe and we could potentially see a > NullPointerException given the right set of interleavings between > toString and clearPartitionRecords although we haven't seen that > happen yet. > In trunk, we changed the requests to have a toStruct method > instead of creating a struct in the constructor and toString was > no longer printing the contents of the Struct. This accidentally > fixed the race condition, but it meant that request logging was less > useful. > A couple of days ago, AbstractRequest.toString was changed to > print the contents of the request by calling toStruct().toString() > and reintroduced the race condition. The impact is more visible > because we iterate over a HashMap, which proactively > checks for concurrent modification (unlike arrays). > We will need a separate PR for 0.10.2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)