radai rosenblatt created KAFKA-6345: ---------------------------------------
Summary: NetworkClient.inFlightRequestCount() is not thread safe, causing ConcurrentModificationExceptions when sensors are read Key: KAFKA-6345 URL: https://issues.apache.org/jira/browse/KAFKA-6345 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 1.0.0 Reporter: radai rosenblatt example stack trace (code is ~0.10.2.*) {code} java.util.ConcurrentModificationException: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) at java.util.HashMap$ValueIterator.next(HashMap.java:1458) at org.apache.kafka.clients.InFlightRequests.inFlightRequestCount(InFlightRequests.java:109) at org.apache.kafka.clients.NetworkClient.inFlightRequestCount(NetworkClient.java:382) at org.apache.kafka.clients.producer.internals.Sender$SenderMetrics$1.measure(Sender.java:480) at org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:61) at org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:52) at org.apache.kafka.common.metrics.JmxReporter$KafkaMbean.getAttribute(JmxReporter.java:183) at org.apache.kafka.common.metrics.JmxReporter$KafkaMbean.getAttributes(JmxReporter.java:193) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttributes(DefaultMBeanServerInterceptor.java:709) at com.sun.jmx.mbeanserver.JmxMBeanServer.getAttributes(JmxMBeanServer.java:705) {code} looking at latest trunk, the code is still vulnerable: # NetworkClient.inFlightRequestCount() eventually iterates over InFlightRequests.requests.values(), which is backed by a (non-thread-safe) HashMap # this will be called from the "requests-in-flight" sensor's measure() method (Sender.java line ~765 in SenderMetrics ctr), which would be driven by some thread reading JMX values # HashMap in question would also be updated by some client io thread calling NetworkClient.doSend() - which calls into InFlightRequests.add()) i guess the only upside is that this exception will always happen on the thread reading the JMX values and never on the actual client io thread ... -- This message was sent by Atlassian JIRA (v6.4.14#64029)