[ https://issues.apache.org/jira/browse/KAFKA-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439247#comment-15439247 ]
jaikiran pai commented on KAFKA-4090: ------------------------------------- Mailing list discussion is here https://www.mail-archive.com/dev@kafka.apache.org/msg55658.html > JVM runs into OOM if (Java) client uses a SSL port without setting the > security protocol > ---------------------------------------------------------------------------------------- > > Key: KAFKA-4090 > URL: https://issues.apache.org/jira/browse/KAFKA-4090 > Project: Kafka > Issue Type: Bug > Components: clients > Affects Versions: 0.9.0.1, 0.10.0.1 > Reporter: jaikiran pai > > Quoting from the mail thread that was sent to Kafka mailing list: > {quote} > We have been using Kafka 0.9.0.1 (server and Java client libraries). So far > we had been using it with plaintext transport but recently have been > considering upgrading to using SSL. It mostly works except that a > mis-configured producer (and even consumer) causes a hard to relate > OutOfMemory exception and thus causing the JVM in which the client is > running, to go into a bad state. We can consistently reproduce that OOM very > easily. We decided to check if this is something that is fixed in 0.10.0.1 so > upgraded one of our test systems to that version (both server and client > libraries) but still see the same issue. Here's how it can be easily > reproduced > 1. Enable SSL listener on the broker via server.properties, as per the Kafka > documentation > {code} > listeners=PLAINTEXT://:9092,SSL://:9093 > ssl.keystore.location=<location-of-keystore> > ssl.keystore.password=pass > ssl.key.password=pass > ssl.truststore.location=<location-of-truststore> > ssl.truststore.password=pass > {code} > 2. Start zookeeper and kafka server > 3. Create a "oom-test" topic (which will be used for these tests): > {code} > kafka-topics.sh --zookeeper localhost:2181 --create --topic oom-test > --partitions 1 --replication-factor 1 > {code} > 4. Create a simple producer which sends a single message to the topic via > Java (new producer) APIs: > {code} > public class OOMTest { > public static void main(final String[] args) throws Exception { > final Properties kafkaProducerConfigs = new Properties(); > // NOTE: Intentionally use a SSL port without specifying > security.protocol as SSL > > kafkaProducerConfigs.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, > "localhost:9093"); > > kafkaProducerConfigs.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getName()); > > kafkaProducerConfigs.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getName()); > try (KafkaProducer<String, String> producer = new > KafkaProducer<>(kafkaProducerConfigs)) { > System.out.println("Created Kafka producer"); > final String topicName = "oom-test"; > final String message = "Hello OOM!"; > // send a message to the topic > final Future<RecordMetadata> recordMetadataFuture = > producer.send(new ProducerRecord<>(topicName, message)); > final RecordMetadata sentRecordMetadata = > recordMetadataFuture.get(); > System.out.println("Sent message '" + message + "' to topic '" + > topicName + "'"); > } > System.out.println("Tests complete"); > } > } > {code} > Notice that the server URL is using a SSL endpoint localhost:9093 but isn't > specifying any of the other necessary SSL configs like security.protocol. > 5. For the sake of easily reproducing this issue run this class with a max > heap size of 256MB (-Xmx256M). Running this code throws up the following > OutOfMemoryError in one of the Sender threads: > {code} > 18:33:25,770 ERROR [KafkaThread] - Uncaught exception in > kafka-producer-network-thread | producer-1: > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) > at > org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) > at org.apache.kafka.common.network.Selector.poll(Selector.java:286) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > at java.lang.Thread.run(Thread.java:745) > {code} > Note that I set it to 256MB as heap size to easily reproduce it but this > isn't specific to that size. We have been able to reproduce it at even 516MB > and higher too. > This even happens with the consumer and in fact can be reproduced out of the > box with the kafka-consumer-group.sh script. All you have to do is run that > tool as follows: > {code} > ./kafka-consumer-groups.sh --list --bootstrap-server localhost:9093 > --new-consumer > {code} > {code} > Error while executing consumer group command Java heap space > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154) > at > org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:323) > at org.apache.kafka.common.network.Selector.poll(Selector.java:283) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > at > kafka.admin.AdminClient.kafka$admin$AdminClient$$send(AdminClient.scala:49) > at > kafka.admin.AdminClient$$anonfun$sendAnyNode$1.apply(AdminClient.scala:61) > at > kafka.admin.AdminClient$$anonfun$sendAnyNode$1.apply(AdminClient.scala:58) > at scala.collection.immutable.List.foreach(List.scala:381) > at kafka.admin.AdminClient.sendAnyNode(AdminClient.scala:58) > at kafka.admin.AdminClient.findAllBrokers(AdminClient.scala:87) > at kafka.admin.AdminClient.listAllGroups(AdminClient.scala:96) > at kafka.admin.AdminClient.listAllGroupsFlattened(AdminClient.scala:117) > at > kafka.admin.AdminClient.listAllConsumerGroupsFlattened(AdminClient.scala:121) > at > kafka.admin.ConsumerGroupCommand$KafkaConsumerGroupService.list(ConsumerGroupCommand.scala:311) > at kafka.admin.ConsumerGroupCommand$.main(ConsumerGroupCommand.scala:63) > at kafka.admin.ConsumerGroupCommand.main(ConsumerGroupCommand.scala) > {code} > Notice that here again I'm using the new consumer and the SSL port without > any additional SSL configs. > Once this OOM occurs, the producer is useless since it's (background) sender > thread is dead. Not just that, since we run these producers/consumers from > within our application, this OOM trips the JVM and our whole JVM goes into an > unstable state. > Debugging shows that the NetworkReceive class in its readFromReadableChannel > method receives a value of something like 352518912 and then goes ahead to > allocate a ByteBuffer of that size. This 352518912 is approximately 300 odd > MB and obviously causes allocation issues. I suspect the value being passed > over the channel is incorrect. > Of course this exception is triggered by a user config error but given that > ends up in a (almost unclear) OOM and causing the JVM to go in a bad state, > is there a way the Kafka Java library can handle this better? > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)