[ https://issues.apache.org/jira/browse/KAFKA-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526883#comment-13526883 ]
Joel Koshy commented on KAFKA-664: ---------------------------------- Okay I'm slightly confused. Even on expiration the request is marked as satisfied. So even if it is not removed from the watcher's list during expiration it will be removed on the next call to collectSatisfiedRequests - which in this case will be when the next produce request arrives to that partition. Which means this should only be due to low-volume partitions that are no longer growing. i.e., the replica fetcher would keep issuing fetch requests that keep expiring but never get removed from the list of pending requests in watchersFor(the-low-volume-partition). > Kafka server threads die due to OOME during long running test > ------------------------------------------------------------- > > Key: KAFKA-664 > URL: https://issues.apache.org/jira/browse/KAFKA-664 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8 > Reporter: Neha Narkhede > Assignee: Jay Kreps > Priority: Blocker > Labels: bugs > Fix For: 0.8 > > Attachments: thread-dump.log > > > I set up a Kafka cluster with 5 brokers (JVM memory 512M) and set up a long > running producer process that sends data to 100s of partitions continuously > for ~15 hours. After ~4 hours of operation, few server threads (acceptor and > processor) exited due to OOME - > [2012-12-07 08:24:44,355] ERROR OOME with size 1700161893 > (kafka.network.BoundedByteBufferReceive) > java.lang.OutOfMemoryError: Java heap space > [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread > 'kafka-acceptor': (kafka.utils.Utils$) > java.lang.OutOfMemoryError: Java heap space > [2012-12-07 08:24:44,356] ERROR Uncaught exception in thread > 'kafka-processor-9092-1': (kafka.utils.Utils$) > java.lang.OutOfMemoryError: Java heap space > [2012-12-07 08:24:46,344] INFO Unable to reconnect to ZooKeeper service, > session 0x13afd0753870103 has expired, closing socket connection > (org.apache.zookeeper.ClientCnxn) > [2012-12-07 08:24:46,344] INFO zookeeper state changed (Expired) > (org.I0Itec.zkclient.ZkClient) > [2012-12-07 08:24:46,344] INFO Initiating client connection, > connectString=eat1-app309.corp:12913,eat1-app310.corp:12913,eat1-app311.corp:12913,eat1-app312.corp:12913,eat1-app313.corp:12913 > sessionTimeout=15000 watcher=org.I0Itec.zkclient.ZkClient@19202d69 > (org.apache.zookeeper.ZooKeeper) > [2012-12-07 08:24:55,702] ERROR OOME with size 2001040997 > (kafka.network.BoundedByteBufferReceive) > java.lang.OutOfMemoryError: Java heap space > [2012-12-07 08:25:01,192] ERROR Uncaught exception in thread > 'kafka-request-handler-0': (kafka.utils.Utils$) > java.lang.OutOfMemoryError: Java heap space > [2012-12-07 08:25:08,739] INFO Opening socket connection to server > eat1-app311.corp/172.20.72.75:12913 (org.apache.zookeeper.ClientCnxn) > [2012-12-07 08:25:14,221] INFO Socket connection established to > eat1-app311.corp/172.20.72.75:12913, initiating session > (org.apache.zookeeper.ClientCnxn) > [2012-12-07 08:25:17,943] INFO Client session timed out, have not heard from > server in 3722ms for sessionid 0x0, closing socket connection and attempting > reconnect (org.apache.zookeeper.ClientCnxn) > [2012-12-07 08:25:19,805] ERROR error in loggedRunnable (kafka.utils.Utils$) > java.lang.OutOfMemoryError: Java heap space > [2012-12-07 08:25:23,528] ERROR OOME with size 1853095936 > (kafka.network.BoundedByteBufferReceive) > java.lang.OutOfMemoryError: Java heap space > It seems like it runs out of memory while trying to read the producer > request, but its unclear so far. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira