Hi, We were recently trying to replace a broker instance and were getting an OutOfMemoryException when the new node was coming up. The issue happened during the log replication phase. We were able to circumvent this issue by copying over all of the logs to the new node before starting it.
Details: - The heap size on the old and new node was 8GB. - There was about 50GB of log data to transfer. - There were 1548 partitions across 11 topics - We recently increased our num.replica.fetchers to solve the problem described here: https://issues.apache.org/jira/browse/KAFKA-1196. However, this process worked when we first changed that value. [2014-12-04 12:10:22,746] ERROR OOME with size 1867671283 (kafka.network. BoundedByteBufferReceive) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at kafka.network.BoundedByteBufferReceive.byteBufferAllocate( BoundedByteBufferReceive.scala:80) at kafka.network.BoundedByteBufferReceive.readFrom( BoundedByteBufferReceive.scala:63) at kafka.network.Receive$class.readCompletely(Transmission.scala:56) at kafka.network.BoundedByteBufferReceive.readCompletely( BoundedByteBufferReceive.scala:29) at kafka.network.BlockingChannel.receive(BlockingChannel.scala:100) at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:73) at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$ $sendRequest(SimpleConsumer.scala:71) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$ apply$mcV$sp$1.apply$mcV$sp(SimpleConsumer.scala:109) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$ apply$mcV$sp$1.apply(SimpleConsumer.scala:109) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1$$anonfun$ apply$mcV$sp$1.apply(SimpleConsumer.scala:109) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply$mcV$sp( SimpleConsumer.scala:108) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply( SimpleConsumer.scala:108) at kafka.consumer.SimpleConsumer$$anonfun$fetch$1.apply( SimpleConsumer.scala:108) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33) at kafka.consumer.SimpleConsumer.fetch(SimpleConsumer.scala:107) at kafka.server.AbstractFetcherThread.processFetchRequest( AbstractFetcherThread.scala:96) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala: 88) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) Thank you