We are writing messages at the rate of about 9000 records/sec into our kafka cluster, at times we see that the producer performance degrades considerably and then it never recovers. When this happens we see the following error "unable to allocate buffer within timeout". The "waiting-threads" metric is very high when the process degrades, any inputs would be appreciated.
The producer parameters are batch.size=1000000linger.ms=30000 acks=-1metadata.fetch.timeout.ms=1000 compression.type=none max.request.size=10000000 Athough the buffer is fully available the errors are "org.apache.kafka.common.errors.TimeoutException: Failed to allocate memory within the configured max blocking time" Below is the JMX screen shot URL when the producer is running degraded vs running ok. http://i.stack.imgur.com/UIKXa.png The batch size is 1,000,000. The issue is the same when the batchsize is dropped to 500,000. I have this question on stack overflow http://stackoverflow.com/questions/36961677/kafka-producer-0-9-0-performance-large-number-of-waiting-threads/36964792#36964792 Thanks much