I am using the kafka java api async client (the one that wraps the Scala client). It's dropping quite a bit of data due to the queue being full, and the process ends up taking a lot of cpu too.
I am posting to a topic with 1024 partitions (across 3 brokers) - maybe the high number of brokers is one of the issues. Profiling with YourKit showed 37% of my CPU being spent on kafka.producer.async.ProducerSendThread.run(). So seems like the producer is not able to keep up with my application and starts dropping. When I expand this waterfall in YourKit, I see that 23% total (not out of the 37%) is being spent on logging! Something like this: kafka.producer.BrokerPartitionInfo$$anonfun$getBrokerPartitionInfo$2.apply(PartitionMetadata) ->kafka.producer.BrokerPartitionInfo.debug(Function0) ->kafka.utils.Logging$class.debug(Logging, Function0) ->org.apache.log4j.Category.isDebugEnabled() ->... (a bunch of other things that finally break down into) ->LoggerContext.java:252 ch.qos.logback.classic.spi.TurboFilterList.getTurboFilterChainDecision(Marker, Logger, Level, String, Object[], Throwable) I am not sure what's going on here. When I look at my process log, none of these messages are actually logged (probably because of the log level). Further I don't see anything very suspicious on the broker logs. They are at 60-70% cpu. I am planning to try the new Java beta producer client, but I am afraid something deeper is going on here, that might not be solved by switching to the newer client.