I'm using 0.9.0.1 consumers on 0.9.0.1 brokers. In a single Java service, we have 4 producers and 5 consumers. They are all KafkaProducer and KafkaConsumer instances (the new consumer.)
Since the 0.9 upgrade, this service is now OOMing after a being up for a few minutes. Heap dumps show >80MB of objects related to topic and partition metadata (hundreds of thousands of org.apache.kafka.common.Node objects.) Digging through the heap, I see references to all sorts of topics in the cluster that this service is not producing to or consuming from. We are not using any pattern matching on topics. Unusual things about this service and cluster: * Smallish heap (was fine with 128mb on the 0.8 consumer, but even bumping up to 256mb we still OOM on 0.9) * We have a large number of dev/test topics on this cluster (several hundred) and thousands of partitions as a result. The service is only concerned with ~15 of these topics. It seems pretty likely that our issue is that metadata for all partitions is being cached locally. Is there a way to keep that from happening? Or are there other known memory leak issues that could be causing us to OOM?