Hello, We are running Apache Kafka v2.7.1 on a total of 54 brokers distributed evenly across 3 racks. All machines are identical (c6g.4xlarge Amazon AWS EC2) and have 32 GB of RAM, of which 12 GB we dedicate to the JVM heap.
This cluster hosts some thousands of topics (each replicated to all 3 racks) and a total of ~3,600 partitions per broker. Except for a handful of log-compacted topics, the retention time is <= 4 days. Around 3 weeks ago we enabled idempotent producer config for our application. Initially this didn't result in any obvious change in cluster stability, but recently we've got alerted that one of the brokers crashed with OutOfMemoryError... We've taken a closer look and found out that the heap usage (as well as G1 old generation usage) was slowly but surely growing on every broker during the course of ~10 days. Prior to the changes it was in the range of 3-10 GB and went up to 8-12 GB around the time of problem detection. As we haven't changed anything else in the middle, we decided to revert the idempotent config change. This didn't result in any immediate heap usage change for the brokers, but now after a few days running with the "new" old setup we are starting to see the opposite trend, with heap usage going back to the expected levels. By examining Kafka source code[1] we've learned that producer snapshots (which are keeping known producer IDs and other information supplied by idempotent producers) are stored in special .snapshot files next to the log segments. On one of the brokers we've checked, these snapshot files amounted to 1.7 GB on disk in total. Finally, here are the questions that we have: 1. Is such a dramatic increase in heap usage expected given the number of partitions per broker? 2. Is there a way to calculate the extra heap requirements before enabling the idempotent producer config? 3. Are there any best practices / community experience that we might be ignoring here? Thank you. [1]: https://github.com/apache/kafka/blob/2.7.1/core/src/main/scala/kafka/log/Log.scala#L2503 Kind regards, -- Alex