[ https://issues.apache.org/jira/browse/KAFKA-4778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879220#comment-15879220 ]
Dave Thomas commented on KAFKA-4778: ------------------------------------ [~guozhang] We set the following topic configuration values: {noformat} kafka.default.replication.factor=3 kafka.default.partition.count=32 cleanup.policy=compact retention.bytes=-1 retention.ms=2592000000 // 30 days for internal streams; 720 days for ingested raw streams {noformat} Is that what you're looking for? > OOM on kafka-streams instances with high numbers of unreaped Record classes > --------------------------------------------------------------------------- > > Key: KAFKA-4778 > URL: https://issues.apache.org/jira/browse/KAFKA-4778 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 0.10.1.1 > Environment: AWS m3.large Ubuntu 16.04.1 LTS. rocksDB on local SSD. > Kafka has 3 zk, 5 brokers. > stream processors are run with: > -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails > Java(TM) SE Runtime Environment (build 1.8.0_101-b13) > Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode) > Stream processors written in scala 2.11.8 > Reporter: Dave Thomas > Attachments: oom-killer.txt > > > We have a stream processing app with ~8 source/sink stages operating roughly > at the rate of 500k messages ingested/day (~4M across the 8 stages). > We get OOM eruptions once every ~18 hours. Note it is Linux triggering the > OOM-killer, not the JVM terminating itself. > It may be worth noting that stream processing uses ~50 mbytes while > processing normally for hours on end, until the problem surfaces; then in > ~20-30 sec memory grows suddenly from under 100 mbytes to >1 gig and does not > shrink until the process is killed. > We are using supervisor to restart the instances. Sometimes, the process > dies immediately once stream processing resumes for the same reason, a > process which could continue for minutes or hours. This extended window has > enabled us to capture a heap dump using jmap. > jhat's histogram feature reveals the following top objects in memory: > Class Instance Count Total Size > class [B 4070487 867857833 > class [Ljava.lang.Object; 2066986 268036184 > class [C 539519 92010932 > class [S 1003290 80263028 > class [I 508208 77821516 > class java.nio.HeapByteBuffer 1506943 58770777 > class org.apache.kafka.common.record.Record 1506783 36162792 > class org.apache.kafka.clients.consumer.ConsumerRecord 528652 35948336 > class org.apache.kafka.common.record.MemoryRecords$RecordsIterator 501742 > 32613230 > class org.apache.kafka.common.record.LogEntry 2009373 32149968 > class org.xerial.snappy.SnappyInputStream 501600 20565600 > class java.io.DataInputStream 501742 20069680 > class java.io.EOFException 501606 20064240 > class java.util.ArrayDeque 501941 8031056 > class java.lang.Long 516463 4131704 > Note high on the list include org.apache.kafka.common.record.Record, > org.apache.kafka.clients.consumer.ConsumerRecord, > org.apache.kafka.common.record.MemoryRecords$RecordsIterator, > org.apache.kafka.common.record.LogEntry > All of these contain 500k-1.5M instances. > There is nothing in stream processing logs that is distinctive (log levels > are still at default). > Could it be references (weak, phantom, etc) causing these instances to not be > garbage-collected? > Edit: to request a full heap dump (created using `jmap > -dump:format=b,file=`), contact me directly at opensou...@peoplemerge.com. > It is 2G. -- This message was sent by Atlassian JIRA (v6.3.15#6346)