[ https://issues.apache.org/jira/browse/KAFKA-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971806#comment-14971806 ]
Todd Palino commented on KAFKA-2235: ------------------------------------ I don't think we can. I have already increased it from 512MB to 1GB, and we still hit the same problems. That only provides a 2x increase in the size of the map, and I would need almost a 10x increase to solve the problem. > LogCleaner offset map overflow > ------------------------------ > > Key: KAFKA-2235 > URL: https://issues.apache.org/jira/browse/KAFKA-2235 > Project: Kafka > Issue Type: Bug > Components: core, log > Affects Versions: 0.8.1, 0.8.2.0 > Reporter: Ivan Simoneko > Assignee: Ivan Simoneko > Fix For: 0.9.0.0 > > Attachments: KAFKA-2235_v1.patch, KAFKA-2235_v2.patch > > > We've seen log cleaning generating an error for a topic with lots of small > messages. It seems that cleanup map overflow is possible if a log segment > contains more unique keys than empty slots in offsetMap. Check for baseOffset > and map utilization before processing segment seems to be not enough because > it doesn't take into account segment size (number of unique messages in the > segment). > I suggest to estimate upper bound of keys in a segment as a number of > messages in the segment and compare it with the number of available slots in > the map (keeping in mind desired load factor). It should work in cases where > an empty map is capable to hold all the keys for a single segment. If even a > single segment no able to fit into an empty map cleanup process will still > fail. Probably there should be a limit on the log segment entries count? > Here is the stack trace for this error: > 2015-05-19 16:52:48,758 ERROR [kafka-log-cleaner-thread-0] > kafka.log.LogCleaner - [kafka-log-cleaner-thread-0], Error due to > java.lang.IllegalArgumentException: requirement failed: Attempt to add a new > entry to a full offset map. > at scala.Predef$.require(Predef.scala:233) > at kafka.log.SkimpyOffsetMap.put(OffsetMap.scala:79) > at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:543) > at > kafka.log.Cleaner$$anonfun$kafka$log$Cleaner$$buildOffsetMapForSegment$1.apply(LogCleaner.scala:538) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:32) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at kafka.message.MessageSet.foreach(MessageSet.scala:67) > at > kafka.log.Cleaner.kafka$log$Cleaner$$buildOffsetMapForSegment(LogCleaner.scala:538) > at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:515) > at > kafka.log.Cleaner$$anonfun$buildOffsetMap$3.apply(LogCleaner.scala:512) > at scala.collection.immutable.Stream.foreach(Stream.scala:547) > at kafka.log.Cleaner.buildOffsetMap(LogCleaner.scala:512) > at kafka.log.Cleaner.clean(LogCleaner.scala:307) > at > kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:221) > at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:199) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) -- This message was sent by Atlassian JIRA (v6.3.4#6332)