Jeff, The discussion thread from a while back on KIP-58 has some discussion around "log.cleaner.min.cleanable.ratio".
KIP-58 page: https://cwiki.apache.org/confluence/display/KAFKA/KIP-58+-+Make+Log+Compaction+Point+Configurable Discussion thread (linked off that page): http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3CCAAWiU2VzPXdK1fW3FacfDsVQc-1sphNMjEqtkRSHZEYaN1Wr-w%40mail.gmail.com%3E The summary is that "log.cleaner.min.cleanable.ratio" seems like it was designed to limit how much disk I/O to spend on compaction. Your JIRA indicates you benchmarked CPU and memory, but did you look at disk I/O? -James > On Jul 7, 2017, at 1:24 PM, Jeff Chao <jc...@heroku.com> wrote: > > Hi, > > I filed a jira a few weeks ago around some log compaction ratio behavior we > were seeing. Now that the 0.11 vote done and release is out, I wanted to > follow up on it. Jira is here: > https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-5452. > > Details are in the jira, but long story short, after much testing, we were > seeing that aggressive log compaction ratios were performing just as well > as more conservative ratios. Fundamentally I would expect there to be some > sort of hit, but seeing that the data shows there wasn't, we wanted to > raise this to the rest of the community and see if anyone else has observed > similar behavior. The motivation behind this is to see if we might consider > changing the default from 0.5. This could help in preventing confusion > around duplicate keys in low volume log-compacted topics use cases. > > Thanks, > > Jeff Chao > Heroku Kafka