Re: Consumer Offsets Compaction

2015-12-16 Thread Grant Henke
I am considering changing these defaults in KAFKA-2988: log.cleaner.enable=true (was false) log.cleaner.dedupe.buffer.size=128MiB (was 500MiB) log.cleaner.delete.retention.ms=7 Days (was 1 day) Thoughts on those values? Should I add logic to make sure we scale down the buffer size instead of cau

Re: Consumer Offsets Compaction

2015-12-15 Thread Gwen Shapira
I'm thinking that anyone who actually uses compaction has non-standard configuration (at the very least, they had to enable the cleaner, and probably few other configurations too... Compaction is a bit fiddly from what I've seen). So, I'm in favor of minimal default buffer just for offsets and cop

Re: Consumer Offsets Compaction

2015-12-15 Thread Grant Henke
Following up based on some digging. There are some upper and lower bounds on the buffer size: log.cleaner.dedupe.buffer.size has a: - Minimum of 1 MiB per cleaner thread - https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/KafkaConfig.scala#L950 - Ma

Re: Consumer Offsets Compaction

2015-12-15 Thread Jay Kreps
The buffer determines the maximum number of unique keys in the new writes that can be processed in one cleaning. Each key requires 24 bytes of space iirc, so 500 MB = ~21,845,333 unique keys (this is actually adjusted for some load factor and divided by the number of cleaner threads). If it is too

Re: Consumer Offsets Compaction

2015-12-15 Thread Grant Henke
Thanks for the background context Jay. Do we have any context on what size is small (but still effect for small deployments) for the compaction buffer? and what is large? what factors help you choose the correct (or a safe) size? Currently the default "log.cleaner.dedupe.buffer.size" is 500 MiB.

Re: Consumer Offsets Compaction

2015-12-14 Thread Jay Kreps
The reason for disabling it by default was (1) general paranoia about log compaction when we released it, (2) avoid allocating the compaction buffer. The first concern is now definitely obsolete, but the second concern is maybe valid. Basically that compaction buffer is a preallocated chunk of memo

Re: Consumer Offsets Compaction

2015-12-14 Thread Grant Henke
Thanks for the responses and confirmation. I have created https://issues.apache.org/jira/browse/KAFKA-2988 to track the work/changes. We can continue discussion here on what changes to make. On Mon, Dec 14, 2015 at 1:51 PM, Jason Gustafson wrote: > That's a good point. It doesn't look like the

Re: Consumer Offsets Compaction

2015-12-14 Thread Jason Gustafson
That's a good point. It doesn't look like there's any special handling for the offsets topic, so enabling the cleaner by default makes sense to me. If compaction is not enabled, it would grow without bound, so I wonder if we should even deprecate that setting. Are there any use cases where it needs

Re: Consumer Offsets Compaction

2015-12-14 Thread Gwen Shapira
This makes sense to me. Copycat also works better if topics are compacted. Just to clarify: log.cleaner.enable = true just makes the compaction thread run, but doesn't force compaction on any specific topic. You still need to set delete.policy=compact, and we should not change defaults here. On M

Consumer Offsets Compaction

2015-12-14 Thread Grant Henke
Since 0.9.0 the internal "__consumer_offsets" topic is being used more heavily. Because this is a compacted topic does "log.cleaner.enable" need to be "true" in order for it to be compacted? Or is there special handling for internal topics? If log.cleaner.enable=true is required, should we make it