Thanks a bunch for that info, Jon! It is pure gold and helps a lot. At LinkedIn, do you ever let Samza or Kafka auto-create topics? Or do you always create them by-hand before deploying code that uses them?
I understand why setting the topic config segment.bytes to smaller than 1GB is beneficial in a lot of cases, to allow log compaction to run more often on smaller data sizes. Is setting segment.ms to less than 7 days used for the same reason? i.e. to roll off the head segment file so it can be compacted sooner? Are there any considerations we need to make to the number of partitions of a changelog topic? Does that need to be the same number as anything else? Thanks, Zach On Mon, Jan 26, 2015 at 6:02 PM, Jon Bringhurst < jbringhu...@linkedin.com.invalid> wrote: > Hey Zach, > > That's correct. You probably want to look into the following topic-level > configs: > > cleanup.policy > min.cleanable.dirty.ratio > segment.ms > segment.bytes (we usually use the default value) > > Also, here's some broker configs of interest that you might want to tweak > (along with the settings we usually use... YMMV): > > log.cleaner.enable > log.cleaner.io.buffer.load.factor (0.9) > log.cleaner.io.buffer.size (524288) > log.cleaner.backoff.ms (30000) > log.cleaner.dedupe.buffer.size (524288000) > log.cleaner.io.max.bytes.per.second (1000000000000.0) > log.cleaner.delete.retention.ms (86400000) > log.cleaner.min.cleanable.ratio (0.5) > log.cleaner.threads (1) > > On a side note, sometimes it's nice to set min.cleanable.dirty.ratio to > 0.01, then view the files on disk to make sure things are working. > > -Jon > > On Jan 26, 2015, at 2:42 PM, Zach Cox <zcox...@gmail.com> wrote: > > > Hi - in Samza 0.8.0 it seems that the Kafka topic created for a key-value > > store changelog does not have compaction enabled, as described in this > jira: > > > > https://issues.apache.org/jira/browse/SAMZA-226 > > > > If Samza creates this changelog topic, am I correct that we then later > need > > to run something like this to enable compaction (and smaller segment > size)? > > > > bin/kafka-topics.sh --zookeeper whatever:2181/kafka --topic > > "the-changelog-topic" --alter --config cleanup.policy=compact --config > > segment.bytes=1000000 > > > > Thanks, > > Zach > >