[ https://issues.apache.org/jira/browse/KAFKA-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153407#comment-14153407 ]
Jay Kreps commented on KAFKA-1499: ---------------------------------- Here is the problem with compaction. Currently the log may contain a mixture of records in different compression codecs interleaved. Compaction means going through, decompressing, and recopying active records to a new compressed segment. However maintaining the original compression becomes quite complex and inefficient because we have to find the arbitrary boundaries in the log where one compressed message set ends and another begins. Even if we deal with the complexity and try to maintain the compression, over time this will result in having each message compressed individually. Since we currently haven't been able to implement this the combination of compression and compaction don't work. The proposed fix was to move to a model where compression is set at the topic level and applied on the broker (as in this ticket). This would let the compaction always just recompress using the default compression type for the topic (i.e. the global default or topic override for that topic). I think the default compression type should be none (i.e. producer may compress requests but the data won't be stored compressed). I agree that this is a change in behavior and that users using compression will have to set compression types when they upgrade. I also think the change may confuse some people as the compression they set on the producer will no longer be carried through to the log/consumer. However leaving the on/off switch doesn't resolve this confusion, I think, it just makes it worse because it adds a whole other mode where compression by the producer is retained. Thoughts? > Broker-side compression configuration > ------------------------------------- > > Key: KAFKA-1499 > URL: https://issues.apache.org/jira/browse/KAFKA-1499 > Project: Kafka > Issue Type: New Feature > Reporter: Joel Koshy > Assignee: Manikumar Reddy > Labels: newbie++ > Fix For: 0.8.2 > > Attachments: KAFKA-1499.patch, KAFKA-1499.patch, > KAFKA-1499_2014-08-15_14:20:27.patch, KAFKA-1499_2014-08-21_21:44:27.patch, > KAFKA-1499_2014-09-21_15:57:23.patch, KAFKA-1499_2014-09-23_14:45:38.patch, > KAFKA-1499_2014-09-24_14:20:33.patch, KAFKA-1499_2014-09-24_14:24:54.patch, > KAFKA-1499_2014-09-25_11:05:57.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > A given topic can have messages in mixed compression codecs. i.e., it can > also have a mix of uncompressed/compressed messages. > It will be useful to support a broker-side configuration to recompress > messages to a specific compression codec. i.e., all messages (for all > topics) on the broker will be compressed to this codec. We could have > per-topic overrides as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)