[
https://issues.apache.org/jira/browse/KAFKA-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153407#comment-14153407
]
Jay Kreps commented on KAFKA-1499:
----------------------------------
Here is the problem with compaction. Currently the log may contain a mixture of
records in different compression codecs interleaved. Compaction means going
through, decompressing, and recopying active records to a new compressed
segment. However maintaining the original compression becomes quite complex and
inefficient because we have to find the arbitrary boundaries in the log where
one compressed message set ends and another begins. Even if we deal with the
complexity and try to maintain the compression, over time this will result in
having each message compressed individually.
Since we currently haven't been able to implement this the combination of
compression and compaction don't work. The proposed fix was to move to a model
where compression is set at the topic level and applied on the broker (as in
this ticket). This would let the compaction always just recompress using the
default compression type for the topic (i.e. the global default or topic
override for that topic).
I think the default compression type should be none (i.e. producer may compress
requests but the data won't be stored compressed).
I agree that this is a change in behavior and that users using compression will
have to set compression types when they upgrade. I also think the change may
confuse some people as the compression they set on the producer will no longer
be carried through to the log/consumer. However leaving the on/off switch
doesn't resolve this confusion, I think, it just makes it worse because it adds
a whole other mode where compression by the producer is retained.
Thoughts?
> Broker-side compression configuration
> -------------------------------------
>
> Key: KAFKA-1499
> URL: https://issues.apache.org/jira/browse/KAFKA-1499
> Project: Kafka
> Issue Type: New Feature
> Reporter: Joel Koshy
> Assignee: Manikumar Reddy
> Labels: newbie++
> Fix For: 0.8.2
>
> Attachments: KAFKA-1499.patch, KAFKA-1499.patch,
> KAFKA-1499_2014-08-15_14:20:27.patch, KAFKA-1499_2014-08-21_21:44:27.patch,
> KAFKA-1499_2014-09-21_15:57:23.patch, KAFKA-1499_2014-09-23_14:45:38.patch,
> KAFKA-1499_2014-09-24_14:20:33.patch, KAFKA-1499_2014-09-24_14:24:54.patch,
> KAFKA-1499_2014-09-25_11:05:57.patch
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> A given topic can have messages in mixed compression codecs. i.e., it can
> also have a mix of uncompressed/compressed messages.
> It will be useful to support a broker-side configuration to recompress
> messages to a specific compression codec. i.e., all messages (for all
> topics) on the broker will be compressed to this codec. We could have
> per-topic overrides as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)