[ 
https://issues.apache.org/jira/browse/KAFKA-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153407#comment-14153407
 ] 

Jay Kreps commented on KAFKA-1499:
----------------------------------

Here is the problem with compaction. Currently the log may contain a mixture of 
records in different compression codecs interleaved. Compaction means going 
through, decompressing, and recopying active records to a new compressed 
segment. However maintaining the original compression becomes quite complex and 
inefficient because we have to find the arbitrary boundaries in the log where 
one compressed message set ends and another begins. Even if we deal with the 
complexity and try to maintain the compression, over time this will result in 
having each message compressed individually.

Since we currently haven't been able to implement this the combination of 
compression and compaction don't work. The proposed fix was to move to a model 
where compression is set at the topic level and applied on the broker (as in 
this ticket). This would let the compaction always just recompress using the 
default compression type for the topic (i.e. the global default or topic 
override for that topic).

I think the default compression type should be none (i.e. producer may compress 
requests but the data won't be stored compressed).

I agree that this is a change in behavior and that users using compression will 
have to set compression types when they upgrade. I also think the change may 
confuse some people as the compression they set on the producer will no longer 
be carried through to the log/consumer. However leaving the on/off switch 
doesn't resolve this confusion, I think, it just makes it worse because it adds 
a whole other mode where compression by the producer is retained.

Thoughts?

> Broker-side compression configuration
> -------------------------------------
>
>                 Key: KAFKA-1499
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1499
>             Project: Kafka
>          Issue Type: New Feature
>            Reporter: Joel Koshy
>            Assignee: Manikumar Reddy
>              Labels: newbie++
>             Fix For: 0.8.2
>
>         Attachments: KAFKA-1499.patch, KAFKA-1499.patch, 
> KAFKA-1499_2014-08-15_14:20:27.patch, KAFKA-1499_2014-08-21_21:44:27.patch, 
> KAFKA-1499_2014-09-21_15:57:23.patch, KAFKA-1499_2014-09-23_14:45:38.patch, 
> KAFKA-1499_2014-09-24_14:20:33.patch, KAFKA-1499_2014-09-24_14:24:54.patch, 
> KAFKA-1499_2014-09-25_11:05:57.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> A given topic can have messages in mixed compression codecs. i.e., it can
> also have a mix of uncompressed/compressed messages.
> It will be useful to support a broker-side configuration to recompress
> messages to a specific compression codec. i.e., all messages (for all
> topics) on the broker will be compressed to this codec. We could have
> per-topic overrides as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to