[ https://issues.apache.org/jira/browse/KAFKA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jay Kreps updated KAFKA-595: ---------------------------- Comment: was deleted (was: I think saying it is unnecessary is perhaps overstating it. It depends what you are trying to optimize. Compression trades client CPU for network bandwidth. For our own use case I don't know whether or not that use case is worth it or not. It depends on the CPU usage of compression, the compression ratio, and the relative availability of network bandwidth. The CPU usage isn't necessarily fixed--a cheaper compression algorithm than GZIP, plus a little work on the compression code to avoid recopies and deep iteration could significantly reduce the CPU cost on the broker. I would instead rephrase this as a feature request--"Decouple producer compression from broker compression.". Since we are going to recompress anyway this is super easy to implement. Basically right now we have a kind of odd heuristic which says "if there is at least one compressed message in a given message set, recompress the entire message set using the last compression codec that appears in the message set". This is actually a little odd. I would recommend we instead add a log.compression.codec property (plus override map) that controls the compression on the broker. This could be set the same as the producer or not. I don't think we necessarily need to support the current behavior of retaining whatever the producer uses--this behavior is actually kind of bad since it means consumers must support EVERY codec ANY producer happens to send. The broker would always apply the configured compression codec to incoming messages regardless of source compression format.) > Decouple producer side compression from server-side compression. > ---------------------------------------------------------------- > > Key: KAFKA-595 > URL: https://issues.apache.org/jira/browse/KAFKA-595 > Project: Kafka > Issue Type: Improvement > Affects Versions: 0.8 > Reporter: Neha Narkhede > Labels: feature > > In 0.7 Kafka always appended messages to the log using whatever compression > codec the client used. In 0.8, after the KAFKA-506 patch, the master always > recompresses the message before appending to the log to assign ids. Currently > the server uses a funky heuristic to choose a compression codec based on the > codecs the producer used. This doesn't actually make that much sense. It > would be better for the server to have its own compression (a global default > and per-topic override) that specified the compression codec, and have the > server always recompress with this codec regardless of the original codec. > Compression currently happens in kafka.log.Log.assignOffsets (perhaps should > be renamed if it takes on compression as an official responsibility instead > of a side-effect). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira