[ 
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032832#comment-14032832
 ] 

James Oliver edited comment on KAFKA-1493 at 6/16/14 9:23 PM:
--------------------------------------------------------------

Snappy's block (default size 32kb) compression format is this:
snappy codec header: 8-byte magic header, version [4-byte integer], min 
compatible version [4-byte integer]
compressed block 1: compressed data size [4-byte integer], compressed data
compressed block 2
...
Notable limitations: no checksum

If I understand the proposed format correctly, this is what you're suggesting:
uncompressed data size [n-byte varint], compressed data

I would expect compressing an entire message as a single block would provide a 
better compression ratio than compressing smaller chunks. We would need to 
allocate enough heap to fit both the uncompressed and compressed message in 
memory. If this is ok then it would drastically simplify things.


was (Author: joliver):
Snappy's block (default size 32kb) compression format is this:
snappy codec header: 8-byte magic header, version [4-byte integer], min 
compatible version [4-byte integer]
compressed block 1: compressed data size [4-byte integer], compressed data
compressed block 2
...
Notable limitations: no checksum

If I understand the proposed format correctly, this is what you're suggesting:
uncompressed data size [n-byte varint], compressed data

While I would expect compressing an entire message as a single block would 
provide a better compression ratio than compressing smaller chunks, doing so 
for larger messages is going to cause serious performance problems.

> Use a well-documented LZ4 compression format and remove redundant LZ4HC option
> ------------------------------------------------------------------------------
>
>                 Key: KAFKA-1493
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1493
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: James Oliver
>             Fix For: 0.8.2
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to