[
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032832#comment-14032832
]
James Oliver edited comment on KAFKA-1493 at 6/16/14 9:23 PM:
--------------------------------------------------------------
Snappy's block (default size 32kb) compression format is this:
snappy codec header: 8-byte magic header, version [4-byte integer], min
compatible version [4-byte integer]
compressed block 1: compressed data size [4-byte integer], compressed data
compressed block 2
...
Notable limitations: no checksum
If I understand the proposed format correctly, this is what you're suggesting:
uncompressed data size [n-byte varint], compressed data
I would expect compressing an entire message as a single block would provide a
better compression ratio than compressing smaller chunks. We would need to
allocate enough heap to fit both the uncompressed and compressed message in
memory. If this is ok then it would drastically simplify things.
was (Author: joliver):
Snappy's block (default size 32kb) compression format is this:
snappy codec header: 8-byte magic header, version [4-byte integer], min
compatible version [4-byte integer]
compressed block 1: compressed data size [4-byte integer], compressed data
compressed block 2
...
Notable limitations: no checksum
If I understand the proposed format correctly, this is what you're suggesting:
uncompressed data size [n-byte varint], compressed data
While I would expect compressing an entire message as a single block would
provide a better compression ratio than compressing smaller chunks, doing so
for larger messages is going to cause serious performance problems.
> Use a well-documented LZ4 compression format and remove redundant LZ4HC option
> ------------------------------------------------------------------------------
>
> Key: KAFKA-1493
> URL: https://issues.apache.org/jira/browse/KAFKA-1493
> Project: Kafka
> Issue Type: Improvement
> Reporter: James Oliver
> Fix For: 0.8.2
>
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)