[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032832#comment-14032832 ]
James Oliver edited comment on KAFKA-1493 at 6/16/14 9:23 PM: -------------------------------------------------------------- Snappy's block (default size 32kb) compression format is this: snappy codec header: 8-byte magic header, version [4-byte integer], min compatible version [4-byte integer] compressed block 1: compressed data size [4-byte integer], compressed data compressed block 2 ... Notable limitations: no checksum If I understand the proposed format correctly, this is what you're suggesting: uncompressed data size [n-byte varint], compressed data I would expect compressing an entire message as a single block would provide a better compression ratio than compressing smaller chunks. We would need to allocate enough heap to fit both the uncompressed and compressed message in memory. If this is ok then it would drastically simplify things. was (Author: joliver): Snappy's block (default size 32kb) compression format is this: snappy codec header: 8-byte magic header, version [4-byte integer], min compatible version [4-byte integer] compressed block 1: compressed data size [4-byte integer], compressed data compressed block 2 ... Notable limitations: no checksum If I understand the proposed format correctly, this is what you're suggesting: uncompressed data size [n-byte varint], compressed data While I would expect compressing an entire message as a single block would provide a better compression ratio than compressing smaller chunks, doing so for larger messages is going to cause serious performance problems. > Use a well-documented LZ4 compression format and remove redundant LZ4HC option > ------------------------------------------------------------------------------ > > Key: KAFKA-1493 > URL: https://issues.apache.org/jira/browse/KAFKA-1493 > Project: Kafka > Issue Type: Improvement > Reporter: James Oliver > Fix For: 0.8.2 > > -- This message was sent by Atlassian JIRA (v6.2#6252)