[
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033274#comment-14033274
]
James Oliver commented on KAFKA-1493:
-------------------------------------
I agree that storing the uncompressed length as a varint makes logical sense
for allocating the required heap space IFF the entire uncompressed message is
destined for the heap. Otherwise, this strategy introduces unnecessary heap
requirements. I also agree that the checksum doesn't buy us much... IMO LZ4 is
mature enough to not worry about distortion, and as you mentioned we already
checksum the compressed message to verify accurate transmission.
Looks like the LZ4 Java path doesn't pass that default blockSize to the
underlying stream, which should be changed (if we go with the LZ4Block
streams). That being said, the ultra-small block size is robbing
performance...we should consider bumping it up to something in the 32-64kb
range to improve our compression ratio and reduce block overhead.
We could just compress the entire message as [[email protected]] mentioned
and document the heap requirements, but it doesn't look like any of the other
compression codecs do so and I'm hesitant to change the way LZ4 would work...
partially implementing
https://docs.google.com/document/d/1gZbUoLw5hRzJ5Q71oPRN6TO4cRMTZur60qip-TE7BhQ/edit?pli=1
might still be our best option.
> Use a well-documented LZ4 compression format and remove redundant LZ4HC option
> ------------------------------------------------------------------------------
>
> Key: KAFKA-1493
> URL: https://issues.apache.org/jira/browse/KAFKA-1493
> Project: Kafka
> Issue Type: Improvement
> Reporter: James Oliver
> Fix For: 0.8.2
>
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)