[ 
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033274#comment-14033274
 ] 

James Oliver commented on KAFKA-1493:
-------------------------------------

I agree that storing the uncompressed length as a varint makes logical sense 
for allocating the required heap space IFF the entire uncompressed message is 
destined for the heap. Otherwise, this strategy introduces unnecessary heap 
requirements. I also agree that the checksum doesn't buy us much... IMO LZ4 is 
mature enough to not worry about distortion, and as you mentioned we already 
checksum the compressed message to verify accurate transmission.

Looks like the LZ4 Java path doesn't pass that default blockSize to the 
underlying stream, which should be changed (if we go with the LZ4Block 
streams). That being said, the ultra-small block size is robbing 
performance...we should consider bumping it up to something in the 32-64kb 
range to improve our compression ratio and reduce block overhead.

We could just compress the entire message as [~alb...@stonethree.com] mentioned 
and document the heap requirements, but it doesn't look like any of the other 
compression codecs do so and I'm hesitant to change the way LZ4 would work... 
partially implementing 
https://docs.google.com/document/d/1gZbUoLw5hRzJ5Q71oPRN6TO4cRMTZur60qip-TE7BhQ/edit?pli=1
 might still be our best option.

> Use a well-documented LZ4 compression format and remove redundant LZ4HC option
> ------------------------------------------------------------------------------
>
>                 Key: KAFKA-1493
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1493
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: James Oliver
>             Fix For: 0.8.2
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to