[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033274#comment-14033274 ]
James Oliver commented on KAFKA-1493: ------------------------------------- I agree that storing the uncompressed length as a varint makes logical sense for allocating the required heap space IFF the entire uncompressed message is destined for the heap. Otherwise, this strategy introduces unnecessary heap requirements. I also agree that the checksum doesn't buy us much... IMO LZ4 is mature enough to not worry about distortion, and as you mentioned we already checksum the compressed message to verify accurate transmission. Looks like the LZ4 Java path doesn't pass that default blockSize to the underlying stream, which should be changed (if we go with the LZ4Block streams). That being said, the ultra-small block size is robbing performance...we should consider bumping it up to something in the 32-64kb range to improve our compression ratio and reduce block overhead. We could just compress the entire message as [~alb...@stonethree.com] mentioned and document the heap requirements, but it doesn't look like any of the other compression codecs do so and I'm hesitant to change the way LZ4 would work... partially implementing https://docs.google.com/document/d/1gZbUoLw5hRzJ5Q71oPRN6TO4cRMTZur60qip-TE7BhQ/edit?pli=1 might still be our best option. > Use a well-documented LZ4 compression format and remove redundant LZ4HC option > ------------------------------------------------------------------------------ > > Key: KAFKA-1493 > URL: https://issues.apache.org/jira/browse/KAFKA-1493 > Project: Kafka > Issue Type: Improvement > Reporter: James Oliver > Fix For: 0.8.2 > > -- This message was sent by Atlassian JIRA (v6.2#6252)