[ 
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033211#comment-14033211
 ] 

Stephan Lachowsky commented on KAFKA-1493:
------------------------------------------

Given the way that the decoder works I think that storing the uncompressed size 
would be the appropriate thing to do. The compressed length can be inferred.  
This allows the reader of the stream to allocate the minimum required memory 
for a single-shot decode.

I've been looking at how the default blocksize is passed down to the various 
compression backends, the java and scala code paths look like they do different 
things.

The current java code passes the blocksize into the decoder from the Compressor 
constructor (Compressor.java:59 and 214).  It appears that MemoryRecords is the 
only user of the java code and it uses the constructor which doesn't explicitly 
pass a blocksize resulting in fallback to the (tiny) default of 1024.

The scala code path in CompressionFactory.scala appears to use just the default 
constructors for the existing stream wrapper, which means that the compressors 
will use their own internal default blocksizes.  It looks like the scala code 
has all the messages on heap already.

> Use a well-documented LZ4 compression format and remove redundant LZ4HC option
> ------------------------------------------------------------------------------
>
>                 Key: KAFKA-1493
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1493
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: James Oliver
>             Fix For: 0.8.2
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to