[ https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033211#comment-14033211 ]
Stephan Lachowsky commented on KAFKA-1493: ------------------------------------------ Given the way that the decoder works I think that storing the uncompressed size would be the appropriate thing to do. The compressed length can be inferred. This allows the reader of the stream to allocate the minimum required memory for a single-shot decode. I've been looking at how the default blocksize is passed down to the various compression backends, the java and scala code paths look like they do different things. The current java code passes the blocksize into the decoder from the Compressor constructor (Compressor.java:59 and 214). It appears that MemoryRecords is the only user of the java code and it uses the constructor which doesn't explicitly pass a blocksize resulting in fallback to the (tiny) default of 1024. The scala code path in CompressionFactory.scala appears to use just the default constructors for the existing stream wrapper, which means that the compressors will use their own internal default blocksizes. It looks like the scala code has all the messages on heap already. > Use a well-documented LZ4 compression format and remove redundant LZ4HC option > ------------------------------------------------------------------------------ > > Key: KAFKA-1493 > URL: https://issues.apache.org/jira/browse/KAFKA-1493 > Project: Kafka > Issue Type: Improvement > Reporter: James Oliver > Fix For: 0.8.2 > > -- This message was sent by Atlassian JIRA (v6.2#6252)