[
https://issues.apache.org/jira/browse/KAFKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033211#comment-14033211
]
Stephan Lachowsky commented on KAFKA-1493:
------------------------------------------
Given the way that the decoder works I think that storing the uncompressed size
would be the appropriate thing to do. The compressed length can be inferred.
This allows the reader of the stream to allocate the minimum required memory
for a single-shot decode.
I've been looking at how the default blocksize is passed down to the various
compression backends, the java and scala code paths look like they do different
things.
The current java code passes the blocksize into the decoder from the Compressor
constructor (Compressor.java:59 and 214). It appears that MemoryRecords is the
only user of the java code and it uses the constructor which doesn't explicitly
pass a blocksize resulting in fallback to the (tiny) default of 1024.
The scala code path in CompressionFactory.scala appears to use just the default
constructors for the existing stream wrapper, which means that the compressors
will use their own internal default blocksizes. It looks like the scala code
has all the messages on heap already.
> Use a well-documented LZ4 compression format and remove redundant LZ4HC option
> ------------------------------------------------------------------------------
>
> Key: KAFKA-1493
> URL: https://issues.apache.org/jira/browse/KAFKA-1493
> Project: Kafka
> Issue Type: Improvement
> Reporter: James Oliver
> Fix For: 0.8.2
>
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)