[
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017229#comment-13017229
]
Krishna Kumar commented on HIVE-2065:
-------------------------------------
The minor version is needed so that we can still read 6.0 files correctly. To
recap, 6.0 files have incorrect record length and while reading, we make the
necessary recalculations to fix it up, while 6.1 onwards have the correct
record length stored on disk.
[PS. I had suggested bumping up the sequence file version to 7 in a comment
above, but I think a minor version is a better idea. The layout itself is still
'kinda sorta' version-6-compatible. For all we know, there may be a sequence
file version 7, and then sequence file version 7 and rc file version 7 would be
divergent.]
[PPS. For the sake of completeness of documentation, here are the reason why
the layout, even after the current patch, is still short of complete version-6
compatibility : [a] The KeyBuffer, denoted as the key class, is unable to read
or write itself from/to the disk stream as the reading/writing the 4-byte key
contents length field and the compression/decompression are being done by the
reader/writer and not the KeyBuffer class and [b] The ValueBuffer, the value
class, must be compressed as a unit to be compatible to sequence file
reader/writer, but it is actually compressed as several units.]
> RCFile issues
> -------------
>
> Key: HIVE-2065
> URL: https://issues.apache.org/jira/browse/HIVE-2065
> Project: Hive
> Issue Type: Bug
> Reporter: Krishna Kumar
> Assignee: Krishna Kumar
> Priority: Minor
> Attachments: HIVE.2065.patch.0.txt, HIVE.2065.patch.1.txt,
> Slide1.png, proposal.png
>
>
> Some potential issues with RCFile
> 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per
> yongqiang he, the class is not meant to be thread-safe (and it is not). Might
> as well get rid of the confusing and performance-impacting lock acquisitions.
> 2. Record Length overstated for compressed files. IIUC, the key compression
> happens after we have written the record length.
> {code}
> int keyLength = key.getSize();
> if (keyLength < 0) {
> throw new IOException("negative length keys not allowed: " + key);
> }
> out.writeInt(keyLength + valueLength); // total record length
> out.writeInt(keyLength); // key portion length
> if (!isCompressed()) {
> out.writeInt(keyLength);
> key.write(out); // key
> } else {
> keyCompressionBuffer.reset();
> keyDeflateFilter.resetState();
> key.write(keyDeflateOut);
> keyDeflateOut.flush();
> keyDeflateFilter.finish();
> int compressedKeyLen = keyCompressionBuffer.getLength();
> out.writeInt(compressedKeyLen);
> out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
> }
> {code}
> 3. For sequence file compatibility, the compressed key length should be the
> next field to record length, not the uncompressed key length.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira