[
https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008741#comment-13008741
]
Krishna Kumar commented on HIVE-2065:
-------------------------------------
So should I go ahead and fix #2 and #3 as well? Note that these are
non-compatible changes, so the version number will need to be bumped up.
My proposal:
Fix the issues in the new format
- up the version number to 7.
- compute and store record length as (compressed key length = 4 + compressed
key contents length) + compressed value length
- store compressed key length as the next 4-byte field
- key contains 4-byte uncompressed key contents length + compressed key
contents
Provide backward compatibility
- while reading version 6,
- interpret fields as now but recalculate the recordlength from the next two
fields (as record length = record length - uncompressed key length + compressed
key length)
> RCFile issues
> -------------
>
> Key: HIVE-2065
> URL: https://issues.apache.org/jira/browse/HIVE-2065
> Project: Hive
> Issue Type: Bug
> Reporter: Krishna Kumar
> Assignee: Krishna Kumar
> Priority: Minor
> Attachments: Slide1.png
>
>
> Some potential issues with RCFile
> 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per
> yongqiang he, the class is not meant to be thread-safe (and it is not). Might
> as well get rid of the confusing and performance-impacting lock acquisitions.
> 2. Record Length overstated for compressed files. IIUC, the key compression
> happens after we have written the record length.
> {code}
> int keyLength = key.getSize();
> if (keyLength < 0) {
> throw new IOException("negative length keys not allowed: " + key);
> }
> out.writeInt(keyLength + valueLength); // total record length
> out.writeInt(keyLength); // key portion length
> if (!isCompressed()) {
> out.writeInt(keyLength);
> key.write(out); // key
> } else {
> keyCompressionBuffer.reset();
> keyDeflateFilter.resetState();
> key.write(keyDeflateOut);
> keyDeflateOut.flush();
> keyDeflateFilter.finish();
> int compressedKeyLen = keyCompressionBuffer.getLength();
> out.writeInt(compressedKeyLen);
> out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
> }
> {code}
> 3. For sequence file compatibility, the compressed key length should be the
> next field to record length, not the uncompressed key length.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira