[ https://issues.apache.org/jira/browse/HIVE-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009981#comment-13009981 ]
Krishna Kumar commented on HIVE-2065: ------------------------------------- Hmm. #3 is taking me a bit too far than I originally thought. I assume being able to read an RCFile as SequenceFile is required, while being able to write an RCFile via the SequenceFile interface is desirable. Having made changes so that record length is correctly set, in order to be able to make sure that the rcfile is handled correctly as a sequence file, the following changes are also required, IIUC. - the second field should be the key length (4 + compressed/plain key contents) - the key class (KeyBuffer) must be made responsible for reading/writing the next field - plain key contents length - as well as compression/decompression of the key contents - the value class (ValueBuffer) related changes will be trickier. Since the value is not compressed as a unit, we can not use record-compressed format. We need to mark the records as plain records, and move the codec to a metadata entry. Then the valueBuffer class will work correctly with sequencefile implementation. Thoughts? worth it? > RCFile issues > ------------- > > Key: HIVE-2065 > URL: https://issues.apache.org/jira/browse/HIVE-2065 > Project: Hive > Issue Type: Bug > Reporter: Krishna Kumar > Assignee: Krishna Kumar > Priority: Minor > Attachments: Slide1.png, proposal.png > > > Some potential issues with RCFile > 1. Remove unwanted synchronized modifiers on the methods of RCFile. As per > yongqiang he, the class is not meant to be thread-safe (and it is not). Might > as well get rid of the confusing and performance-impacting lock acquisitions. > 2. Record Length overstated for compressed files. IIUC, the key compression > happens after we have written the record length. > {code} > int keyLength = key.getSize(); > if (keyLength < 0) { > throw new IOException("negative length keys not allowed: " + key); > } > out.writeInt(keyLength + valueLength); // total record length > out.writeInt(keyLength); // key portion length > if (!isCompressed()) { > out.writeInt(keyLength); > key.write(out); // key > } else { > keyCompressionBuffer.reset(); > keyDeflateFilter.resetState(); > key.write(keyDeflateOut); > keyDeflateOut.flush(); > keyDeflateFilter.finish(); > int compressedKeyLen = keyCompressionBuffer.getLength(); > out.writeInt(compressedKeyLen); > out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen); > } > {code} > 3. For sequence file compatibility, the compressed key length should be the > next field to record length, not the uncompressed key length. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira