RCFile issues
-------------

                 Key: HIVE-2065
                 URL: https://issues.apache.org/jira/browse/HIVE-2065
             Project: Hive
          Issue Type: Bug
            Reporter: Krishna Kumar
            Priority: Minor


Some potential issues with RCFile

1. Remove unwanted synchronized modifiers on the methods of RCFile. As per 
yongqiang he, the class is not meant to be thread-safe (and it is not). Might 
as well get rid of the confusing and performance-impacting lock acquisitions.

2. Record Length overstated for compressed files. IIUC, the key compression 
happens after we have written the record length.

{code}
      int keyLength = key.getSize();
      if (keyLength < 0) {
        throw new IOException("negative length keys not allowed: " + key);
      }

      out.writeInt(keyLength + valueLength); // total record length
      out.writeInt(keyLength); // key portion length
      if (!isCompressed()) {
        out.writeInt(keyLength);
        key.write(out); // key
      } else {
        keyCompressionBuffer.reset();
        keyDeflateFilter.resetState();
        key.write(keyDeflateOut);
        keyDeflateOut.flush();
        keyDeflateFilter.finish();
        int compressedKeyLen = keyCompressionBuffer.getLength();
        out.writeInt(compressedKeyLen);
        out.write(keyCompressionBuffer.getData(), 0, compressedKeyLen);
      }
{code}

3. For sequence file compatibility, the compressed key length should be the 
next field to record length, not the uncompressed key length.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to