Kevin Wilfong created HIVE-4005:
-----------------------------------

             Summary: Column truncation
                 Key: HIVE-4005
                 URL: https://issues.apache.org/jira/browse/HIVE-4005
             Project: Hive
          Issue Type: New Feature
          Components: CLI
    Affects Versions: 0.11.0
            Reporter: Kevin Wilfong
            Assignee: Kevin Wilfong


Column truncation allows users to remove data for columns that are no longer 
useful.

This is done by removing the data for the column and setting the length of the 
column data and related lengths to 0 in the RC file header.

RC file was fixed to recognize columns with lengths of zero to be empty and are 
treated as if the column doesn't exist in the data, a null is returned for 
every value of that column in every row. This is the same thing that happens 
when more columns are selected than exist in the file.

A new command was added to the CLI
TRUNCATE TABLE ... PARTITION ... COLUMNS ...

This launches a map only job where each mapper rewrites a single file without 
the unnecessary column data and the adjusted headers. It does not 
uncompress/deserialize the data so it is much faster than rewriting the data 
with NULLs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to