[ 
https://issues.apache.org/jira/browse/HIVE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160673#comment-13160673
 ] 

Carl Steinbach commented on HIVE-2600:
--------------------------------------

My understanding from looking at the code is that RCFile currently stores all 
columns as strings regardless of the underlying column type, e.g. a FLOAT 
column with values 1.0, 2.2, and 3.33 will be serialized in the column value 
buffer as "1.02.23.33". Assuming this is correct, I'm wondering if there's 
really much advantage to specifying the compression codec at the column level 
since all columns are fundamentally strings?
                
> Enable/Add type-specific compression for rcfile
> -----------------------------------------------
>
>                 Key: HIVE-2600
>                 URL: https://issues.apache.org/jira/browse/HIVE-2600
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Query Processor, Serializers/Deserializers
>            Reporter: Krishna Kumar
>            Assignee: Krishna Kumar
>            Priority: Minor
>         Attachments: HIVE-2600.v0.patch, HIVE-2600.v1.patch
>
>
> Enable schema-aware compression codecs which can perform type-specific 
> compression on a per-column basis. I see this as in three-parts
> 1. Add interfaces for the rcfile to communicate column information to the 
> codec
> 2. Add an "uber compressor" which can perform column-specific compression on 
> a per-block basis. Initially, this can be config driven, but we can go for a 
> dynamic implementation later.
> 3. A bunch of type-specific compressors
> This jira is for the first part of the effort.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to