[ 
https://issues.apache.org/jira/browse/HIVE-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1071:
---------------------------------

    Component/s: Serializers/Deserializers

> Making RCFile "concatenatable" to reduce the number of files of the output
> --------------------------------------------------------------------------
>
>                 Key: HIVE-1071
>                 URL: https://issues.apache.org/jira/browse/HIVE-1071
>             Project: Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>            Reporter: Zheng Shao
>
> Hive automatically determine the number of reducers most of the time.
> Sometimes, we create a lot of small files.
> Hive has an option to "merge" those small files though a map-reduce job.
> Dhruba has the idea which can fix it even faster:
> if we can make RCFile concatenatable, then we can simply tell the namenode to 
> "merge" these files.
> Pros: This approach does not do any I/O so it's faster.
> Cons: We have to zero-fill the files to make sure they can be concatenated 
> (all blocks except the last have to be full HDFS blocks).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to