[ 
https://issues.apache.org/jira/browse/HIVE-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-3502:
--------------------------------

    Assignee: Sambavi Muthukrishnan
    
> design efficient bucketing techniques
> -------------------------------------
>
>                 Key: HIVE-3502
>                 URL: https://issues.apache.org/jira/browse/HIVE-3502
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Sambavi Muthukrishnan
>
> Currently, the bucketing techniques are fairly expensive - The bucketing keys 
> have to be the same as the reduction keys and the process of bucketization 
> requires
> a fully blown map-reduce job.
> It should be possible to perform a map-side bucketization. The high level 
> idea is
> to shard the data based on the number of buckets, and create a sub-directory 
> for each
> bucket. Then, the data from all the mappers (in the same sub-directory) can 
> be merged.
> So, instead of having 1 file per directory, it would lead to 1 directory per 
> directory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to