[ https://issues.apache.org/jira/browse/HIVE-3502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain reassigned HIVE-3502: -------------------------------- Assignee: Sambavi Muthukrishnan > design efficient bucketing techniques > ------------------------------------- > > Key: HIVE-3502 > URL: https://issues.apache.org/jira/browse/HIVE-3502 > Project: Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Namit Jain > Assignee: Sambavi Muthukrishnan > > Currently, the bucketing techniques are fairly expensive - The bucketing keys > have to be the same as the reduction keys and the process of bucketization > requires > a fully blown map-reduce job. > It should be possible to perform a map-side bucketization. The high level > idea is > to shard the data based on the number of buckets, and create a sub-directory > for each > bucket. Then, the data from all the mappers (in the same sub-directory) can > be merged. > So, instead of having 1 file per directory, it would lead to 1 directory per > directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira