Igor Kabiljo created HIVE-3541:
----------------------------------
Summary: Allow keeping the bucket order while streaming bucketed
table
Key: HIVE-3541
URL: https://issues.apache.org/jira/browse/HIVE-3541
Project: Hive
Issue Type: Improvement
Reporter: Igor Kabiljo
Priority: Minor
If we have a bucketed table, for example table_a with columns col_key and
col_value (bucketed on col_key), and we need to create new derived bucketed
table (by for example SELECT col_key, col_value*2 FROM table a), it would be
fastest if it can be done in single streaming map-only job.
With specifying:
SET hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
we can make sure that each input bucket will be read by exactly one mapper, and
that they will output exactly one file. With:
SET hive.merge.mapfiles = false;
SET hive.merge.mapredfiles = false;
SET hive.enforce.bucketing = false;
We can make sure those files are inserted as is into the output table.
But with that - bucket order is not kept, so end table is not bucketed
correctly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira