Eugene Koifman created HIVE-17923:
-------------------------------------
Summary: 'cluster by' should not be needed for a bucketed table
Key: HIVE-17923
URL: https://issues.apache.org/jira/browse/HIVE-17923
Project: Hive
Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Priority: Blocker
given
{noformat}
CREATE TABLE over10k_orc_bucketed(t tinyint,
si smallint,
i int,
b bigint,
f float,
d double,
bo boolean,
s string,
ts timestamp,
`dec` decimal(4,2),
bin binary) CLUSTERED BY(si) INTO 4 BUCKETS STORED AS ORC;
{noformat}
insert into over10k_orc_bucketed select * from over10k
{noformat}
produces 1 data file (bucket 0). It should produce 4 based on input data.
{noformat}
insert into over10k_orc_bucketed select * from over10k cluster by si
{noformat}
does the right thing.
acid_vectorization_original.q has the full script
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)