[ https://issues.apache.org/jira/browse/HIVE-19113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888446#comment-16888446 ]
Gopal V commented on HIVE-19113: -------------------------------- Interesting side-effect {code} - Statistics: Num rows: 4200 Data size: 1253037 Basic stats: COMPLETE Column stats: PARTIAL + Statistics: Num rows: 4200 Data size: 1247197 Basic stats: COMPLETE Column stats: PARTIAL {code} The ORC files got smaller after this change. > Bucketing: Make CLUSTERED BY do CLUSTER BY if no explicit sorting is specified > ------------------------------------------------------------------------------ > > Key: HIVE-19113 > URL: https://issues.apache.org/jira/browse/HIVE-19113 > Project: Hive > Issue Type: Improvement > Components: Logical Optimizer > Affects Versions: 3.0.0 > Reporter: Gopal V > Assignee: Jesus Camacho Rodriguez > Priority: Major > Attachments: HIVE-19113.patch > > > The user's expectation of > "create external table bucketed (key int) clustered by (key) into 4 buckets > stored as orc;" > is that the table will cluster the key into 4 buckets, while the file layout > does not do any actual clustering of rows. > In the absence of a "SORTED BY", this can automatically do a "SORTED BY > (key)" to cluster the keys within the file as expected. -- This message was sent by Atlassian JIRA (v7.6.14#76016)