Re: Hive, Tez, clustering, buckets, and Presto

2018-04-02 Thread Richard A. Bross
I'm really confused and could use help understanding. The Hive documentation here https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables says: "Bucketed tables are fantastic in that they allow much more efficient sampling than do non-bucketed tables, and they may l

Re: Hive, Tez, clustering, buckets, and Presto

2018-04-02 Thread Richard A. Bross
Gopal, Thanks for taking the time to try and help. A few things in relation to your response: * Yes, the 'epoch' column is an hourly timestamp. Clustering by a column with high cardinality would make little sense. * I'm interested in your statement that CLUSTERED BY does not CLUSTER BY. My