I remember that textfiles are used in those scripts. With 0.12, I think ORC should be used. Also, I think those sub-queries should be merged into a single query. With a single query, if a reduce join is converted to a map join, this map join can be merged to its child job. But, if this join is evaluated by an individual query, hive has to use a single map only job to evaluate it because it does not know this map only job is used to generate intermediate results. For query 17 and query 18, with a single query, Correlation Optimizer should be able to optimize these two queries (set hive.optimize.correlation=true).
Thanks, Yin On Fri, Nov 22, 2013 at 1:31 PM, Avrilia Floratou < avrilia.flora...@gmail.com> wrote: > Hello, > > I'd like to run a few TPC-H queries on Hive 0.12. I've found the TPC-H > scripts here: > > https://issues.apache.org/jira/browse/HIVE-600. > > but noticed that these scripts were generated a long time ago. Since Hive > could not support full SQL-92 specification some queries were split into > smaller sub-queries whose results have been materialized. Is there any > change in HiveQL (in Hive 0.12) that would affect the way the TPC-H queries > are written? > > Thanks, > Avrilia >