Hello, We are on hive 1.2.x.
We are generating queries of the form (example) CREATE TABLE T ..... AS WITH T1 AS (SELECT * FROM S1.A), T2 AS (SELECT * FROM S2.B WHERE x > 1000), T3 AS (SELECT * FROM T1 JOIN T2 ON T1.x=T2.y AND T1.z>T2.z) SELECT * from T3 We are generating much larger queries than above and we also use UDFs which are added on the fly and registered as temporary functions. However, since the above syntax does not allow specifying partitions, we will need to create the table before hand and then run the same query again to insert. Also we need to extract the types of the output of the final query. Lot of tools also are doing the same. To extract the types for the create table, we just added LIMIT 0 assuming it would skip running MR/Tez jobs and return fast. When on MR its launching all the jobs and finally throws away the records, which is taking 10+ minutes for the query/cluster combination. When on Tez its much much faster but still high. We tried putting LIMIT 0 and/or WHERE 1=2 always false in the query to make it skip faster, but its not skipping launching mr jobs. We came across the ticket https://issues.apache.org/jira/browse/HIVE-7203, which says the optimisation applies only to the outer most query even when specified in the inner queries. Is there any workaround which works when execution engine is either mr or tez. Help / guidance would be very helpful. Please let me know if I have to share any other information Regards, Abhilash