Stamatis Zampetakis created HIVE-29146: ------------------------------------------
Summary: Query with WITH clause fails during split generation when CTE materaliazation is enabled Key: HIVE-29146 URL: https://issues.apache.org/jira/browse/HIVE-29146 Project: Hive Issue Type: Bug Reporter: Stamatis Zampetakis Attachments: repro.q Queries with WITH clause over transactional tables fail at runtime during split generation when the CTE materialization is enabled. The problem can be reproduced by running TPC-DS query 11 and setting the following properties. {code:sql} set hive.optimize.cte.materialize.threshold=2; set hive.optimize.cte.materialize.full.aggregate.only=false; set hive.optimize.shared.work=true; with year_total as ( select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,c_preferred_cust_flag customer_preferred_cust_flag ,c_birth_country customer_birth_country ,c_login customer_login ,c_email_address customer_email_address ,d_year dyear ,sum(ss_ext_list_price-ss_ext_discount_amt) year_total ,'s' sale_type from customer ,store_sales ,date_dim where c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk group by c_customer_id ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_country ,c_login ,c_email_address ,d_year union all select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,c_preferred_cust_flag customer_preferred_cust_flag ,c_birth_country customer_birth_country ,c_login customer_login ,c_email_address customer_email_address ,d_year dyear ,sum(ws_ext_list_price-ws_ext_discount_amt) year_total ,'w' sale_type from customer ,web_sales ,date_dim where c_customer_sk = ws_bill_customer_sk and ws_sold_date_sk = d_date_sk group by c_customer_id ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_country ,c_login ,c_email_address ,d_year ) select t_s_secyear.customer_id ,t_s_secyear.customer_first_name ,t_s_secyear.customer_last_name ,t_s_secyear.customer_birth_country from year_total t_s_firstyear ,year_total t_s_secyear ,year_total t_w_firstyear ,year_total t_w_secyear where t_s_secyear.customer_id = t_s_firstyear.customer_id and t_s_firstyear.customer_id = t_w_secyear.customer_id and t_s_firstyear.customer_id = t_w_firstyear.customer_id and t_s_firstyear.sale_type = 's' and t_w_firstyear.sale_type = 'w' and t_s_secyear.sale_type = 's' and t_w_secyear.sale_type = 'w' and t_s_firstyear.dyear = 1999 and t_s_secyear.dyear = 1999+1 and t_w_firstyear.dyear = 1999 and t_w_secyear.dyear = 1999+1 and t_s_firstyear.year_total > 0 and t_w_firstyear.year_total > 0 and case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else 0.0 end > case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else 0.0 end order by t_s_secyear.customer_id ,t_s_secyear.customer_first_name ,t_s_secyear.customer_last_name ,t_s_secyear.customer_birth_country limit 100; {code} Sample error is shown below. There are various occurrences of the same error each with a different table. {noformat} 2025-08-14T05:10:46,213 ERROR [Dispatcher thread {Central}] impl.VertexImpl: Vertex Input: date_dim initializer failed, vertex=vertex_1755173441322_0001_1_02 [Map 6] org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.io.IOException: Acid table: default.date_dim is missing from the ValidWriteIdList config: null at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:330) ~[tez-dag-0.10.5.jar:0.10.5] at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1228) ~[guava-22.0.jar:?] at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399) ~[guava-22.0.jar:?] at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:911) ~[guava-22.0.jar:?] at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:822) ~[guava-22.0.jar:?] at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:686) ~[guava-22.0.jar:?] at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:113) ~[guava-22.0.jar:?] at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) ~[guava-22.0.jar:?] at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) ~[guava-22.0.jar:?] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?] at java.base/java.lang.Thread.run(Thread.java:1583) [?:?] Caused by: java.io.IOException: Acid table: default.date_dim is missing from the ValidWriteIdList config: null at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:536) ~[hive-exec-4.2.0-SNAPSHOT.jar:4.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:880) ~[hive-exec-4.2.0-SNAPSHOT.jar:4.2.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:363) ~[hive-exec-4.2.0-SNAPSHOT.jar:4.2.0-SNAPSHOT] at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:280) ~[tez-dag-0.10.5.jar:0.10.5] at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:272) ~[tez-dag-0.10.5.jar:0.10.5] at java.base/java.security.AccessController.doPrivileged(AccessController.java:714) ~[?:?] at java.base/javax.security.auth.Subject.doAs(Subject.java:525) ~[?:?] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953) ~[hadoop-common-3.4.1.jar:?] at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:272) ~[tez-dag-0.10.5.jar:0.10.5] at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:256) ~[tez-dag-0.10.5.jar:0.10.5] at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) ~[guava-22.0.jar:?] ... 5 more {noformat} The problem is reproducible in master (commit 243bc97290f12c97a11b840f2723ec50458b198c) using [^repro.q] {code:java} mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=repro.q {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)