Stamatis Zampetakis created HIVE-29146:
------------------------------------------

             Summary: Query with WITH clause fails during split generation when 
CTE materaliazation is enabled
                 Key: HIVE-29146
                 URL: https://issues.apache.org/jira/browse/HIVE-29146
             Project: Hive
          Issue Type: Bug
            Reporter: Stamatis Zampetakis
         Attachments: repro.q

Queries with WITH clause over transactional tables fail at runtime during split 
generation when the CTE materialization is enabled.

The problem can be reproduced by running TPC-DS query 11 and setting the 
following properties.
{code:sql}
set hive.optimize.cte.materialize.threshold=2;
set hive.optimize.cte.materialize.full.aggregate.only=false;
set hive.optimize.shared.work=true;

with year_total as (
 select c_customer_id customer_id
       ,c_first_name customer_first_name
       ,c_last_name customer_last_name
       ,c_preferred_cust_flag customer_preferred_cust_flag
       ,c_birth_country customer_birth_country
       ,c_login customer_login
       ,c_email_address customer_email_address
       ,d_year dyear
       ,sum(ss_ext_list_price-ss_ext_discount_amt) year_total
       ,'s' sale_type
 from customer
     ,store_sales
     ,date_dim
 where c_customer_sk = ss_customer_sk
   and ss_sold_date_sk = d_date_sk
 group by c_customer_id
         ,c_first_name
         ,c_last_name
         ,c_preferred_cust_flag
         ,c_birth_country
         ,c_login
         ,c_email_address
         ,d_year
 union all
 select c_customer_id customer_id
       ,c_first_name customer_first_name
       ,c_last_name customer_last_name
       ,c_preferred_cust_flag customer_preferred_cust_flag
       ,c_birth_country customer_birth_country
       ,c_login customer_login
       ,c_email_address customer_email_address
       ,d_year dyear
       ,sum(ws_ext_list_price-ws_ext_discount_amt) year_total
       ,'w' sale_type
 from customer
     ,web_sales
     ,date_dim
 where c_customer_sk = ws_bill_customer_sk
   and ws_sold_date_sk = d_date_sk
 group by c_customer_id
         ,c_first_name
         ,c_last_name
         ,c_preferred_cust_flag
         ,c_birth_country
         ,c_login
         ,c_email_address
         ,d_year
         )
  select
                  t_s_secyear.customer_id
                 ,t_s_secyear.customer_first_name
                 ,t_s_secyear.customer_last_name
                 ,t_s_secyear.customer_birth_country
 from year_total t_s_firstyear
     ,year_total t_s_secyear
     ,year_total t_w_firstyear
     ,year_total t_w_secyear
 where t_s_secyear.customer_id = t_s_firstyear.customer_id
         and t_s_firstyear.customer_id = t_w_secyear.customer_id
         and t_s_firstyear.customer_id = t_w_firstyear.customer_id
         and t_s_firstyear.sale_type = 's'
         and t_w_firstyear.sale_type = 'w'
         and t_s_secyear.sale_type = 's'
         and t_w_secyear.sale_type = 'w'
         and t_s_firstyear.dyear = 1999
         and t_s_secyear.dyear = 1999+1
         and t_w_firstyear.dyear = 1999
         and t_w_secyear.dyear = 1999+1
         and t_s_firstyear.year_total > 0
         and t_w_firstyear.year_total > 0
         and case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total 
/ t_w_firstyear.year_total else 0.0 end
             > case when t_s_firstyear.year_total > 0 then 
t_s_secyear.year_total / t_s_firstyear.year_total else 0.0 end
 order by t_s_secyear.customer_id
         ,t_s_secyear.customer_first_name
         ,t_s_secyear.customer_last_name
         ,t_s_secyear.customer_birth_country
limit 100;
{code}
 
Sample error is shown below. There are various occurrences of the same error 
each with a different table.
{noformat}
2025-08-14T05:10:46,213 ERROR [Dispatcher thread {Central}] impl.VertexImpl: 
Vertex Input: date_dim initializer failed, 
vertex=vertex_1755173441322_0001_1_02 [Map 6]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.io.IOException: Acid 
table: default.date_dim is missing from the ValidWriteIdList config: null
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:330)
 ~[tez-dag-0.10.5.jar:0.10.5]
        at 
com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1228)
 ~[guava-22.0.jar:?]
        at 
com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:399)
 ~[guava-22.0.jar:?]
        at 
com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:911)
 ~[guava-22.0.jar:?]
        at 
com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:822)
 ~[guava-22.0.jar:?]
        at 
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:686)
 ~[guava-22.0.jar:?]
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:113)
 ~[guava-22.0.jar:?]
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
 ~[guava-22.0.jar:?]
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
 ~[guava-22.0.jar:?]
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
 ~[?:?]
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
 ~[?:?]
        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: java.io.IOException: Acid table: default.date_dim is missing from 
the ValidWriteIdList config: null
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:536)
 ~[hive-exec-4.2.0-SNAPSHOT.jar:4.2.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:880)
 ~[hive-exec-4.2.0-SNAPSHOT.jar:4.2.0-SNAPSHOT]
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:363)
 ~[hive-exec-4.2.0-SNAPSHOT.jar:4.2.0-SNAPSHOT]
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:280)
 ~[tez-dag-0.10.5.jar:0.10.5]
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:272)
 ~[tez-dag-0.10.5.jar:0.10.5]
        at 
java.base/java.security.AccessController.doPrivileged(AccessController.java:714)
 ~[?:?]
        at java.base/javax.security.auth.Subject.doAs(Subject.java:525) ~[?:?]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
 ~[hadoop-common-3.4.1.jar:?]
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:272)
 ~[tez-dag-0.10.5.jar:0.10.5]
        at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:256)
 ~[tez-dag-0.10.5.jar:0.10.5]
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
 ~[guava-22.0.jar:?]
        ... 5 more

{noformat}

The problem is reproducible in master (commit 
243bc97290f12c97a11b840f2723ec50458b198c) using  [^repro.q] 

{code:java}
mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=repro.q 
{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to