A-little-bit-of-data commented on issue #10102: URL: https://github.com/apache/incubator-gluten/issues/10102#issuecomment-3082669822
The same spark3.5.2 image is used and the same configuration is started in k8s ##gluten ___admin___.spark.app.name=admin ___admin___.spark.executor.instances=1 ___admin___.spark.driver.cores=1 ___admin___.spark.executor.cores=1 ___admin___.spark.kubernetes.driver.limit.cores=1 ___admin___.spark.kubernetes.executor.limit.cores=1 ___admin___.spark.driver.memory=1g ___admin___.spark.executor.memory=1g ___admin___.spark.memory.offHeap.enabled=true ___admin___.spark.memory.offHeap.size=7g ___admin___.spark.dynamicAllocation.enabled=true ___admin___.spark.dynamicAllocation.shuffleTracking.enabled=true ___admin___.spark.dynamicAllocation.schedulerBacklogTimeout=1s ___admin___.spark.dynamicAllocation.sustainedSchedulerBacklogTimeout = 30s ___admin___.spark.dynamicAllocation.minExecutors=1 ___admin___.spark.dynamicAllocation.maxExecutors=5 ___admin___.spark.dynamicAllocation.initialExecutors=2 ##No gluten ___admin___.spark.app.name=admin ___admin___.spark.executor.instances=1 ___admin___.spark.driver.cores=1 ___admin___.spark.executor.cores=1 ___admin___.spark.kubernetes.driver.limit.cores=1 ___admin___.spark.kubernetes.executor.limit.cores=1 ___admin___.spark.driver.memory=1g ___admin___.spark.executor.memory=8g ___admin___.spark.dynamicAllocation.enabled=true ___admin___.spark.dynamicAllocation.shuffleTracking.enabled=true ___admin___.spark.dynamicAllocation.schedulerBacklogTimeout=1s ___admin___.spark.dynamicAllocation.sustainedSchedulerBacklogTimeout = 30s ___admin___.spark.dynamicAllocation.minExecutors=1 ___admin___.spark.dynamicAllocation.maxExecutors=5 ___admin___.spark.dynamicAllocation.initialExecutors=2 Executed SQL ``` with year_total as ( select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,c_preferred_cust_flag customer_preferred_cust_flag ,c_birth_country customer_birth_country ,c_login customer_login ,c_email_address customer_email_address ,d_year dyear ,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total ,'s' sale_type from sf100.customer ,sf100.store_sales ,sf100.date_dim where c_customer_sk = ss_customer_sk and ss_sold_date_sk = d_date_sk group by c_customer_id ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_country ,c_login ,c_email_address ,d_year union all select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,c_preferred_cust_flag customer_preferred_cust_flag ,c_birth_country customer_birth_country ,c_login customer_login ,c_email_address customer_email_address ,d_year dyear ,sum((((cs_ext_list_price-cs_ext_wholesale_cost-cs_ext_discount_amt)+cs_ext_sales_price)/2) ) year_total ,'c' sale_type from sf100.customer ,sf100.catalog_sales ,sf100.date_dim where c_customer_sk = cs_bill_customer_sk and cs_sold_date_sk = d_date_sk group by c_customer_id ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_country ,c_login ,c_email_address ,d_year union all select c_customer_id customer_id ,c_first_name customer_first_name ,c_last_name customer_last_name ,c_preferred_cust_flag customer_preferred_cust_flag ,c_birth_country customer_birth_country ,c_login customer_login ,c_email_address customer_email_address ,d_year dyear ,sum((((ws_ext_list_price-ws_ext_wholesale_cost-ws_ext_discount_amt)+ws_ext_sales_price)/2) ) year_total ,'w' sale_type from sf100.customer ,sf100.web_sales ,sf100.date_dim where c_customer_sk = ws_bill_customer_sk and ws_sold_date_sk = d_date_sk group by c_customer_id ,c_first_name ,c_last_name ,c_preferred_cust_flag ,c_birth_country ,c_login ,c_email_address ,d_year ) select t_s_secyear.customer_id ,t_s_secyear.customer_first_name ,t_s_secyear.customer_last_name ,t_s_secyear.customer_birth_country from year_total t_s_firstyear ,year_total t_s_secyear ,year_total t_c_firstyear ,year_total t_c_secyear ,year_total t_w_firstyear ,year_total t_w_secyear where t_s_secyear.customer_id = t_s_firstyear.customer_id and t_s_firstyear.customer_id = t_c_secyear.customer_id and t_s_firstyear.customer_id = t_c_firstyear.customer_id and t_s_firstyear.customer_id = t_w_firstyear.customer_id and t_s_firstyear.customer_id = t_w_secyear.customer_id and t_s_firstyear.sale_type = 's' and t_c_firstyear.sale_type = 'c' and t_w_firstyear.sale_type = 'w' and t_s_secyear.sale_type = 's' and t_c_secyear.sale_type = 'c' and t_w_secyear.sale_type = 'w' and t_s_firstyear.dyear = 1999 and t_s_secyear.dyear = 1999+1 and t_c_firstyear.dyear = 1999 and t_c_secyear.dyear = 1999+1 and t_w_firstyear.dyear = 1999 and t_w_secyear.dyear = 1999+1 and t_s_firstyear.year_total > 0 and t_c_firstyear.year_total > 0 and t_w_firstyear.year_total > 0 and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end > case when t_s_firstyear.year_total > 0 then t_s_secyear.year_total / t_s_firstyear.year_total else null end and case when t_c_firstyear.year_total > 0 then t_c_secyear.year_total / t_c_firstyear.year_total else null end > case when t_w_firstyear.year_total > 0 then t_w_secyear.year_total / t_w_firstyear.year_total else null end order by t_s_secyear.customer_id ,t_s_secyear.customer_first_name ,t_s_secyear.customer_last_name ,t_s_secyear.customer_birth_country ; ``` ##gluten <img width="1019" height="570" alt="Image" src="https://github.com/user-attachments/assets/0b90edd3-34e9-4d5f-b694-4fe6e24b07f8" /> ##No gluten <img width="1766" height="1013" alt="Image" src="https://github.com/user-attachments/assets/ac96bdad-1226-4e93-ad34-2c5c6a46c73d" /> The test results did not show the improvement as much as expected. This is still under the sf100 database. When the volume is small, the situation is still upside down. Of course, I only selected one of the sql tests in tpch, not all of them. Is there something wrong with my use? Please give me some advice -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
