Hi All, I am running performance issue with below query. Its took 2-3 hours to complete in hive.
Try tried to partition and bucketing changes on this tables, but without luck. Please help me in optimizing this query. what schema level changes can be done ? other parameters recommendations ? *Below are complete details :* *Hive Table DDL :* CREATE TABLE `tuning_dd_key`( > m_d_key smallint, > sb_gu_key bigint, > t_ev_st_dt date, > a_z_key int, > c_dt date, > e_p_dt date, > sq_nbr int); *Total data size : * > 250 GB *Long running query :* > SELECT > sb_gu_key, m_d_key, t_ev_st_dt, > LAG( t_ev_st_dt ) OVER ( PARTITION BY m_d_key , sb_gu_key ORDER BY > t_ev_st_dt ) AS LAG_START_DT, > a_z_key, > c_dt, > e_p_dt, > sq_nbr, > CASE WHEN LAG( t_ev_st_dt ) OVER ( PARTITION BY m_d_key , sb_gu_key > ORDER BY t_ev_st_dt ) IS NULL OR a_z_key <> LAG( a_z_key , 1 , -999 ) > OVER ( PARTITION BY m_d_key , sb_gu_key ORDER BY t_ev_st_dt ) THEN 'S' > ELSE NULL END AS ST_FLAG > FROM `PRDDB`.tuning_dd_key ; *More info :* number of distinct value in column m_d_key : 29 > number of distinct value in column sb_gu_key : 15434343 Regards Sanjiv Singh Mob : +091 9990-447-339