Thanks, I wanted to rule out skewedness over m_d_key,sb_gu_key Dudu
From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] Sent: Thursday, June 23, 2016 11:55 PM To: user@hive.apache.org; Markovitz, Dudu <dmarkov...@paypal.com>; sanjiv singh (ME) <sanjiv.is...@gmail.com> Subject: Re: Optimize Hive Query Hi Dudu, find below query response. Query : select m_d_key,sb_gu_key ,count (*) as cnt from tuning_dd_key group by m_d_key,sb_gu_key order by cnt desc limit 100; Output : 16 9042668 1361 16 8063808 1361 16 8569864 1361 16 8909889 1361 16 9864785 1361 16 8269717 1361 16 10180282 1361 16 8913062 1361 16 8418183 1361 16 8003791 1361 16 10201084 1361 16 8470942 1361 16 9234223 1361 16 8330286 1361 16 12966192 1361 16 9008767 1361 16 8902598 1361 16 9878885 1361 16 8741214 1361 16 8732856 1361 16 9692696 1361 16 8072042 1361 16 8802681 1361 16 14087558 1361 16 9027186 1361 16 9587342 1361 16 9699202 1361 16 8542344 1361 16 9680544 1361 16 8903570 1361 16 9542542 1361 4 3576041 1361 16 9126774 1361 16 9957826 1361 16 8345331 1361 16 9756883 1361 16 9399702 1361 18 9403442 1361 16 9746288 1361 16 9435202 1361 16 9069894 1361 16 9920826 1361 16 8765877 1361 16 8813448 1361 18 9635460 1361 16 8463714 1361 16 8166965 1361 16 9597903 1361 16 9432100 1361 16 8847857 1361 16 13953068 1361 16 8744451 1361 16 8089463 1361 16 9674902 1361 16 8418200 1361 16 8028509 1361 16 9243086 1361 16 8892184 1361 16 8801594 1361 16 9849079 1361 16 8556753 1361 16 8979232 1361 16 8081946 1361 16 8724046 1361 16 9984434 1361 16 8651659 1361 16 9116866 1361 1 17870072 1361 16 8860630 1361 16 9888398 1361 16 9463782 1361 16 9602127 1361 16 9353325 1361 16 7991816 1361 16 9920420 1361 16 8497624 1361 16 8987980 1361 16 8234751 1361 16 8389490 1361 18 9975575 1361 16 8026536 1361 16 8790618 1361 16 9846791 1361 16 8363833 1361 16 9025525 1361 16 9241297 1361 16 8712487 1361 16 8692003 1361 16 9316523 1361 16 8124338 1361 16 9941027 1361 16 9547973 1361 16 8007742 1361 16 8418425 1361 16 8944940 1361 16 8890232 1361 16 9248984 1361 16 9784461 1361 16 9009374 1361 16 8395861 1361 Regards Sanjiv Singh Mob : +091 9990-447-339 On Thu, Jun 23, 2016 at 4:01 AM, Markovitz, Dudu <dmarkov...@paypal.com<mailto:dmarkov...@paypal.com>> wrote: Could you also add the results of the following query? Thanks Dudu select m_d_key ,sb_gu_key ,count (*) as cnt from tuning_dd_key group by m_d_key ,sb_gu_key order by cnt desc limit 100 ; -----Original Message----- From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com<mailto:go...@hortonworks.com>] On Behalf Of Gopal Vijayaraghavan Sent: Thursday, June 23, 2016 9:45 AM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Optimize Hive Query > Long running query : Are you running this on MapReduce or Tez? Please post the output of explain - if you are seeing > 1 shuffle edge in your query while having only one window for OVER(), that might be the reason. OVER ( PARTITION BY m_d_key , sb_gu_key ORDER BY t_ev_st_dt) The multiple PTF operators should have been collapsed by the reduce sink-deduplication. Cheers, Gopal