Thanks, I wanted to rule out skewedness over m_d_key,sb_gu_key

Dudu

From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com]
Sent: Thursday, June 23, 2016 11:55 PM
To: user@hive.apache.org; Markovitz, Dudu <dmarkov...@paypal.com>; sanjiv singh 
(ME) <sanjiv.is...@gmail.com>
Subject: Re: Optimize Hive Query

Hi Dudu,

find below query response.

Query :
select      m_d_key,sb_gu_key   ,count (*)   as cnt
from        tuning_dd_key
group by    m_d_key,sb_gu_key
order by    cnt desc
limit       100;

Output :

16    9042668          1361
16    8063808          1361
16    8569864          1361
16    8909889          1361
16    9864785          1361
16    8269717          1361
16    10180282        1361
16    8913062          1361
16    8418183          1361
16    8003791          1361
16    10201084        1361
16    8470942          1361
16    9234223          1361
16    8330286          1361
16    12966192        1361
16    9008767          1361
16    8902598          1361
16    9878885          1361
16    8741214          1361
16    8732856          1361
16    9692696          1361
16    8072042          1361
16    8802681          1361
16    14087558        1361
16    9027186          1361
16    9587342          1361
16    9699202          1361
16    8542344          1361
16    9680544          1361
16    8903570          1361
16    9542542          1361
4      3576041          1361
16    9126774          1361
16    9957826          1361
16    8345331          1361
16    9756883          1361
16    9399702          1361
18    9403442          1361
16    9746288          1361
16    9435202          1361
16    9069894          1361
16    9920826          1361
16    8765877          1361
16    8813448          1361
18    9635460          1361
16    8463714          1361
16    8166965          1361
16    9597903          1361
16    9432100          1361
16    8847857          1361
16    13953068        1361
16    8744451          1361
16    8089463          1361
16    9674902          1361
16    8418200          1361
16    8028509          1361
16    9243086          1361
16    8892184          1361
16    8801594          1361
16    9849079          1361
16    8556753          1361
16    8979232          1361
16    8081946          1361
16    8724046          1361
16    9984434          1361
16    8651659          1361
16    9116866          1361
1      17870072        1361
16    8860630          1361
16    9888398          1361
16    9463782          1361
16    9602127          1361
16    9353325          1361
16    7991816          1361
16    9920420          1361
16    8497624          1361
16    8987980          1361
16    8234751          1361
16    8389490          1361
18    9975575          1361
16    8026536          1361
16    8790618          1361
16    9846791          1361
16    8363833          1361
16    9025525          1361
16    9241297          1361
16    8712487          1361
16    8692003          1361
16    9316523          1361
16    8124338          1361
16    9941027          1361
16    9547973          1361
16    8007742          1361
16    8418425          1361
16    8944940          1361
16    8890232          1361
16    9248984          1361
16    9784461          1361
16    9009374          1361
16    8395861          1361


Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Thu, Jun 23, 2016 at 4:01 AM, Markovitz, Dudu 
<dmarkov...@paypal.com<mailto:dmarkov...@paypal.com>> wrote:

Could you also add the results of the following query?



Thanks



Dudu





select      m_d_key

           ,sb_gu_key

           ,count (*)   as cnt



from        tuning_dd_key



group by    m_d_key

           ,sb_gu_key



order by    cnt desc



limit       100

;



-----Original Message-----
From: Gopal Vijayaraghavan 
[mailto:go...@hortonworks.com<mailto:go...@hortonworks.com>] On Behalf Of Gopal 
Vijayaraghavan
Sent: Thursday, June 23, 2016 9:45 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Optimize Hive Query





> Long running query :



Are you running this on MapReduce or Tez?



Please post the output of explain - if you are seeing > 1 shuffle edge in your 
query while having only one window for OVER(), that might be the reason.



OVER ( PARTITION BY  m_d_key , sb_gu_key  ORDER BY  t_ev_st_dt)



The multiple PTF operators should have been collapsed by the reduce 
sink-deduplication.



Cheers,

Gopal





Reply via email to