Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-07-18 Thread @Sanjiv Singh
om (select (cast (floor(r*100) as bigint)+ 1) + 100L > * (row_number () over (partition by (cast (floor(r*100) as bigint) + 1) > order by null) - 1) as ETL_ROW_ID > > > > from(select *,rand() as r from INTER_ETL) as t > > )

Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-07-01 Thread @Sanjiv Singh
> > > > from INTER_ETL as t > > > > joingroup_rows_accumulated as a > > > > on a.group_id = > > abs(hash(MyColumn))%10000 > > ; > > > > *F

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-07-01 Thread Markovitz, Dudu
udu [mailto:dmarkov...@paypal.com] Sent: Thursday, June 30, 2016 12:43 PM To: user@hive.apache.org; sanjiv.is...@gmail.com Subject: RE: Query Performance Issue : Group By and Distinct and load on reducer 1. This works. I’ve recalled that the CAST is needed since FLOOR defaults to FLOAT. sel

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-30 Thread Markovitz, Dudu
1 39567412227 40529759537 From: Markovitz, Dudu [mailto:dmarkov...@paypal.com] Sent: Wednesday, June 29, 2016 11:37 PM To: sanjiv.is...@gmail.com Cc: user@hive.apache.org Subject: RE: Query Performance Issue : Group By and Distinct and load on reducer 1. This is

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-29 Thread Markovitz, Dudu
a1.group_id group bya1.group_id ; From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] Sent: Wednesday, June 29, 2016 10:55 PM To: Markovitz, Dudu Cc: user@hive.apache.org Subject: Re: Query Performance Issue : Group By and Distinct and load on reducer Hi Dudu, I tried the same on same ta

Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-29 Thread @Sanjiv Singh
t; MyCol1,MyCol2 rows between unbounded preceding and 1 preceding) - >> count(*) as accum_rows >> >> >> >> fromINTER_ETL >> >> >> >> group byabs(hash(MyCol1,MyCol2))%1 >> >

Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread @Sanjiv Singh
> > as a > > > > on a.group_id = > abs(hash(t.MyCol1,t.MyCol2))%1 > > > > ; > > > > > > > > *From:* @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] > *Sent:* Tuesday, June 28, 2016 11:52

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread Markovitz, Dudu
: Tuesday, June 28, 2016 11:52 PM To: Markovitz, Dudu Cc: user@hive.apache.org Subject: Re: Query Performance Issue : Group By and Distinct and load on reducer ETL_ROW_ID is to be consecutive number. I need to check if having unique number would not break any logic. Considering unique number for

Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread @Sanjiv Singh
vitz, Dudu > *Cc:* user@hive.apache.org > *Subject:* Re: Query Performance Issue : Group By and Distinct and load > on reducer > > > > Hi Dudu, > > > > You are correct ...ROW_NUMBER() is main culprit. > > > > ROW_NUMBER() OVER Not Fast Enough With Large Resu

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread Markovitz, Dudu
e: The row_number operation seems to be skewed. Dudu From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com<mailto:sanjiv.is...@gmail.com>] Sent: Tuesday, June 28, 2016 8:54 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Query Performance Issue : Group By and Distinct and l

Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread @Sanjiv Singh
wed. > > > > Dudu > > > > *From:* @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] > *Sent:* Tuesday, June 28, 2016 8:54 PM > *To:* user@hive.apache.org > *Subject:* Query Performance Issue : Group By and Distinct and load on > reducer > > > > Hi All, > &

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread Markovitz, Dudu
The row_number operation seems to be skewed. Dudu From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] Sent: Tuesday, June 28, 2016 8:54 PM To: user@hive.apache.org Subject: Query Performance Issue : Group By and Distinct and load on reducer Hi All, I am having performance issue with data skew

Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread @Sanjiv Singh
Hi All, I am having performance issue with data skew of the distinct statement in Hive . See below query with DISTINCT operator. *Original Query : * SELECT DISTINCT SD.