subject:"Re\: Query Performance Issue \: Group By and Distinct and load on reducer"

Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-07-18 Thread @Sanjiv Singh

om (select (cast (floor(r*100) as bigint)+ 1) + 100L > * (row_number () over (partition by (cast (floor(r*100) as bigint) + 1) > order by null) - 1) as ETL_ROW_ID > > > > from(select *,rand() as r from INTER_ETL) as t > > )

Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-07-01 Thread @Sanjiv Singh

> > > > from INTER_ETL as t > > > > joingroup_rows_accumulated as a > > > > on a.group_id = > > abs(hash(MyColumn))%10000 > > ; > > > > *F

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-07-01 Thread Markovitz, Dudu

udu [mailto:dmarkov...@paypal.com] Sent: Thursday, June 30, 2016 12:43 PM To: user@hive.apache.org; sanjiv.is...@gmail.com Subject: RE: Query Performance Issue : Group By and Distinct and load on reducer 1. This works. I’ve recalled that the CAST is needed since FLOOR defaults to FLOAT. sel

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-30 Thread Markovitz, Dudu

1 39567412227 40529759537 From: Markovitz, Dudu [mailto:dmarkov...@paypal.com] Sent: Wednesday, June 29, 2016 11:37 PM To: sanjiv.is...@gmail.com Cc: user@hive.apache.org Subject: RE: Query Performance Issue : Group By and Distinct and load on reducer 1. This is

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-29 Thread Markovitz, Dudu

a1.group_id group bya1.group_id ; From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] Sent: Wednesday, June 29, 2016 10:55 PM To: Markovitz, Dudu Cc: user@hive.apache.org Subject: Re: Query Performance Issue : Group By and Distinct and load on reducer Hi Dudu, I tried the same on same ta

Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-29 Thread @Sanjiv Singh

t; MyCol1,MyCol2 rows between unbounded preceding and 1 preceding) - >> count(*) as accum_rows >> >> >> >> fromINTER_ETL >> >> >> >> group byabs(hash(MyCol1,MyCol2))%1 >> >

Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread @Sanjiv Singh

> > as a > > > > on a.group_id = > abs(hash(t.MyCol1,t.MyCol2))%1 > > > > ; > > > > > > > > *From:* @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] > *Sent:* Tuesday, June 28, 2016 11:52

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread Markovitz, Dudu

: Tuesday, June 28, 2016 11:52 PM To: Markovitz, Dudu Cc: user@hive.apache.org Subject: Re: Query Performance Issue : Group By and Distinct and load on reducer ETL_ROW_ID is to be consecutive number. I need to check if having unique number would not break any logic. Considering unique number for

Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread @Sanjiv Singh

vitz, Dudu > *Cc:* user@hive.apache.org > *Subject:* Re: Query Performance Issue : Group By and Distinct and load > on reducer > > > > Hi Dudu, > > > > You are correct ...ROW_NUMBER() is main culprit. > > > > ROW_NUMBER() OVER Not Fast Enough With Large Resu

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread Markovitz, Dudu

I’m guessing ETL_ROW_ID should be unique but not necessarily contain only consecutive numbers? From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] Sent: Tuesday, June 28, 2016 10:57 PM To: Markovitz, Dudu Cc: user@hive.apache.org Subject: Re: Query Performance Issue : Group By and Distinct and

Re: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread @Sanjiv Singh

Hi Dudu, You are correct ...ROW_NUMBER() is main culprit. ROW_NUMBER() OVER Not Fast Enough With Large Result Set, any good solution? Regards Sanjiv Singh Mob : +091 9990-447-339 On Tue, Jun 28, 2016 at 3:42 PM, Markovitz, Dudu wrote: > The row_number operation seems to be skewed. > > > >

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread Markovitz, Dudu

The row_number operation seems to be skewed. Dudu From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] Sent: Tuesday, June 28, 2016 8:54 PM To: user@hive.apache.org Subject: Query Performance Issue : Group By and Distinct and load on reducer Hi All, I am having performance issue with data skew o

Re: Query Performance Issue : Group By and Distinct and load on reducer

Re: Query Performance Issue : Group By and Distinct and load on reducer

RE: Query Performance Issue : Group By and Distinct and load on reducer

RE: Query Performance Issue : Group By and Distinct and load on reducer

RE: Query Performance Issue : Group By and Distinct and load on reducer

Re: Query Performance Issue : Group By and Distinct and load on reducer

Re: Query Performance Issue : Group By and Distinct and load on reducer

RE: Query Performance Issue : Group By and Distinct and load on reducer

Re: Query Performance Issue : Group By and Distinct and load on reducer

RE: Query Performance Issue : Group By and Distinct and load on reducer

Re: Query Performance Issue : Group By and Distinct and load on reducer

RE: Query Performance Issue : Group By and Distinct and load on reducer

12 matches

Site Navigation

Mail list logo

Footer information