om (select (cast (floor(r*100) as bigint)+ 1) + 100L
> * (row_number () over (partition by (cast (floor(r*100) as bigint) + 1)
> order by null) - 1) as ETL_ROW_ID
>
>
>
> from(select *,rand() as r from INTER_ETL) as t
>
> )
>
>
>
> from INTER_ETL as t
>
>
>
> joingroup_rows_accumulated as a
>
>
>
> on a.group_id =
>
> abs(hash(MyColumn))%10000
>
> ;
>
>
>
> *F
udu [mailto:dmarkov...@paypal.com]
Sent: Thursday, June 30, 2016 12:43 PM
To: user@hive.apache.org; sanjiv.is...@gmail.com
Subject: RE: Query Performance Issue : Group By and Distinct and load on reducer
1.
This works.
I’ve recalled that the CAST is needed since FLOOR defaults to FLOAT.
sel
1
39567412227
40529759537
From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]
Sent: Wednesday, June 29, 2016 11:37 PM
To: sanjiv.is...@gmail.com
Cc: user@hive.apache.org
Subject: RE: Query Performance Issue : Group By and Distinct and load on reducer
1.
This is
a1.group_id
group bya1.group_id
;
From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com]
Sent: Wednesday, June 29, 2016 10:55 PM
To: Markovitz, Dudu
Cc: user@hive.apache.org
Subject: Re: Query Performance Issue : Group By and Distinct and load on reducer
Hi Dudu,
I tried the same on same ta
t; MyCol1,MyCol2 rows between unbounded preceding and 1 preceding) -
>> count(*) as accum_rows
>>
>>
>>
>> fromINTER_ETL
>>
>>
>>
>> group byabs(hash(MyCol1,MyCol2))%1
>>
>
>
> as a
>
>
>
> on a.group_id =
> abs(hash(t.MyCol1,t.MyCol2))%1
>
>
>
> ;
>
>
>
>
>
>
>
> *From:* @Sanjiv Singh [mailto:sanjiv.is...@gmail.com]
> *Sent:* Tuesday, June 28, 2016 11:52
: Tuesday, June 28, 2016 11:52 PM
To: Markovitz, Dudu
Cc: user@hive.apache.org
Subject: Re: Query Performance Issue : Group By and Distinct and load on reducer
ETL_ROW_ID is to be consecutive number. I need to check if having unique number
would not break any logic.
Considering unique number for
vitz, Dudu
> *Cc:* user@hive.apache.org
> *Subject:* Re: Query Performance Issue : Group By and Distinct and load
> on reducer
>
>
>
> Hi Dudu,
>
>
>
> You are correct ...ROW_NUMBER() is main culprit.
>
>
>
> ROW_NUMBER() OVER Not Fast Enough With Large Resu
I’m guessing ETL_ROW_ID should be unique but not necessarily contain only
consecutive numbers?
From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com]
Sent: Tuesday, June 28, 2016 10:57 PM
To: Markovitz, Dudu
Cc: user@hive.apache.org
Subject: Re: Query Performance Issue : Group By and Distinct and
Hi Dudu,
You are correct ...ROW_NUMBER() is main culprit.
ROW_NUMBER() OVER Not Fast Enough With Large Result Set, any good solution?
Regards
Sanjiv Singh
Mob : +091 9990-447-339
On Tue, Jun 28, 2016 at 3:42 PM, Markovitz, Dudu
wrote:
> The row_number operation seems to be skewed.
>
>
>
>
The row_number operation seems to be skewed.
Dudu
From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com]
Sent: Tuesday, June 28, 2016 8:54 PM
To: user@hive.apache.org
Subject: Query Performance Issue : Group By and Distinct and load on reducer
Hi All,
I am having performance issue with data skew o
12 matches
Mail list logo