Re: Tez reducer parallelism ..

Gautam Tue, 15 Mar 2016 22:29:31 -0700

> The windowing is not simultaneous unless they are all over the same window
> - the following query has 3 different windows applied over the same rows
> sequentially.


Ok. Just wanted to confirm. Maybe I could restructure my query to get more
parallelism ..

> They are all over the same rows so they're done in sequence, so the final
> row-set contains all values, causing multiple shuffles of the same rows.

So you'r saying, since these windows  are part of a single SELECT
projection they need to be serial?



> Does your query only have row_numbers or does it have other columns in
them?

yes.  Something like ...

{code}

 SELECT


              row_number() OVER( PARTITION BY app, user, type ORDER BY
ts ) as a_number,

              row_number() OVER( PARTITION BY day, app, user, type ORDER
BY ts ) as type_rank,

              row_number() OVER( PARTITION BY day, app, user   ORDER BY
ts ) as  dau_rank,

              day,

              user,

              app,

              type,

              ts

FROM messages

{code}



On Tue, Mar 15, 2016 at 6:41 PM, Gopal Vijayaraghavan <gop...@apache.org>
wrote:

> > A lot of our queries do the following style of simultaneous windowing ..
>
> The windowing is not simultaneous unless they are all over the same window
> - the following query has 3 different windows applied over the same rows
> sequentially.
>
> > SELECT
> >    row_number() OVER( PARTITION BY app, user,
> > type ORDER BY ts )as a_number,
> >    row_number() OVER( PARTITION BY day, app, user,
> > type ORDER BY ts )as type_rank,
>
> > Since each OVER / PARTITION-By clause is independent they can the put
> >into parallelized Reducer phases.
>
> They are all over the same rows so they're done in sequence, so the final
> row-set contains all values, causing multiple shuffles of the same rows.
>
> > Is this something that Tez tries to do at all or an optimization that I
> >can use to my benefit ?
>
> Does your query only have row_numbers or does it have other columns in
> them?
>
> Cheers,
> Gopal
>
>
>


-- 
"If you really want something in this life, you have to work for it. Now,
quiet! They're about to announce the lottery numbers..."

Re: Tez reducer parallelism ..

Reply via email to