> The windowing is not simultaneous unless they are all over the same window > - the following query has 3 different windows applied over the same rows > sequentially.
Ok. Just wanted to confirm. Maybe I could restructure my query to get more parallelism .. > They are all over the same rows so they're done in sequence, so the final > row-set contains all values, causing multiple shuffles of the same rows. So you'r saying, since these windows are part of a single SELECT projection they need to be serial? > Does your query only have row_numbers or does it have other columns in them? yes. Something like ... {code} SELECT row_number() OVER( PARTITION BY app, user, type ORDER BY ts ) as a_number, row_number() OVER( PARTITION BY day, app, user, type ORDER BY ts ) as type_rank, row_number() OVER( PARTITION BY day, app, user ORDER BY ts ) as dau_rank, day, user, app, type, ts FROM messages {code} On Tue, Mar 15, 2016 at 6:41 PM, Gopal Vijayaraghavan <gop...@apache.org> wrote: > > A lot of our queries do the following style of simultaneous windowing .. > > The windowing is not simultaneous unless they are all over the same window > - the following query has 3 different windows applied over the same rows > sequentially. > > > SELECT > > row_number() OVER( PARTITION BY app, user, > > type ORDER BY ts )as a_number, > > row_number() OVER( PARTITION BY day, app, user, > > type ORDER BY ts )as type_rank, > > > Since each OVER / PARTITION-By clause is independent they can the put > >into parallelized Reducer phases. > > They are all over the same rows so they're done in sequence, so the final > row-set contains all values, causing multiple shuffles of the same rows. > > > Is this something that Tez tries to do at all or an optimization that I > >can use to my benefit ? > > Does your query only have row_numbers or does it have other columns in > them? > > Cheers, > Gopal > > > -- "If you really want something in this life, you have to work for it. Now, quiet! They're about to announce the lottery numbers..."