> A lot of our queries do the following style of simultaneous windowing ..
The windowing is not simultaneous unless they are all over the same window - the following query has 3 different windows applied over the same rows sequentially. > SELECT > row_number() OVER( PARTITION BY app, user, > type ORDER BY ts )as a_number, > row_number() OVER( PARTITION BY day, app, user, > type ORDER BY ts )as type_rank, > Since each OVER / PARTITION-By clause is independent they can the put >into parallelized Reducer phases. They are all over the same rows so they're done in sequence, so the final row-set contains all values, causing multiple shuffles of the same rows. > Is this something that Tez tries to do at all or an optimization that I >can use to my benefit ? Does your query only have row_numbers or does it have other columns in them? Cheers, Gopal