In general principle,
distribute by  ensures each of N reducers gets non-overlapping ranges of X ,
but doesn't sort the output of each reducer. You end up with N or unsorted
files with non-overlapping ranges. So this is more of a horizontal
distribution of data.

In my view,
Partition by is more based on values so its vertical distribution of data.

I may be wrong in understanding this




On Fri, Jul 11, 2014 at 1:38 PM, Eric Chu <e...@rocketfuel.com> wrote:

> Does anyone know what
>
> *rank() over(distribute by p_mfgr sort by p_name) *
>
> does exactly and how it's different from
>
> *rank() over(partition by p_mfgr order by p_name)*?
>
> Thanks,
>
> Eric
>
>


-- 
Nitin Pawar

Reply via email to