Thank you... On Mon, Mar 9, 2015 at 2:23 AM, r7raul1...@163.com <r7raul1...@163.com> wrote:
> read this article > http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/ > > > then read > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy > > ------------------------------ > r7raul1...@163.com > > > *From:* max scalf <oracle.bl...@gmail.com> > *Date:* 2015-03-08 07:02 > *To:* HDP mailing list <u...@hadoop.apache.org>; Hive Mailing List > <user@hive.apache.org> > *Subject:* sorting in hive -- general > Hello all, > > I am a new to hadoop and hive in general and i am reading "hadoop the > definitive guide" by Tom White and on page 504 for the hive chapter, Tom > says below with regards to soritng > > *Sorting and Aggregating* > *Sorting data in Hive can be achieved by using a standard ORDER BY clause. > ORDER BY performs a parallel total sort of the input (like that described > in “Total Sort” on page 261). When a globally sorted result is not > required—and in many cases it isn’t—you can use Hive’s nonstandard > extension, SORT BY, instead. SORT BY produces a sorted file per reducer.* > > > My Questions is, what exactly does he mean by "globally sorted result"?, > if the sort by operation produces a sorted file per reducer does that mean > at the end of the sort all the reducer are put back together to give the > correct results ? > > > >