Thanks for your quick reply. Rank is a column which has integer data. I am
writing to dynamoDB database tho. Not sure why only a single reducer is
used tho. I will check sql with explain command again and will report my
findings. I will check your implementation too.

------------------------------
Binesh Gummadi




On Sun, Sep 2, 2012 at 4:01 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote:

>
> Sort by does not have the single reduce restriction. Not sure which rank
> you are using but any one should allow you to sort and rank if the query is
> written correctly. Our implementation on my github.com/edwardcaprioloallows 
> this.
>
> On Sunday, September 2, 2012, Binesh Gummadi <binesh.gumm...@gmail.com>
> wrote:
> > I am trying to insert data into a table after selecting and sorting by a
> column. What I really want is order by a column and select the top million
> rows. I am using Amazon EMR hive cloud to process data.
> > Here is my query
> > INSERT INTO TABLE ddb_table SELECT * FROM data_dump sort by rank desc
> LIMIT 1000000;
> > It creates two jobs. First job run rather quickly and second job reducer
> is running forever as it is running with a single reducer. Here is my
> question on stackoverflow(
> http://stackoverflow.com/questions/12233343/why-is-sort-by-always-using-single-reducer
> ).
> > According to docs "order by" clause has a limitation of 1 reducer. Does
> sort by has same limitation? Are there any other ways of solving the above
> requirement?
> > ________________________________
> > Binesh Gummadi
> >
> >
>

Reply via email to