1. lower your mapper number,
2. Chen Song's suggestion is also work.
3. using shell command cat your small file into bigger one.
2012/9/27 Chen Song
> You can force reduce phase by adding distribute by or order by clause
> after your select query.
>
> On Thu, Sep 27, 2012 at 2:03 PM, 王锋 wrote:
You can force reduce phase by adding distribute by or order by clause after
your select query.
On Thu, Sep 27, 2012 at 2:03 PM, 王锋 wrote:
> but it's map only job
>
>
> At 2012-09-27 05:39:39,"Chen Song" wrote:
>
> As far as I know, the number of files emitted would be determined by the
> number
As far as I know, the number of files emitted would be determined by the
number of mappers for a map only job and the number of reducers for a map
reduce job.
So it totally depends how your query translates into a MR job.
You can enforce it by setting the property
*mapred.reduce.tasks=1*
Chen