On Tue, May 3, 2011 at 1:48 AM, elton sky <eltonsky9...@gmail.com> wrote:

> Pls correct me if I am wrong. One of the important assumptions of hadoop
> map
> reduce is: map's output should be smaller than input.


No, that isn't a valid assumption. MapReduce workloads can roughly be
divided into three categories:
1. scans (map input > shuffle data)
2. sorts (map input = shuffle data = output data)
3. index builds ( map input < shuffle data)

Scans are the most common, but far from the only case.

-- Owen

Reply via email to