An iterator does not imply data has to be memory resident. Think merge sort output as an iterator (disk backed).
Tom is actually planning to work on something similar with me on this hopefully this or next month. Regards, Mridul On Sun, Apr 20, 2014 at 11:46 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote: > Hey all, > > After a shuffle / groupByKey, Hadoop MapReduce allows the values for a key > to not all fit in memory. The current ShuffleFetcher.fetch API, which > doesn't distinguish between keys and values, only returning an Iterator[P], > seems incompatible with this. > > Any thoughts on how we could achieve parity here? > > -Sandy