Re: Using memory mapped file for shuffle

Sandy Ryza Tue, 14 Apr 2015 09:08:08 -0700

Hi Kannan,

Both in MapReduce and Spark, the amount of shuffle data a task produces can
exceed the tasks memory without risk of OOM.


-Sandy

On Tue, Apr 14, 2015 at 6:47 AM, Imran Rashid <iras...@cloudera.com> wrote:

> That limit doesn't have anything to do with the amount of available
> memory.  Its just a tuning parameter, as one version is more efficient for
> smaller files, the other is better for bigger files.  I suppose the comment
> is a little better in FileSegmentManagedBuffer:
>
>
> https://github.com/apache/spark/blob/master/network/common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java#L62
>
> On Tue, Apr 14, 2015 at 12:01 AM, Kannan Rajah <kra...@maprtech.com>
> wrote:
>
> > DiskStore.getBytes uses memory mapped files if the length is more than a
> > configured limit. This code path is used during map side shuffle in
> > ExternalSorter. I want to know if its possible for the length to exceed
> the
> > limit in the case of shuffle. The reason I ask is in the case of Hadoop,
> > each map task is supposed to produce only data that can fit within the
> > task's configured max memory. Otherwise it will result in OOM. Is the
> > behavior same in Spark or the size of data generated by a map task can
> > exceed what can be fitted in memory.
> >
> >   if (length < minMemoryMapBytes) {
> >     val buf = ByteBuffer.allocate(length.toInt)
> >     ....
> >   } else {
> >     Some(channel.map(MapMode.READ_ONLY, offset, length))
> >   }
> >
> > --
> > Kannan
> >
>

Re: Using memory mapped file for shuffle

Reply via email to