Re: Unable to acquire memory errors in HiveCompatibilitySuite

Pete Robbins Tue, 15 Sep 2015 10:29:12 -0700

Yes and at least there is an override by setting  spark.sql.test.master to
local[8] , in fact local[16] worked on my 8 core box.


I'm happy to use this as a workaround but the 32 hard-coded will fail
running build/tests on a clean checkout if you only have 8 cores.

On 15 September 2015 at 17:40, Marcelo Vanzin <[email protected]> wrote:

> That test explicitly sets the number of executor cores to 32.
>
> object TestHive
>   extends TestHiveContext(
>     new SparkContext(
>       System.getProperty("spark.sql.test.master", "local[32]"),
>
>
> On Mon, Sep 14, 2015 at 11:22 PM, Reynold Xin <[email protected]> wrote:
> > Yea I think this is where the heuristics is failing -- it uses 8 cores to
> > approximate the number of active tasks, but the tests somehow is using 32
> > (maybe because it explicitly sets it to that, or you set it yourself? I'm
> > not sure which one)
> >
> > On Mon, Sep 14, 2015 at 11:06 PM, Pete Robbins <[email protected]>
> wrote:
> >>
> >> Reynold, thanks for replying.
> >>
> >> getPageSize parameters: maxMemory=515396075, numCores=0
> >> Calculated values: cores=8, default=4194304
> >>
> >> So am I getting a large page size as I only have 8 cores?
> >>
> >> On 15 September 2015 at 00:40, Reynold Xin <[email protected]> wrote:
> >>>
> >>> Pete - can you do me a favor?
> >>>
> >>>
> >>>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala#L174
> >>>
> >>> Print the parameters that are passed into the getPageSize function, and
> >>> check their values.
> >>>
> >>> On Mon, Sep 14, 2015 at 4:32 PM, Reynold Xin <[email protected]>
> wrote:
> >>>>
> >>>> Is this on latest master / branch-1.5?
> >>>>
> >>>> out of the box we reserve only 16% (0.2 * 0.8) of the memory for
> >>>> execution (e.g. aggregate, join) / shuffle sorting. With a 3GB heap,
> that's
> >>>> 480MB. So each task gets 480MB / 32 = 15MB, and each operator
> reserves at
> >>>> least one page for execution. If your page size is 4MB, it only takes
> 3
> >>>> operators to use up its memory.
> >>>>
> >>>> The thing is page size is dynamically determined -- and in your case
> it
> >>>> should be smaller than 4MB.
> >>>>
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala#L174
> >>>>
> >>>> Maybe there is a place that in the maven tests that we explicitly set
> >>>> the page size (spark.buffer.pageSize) to 4MB? If yes, we need to find
> it and
> >>>> just remove it.
> >>>>
> >>>>
> >>>> On Mon, Sep 14, 2015 at 4:16 AM, Pete Robbins <[email protected]>
> >>>> wrote:
> >>>>>
> >>>>> I keep hitting errors running the tests on 1.5 such as
> >>>>>
> >>>>>
> >>>>> - join31 *** FAILED ***
> >>>>>   Failed to execute query using catalyst:
> >>>>>   Error: Job aborted due to stage failure: Task 9 in stage 3653.0
> >>>>> failed 1 times, most recent failure: Lost task 9.0 in stage 3653.0
> (TID
> >>>>> 123363, localhost): java.io.IOException: Unable to acquire 4194304
> bytes of
> >>>>> memory
> >>>>>       at
> >>>>>
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
> >>>>>
> >>>>>
> >>>>> This is using the command
> >>>>> build/mvn -Pyarn -Phadoop-2.2 -Phive -Phive-thriftserver  test
> >>>>>
> >>>>>
> >>>>> I don't see these errors in any of the amplab jenkins builds. Do
> those
> >>>>> builds have any configuration/environment that I may be missing? My
> build is
> >>>>> running with whatever defaults are in the top level pom.xml, eg
> -Xmx3G.
> >>>>>
> >>>>> I can make these tests pass by setting
> spark.shuffle.memoryFraction=0.6
> >>>>> in the HiveCompatibilitySuite rather than the default 0.2 value.
> >>>>>
> >>>>> Trying to analyze what is going on with the test it is related to the
> >>>>> number of active tasks, which seems to rise to 32, and so the
> >>>>> ShuffleMemoryManager allows less memory per task even though most of
> those
> >>>>> tasks do not have any memory allocated to them.
> >>>>>
> >>>>> Has anyone seen issues like this before?
> >>>>
> >>>>
> >>>
> >>
> >
>
>
>
> --
> Marcelo
>

Re: Unable to acquire memory errors in HiveCompatibilitySuite

Reply via email to