Maybe we can change the heuristics in memory calculation to use SparkContext.defaultParallelism if it is local mode.
On Tue, Sep 15, 2015 at 10:28 AM, Pete Robbins <robbin...@gmail.com> wrote: > Yes and at least there is an override by setting spark.sql.test.master to > local[8] , in fact local[16] worked on my 8 core box. > > I'm happy to use this as a workaround but the 32 hard-coded will fail > running build/tests on a clean checkout if you only have 8 cores. > > On 15 September 2015 at 17:40, Marcelo Vanzin <van...@cloudera.com> wrote: > >> That test explicitly sets the number of executor cores to 32. >> >> object TestHive >> extends TestHiveContext( >> new SparkContext( >> System.getProperty("spark.sql.test.master", "local[32]"), >> >> >> On Mon, Sep 14, 2015 at 11:22 PM, Reynold Xin <r...@databricks.com> >> wrote: >> > Yea I think this is where the heuristics is failing -- it uses 8 cores >> to >> > approximate the number of active tasks, but the tests somehow is using >> 32 >> > (maybe because it explicitly sets it to that, or you set it yourself? >> I'm >> > not sure which one) >> > >> > On Mon, Sep 14, 2015 at 11:06 PM, Pete Robbins <robbin...@gmail.com> >> wrote: >> >> >> >> Reynold, thanks for replying. >> >> >> >> getPageSize parameters: maxMemory=515396075, numCores=0 >> >> Calculated values: cores=8, default=4194304 >> >> >> >> So am I getting a large page size as I only have 8 cores? >> >> >> >> On 15 September 2015 at 00:40, Reynold Xin <r...@databricks.com> >> wrote: >> >>> >> >>> Pete - can you do me a favor? >> >>> >> >>> >> >>> >> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala#L174 >> >>> >> >>> Print the parameters that are passed into the getPageSize function, >> and >> >>> check their values. >> >>> >> >>> On Mon, Sep 14, 2015 at 4:32 PM, Reynold Xin <r...@databricks.com> >> wrote: >> >>>> >> >>>> Is this on latest master / branch-1.5? >> >>>> >> >>>> out of the box we reserve only 16% (0.2 * 0.8) of the memory for >> >>>> execution (e.g. aggregate, join) / shuffle sorting. With a 3GB heap, >> that's >> >>>> 480MB. So each task gets 480MB / 32 = 15MB, and each operator >> reserves at >> >>>> least one page for execution. If your page size is 4MB, it only >> takes 3 >> >>>> operators to use up its memory. >> >>>> >> >>>> The thing is page size is dynamically determined -- and in your case >> it >> >>>> should be smaller than 4MB. >> >>>> >> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala#L174 >> >>>> >> >>>> Maybe there is a place that in the maven tests that we explicitly set >> >>>> the page size (spark.buffer.pageSize) to 4MB? If yes, we need to >> find it and >> >>>> just remove it. >> >>>> >> >>>> >> >>>> On Mon, Sep 14, 2015 at 4:16 AM, Pete Robbins <robbin...@gmail.com> >> >>>> wrote: >> >>>>> >> >>>>> I keep hitting errors running the tests on 1.5 such as >> >>>>> >> >>>>> >> >>>>> - join31 *** FAILED *** >> >>>>> Failed to execute query using catalyst: >> >>>>> Error: Job aborted due to stage failure: Task 9 in stage 3653.0 >> >>>>> failed 1 times, most recent failure: Lost task 9.0 in stage 3653.0 >> (TID >> >>>>> 123363, localhost): java.io.IOException: Unable to acquire 4194304 >> bytes of >> >>>>> memory >> >>>>> at >> >>>>> >> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368) >> >>>>> >> >>>>> >> >>>>> This is using the command >> >>>>> build/mvn -Pyarn -Phadoop-2.2 -Phive -Phive-thriftserver test >> >>>>> >> >>>>> >> >>>>> I don't see these errors in any of the amplab jenkins builds. Do >> those >> >>>>> builds have any configuration/environment that I may be missing? My >> build is >> >>>>> running with whatever defaults are in the top level pom.xml, eg >> -Xmx3G. >> >>>>> >> >>>>> I can make these tests pass by setting >> spark.shuffle.memoryFraction=0.6 >> >>>>> in the HiveCompatibilitySuite rather than the default 0.2 value. >> >>>>> >> >>>>> Trying to analyze what is going on with the test it is related to >> the >> >>>>> number of active tasks, which seems to rise to 32, and so the >> >>>>> ShuffleMemoryManager allows less memory per task even though most >> of those >> >>>>> tasks do not have any memory allocated to them. >> >>>>> >> >>>>> Has anyone seen issues like this before? >> >>>> >> >>>> >> >>> >> >> >> > >> >> >> >> -- >> Marcelo >> > >