Maybe we can change the heuristics in memory calculation to use
SparkContext.defaultParallelism if it is local mode.


On Tue, Sep 15, 2015 at 10:28 AM, Pete Robbins <robbin...@gmail.com> wrote:

> Yes and at least there is an override by setting  spark.sql.test.master to
> local[8] , in fact local[16] worked on my 8 core box.
>
> I'm happy to use this as a workaround but the 32 hard-coded will fail
> running build/tests on a clean checkout if you only have 8 cores.
>
> On 15 September 2015 at 17:40, Marcelo Vanzin <van...@cloudera.com> wrote:
>
>> That test explicitly sets the number of executor cores to 32.
>>
>> object TestHive
>>   extends TestHiveContext(
>>     new SparkContext(
>>       System.getProperty("spark.sql.test.master", "local[32]"),
>>
>>
>> On Mon, Sep 14, 2015 at 11:22 PM, Reynold Xin <r...@databricks.com>
>> wrote:
>> > Yea I think this is where the heuristics is failing -- it uses 8 cores
>> to
>> > approximate the number of active tasks, but the tests somehow is using
>> 32
>> > (maybe because it explicitly sets it to that, or you set it yourself?
>> I'm
>> > not sure which one)
>> >
>> > On Mon, Sep 14, 2015 at 11:06 PM, Pete Robbins <robbin...@gmail.com>
>> wrote:
>> >>
>> >> Reynold, thanks for replying.
>> >>
>> >> getPageSize parameters: maxMemory=515396075, numCores=0
>> >> Calculated values: cores=8, default=4194304
>> >>
>> >> So am I getting a large page size as I only have 8 cores?
>> >>
>> >> On 15 September 2015 at 00:40, Reynold Xin <r...@databricks.com>
>> wrote:
>> >>>
>> >>> Pete - can you do me a favor?
>> >>>
>> >>>
>> >>>
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala#L174
>> >>>
>> >>> Print the parameters that are passed into the getPageSize function,
>> and
>> >>> check their values.
>> >>>
>> >>> On Mon, Sep 14, 2015 at 4:32 PM, Reynold Xin <r...@databricks.com>
>> wrote:
>> >>>>
>> >>>> Is this on latest master / branch-1.5?
>> >>>>
>> >>>> out of the box we reserve only 16% (0.2 * 0.8) of the memory for
>> >>>> execution (e.g. aggregate, join) / shuffle sorting. With a 3GB heap,
>> that's
>> >>>> 480MB. So each task gets 480MB / 32 = 15MB, and each operator
>> reserves at
>> >>>> least one page for execution. If your page size is 4MB, it only
>> takes 3
>> >>>> operators to use up its memory.
>> >>>>
>> >>>> The thing is page size is dynamically determined -- and in your case
>> it
>> >>>> should be smaller than 4MB.
>> >>>>
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/ShuffleMemoryManager.scala#L174
>> >>>>
>> >>>> Maybe there is a place that in the maven tests that we explicitly set
>> >>>> the page size (spark.buffer.pageSize) to 4MB? If yes, we need to
>> find it and
>> >>>> just remove it.
>> >>>>
>> >>>>
>> >>>> On Mon, Sep 14, 2015 at 4:16 AM, Pete Robbins <robbin...@gmail.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> I keep hitting errors running the tests on 1.5 such as
>> >>>>>
>> >>>>>
>> >>>>> - join31 *** FAILED ***
>> >>>>>   Failed to execute query using catalyst:
>> >>>>>   Error: Job aborted due to stage failure: Task 9 in stage 3653.0
>> >>>>> failed 1 times, most recent failure: Lost task 9.0 in stage 3653.0
>> (TID
>> >>>>> 123363, localhost): java.io.IOException: Unable to acquire 4194304
>> bytes of
>> >>>>> memory
>> >>>>>       at
>> >>>>>
>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
>> >>>>>
>> >>>>>
>> >>>>> This is using the command
>> >>>>> build/mvn -Pyarn -Phadoop-2.2 -Phive -Phive-thriftserver  test
>> >>>>>
>> >>>>>
>> >>>>> I don't see these errors in any of the amplab jenkins builds. Do
>> those
>> >>>>> builds have any configuration/environment that I may be missing? My
>> build is
>> >>>>> running with whatever defaults are in the top level pom.xml, eg
>> -Xmx3G.
>> >>>>>
>> >>>>> I can make these tests pass by setting
>> spark.shuffle.memoryFraction=0.6
>> >>>>> in the HiveCompatibilitySuite rather than the default 0.2 value.
>> >>>>>
>> >>>>> Trying to analyze what is going on with the test it is related to
>> the
>> >>>>> number of active tasks, which seems to rise to 32, and so the
>> >>>>> ShuffleMemoryManager allows less memory per task even though most
>> of those
>> >>>>> tasks do not have any memory allocated to them.
>> >>>>>
>> >>>>> Has anyone seen issues like this before?
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>>
>>
>> --
>> Marcelo
>>
>
>

Reply via email to