I keep hitting errors running the tests on 1.5 such as

- join31 *** FAILED ***
  Failed to execute query using catalyst:
  Error: Job aborted due to stage failure: Task 9 in stage 3653.0 failed 1
times, most recent failure: Lost task 9.0 in stage 3653.0 (TID 123363,
localhost): java.io.IOException: Unable to acquire 4194304 bytes of memory
      at
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)


This is using the command
build/mvn -Pyarn -Phadoop-2.2 -Phive -Phive-thriftserver  test


I don't see these errors in any of the amplab jenkins builds. Do those
builds have any configuration/environment that I may be missing? My build
is running with whatever defaults are in the top level pom.xml, eg -Xmx3G.

I can make these tests pass by setting spark.shuffle.memoryFraction=0.6 in
the HiveCompatibilitySuite rather than the default 0.2 value.

Trying to analyze what is going on with the test it is related to the
number of active tasks, which seems to rise to 32, and so the
ShuffleMemoryManager allows less memory per task even though most of those
tasks do not have any memory allocated to them.

Has anyone seen issues like this before?

Reply via email to