I keep hitting errors running the tests on 1.5 such as
- join31 *** FAILED *** Failed to execute query using catalyst: Error: Job aborted due to stage failure: Task 9 in stage 3653.0 failed 1 times, most recent failure: Lost task 9.0 in stage 3653.0 (TID 123363, localhost): java.io.IOException: Unable to acquire 4194304 bytes of memory at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368) This is using the command build/mvn -Pyarn -Phadoop-2.2 -Phive -Phive-thriftserver test I don't see these errors in any of the amplab jenkins builds. Do those builds have any configuration/environment that I may be missing? My build is running with whatever defaults are in the top level pom.xml, eg -Xmx3G. I can make these tests pass by setting spark.shuffle.memoryFraction=0.6 in the HiveCompatibilitySuite rather than the default 0.2 value. Trying to analyze what is going on with the test it is related to the number of active tasks, which seems to rise to 32, and so the ShuffleMemoryManager allows less memory per task even though most of those tasks do not have any memory allocated to them. Has anyone seen issues like this before?