Hello,
I was wondering if anyone has successfully done or might know how to run
Spark unit tests without recompiling every time. Due to some limitations of
our test systems and behavior on BigEndian / EBCDIC encoded systems we are
forced to run unit tests in a few phases. Currently we have the following
pipeline:
Build Spark - 45-60 min
Initial Unit Testing - 2 hrs
Missing Unit Tests - 3 hrs
Failed Unit Tests - 1 hrs
Obviously Building Spark requires compiling everything, but then we have
the problem in each consecutive stage having to recompile everything. The
following is occurring in each stage:
Build Spark
Build Spark without unit tests and create a package via
make-distribution. mvn -e -Dhive -Dhive-thriftserver -Dhadoop-2.10
-DskipTests clean package
Initial Unit Testing
Running unit tests. mvn -e -fn -Dhive -Dhive-thriftserver -Dhadoop-2.10
test
Missing Unit Testing
Compare executed tests (determined by surefire-reports) from available
tests. Tests are missed when a unit test causes a JVM error possibly due
to an OOM error, remaining unit tests of project are skipped. These
tests are then collectively run against a specific project. i.e. any
missing scalaTests under core are run. mvn -pl external/flume -am -e -fn
-Phive -Phive-thriftserver -Phadoop-2.10 -DwildcardSuites=none
-Dtest=org.apache.spark.streaming.flume.JavaFlumePollingStreamSuite,org.apache.spark.streaming.flume.JavaFlumeStreamSuite
test
Failed Unit Testing
Look at the results of the tests (determined by surefire-reports) re-run
any failing tests by themselves, this resolved a large number of tests
that are flaky tests. mvn -pl core -am -e -fn -Phive -Phive-thriftserver
-Phadoop-2.10 -DwildcardSuites=org.apache.spark.util.UtilsSuite
-Dtest=none test
Above with every run of mvn everything is recompiled, even though the code
hasn't changed. I'd like to compile the tests once during the Build Spark
stage, and simply run the tests in the Unit Testing stages. This would
speed up our pipeline drastically. Any suggestions are appreciated.
Additional information:
Java Options set to: -Dfile.encoding=UTF8 -Xmx4g -Xss1024k
-Dconsole.encoding=IBM-1047 -XX:MaxPermSize=512m
-XX:ReservedCodeCacheSize=512m;
Have updated Maven through the years, but unaware of any new features
that'd help: 3.3.9, 3.5.4, 3.6.3, and currently 3.8.1
We attempted multi-core compilation years ago, to no avail. (but willing
to try again if it is suggested.)
Zinc was also attempted years ago, but wasn't able to port it over to
our system at the time.
Thanks again!
Sincerely,
Nicholas T. Marion
AI and Analytics Development Lead | IzODA CPO
Mobile: 1 845 649 3592
E-mail: [email protected]
IBM