Hello,

I was wondering if anyone has successfully done or might know how to run
Spark unit tests without recompiling every time. Due to some limitations of
our test systems and behavior on BigEndian / EBCDIC encoded systems we are
forced to run unit tests in a few phases. Currently we have the following
pipeline:

Build Spark - 45-60 min
Initial Unit Testing - 2 hrs
Missing Unit Tests - 3 hrs
Failed Unit Tests - 1 hrs

Obviously Building Spark requires compiling everything, but then we have
the problem in each consecutive stage having to recompile everything. The
following is occurring in each stage:

Build Spark
   Build Spark without unit tests and create a package via
   make-distribution. mvn -e -Dhive -Dhive-thriftserver -Dhadoop-2.10
   -DskipTests clean package

Initial Unit Testing
   Running unit tests. mvn -e -fn -Dhive -Dhive-thriftserver -Dhadoop-2.10
   test

Missing Unit Testing
   Compare executed tests (determined by surefire-reports) from available
   tests. Tests are missed when a unit test causes a JVM error possibly due
   to an OOM error, remaining unit tests of project are skipped. These
   tests are then collectively run against a specific project. i.e. any
   missing scalaTests under core are run. mvn -pl external/flume -am -e -fn
   -Phive -Phive-thriftserver -Phadoop-2.10 -DwildcardSuites=none
   
-Dtest=org.apache.spark.streaming.flume.JavaFlumePollingStreamSuite,org.apache.spark.streaming.flume.JavaFlumeStreamSuite
 test

Failed Unit Testing
   Look at the results of the tests (determined by surefire-reports) re-run
   any failing tests by themselves, this resolved a large number of tests
   that are flaky tests. mvn -pl core -am -e -fn -Phive -Phive-thriftserver
   -Phadoop-2.10 -DwildcardSuites=org.apache.spark.util.UtilsSuite
   -Dtest=none test


Above with every run of mvn everything is recompiled, even though the code
hasn't changed. I'd like to compile the tests once during the Build Spark
stage, and simply run the tests in the Unit Testing stages. This would
speed up our pipeline drastically. Any suggestions are appreciated.

Additional information:

   Java Options set to: -Dfile.encoding=UTF8 -Xmx4g -Xss1024k
   -Dconsole.encoding=IBM-1047 -XX:MaxPermSize=512m
   -XX:ReservedCodeCacheSize=512m;
   Have updated Maven through the years, but unaware of any new features
   that'd help: 3.3.9, 3.5.4, 3.6.3, and currently 3.8.1
   We attempted multi-core compilation years ago, to no avail. (but willing
   to try again if it is suggested.)
   Zinc was also attempted years ago, but wasn't able to port it over to
   our system at the time.

Thanks again!

                                                                                
   
 Sincerely,                                                                     
   
                                                                                
   
                                                                                
   
 Nicholas T. Marion                                                             
   
 AI and Analytics Development Lead | IzODA CPO                                  
   
 Mobile: 1 845 649 3592                                                         
   
 E-mail: nmar...@us.ibm.com                                                     
   
                                                                                
   
 IBM                                                                            
   
                                                                                
   



Reply via email to