Hello folks. I recently migrated my application to Spark 2.0, and everything worked well, except for one function that uses "toDS" and the ML libraries.
This stage used to complete in 15 minutes or so on 1.6.2, and now takes almost two hours. The UI shows very strange behavior - completed stages still being worked on, concurrent work on tons of stages, including ones from downstream jobs: https://dl.dropboxusercontent.com/u/231152/spark.png Anyone know what might be going on? The only source change I made was changing "toDF" to "toDS()" before handing my RDDs to the ML libraries. Thanks, -miles