Bizarre behavior using Datasets/ML on Spark 2.0

Miles Crawford Wed, 21 Sep 2016 10:24:07 -0700

Hello folks. I recently migrated my application to Spark 2.0, and
everything worked well, except for one function that uses "toDS" and the ML
libraries.


This stage used to complete in 15 minutes or so on 1.6.2, and now takes
almost two hours.

The UI shows very strange behavior - completed stages still being worked
on, concurrent work on tons of stages, including ones from downstream jobs:
https://dl.dropboxusercontent.com/u/231152/spark.png

Anyone know what might be going on? The only source change I made was
changing "toDF" to "toDS()" before handing my RDDs to the ML libraries.

Thanks,
-miles

Bizarre behavior using Datasets/ML on Spark 2.0

Reply via email to