Are you sure the new ulimit has taken effect?
How many cores are you using? How many reducers?
"In general if a node in your cluster has C assigned cores and you run
a job with X reducers then Spark will open C*X files in parallel and
start writing. Shuffle consolidat
Disabling parallelExecution has worked for me.
Other alternatives I’ve tried that also work include:
1. Using a lock – this will let tests execute in parallel except for those
using a SparkContext. If you have a large number of tests that could execute
in parallel, this can shave off some tim
For example,
val originalRDD: RDD[SomeCaseClass] = ...
// Option 1: objects are copied, setting prop1 in the process
val transformedRDD = originalRDD.map( item => item.copy(prop1 = calculation() )
// Option 2: objects are re-used and modified
val tranformedRDD = originalRDD.map( item => item.pro