RE: Shuffle files

2014-10-07 Thread Lisonbee, Todd
Are you sure the new ulimit has taken effect? How many cores are you using? How many reducers? "In general if a node in your cluster has C assigned cores and you run a job with X reducers then Spark will open C*X files in parallel and start writing. Shuffle consolidat

RE: Unit test failure: Address already in use

2014-06-18 Thread Lisonbee, Todd
Disabling parallelExecution has worked for me. Other alternatives I’ve tried that also work include: 1. Using a lock – this will let tests execute in parallel except for those using a SparkContext. If you have a large number of tests that could execute in parallel, this can shave off some tim

is it okay to reuse objects across RDD's?

2014-04-26 Thread Lisonbee, Todd
For example, val originalRDD: RDD[SomeCaseClass] = ... // Option 1: objects are copied, setting prop1 in the process val transformedRDD = originalRDD.map( item => item.copy(prop1 = calculation() ) // Option 2: objects are re-used and modified val tranformedRDD = originalRDD.map( item => item.pro