Hi, I am seeing perf degradation in the Spark/Pi example on a single-node setup (using local[K])
Using 1, 2, 4, and 8 cores, this is the execution time in seconds for the same number of iterations:- Random: 4.0, 7.0, 12.96, 17.96 If I change the code to use ThreadLocalRandom (https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala#L35) it scales properly:- ThreadLocalRandom: 2.2, 1.4, 1.07, 1.00 I see a similar issue in Kryo serializer in another app - the push function shows up at the top of profile data, but goes away completely if I use ThreadLocalRandom https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/util/ObjectMap.java#L259 The JDK documentation (https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadLocalRandom.html) says: > When applicable, use of ThreadLocalRandom? rather than shared Random objects > in concurrent programs will typically encounter much less overhead and > contention. Use of ThreadLocalRandom? is particularly appropriate when > multiple tasks (for example, each a ForkJoinTask? ) use random numbers in > parallel in thread pools I am using Spark 1.5 and Java 1.8.0_91. Is there any reason to prefer Random over ThreadLocalRandom? Thanks Prasun --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org