Hi,

I am seeing perf degradation in the Spark/Pi example on a single-node
setup (using local[K])

Using 1, 2, 4, and 8 cores, this is the execution time in seconds for
the same number of iterations:-
Random: 4.0, 7.0, 12.96, 17.96

If I change the code to use ThreadLocalRandom
(https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala#L35)
it scales properly:-
ThreadLocalRandom: 2.2, 1.4, 1.07, 1.00

I see a similar issue in Kryo serializer in another app - the push
function shows up at the top of profile data, but goes away completely
if I use ThreadLocalRandom

https://github.com/EsotericSoftware/kryo/blob/master/src/com/esotericsoftware/kryo/util/ObjectMap.java#L259

The JDK documentation
(https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadLocalRandom.html)
says:

> When applicable, use of ThreadLocalRandom? rather than shared Random objects 
> in concurrent programs will typically encounter much less overhead and 
> contention. Use of ThreadLocalRandom? is particularly appropriate when 
> multiple tasks (for example, each a ForkJoinTask? ) use random numbers in 
> parallel in thread pools

I am using Spark 1.5 and Java 1.8.0_91.

Is there any reason to prefer Random over ThreadLocalRandom?

Thanks
Prasun

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to