Hi Christopher

>>which you would invoke as TaskNonce.getSingleton().doThisOnce() from
within the map closure.

Say I have a cluster with 24 workers (one thread per worker
SPARK_WORKER_CORES). My application would have 24 executors each with its
own VM.

The RDDs i process have millions of rows and many partitions. I could do

rdd.mapPartitions and it would still work as code would only be executed
once in each VM, but was wondering if there is more efficient way of doing
this by using a generated RDD with one partition per executor.

Also when I want to return some stats from each executor rdd.mapPartitions
would return multiple results.

Deenar





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-a-task-once-on-each-executor-tp3203p3337.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to