Hi Christopher >>which you would invoke as TaskNonce.getSingleton().doThisOnce() from within the map closure.
Say I have a cluster with 24 workers (one thread per worker SPARK_WORKER_CORES). My application would have 24 executors each with its own VM. The RDDs i process have millions of rows and many partitions. I could do rdd.mapPartitions and it would still work as code would only be executed once in each VM, but was wondering if there is more efficient way of doing this by using a generated RDD with one partition per executor. Also when I want to return some stats from each executor rdd.mapPartitions would return multiple results. Deenar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Running-a-task-once-on-each-executor-tp3203p3337.html Sent from the Apache Spark User List mailing list archive at Nabble.com.