Deenar, yes, you may indeed be overthinking it a bit, about how Spark
executes maps/filters etc. I'll focus on the high-order bits so it's clear.

Let's assume you're doing this in Java. Then you'd pass some
*MyMapper*instance to J
*avaRDD#map(myMapper)*.

So you'd have a class *MyMapper extends Function<InType, OutType>*. The
*call()* method of that class is effectively the function that will be
executed by the workers on your RDD's rows.

Within that *MyMapper#call()*, you can access static members and methods of
*MyMapper* itself. You could implement your *runOnce() *there.



--
Christopher T. Nguyen
Co-founder & CEO, Adatao <http://adatao.com>
linkedin.com/in/ctnguyen



On Thu, Mar 27, 2014 at 4:20 PM, deenar.toraskar <deenar.toras...@db.com>wrote:

> Christopher
>
> Sorry I might be missing the obvious, but how do i get my function called
> on
> all Executors used by the app? I dont want to use RDDs unless necessary.
>
> once I start my shell or app, how do I get
> TaskNonce.getSingleton().doThisOnce() executed on each executor?
>
> @dmpour
> >>rdd.mapPartitions and it would still work as code would only be executed
> once in each VM, but was wondering if there is more efficient way of doing
> this by using a generated RDD with one partition per executor.
> This remark was misleading, what I meant was that in conjunction with the
> TaskNonce pattern, my function would be called only once per executor as
> long as the RDD had atleast one partition on each executor
>
> Deenar
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Running-a-task-once-on-each-executor-tp3203p3393.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Reply via email to