Re: Running a task once on each executor

2014-08-11 Thread RodrigoB
Hi Christopher, I am also in the need of having a single function call on the node level. Your suggestion makes sense as a solution to the requirement, but still feels like a workaround, this check will get called on every row...Also having static members and methods created specially on a multi-t

Re: Running a task once on each executor

2014-03-28 Thread dmpour23
Is it possible to do this:\ JavaRDD parttionedRdds = input.map(new Split()).sortByKey().partitionBy(new HashPartitioner(k)).values(); parttionedRdds.saveAsTextFile(args[2]); //Then run my SingletonFunction (My app depends on the saved Files) parttionedRdds.map(new SingletonFunc()); The partti

Re: Running a task once on each executor

2014-03-27 Thread Christopher Nguyen
Deenar, yes, you may indeed be overthinking it a bit, about how Spark executes maps/filters etc. I'll focus on the high-order bits so it's clear. Let's assume you're doing this in Java. Then you'd pass some *MyMapper*instance to J *avaRDD#map(myMapper)*. So you'd have a class *MyMapper extends Fu

Re: Running a task once on each executor

2014-03-27 Thread deenar.toraskar
Christopher Sorry I might be missing the obvious, but how do i get my function called on all Executors used by the app? I dont want to use RDDs unless necessary. once I start my shell or app, how do I get TaskNonce.getSingleton().doThisOnce() executed on each executor? @dmpour >>rdd.mapPartitio

Re: Running a task once on each executor

2014-03-27 Thread Christopher Nguyen
Deenar, dmpour is correct in that there's a many-to-many mapping between executors and partitions (an executor can be assigned multiple partitions, and a given partition can in principle move a different executor). I'm not sure why you seem to require this problem statement to be solved with RDDs.

Re: Running a task once on each executor

2014-03-27 Thread dmpour23
How exactly does rdd.mapPartitions be executed once in each VM? I am running mapPartitions and the call function seems not to execute the code? JavaPairRDD twos = input.map(new Split()).sortByKey().partitionBy(new HashPartitioner(k)); twos.values().saveAsTextFile(args[2]); JavaRDD ls = twos.va

Re: Running a task once on each executor

2014-03-27 Thread deenar.toraskar
Hi Christopher >>which you would invoke as TaskNonce.getSingleton().doThisOnce() from within the map closure. Say I have a cluster with 24 workers (one thread per worker SPARK_WORKER_CORES). My application would have 24 executors each with its own VM. The RDDs i process have millions of rows and

Re: Running a task once on each executor

2014-03-25 Thread Christopher Nguyen
Deenar, the singleton pattern I'm suggesting would look something like this: public class TaskNonce { private transient boolean mIsAlreadyDone; private static transient TaskNonce mSingleton = new TaskNonce(); private transient Object mSyncObject = new Object(); public TaskNonce getSing

Re: Running a task once on each executor

2014-03-25 Thread deenar.toraskar
Christopher It is once per JVM. TaskNonce would meet my needs. I guess if I want it once per thread, then a ThreadLocal would do the same. But how do I invoke TaskNonce, what is the best way to generate a RDD to ensure that there is one element per executor. Deenar -- View this message in c

Re: Running a task once on each executor

2014-03-25 Thread Christopher Nguyen
Deenar, when you say "just once", have you defined "across multiple " (e.g., across multiple threads in the same JVM on the same machine)? In principle you can have multiple executors on the same machine. In any case, assuming it's the same JVM, have you considered using a singleton that maintains