Hi Christopher,
I am also in the need of having a single function call on the node level.
Your suggestion makes sense as a solution to the requirement, but still
feels like a workaround, this check will get called on every row...Also
having static members and methods created specially on a multi-t
Is it possible to do this:\
JavaRDD parttionedRdds = input.map(new
Split()).sortByKey().partitionBy(new HashPartitioner(k)).values();
parttionedRdds.saveAsTextFile(args[2]);
//Then run my SingletonFunction (My app depends on the saved Files)
parttionedRdds.map(new SingletonFunc());
The partti
Deenar, yes, you may indeed be overthinking it a bit, about how Spark
executes maps/filters etc. I'll focus on the high-order bits so it's clear.
Let's assume you're doing this in Java. Then you'd pass some
*MyMapper*instance to J
*avaRDD#map(myMapper)*.
So you'd have a class *MyMapper extends Fu
Christopher
Sorry I might be missing the obvious, but how do i get my function called on
all Executors used by the app? I dont want to use RDDs unless necessary.
once I start my shell or app, how do I get
TaskNonce.getSingleton().doThisOnce() executed on each executor?
@dmpour
>>rdd.mapPartitio
Deenar, dmpour is correct in that there's a many-to-many mapping between
executors and partitions (an executor can be assigned multiple partitions,
and a given partition can in principle move a different executor).
I'm not sure why you seem to require this problem statement to be solved
with RDDs.
How exactly does rdd.mapPartitions be executed once in each VM?
I am running mapPartitions and the call function seems not to execute the
code?
JavaPairRDD twos = input.map(new
Split()).sortByKey().partitionBy(new HashPartitioner(k));
twos.values().saveAsTextFile(args[2]);
JavaRDD ls = twos.va
Hi Christopher
>>which you would invoke as TaskNonce.getSingleton().doThisOnce() from
within the map closure.
Say I have a cluster with 24 workers (one thread per worker
SPARK_WORKER_CORES). My application would have 24 executors each with its
own VM.
The RDDs i process have millions of rows and
Deenar, the singleton pattern I'm suggesting would look something like this:
public class TaskNonce {
private transient boolean mIsAlreadyDone;
private static transient TaskNonce mSingleton = new TaskNonce();
private transient Object mSyncObject = new Object();
public TaskNonce getSing
Christopher
It is once per JVM. TaskNonce would meet my needs. I guess if I want it once
per thread, then a ThreadLocal would do the same.
But how do I invoke TaskNonce, what is the best way to generate a RDD to
ensure that there is one element per executor.
Deenar
--
View this message in c
Deenar, when you say "just once", have you defined "across multiple "
(e.g., across multiple threads in the same JVM on the same machine)? In
principle you can have multiple executors on the same machine.
In any case, assuming it's the same JVM, have you considered using a
singleton that maintains
10 matches
Mail list logo