Re: Accumulators not available in Task.taskMetrics

2017-10-05 Thread Tarun Kumar
Any response on this one? Thanks in advance! On Thu, Oct 5, 2017 at 1:44 PM, Tarun Kumar wrote: > Hi, I registered an accumulator in driver via > sparkContext.register(myCustomAccumulator, > "accumulator-name"). But this accumulator is not available in > task.metrics.accumulators() > list. Acc

Re: Accumulators and Datasets

2017-01-18 Thread Sean Owen
Accumulators aren't related directly to RDDs or Datasets. They're a separate construct. You can imagine updating accumulators in any distributed operation that you see documented for RDDs or Datasets. On Wed, Jan 18, 2017 at 2:16 PM Hanna Mäki wrote: > Hi, > > The documentation > (http://spark.a

Re: Accumulators displayed in SparkUI in 1.4.1?

2016-05-25 Thread Jacek Laskowski
On 25 May 2016 6:00 p.m., "Daniel Barclay" wrote: > > Was the feature of displaying accumulators in the Spark UI implemented in Spark 1.4.1, or was that added later? Dunno, but only *named* *accumulators* are displayed in Spark’s webUI (under Stages tab for a given stage). Jacek

Re: Accumulators internals and reliability

2015-10-26 Thread Adrian Tanase
I can reply from an user’s perspective – I defer to semantic guarantees to someone with more experience. I’ve successfully implemented the following using a custom Accumulable class: * Created a MapAccumulator with dynamic keys (they are driven by the data coming in), as opposed to creating

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-23 Thread Guillaume Pitel
Hi, So I've done this "Node-centered accumulator", I've written a small piece about it : http://blog.guillaume-pitel.fr/2015/06/spark-trick-shot-node-centered-aggregator/ Hope it can help someone Guillaume 2015-06-18 15:17 GMT+02:00 Guillaume Pitel >:

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Eugen Cepoi
BTW I suggest this instead of using thread locals as I am not sure in which situation spark will reuse or not them. For example if an error happens inside a thread, will spark then create a new one or the error is catched inside the thread preventing it to stop. So in short, does spark guarantee th

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Eugen Cepoi
2015-06-18 15:17 GMT+02:00 Guillaume Pitel : > I was thinking exactly the same. I'm going to try it, It doesn't really > matter if I lose an executor, since its sketch will be lost, but then > reexecuted somewhere else. > > I mean that between the action that will update the sketches and the acti

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Guillaume Pitel
I was thinking exactly the same. I'm going to try it, It doesn't really matter if I lose an executor, since its sketch will be lost, but then reexecuted somewhere else. And anyway, it's an approximate data structure, and what matters are ratios, not exact values. I mostly need to take care o

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Eugen Cepoi
Yeah thats the problem. There is probably some "perfect" num of partitions that provides the best balance between partition size and memory and merge overhead. Though it's not an ideal solution :( There could be another way but very hacky... for example if you store one sketch in a singleton per j

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Guillaume Pitel
Hi, Thank you for this confirmation. Coalescing is what we do now. It creates, however, very big partitions. Guillaume Hey, I am not 100% sure but from my understanding accumulators are per partition (so per task as its the same) and are sent back to the driver with the task result and merg

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Eugen Cepoi
Hey, I am not 100% sure but from my understanding accumulators are per partition (so per task as its the same) and are sent back to the driver with the task result and merged. When a task needs to be run n times (multiple rdds depend on this one, some partition loss later in the chain etc) then th

Re: Accumulators in Spark Streaming on UI

2015-05-26 Thread Justin Pihony
You need to make sure to name the accumulator. On Tue, May 26, 2015 at 2:23 PM, Snehal Nagmote wrote: > Hello all, > > I have accumulator in spark streaming application which counts number of > events received from Kafka. > > From the documentation , It seems Spark UI has support to display it

Re: Accumulators

2015-01-15 Thread Imran Rashid
You're understanding is basically correct. Each task creates it's own local accumulator, and just those results get merged together. However, there are some performance limitations to be aware of. First you need enough memory on the executors to build up whatever those intermediate results are.

Re: Accumulators

2015-01-14 Thread Corey Nolet
Just noticed an error in my wording. Should be " I'm assuming it's not immediately aggregating on the driver each time I call the += on the Accumulator." On Wed, Jan 14, 2015 at 9:19 PM, Corey Nolet wrote: > What are the limitations of using Accumulators to get a union of a bunch > of small set

Re: Accumulators : Task not serializable: java.io.NotSerializableException: org.apache.spark.SparkContext

2014-10-27 Thread Akhil Das
It works fine on my *Spark 1.1.0* Thanks Best Regards On Mon, Oct 27, 2014 at 12:22 AM, octavian.ganea wrote: > Hi Akhil, > > Please see this related message. > > http://apache-spark-user-list.1001560.n3.nabble.com/Bug-in-Accumulators-td17263.html > > I am curious if this works for you also. >

Re: Accumulators : Task not serializable: java.io.NotSerializableException: org.apache.spark.SparkContext

2014-10-26 Thread octavian.ganea
Hi Akhil, Please see this related message. http://apache-spark-user-list.1001560.n3.nabble.com/Bug-in-Accumulators-td17263.html I am curious if this works for you also. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Accumulators-Task-not-serializable-j

Re: Accumulators : Task not serializable: java.io.NotSerializableException: org.apache.spark.SparkContext

2014-10-26 Thread Akhil Das
Just tried the below code and works for me, not sure why is sparkContext being sent inside the mapPartitions function in your case. Can you try with simple map() instead of mapPartition? val ac = sc.accumulator(0) > val or = sc.parallelize(1 to 1) > val ps = or.map(x => (x,x+2)).map(x => ac +=