Accumulators and other important metrics for your job

2021-05-27 Thread Hamish Whittal
ould do: df.count() but this seems clunky (and expensive) for something that should be easy to keep track of. I then thought accumulators might be the solution, but it seems that I would have to do a second pass through the data at least to "addInPlace" to the lines total. I might as wel

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Something Something
t; -z >>>>>> > >>>>>> > On Fri, 29 May 2020 11:16:12 -0700 >>>>>> > Something Something wrote: >>>>>> > >>>>>> > > Did you try this on the Cluster? Note: This works just fine under >>>

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Srinivas V
the Cluster? Note: This works just fine under >>>>> 'Local' >>>>> > > mode. >>>>> > > >>>>> > > On Thu, May 28, 2020 at 9:12 PM ZHANG Wei >>>>> wrote: >>>>> > > >

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Something Something
reams.addListener(new StreamingQueryListener { >>>> > > > override def onQueryProgress(event: >>>> > > > StreamingQueryListener.QueryProgressEvent): Unit = { >>>> > > > println(event.progress.id + " is on progress") >>>

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Something Something
event.progress.id + " is on progress") >>> > > > println(s"My accu is ${myAcc.value} on query progress") >>> > > > } >>> > > > ... >>> > > > }) >>> > &

Re: Using Spark Accumulators with Structured Streaming

2020-06-08 Thread Srinivas V
t;>> key: $key => state: ${state}") >> > > > ... >> > > > } >> > > > >> > > > val wordCounts = words >> > > > .groupByKey(v => ...) >>

Re: Using Spark Accumulators with Structured Streaming

2020-06-07 Thread Something Something
..) > > > > .mapGroupsWithState(timeoutConf = > > > > GroupStateTimeout.ProcessingTimeTimeout)(func = mappingFunc) > > > > > > > > val query = wordCounts.writeStream > > > > .outputMode(OutputMode.Update) > > > &

Re: Using Spark Accumulators with Structured Streaming

2020-06-03 Thread ZHANG Wei
ut.ProcessingTimeTimeout)(func = mappingFunc) > > > > > > val query = wordCounts.writeStream > > > .outputMode(OutputMode.Update) > > > ... > > > ``` > > > > > > I'm wondering if there were any errors can be found from driver

Re: Using Spark Accumulators with Structured Streaming

2020-06-01 Thread ZHANG Wei
r logs? The > > micro-batch > > exceptions won't terminate the streaming job running. > > > > For the following code, we have to make sure that `StateUpdateTask` is > > started: > > > .mapGroupsWith

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
elUpdate call(String productId, Iterator > eventsIterator, GroupState state) { > } > } > > On Fri, May 29, 2020 at 1:08 PM Srinivas V wrote: > >> >> Yes, accumulators are updated in the call method of StateUpdateTask. Like >> when state times out or when the data

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Something Something
Update*> {. *--> I was expecting to see 'accumulator' here in the definition.* @Override public ModelUpdate call(String productId, Iterator eventsIterator, GroupState state) { } } On Fri, May 29, 2020 at 1:08 PM Srinivas V wrote: > > Yes, accumulators are update

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
Yes, accumulators are updated in the call method of StateUpdateTask. Like when state times out or when the data is pushed to next Kafka topic etc. On Fri, May 29, 2020 at 11:55 PM Something Something < mailinglist...@gmail.com> wrote: > Thanks! I will take a look at the link. Just one

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Something Something
Thanks! I will take a look at the link. Just one question, you seem to be passing 'accumulators' in the constructor but where do you use it in the StateUpdateTask class? I am still missing that connection. Sorry, if my question is dumb. I must be missing something. Thanks for your h

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Something Something
gs? The > micro-batch > exceptions won't terminate the streaming job running. > > For the following code, we have to make sure that `StateUpdateTask` is > started: > > .mapGroupsWithState( > > new >

Re: Using Spark Accumulators with Structured Streaming

2020-05-29 Thread Srinivas V
-> >> event.getId(), Encoders.STRING()) >> .mapGroupsWithState( >> new >> StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT), >> appConfig, accumulators), >>

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread ZHANG Wei
onfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT), > appConfig, accumulators), > Encoders.bean(ModelStateInfo.class), > Encoders.bean(ModelUpdate.class), > GroupStateTimeout.ProcessingTimeTimeout()); -- Cheers, -

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Something Something
gt; > event.getId(), Encoders.STRING()) > .mapGroupsWithState( > new > StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT), > appConfig, accumulators), > Encoders.bean(ModelStateInfo.class), >

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Srinivas V
Giving the code below: //accumulators is a class level variable in driver. sparkSession.streams().addListener(new StreamingQueryListener() { @Override public void onQueryStarted(QueryStartedEvent queryStarted) { logger.info("Query st

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread ZHANG Wei
new > StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT), > appConfig, accumulators), > Encoders.bean(ModelStateInfo.class), > Encoders.bean(ModelUpdate.class), >

Re: Using Spark Accumulators with Structured Streaming

2020-05-28 Thread Srinivas V
ate( new StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT), appConfig, accumulators), Encoders.bean(ModelStateInfo.class), Encoders.bean(ModelUpdate.cl

Re: Using Spark Accumulators with Structured Streaming

2020-05-27 Thread Something Something
'updateAcrossEvents'? We're experiencing this only under 'Stateful Structured Streaming'. In other streaming applications it works as expected. On Wed, May 27, 2020 at 9:01 AM Srinivas V wrote: > Yes, I am talking about Application specific Accumulators. Actually I

Re: Using Spark Accumulators with Structured Streaming

2020-05-27 Thread Srinivas V
Yes, I am talking about Application specific Accumulators. Actually I am getting the values printed in my driver log as well as sent to Grafana. Not sure where and when I saw 0 before. My deploy mode is “client” on a yarn cluster(not local Mac) where I submit from master node. It should work the

Re: Using Spark Accumulators with Structured Streaming

2020-05-26 Thread Something Something
Hmm... how would they go to Graphana if they are not getting computed in your code? I am talking about the Application Specific Accumulators. The other standard counters such as 'event.progress.inputRowsPerSecond' are getting populated correctly! On Mon, May 25, 2020 at 8:39 PM Sriniva

Re: Using Spark Accumulators with Structured Streaming

2020-05-25 Thread Srinivas V
;> [3] >> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.LongAccumulator >> >> ____ >> From: Something Something >> Sent: Saturday, May 16, 2020 0:38 >> To: spark-user >> Subject: Re: Using Spark Accumulators with Structured Streami

Re: Using Spark Accumulators with Structured Streaming

2020-05-25 Thread Something Something
__ > From: Something Something > Sent: Saturday, May 16, 2020 0:38 > To: spark-user > Subject: Re: Using Spark Accumulators with Structured Streaming > > Can someone from Spark Development team tell me if this functionality is > supported and tested? I've spent a l

Re: Using Spark Accumulators with Structured Streaming

2020-05-15 Thread ZHANG Wei
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.LongAccumulator From: Something Something Sent: Saturday, May 16, 2020 0:38 To: spark-user Subject: Re: Using Spark Accumulators with Structured Streaming Can someone from Spark Develo

Re: Using Spark Accumulators with Structured Streaming

2020-05-15 Thread Something Something
of one or more accumulators. Here's the definition: class CollectionLongAccumulator[T] extends AccumulatorV2[T, java.util.Map[T, Long]] When the job begins we register an instance of this class: spark.sparkContext.register(myAccumulator, "MyAccumulator") Is this working under Stru

Using Spark Accumulators with Structured Streaming

2020-05-14 Thread Something Something
In my structured streaming job I am updating Spark Accumulators in the updateAcrossEvents method but they are always 0 when I try to print them in my StreamingListener. Here's the code: .mapGroupsWithState(GroupStateTimeout.ProcessingTimeTimeout())( updateAcrossEvents )

Re: Use of Accumulators

2017-11-14 Thread Kedarnath Dixit
Yes! Thanks! ~Kedar Dixit Bigdata Analytics at Persistent Systems Ltd. From: Holden Karau Sent: 14 November 2017 20:04:50 To: Kedarnath Dixit Cc: user@spark.apache.org Subject: Re: Use of Accumulators And where do you want to read the toggle back from? On

Re: Use of Accumulators

2017-11-14 Thread Holden Karau
Apache Spark User List] [ > mailto:ml+s1001560n29995...@n3.nabble.com > ] > *Sent:* Tuesday, November 14, 2017 1:16 PM > *To:* Kedarnath Dixit > > > *Subject:* Re: Use of Accumulators > > > > So you want to set an accumulator to 1 after a transformation has fully

RE: Use of Accumulators

2017-11-14 Thread Kedarnath Dixit
: Holden Karau [via Apache Spark User List] [mailto:ml+s1001560n29995...@n3.nabble.com] Sent: Tuesday, November 14, 2017 1:16 PM To: Kedarnath Dixit mailto:kedarnath_di...@persistent.com>> Subject: Re: Use of Accumulators So you want to set an accumulator to 1 after a transformation has

Re: Use of Accumulators

2017-11-13 Thread Holden Karau
So you want to set an accumulator to 1 after a transformation has fully completed? Or what exactly do you want to do? On Mon, Nov 13, 2017 at 9:47 PM vaquar khan wrote: > Confirmed ,you can use Accumulators :) > > Regards, > Vaquar khan > > On Mon, Nov 13, 2017 at 10:58 A

Re: Use of Accumulators

2017-11-13 Thread vaquar khan
Confirmed ,you can use Accumulators :) Regards, Vaquar khan On Mon, Nov 13, 2017 at 10:58 AM, Kedarnath Dixit < kedarnath_di...@persistent.com> wrote: > Hi, > > > We need some way to toggle the flag of a variable in transformation. > > > We are thinking to make use

Use of Accumulators

2017-11-13 Thread Kedarnath Dixit
Hi, We need some way to toggle the flag of a variable in transformation. We are thinking to make use of spark Accumulators for this purpose. Can we use these as below: Variables -> Initial Value Variable1 -> 0 Variable2 -> 0 In one of the transformations if we nee

Dynamic Accumulators in 2.x?

2017-10-11 Thread David Capwell
path single-threaded and pass around the result when the task competes; which sounds like AccumulatorV2. I started rewriting the instrumented logic to be based off accumulators, but having a hard time getting these to show up in the UI/API (using this to see if I am linking things properly). So my

Re: Accumulators not available in Task.taskMetrics

2017-10-05 Thread Tarun Kumar
t way: > > AccumulatorContext.lookForAccumulatorByName("accumulator-name"). > > map(accum => { > accum.asInstanceOf[MyCustomAccumulator].add(*k, v*)) > }) > > > Second way: > > taskContext.taskMetrics().accumulators(). > filter(_.name ==

Accumulators not available in Task.taskMetrics

2017-10-05 Thread Tarun Kumar
d(*k, v*)) }) Second way: taskContext.taskMetrics().accumulators(). filter(_.name == Some("accumulator-name")). map(accum => { accum.asInstanceOf[MyCustomAccumulator].add(*k, v*)) }) Thanks Tarun

Re: [Spark] Accumulators or count()

2017-03-01 Thread Daniel Siegmann
As you noted, Accumulators do not guarantee accurate results except in specific situations. I recommend never using them. This article goes into some detail on the problems with accumulators: http://imranrashid.com/posts/Spark-Accumulators/ On Wed, Mar 1, 2017 at 7:26 AM, Charles O. Bajomo

[Spark] Accumulators or count()

2017-03-01 Thread Charles O. Bajomo
Hello everyone, I wanted to know if there is any benefit to using an acculumator over just executing a count() on the whole RDD. There seems to be a lot of issues with accumulator during a stage failure and also seems to be an issue rebuilding them if the application restarts from a checkpoint

Re: Accumulators and Datasets

2017-01-18 Thread Sean Owen
Accumulators aren't related directly to RDDs or Datasets. They're a separate construct. You can imagine updating accumulators in any distributed operation that you see documented for RDDs or Datasets. On Wed, Jan 18, 2017 at 2:16 PM Hanna Mäki wrote: > Hi, > > The do

Accumulators and Datasets

2017-01-18 Thread Hanna Mäki
Hi, The documentation (http://spark.apache.org/docs/latest/programming-guide.html#accumulators) describes how to use accumulators with RDDs, but I'm wondering if and how I can use accumulators with the Dataset API. BR, Hanna -- View this message in context: http://apache-spark-user

Re: Few questions on reliability of accumulators value.

2016-12-15 Thread Steve Loughran
On 12 Dec 2016, at 19:57, Daniel Siegmann mailto:dsiegm...@securityscorecard.io>> wrote: Accumulators are generally unreliable and should not be used. The answer to (2) and (4) is yes. The answer to (3) is both. Here's a more in-depth explanation: http://imranrashid.com/

Re: Few questions on reliability of accumulators value.

2016-12-13 Thread Sudev A C
Thank you for the clarification. On Tue, Dec 13, 2016 at 1:27 AM Daniel Siegmann < dsiegm...@securityscorecard.io> wrote: > Accumulators are generally unreliable and should not be used. The answer > to (2) and (4) is yes. The answer to (3) is both. > > Here's a more in-de

Re: Few questions on reliability of accumulators value.

2016-12-12 Thread Daniel Siegmann
Accumulators are generally unreliable and should not be used. The answer to (2) and (4) is yes. The answer to (3) is both. Here's a more in-depth explanation: http://imranrashid.com/posts/Spark-Accumulators/ On Sun, Dec 11, 2016 at 11:27 AM, Sudev A C wrote: > Please help. >

Re: Few questions on reliability of accumulators value.

2016-12-11 Thread Sudev A C
Please help. Anyone, any thoughts on the previous mail ? Thanks Sudev On Fri, Dec 9, 2016 at 2:28 PM Sudev A C wrote: > Hi, > > Can anyone please help clarity on how accumulators can be used reliably to > measure error/success/analytical metrics ? > > Given below is use c

Few questions on reliability of accumulators value.

2016-12-09 Thread Sudev A C
Hi, Can anyone please help clarity on how accumulators can be used reliably to measure error/success/analytical metrics ? Given below is use case / code snippet that I have. val amtZero = sc.accumulator(0) > val amtLarge = sc.accumulator(0) > val amtNormal = sc.accumulator(0) > val

Fault-tolerant Accumulators in a DStream-only transformations.

2016-11-29 Thread Amit Sela
Hi all, In order to recover Accumulators (functionally) from a Driver failure, it is recommended to use it within a foreachRDD/transform and use the RDD context with a Singleton wrapping the Accumulator as shown in the examples <https://github.com/apache/spark/blob/branch-1.6/examples/src/m

Fault-tolerant Accumulators in stateful operators.

2016-11-22 Thread Amit Sela
Hi all, To recover (functionally) Accumulators from Driver failure in a streaming application, we wrap them in a "getOrCreate" Singleton as shown here <https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCou

Using accumulators in Local mode for testing

2016-07-11 Thread harelglik
Hi, I am writing an app in Spark ( 1.6.1 ) in which I am using an accumulator. My accumulator is simply counting rows: acc += 1. My test processes 4 files each with 4 rows however the value of the accumulator in the end is not 16 and even worse is inconsistent between runs. Are accumulators not

Re: Accumulators displayed in SparkUI in 1.4.1?

2016-05-25 Thread Jacek Laskowski
On 25 May 2016 6:00 p.m., "Daniel Barclay" wrote: > > Was the feature of displaying accumulators in the Spark UI implemented in Spark 1.4.1, or was that added later? Dunno, but only *named* *accumulators* are displayed in Spark’s webUI (under Stages tab for a given stage). Jacek

Accumulators displayed in SparkUI in 1.4.1?

2016-05-25 Thread Daniel Barclay
Was the feature of displaying accumulators in the Spark UI implemented in Spark 1.4.1, or was that added later? Thanks, Daniel - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail

Too Many Accumulators in my Spark Job

2016-03-12 Thread Harshvardhan Chauhan
Hi, My question is about having a lot of counters in spark to keep track of bad/null values in my rdd its descried in detail in below stackoverflow link http://stackoverflow.com/questions/35953400/too-many-accumulators-in-spark-job Posting to the user group to get more traction. Appreciate your

Re: Double Counting When Using Accumulators with Spark Streaming

2016-01-05 Thread Shixiong(Ryan) Zhu
rg.apache.spark.scheduler.TaskSetManager - Starting task 0.0 in > stage 1.0 (TID 1, localhost, ANY, 2026 bytes) > > INFO : org.apache.spark.executor.Executor - Running task 0.0 in stage 1.0 > (TID 1) > > INFO : org.apache.spark.streaming.kafka.KafkaRDD - Computing topic test11, > part

RE: Double Counting When Using Accumulators with Spark Streaming

2016-01-05 Thread Rachana Srivastava
esult sent to driver INFO : org.apache.spark.scheduler.DAGScheduler - ResultStage 1 (foreachRDD at KafkaURLStreaming.java:90) finished in 0.103 s INFO : org.apache.spark.scheduler.DAGScheduler - Job 1 finished: foreachRDD at KafkaURLStreaming.java:90, took 0.151210 s &&&&&&&

Re: Double Counting When Using Accumulators with Spark Streaming

2016-01-05 Thread Jean-Baptiste Onofré
Hi Rachana, don't you have two messages on the kafka broker ? Regards JB On 01/05/2016 05:14 PM, Rachana Srivastava wrote: I have a very simple two lines program. I am getting input from Kafka and save the input in a file and counting the input received. My code looks like this, when I run t

Double Counting When Using Accumulators with Spark Streaming

2016-01-05 Thread Rachana Srivastava
I have a very simple two lines program. I am getting input from Kafka and save the input in a file and counting the input received. My code looks like this, when I run this code I am getting two accumulator count for each input. HashMap kafkaParams = new HashMap(); kafkaParams.put("metadata.

Re: Spark Streaming - print accumulators value every period as logs

2015-12-25 Thread Ali Gouta
e functions (Scala). I use > accumulators (passed to functions constructors) to count stuff. > > In the batch driver, doing so in the right point of the pipeline, I'm able > to retrieve the accumulator value and print it as log4j log. > > In the streaming driver, doing the sa

Spark Streaming - print accumulators value every period as logs

2015-12-24 Thread Roberto Coluccio
Hello, I have a batch and a streaming driver using same functions (Scala). I use accumulators (passed to functions constructors) to count stuff. In the batch driver, doing so in the right point of the pipeline, I'm able to retrieve the accumulator value and print it as log4j log. I

Re: Accumulators internals and reliability

2015-10-26 Thread Adrian Tanase
creating many discrete accumulators * The merge operation is add the values on key conflict * I’m adding K->Vs to this accumulator in a variety of places (maps, flatmaps, transforms and updateStateBy key) * In a foreachRdd at the end of the transformations I’m reading the accumulator

Accumulators internals and reliability

2015-10-26 Thread Sela, Amit
It seems like there is not much literature about Spark's Accumulators so I thought I'd ask here: Do Accumulators reside in a Task ? Are they being serialized with the task ? Sent back on task completion as part of the ResultTask ? Are they reliable ? If so, when ? Can I relay on ac

Re: NullPointException Help while using accumulators

2015-08-03 Thread Anubhav Agarwal
.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:647) at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:647) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.

Re: NullPointException Help while using accumulators

2015-08-03 Thread Ted Yu
Putting your code in a file I find the following on line 17: stepAcc = new StepAccumulator(); However I don't think that was where the NPE was thrown. Another thing I don't understand was that there were two addAccumulator() calls at the top of stack trace while in your code I don'

Re: NullPointException Help while using accumulators

2015-08-03 Thread Anubhav Agarwal
The code was written in 1.4 but I am compiling it and running it with 1.3. import it.unimi.dsi.fastutil.objects.Object2ObjectOpenHashMap; import org.apache.spark.AccumulableParam; import scala.Tuple4; import thomsonreuters.trailblazer.operation.DriverCalc; import thomsonreuters.trailblazer.operati

Re: NullPointException Help while using accumulators

2015-08-03 Thread Ted Yu
Can you show related code in DriverAccumulator.java ? Which Spark release do you use ? Cheers On Mon, Aug 3, 2015 at 3:13 PM, Anubhav Agarwal wrote: > Hi, > I am trying to modify my code to use HDFS and multiple nodes. The code > works fine when I run it locally in a single machine with a sing

NullPointException Help while using accumulators

2015-08-03 Thread Anubhav Agarwal
Hi, I am trying to modify my code to use HDFS and multiple nodes. The code works fine when I run it locally in a single machine with a single worker. I have been trying to modify it and I get the following error. Any hint would be helpful. java.lang.NullPointerException at thomsonreuters.

Re: broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Tathagata Das
ultTask.runTask(ResultTask.scala:61) >>> at org.apache.spark.scheduler.Task.run(Task.scala:64) >>> at >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(

Re: broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Shushant Arora
pache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>

Re: broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Tathagata Das
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > 2.For accumulator variable it says : > 15/07/29 19:23:12 ER

broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Shushant Arora
update accumulators for ResultTask(1, 16) java.util.NoSuchElementException: key not found: 2 at scala.collection.MapLike$class.default(MapLike.scala:228) at scala.collection.AbstractMap.default(Map.scala:58) at scala.collection.mutable.HashMap.apply(HashMap.scala:64

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-23 Thread Guillaume Pitel
owever, very big partitions. Guillaume Hey, I am not 100% sure but from my understanding accumulators are per partition (so per task as its the same) and are sent back to the driver with the task result and merged. When a task needs to be run

Re: Using Accumulators in Streaming

2015-06-22 Thread Michal Čizmazia
I stumbled upon zipWithUniqueId/zipWithIndex. Is this what you are looking for? https://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaRDDLike.html#zipWithUniqueId() On 22 June 2015 at 06:16, Michal Čizmazia wrote: > If I am not mistaken, one way to see the accumulat

Re: Using Accumulators in Streaming

2015-06-22 Thread Michal Čizmazia
If I am not mistaken, one way to see the accumulators is that they are just write-only for the workers and their value can be read by the driver. Therefore they cannot be used for ID generation as you wish. On 22 June 2015 at 04:30, anshu shukla wrote: > But i just want to update rdd ,

Re: Using Accumulators in Streaming

2015-06-22 Thread anshu shukla
e 2015 at 21:32, Will Briggs wrote: > >> It sounds like accumulators are not necessary in Spark Streaming - see >> this post ( >> http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html) >> for more details. >> >&g

Re: Using Accumulators in Streaming

2015-06-21 Thread Michal Čizmazia
StreamingContext.sparkContext() On 21 June 2015 at 21:32, Will Briggs wrote: > It sounds like accumulators are not necessary in Spark Streaming - see > this post ( > http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html) > for more detai

Re: Using Accumulators in Streaming

2015-06-21 Thread Will Briggs
It sounds like accumulators are not necessary in Spark Streaming - see this post ( http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html) for more details. On June 21, 2015, at 7:31 PM, anshu shukla wrote: In spark Streaming ,Since we are already

Using Accumulators in Streaming

2015-06-21 Thread anshu shukla
In spark Streaming ,Since we are already having Streaming context , which does not allows us to have accumulators .We have to get sparkContext for initializing accumulator value . But having 2 spark context will not serve the problem . Please Help !! -- Thanks & Regards, Anshu Shukla

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Eugen Cepoi
of a hack but should work. >> >> Note that if you loose an executor in between, then that doesn't work >> anymore, probably you could detect it and recompute the sketches, but it >> would become over complicated. >> >> >> >> 2015-06-18 14:27 GMT+02:0

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Eugen Cepoi
. >> >> Coalescing is what we do now. It creates, however, very big partitions. >> >> Guillaume >> >> Hey, >> >> I am not 100% sure but from my understanding accumulators are per >> partition (so per task as its the same) and are sent back to the driver

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Guillaume Pitel
ation. Coalescing is what we do now. It creates, however, very big partitions. Guillaume Hey, I am not 100% sure but from my understanding accumulators are per partition (so per task as its the same) and are sent back to the driver with the task result and merged. When

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Eugen Cepoi
> Hey, > > I am not 100% sure but from my understanding accumulators are per > partition (so per task as its the same) and are sent back to the driver > with the task result and merged. When a task needs to be run n times > (multiple rdds depend on this one, some partition loss lat

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Guillaume Pitel
Hi, Thank you for this confirmation. Coalescing is what we do now. It creates, however, very big partitions. Guillaume Hey, I am not 100% sure but from my understanding accumulators are per partition (so per task as its the same) and are sent back to the driver with the task result and

Re: Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Eugen Cepoi
Hey, I am not 100% sure but from my understanding accumulators are per partition (so per task as its the same) and are sent back to the driver with the task result and merged. When a task needs to be run n times (multiple rdds depend on this one, some partition loss later in the chain etc) then

Accumulators / Accumulables : thread-local, task-local, executor-local ?

2015-06-18 Thread Guillaume Pitel
Hi, I'm trying to figure out the smartest way to implement a global count-min-sketch on accumulators. For now, we are doing that with RDDs. It works well, but with one sketch per partition, merging takes too long. As you probably know, a count-min sketch is a big mutable array of arra

Re: Accumulators in Spark Streaming on UI

2015-05-26 Thread Justin Pihony
You need to make sure to name the accumulator. On Tue, May 26, 2015 at 2:23 PM, Snehal Nagmote wrote: > Hello all, > > I have accumulator in spark streaming application which counts number of > events received from Kafka. > > From the documentation , It seems Spark UI has support to display it

Accumulators in Spark Streaming on UI

2015-05-26 Thread Snehal Nagmote
Hello all, I have accumulator in spark streaming application which counts number of events received from Kafka. >From the documentation , It seems Spark UI has support to display it . But I am unable to see it on UI. I am using spark 1.3.1 Do I need to call any method (print) or am I missing

Re: Questions about Accumulators

2015-05-03 Thread Dean Wampler
, the map() operation > re-executed(map(x => accumulator += x) re-executed), then the final result > of acculumator will be "2", twice as the correct result? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Questions

Re: Questions about Accumulators

2015-05-03 Thread xiazhuchang
ew this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746p22747.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-un

Re: Questions about Accumulators

2015-05-03 Thread Eugen Cepoi
e than once if the task restarte? And then the final result > will be many times of the real result? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746.html

Re: Questions about Accumulators

2015-05-03 Thread Ignacio Blasco
Given the lazy nature of an RDD if you use an accumulator inside a map() and then you call count and saveAsTextfile over that accumulator will be called twice. IMHO, accumulators are a bit nondeterministic you need to be sure when to read them to avoid unexpected re-executions El 3/5/2015 2:09 p

Questions about Accumulators

2015-05-03 Thread xiazhuchang
peration), this operation will be execuated more than once if the task restarte? And then the final result will be many times of the real result? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746.html Sent from the Apache

Re: Negative Accumulators

2015-01-30 Thread francois . garillot
Sanity-check: would it be possible that `threshold_var` be negative ? — FG On Fri, Jan 30, 2015 at 5:06 PM, Peter Thai wrote: > Hello, > I am seeing negative values for accumulators. Here's my implementation in a > standalone app in Spark 1.1.1rc: > implicit object BigIn

Negative Accumulators

2015-01-30 Thread Peter Thai
Hello, I am seeing negative values for accumulators. Here's my implementation in a standalone app in Spark 1.1.1rc: implicit object BigIntAccumulatorParam extends AccumulatorParam[BigInt] { def addInPlace(t1: Int, t2: BigInt) = BigInt(t1) + t2 def addInPlace(t1: BigInt, t2: B

Re: Accumulators

2015-01-15 Thread Imran Rashid
an error in my wording. > > Should be " I'm assuming it's not immediately aggregating on the driver > each time I call the += on the Accumulator." > > On Wed, Jan 14, 2015 at 9:19 PM, Corey Nolet wrote: > >> What are the limitations of using Accumulators to

Accumulators

2015-01-14 Thread Corey Nolet
What are the limitations of using Accumulators to get a union of a bunch of small sets? Let's say I have an RDD[Map{String,Any} and i want to do: rdd.map(accumulator += Set(_.get("entityType").get)) What implication does this have on performance? I'm assuming it's no

Re: Accumulators

2015-01-14 Thread Corey Nolet
Just noticed an error in my wording. Should be " I'm assuming it's not immediately aggregating on the driver each time I call the += on the Accumulator." On Wed, Jan 14, 2015 at 9:19 PM, Corey Nolet wrote: > What are the limitations of using Accumulators to get a unio

Question on Spark UI/accumulators

2015-01-05 Thread Virgil Palanciuc
Hi, The Spark documentation states that "If accumulators are created with a name, they will be displayed in Spark’s UI" http://spark.apache.org/docs/latest/programming-guide.html#accumulators Where exactly are they shown? I may be dense, but I can't find them on the UI from http:/

Spark Accumulators exposed as Metrics to Graphite

2014-12-30 Thread Łukasz Stefaniak
Hi Does spark have built in possiblity of exposing current value of Accumulator [1] using Monitoring and Instrumentation [2]. Unfortunately I couldn't find anything in Sources which could be used. Does it mean only way to expose current accumulator value is to implement new Source which would hook

Re: Understanding disk usage with Accumulators

2014-12-16 Thread Ganelin, Ilya
day, December 16, 2014 at 10:23 AM To: "'user@spark.apache.org'" mailto:user@spark.apache.org>> Subject: Understanding disk usage with Accumulators Hi all – I’m running a long running batch-processing job with Spark through Yarn. I am doing the following Batch Process

Understanding disk usage with Accumulators

2014-12-16 Thread Ganelin, Ilya
Hi all – I’m running a long running batch-processing job with Spark through Yarn. I am doing the following Batch Process val resultsArr = sc.accumulableCollection(mutable.ArrayBuffer[ListenableFuture[Result]]()) InMemoryArray.forEach{ 1) Using a thread pool, generate callable jobs that operate

Re: Negative Accumulators

2014-12-02 Thread Peter Thai
ram) accu: org.apache.spark.Accumulator[scala.math.BigInt] = 0 scala> accu += 100 scala> accu.value res1: scala.math.BigInt = 100 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Negative-Accumulators-tp19706p20199.html Sent from the Apache Spark User List maili

  1   2   >