ould do:
df.count()
but this seems clunky (and expensive) for something that should be easy to
keep track of. I then thought accumulators might be the solution, but it
seems that I would have to do a second pass through the data at least to
"addInPlace" to the lines total. I might as wel
t; -z
>>>>>> >
>>>>>> > On Fri, 29 May 2020 11:16:12 -0700
>>>>>> > Something Something wrote:
>>>>>> >
>>>>>> > > Did you try this on the Cluster? Note: This works just fine under
>>>
the Cluster? Note: This works just fine under
>>>>> 'Local'
>>>>> > > mode.
>>>>> > >
>>>>> > > On Thu, May 28, 2020 at 9:12 PM ZHANG Wei
>>>>> wrote:
>>>>> > >
>
reams.addListener(new StreamingQueryListener {
>>>> > > > override def onQueryProgress(event:
>>>> > > > StreamingQueryListener.QueryProgressEvent): Unit = {
>>>> > > > println(event.progress.id + " is on progress")
>>>
event.progress.id + " is on progress")
>>> > > > println(s"My accu is ${myAcc.value} on query progress")
>>> > > > }
>>> > > > ...
>>> > > > })
>>> > &
t;>> key: $key => state: ${state}")
>> > > > ...
>> > > > }
>> > > >
>> > > > val wordCounts = words
>> > > > .groupByKey(v => ...)
>>
..)
> > > > .mapGroupsWithState(timeoutConf =
> > > > GroupStateTimeout.ProcessingTimeTimeout)(func = mappingFunc)
> > > >
> > > > val query = wordCounts.writeStream
> > > > .outputMode(OutputMode.Update)
> > > &
ut.ProcessingTimeTimeout)(func = mappingFunc)
> > >
> > > val query = wordCounts.writeStream
> > > .outputMode(OutputMode.Update)
> > > ...
> > > ```
> > >
> > > I'm wondering if there were any errors can be found from driver
r logs? The
> > micro-batch
> > exceptions won't terminate the streaming job running.
> >
> > For the following code, we have to make sure that `StateUpdateTask` is
> > started:
> > > .mapGroupsWith
elUpdate call(String productId, Iterator
> eventsIterator, GroupState state) {
> }
> }
>
> On Fri, May 29, 2020 at 1:08 PM Srinivas V wrote:
>
>>
>> Yes, accumulators are updated in the call method of StateUpdateTask. Like
>> when state times out or when the data
Update*> {. *--> I was expecting to
see 'accumulator' here in the definition.*
@Override
public ModelUpdate call(String productId, Iterator
eventsIterator, GroupState state) {
}
}
On Fri, May 29, 2020 at 1:08 PM Srinivas V wrote:
>
> Yes, accumulators are update
Yes, accumulators are updated in the call method of StateUpdateTask. Like
when state times out or when the data is pushed to next Kafka topic etc.
On Fri, May 29, 2020 at 11:55 PM Something Something <
mailinglist...@gmail.com> wrote:
> Thanks! I will take a look at the link. Just one
Thanks! I will take a look at the link. Just one question, you seem to be
passing 'accumulators' in the constructor but where do you use it in the
StateUpdateTask class? I am still missing that connection. Sorry, if my
question is dumb. I must be missing something. Thanks for your h
gs? The
> micro-batch
> exceptions won't terminate the streaming job running.
>
> For the following code, we have to make sure that `StateUpdateTask` is
> started:
> > .mapGroupsWithState(
> > new
>
->
>> event.getId(), Encoders.STRING())
>> .mapGroupsWithState(
>> new
>> StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT),
>> appConfig, accumulators),
>>
onfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT),
> appConfig, accumulators),
> Encoders.bean(ModelStateInfo.class),
> Encoders.bean(ModelUpdate.class),
> GroupStateTimeout.ProcessingTimeTimeout());
--
Cheers,
-
gt;
> event.getId(), Encoders.STRING())
> .mapGroupsWithState(
> new
> StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT),
> appConfig, accumulators),
> Encoders.bean(ModelStateInfo.class),
>
Giving the code below:
//accumulators is a class level variable in driver.
sparkSession.streams().addListener(new StreamingQueryListener() {
@Override
public void onQueryStarted(QueryStartedEvent queryStarted) {
logger.info("Query st
new
> StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT),
> appConfig, accumulators),
> Encoders.bean(ModelStateInfo.class),
> Encoders.bean(ModelUpdate.class),
>
ate(
new
StateUpdateTask(Long.parseLong(appConfig.getSparkStructuredStreamingConfig().STATE_TIMEOUT),
appConfig, accumulators),
Encoders.bean(ModelStateInfo.class),
Encoders.bean(ModelUpdate.cl
'updateAcrossEvents'? We're
experiencing this only under 'Stateful Structured Streaming'. In other
streaming applications it works as expected.
On Wed, May 27, 2020 at 9:01 AM Srinivas V wrote:
> Yes, I am talking about Application specific Accumulators. Actually I
Yes, I am talking about Application specific Accumulators. Actually I am
getting the values printed in my driver log as well as sent to Grafana. Not
sure where and when I saw 0 before. My deploy mode is “client” on a yarn
cluster(not local Mac) where I submit from master node. It should work the
Hmm... how would they go to Graphana if they are not getting computed in
your code? I am talking about the Application Specific Accumulators. The
other standard counters such as 'event.progress.inputRowsPerSecond' are
getting populated correctly!
On Mon, May 25, 2020 at 8:39 PM Sriniva
;> [3]
>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.LongAccumulator
>>
>> ____
>> From: Something Something
>> Sent: Saturday, May 16, 2020 0:38
>> To: spark-user
>> Subject: Re: Using Spark Accumulators with Structured Streami
__
> From: Something Something
> Sent: Saturday, May 16, 2020 0:38
> To: spark-user
> Subject: Re: Using Spark Accumulators with Structured Streaming
>
> Can someone from Spark Development team tell me if this functionality is
> supported and tested? I've spent a l
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.LongAccumulator
From: Something Something
Sent: Saturday, May 16, 2020 0:38
To: spark-user
Subject: Re: Using Spark Accumulators with Structured Streaming
Can someone from Spark Develo
of one or more
accumulators. Here's the definition:
class CollectionLongAccumulator[T]
extends AccumulatorV2[T, java.util.Map[T, Long]]
When the job begins we register an instance of this class:
spark.sparkContext.register(myAccumulator, "MyAccumulator")
Is this working under Stru
In my structured streaming job I am updating Spark Accumulators in the
updateAcrossEvents method but they are always 0 when I try to print them in
my StreamingListener. Here's the code:
.mapGroupsWithState(GroupStateTimeout.ProcessingTimeTimeout())(
updateAcrossEvents
)
Yes!
Thanks!
~Kedar Dixit
Bigdata Analytics at Persistent Systems Ltd.
From: Holden Karau
Sent: 14 November 2017 20:04:50
To: Kedarnath Dixit
Cc: user@spark.apache.org
Subject: Re: Use of Accumulators
And where do you want to read the toggle back from? On
Apache Spark User List] [
> mailto:ml+s1001560n29995...@n3.nabble.com
> ]
> *Sent:* Tuesday, November 14, 2017 1:16 PM
> *To:* Kedarnath Dixit
>
>
> *Subject:* Re: Use of Accumulators
>
>
>
> So you want to set an accumulator to 1 after a transformation has fully
: Holden Karau [via Apache Spark User List]
[mailto:ml+s1001560n29995...@n3.nabble.com]
Sent: Tuesday, November 14, 2017 1:16 PM
To: Kedarnath Dixit
mailto:kedarnath_di...@persistent.com>>
Subject: Re: Use of Accumulators
So you want to set an accumulator to 1 after a transformation has
So you want to set an accumulator to 1 after a transformation has fully
completed? Or what exactly do you want to do?
On Mon, Nov 13, 2017 at 9:47 PM vaquar khan wrote:
> Confirmed ,you can use Accumulators :)
>
> Regards,
> Vaquar khan
>
> On Mon, Nov 13, 2017 at 10:58 A
Confirmed ,you can use Accumulators :)
Regards,
Vaquar khan
On Mon, Nov 13, 2017 at 10:58 AM, Kedarnath Dixit <
kedarnath_di...@persistent.com> wrote:
> Hi,
>
>
> We need some way to toggle the flag of a variable in transformation.
>
>
> We are thinking to make use
Hi,
We need some way to toggle the flag of a variable in transformation.
We are thinking to make use of spark Accumulators for this purpose.
Can we use these as below:
Variables -> Initial Value
Variable1 -> 0
Variable2 -> 0
In one of the transformations if we nee
path single-threaded and pass around the result when the task
competes; which sounds like AccumulatorV2.
I started rewriting the instrumented logic to be based off accumulators,
but having a hard time getting these to show up in the UI/API (using this
to see if I am linking things properly).
So my
t way:
>
> AccumulatorContext.lookForAccumulatorByName("accumulator-name").
>
> map(accum => {
> accum.asInstanceOf[MyCustomAccumulator].add(*k, v*))
> })
>
>
> Second way:
>
> taskContext.taskMetrics().accumulators().
> filter(_.name ==
d(*k, v*))
})
Second way:
taskContext.taskMetrics().accumulators().
filter(_.name == Some("accumulator-name")).
map(accum => {
accum.asInstanceOf[MyCustomAccumulator].add(*k, v*))
})
Thanks
Tarun
As you noted, Accumulators do not guarantee accurate results except in
specific situations. I recommend never using them.
This article goes into some detail on the problems with accumulators:
http://imranrashid.com/posts/Spark-Accumulators/
On Wed, Mar 1, 2017 at 7:26 AM, Charles O. Bajomo
Hello everyone,
I wanted to know if there is any benefit to using an acculumator over just
executing a count() on the whole RDD. There seems to be a lot of issues with
accumulator during a stage failure and also seems to be an issue rebuilding
them if the application restarts from a checkpoint
Accumulators aren't related directly to RDDs or Datasets. They're a
separate construct. You can imagine updating accumulators in any
distributed operation that you see documented for RDDs or Datasets.
On Wed, Jan 18, 2017 at 2:16 PM Hanna Mäki wrote:
> Hi,
>
> The do
Hi,
The documentation
(http://spark.apache.org/docs/latest/programming-guide.html#accumulators)
describes how to use accumulators with RDDs, but I'm wondering if and how I
can use accumulators with the Dataset API.
BR,
Hanna
--
View this message in context:
http://apache-spark-user
On 12 Dec 2016, at 19:57, Daniel Siegmann
mailto:dsiegm...@securityscorecard.io>> wrote:
Accumulators are generally unreliable and should not be used. The answer to (2)
and (4) is yes. The answer to (3) is both.
Here's a more in-depth explanation:
http://imranrashid.com/
Thank you for the clarification.
On Tue, Dec 13, 2016 at 1:27 AM Daniel Siegmann <
dsiegm...@securityscorecard.io> wrote:
> Accumulators are generally unreliable and should not be used. The answer
> to (2) and (4) is yes. The answer to (3) is both.
>
> Here's a more in-de
Accumulators are generally unreliable and should not be used. The answer to
(2) and (4) is yes. The answer to (3) is both.
Here's a more in-depth explanation:
http://imranrashid.com/posts/Spark-Accumulators/
On Sun, Dec 11, 2016 at 11:27 AM, Sudev A C wrote:
> Please help.
>
Please help.
Anyone, any thoughts on the previous mail ?
Thanks
Sudev
On Fri, Dec 9, 2016 at 2:28 PM Sudev A C wrote:
> Hi,
>
> Can anyone please help clarity on how accumulators can be used reliably to
> measure error/success/analytical metrics ?
>
> Given below is use c
Hi,
Can anyone please help clarity on how accumulators can be used reliably to
measure error/success/analytical metrics ?
Given below is use case / code snippet that I have.
val amtZero = sc.accumulator(0)
> val amtLarge = sc.accumulator(0)
> val amtNormal = sc.accumulator(0)
> val
Hi all,
In order to recover Accumulators (functionally) from a Driver failure, it
is recommended to use it within a foreachRDD/transform and use the RDD
context with a Singleton wrapping the Accumulator as shown in the examples
<https://github.com/apache/spark/blob/branch-1.6/examples/src/m
Hi all,
To recover (functionally) Accumulators from Driver failure in a streaming
application, we wrap them in a "getOrCreate" Singleton as shown here
<https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCou
Hi,
I am writing an app in Spark ( 1.6.1 ) in which I am using an accumulator.
My accumulator is simply counting rows: acc += 1.
My test processes 4 files each with 4 rows however the value of the
accumulator in the end is not 16 and even worse is inconsistent between
runs.
Are accumulators not
On 25 May 2016 6:00 p.m., "Daniel Barclay"
wrote:
>
> Was the feature of displaying accumulators in the Spark UI implemented in
Spark 1.4.1, or was that added later?
Dunno, but only *named* *accumulators* are displayed in Spark’s webUI
(under Stages tab for a given stage).
Jacek
Was the feature of displaying accumulators in the Spark UI implemented in Spark
1.4.1, or was that added later?
Thanks,
Daniel
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail
Hi,
My question is about having a lot of counters in spark to keep track of bad/null
values in my rdd its descried in detail in below stackoverflow link
http://stackoverflow.com/questions/35953400/too-many-accumulators-in-spark-job
Posting to the user group to get more traction. Appreciate your
rg.apache.spark.scheduler.TaskSetManager - Starting task 0.0 in
> stage 1.0 (TID 1, localhost, ANY, 2026 bytes)
>
> INFO : org.apache.spark.executor.Executor - Running task 0.0 in stage 1.0
> (TID 1)
>
> INFO : org.apache.spark.streaming.kafka.KafkaRDD - Computing topic test11,
> part
esult sent to driver
INFO : org.apache.spark.scheduler.DAGScheduler - ResultStage 1 (foreachRDD at
KafkaURLStreaming.java:90) finished in 0.103 s
INFO : org.apache.spark.scheduler.DAGScheduler - Job 1 finished: foreachRDD at
KafkaURLStreaming.java:90, took 0.151210 s
&&&&&&&
Hi Rachana,
don't you have two messages on the kafka broker ?
Regards
JB
On 01/05/2016 05:14 PM, Rachana Srivastava wrote:
I have a very simple two lines program. I am getting input from Kafka
and save the input in a file and counting the input received. My code
looks like this, when I run t
I have a very simple two lines program. I am getting input from Kafka and save
the input in a file and counting the input received. My code looks like this,
when I run this code I am getting two accumulator count for each input.
HashMap kafkaParams = new HashMap();
kafkaParams.put("metadata.
e functions (Scala). I use
> accumulators (passed to functions constructors) to count stuff.
>
> In the batch driver, doing so in the right point of the pipeline, I'm able
> to retrieve the accumulator value and print it as log4j log.
>
> In the streaming driver, doing the sa
Hello,
I have a batch and a streaming driver using same functions (Scala). I use
accumulators (passed to functions constructors) to count stuff.
In the batch driver, doing so in the right point of the pipeline, I'm able
to retrieve the accumulator value and print it as log4j log.
I
creating many discrete accumulators
* The merge operation is add the values on key conflict
* I’m adding K->Vs to this accumulator in a variety of places (maps,
flatmaps, transforms and updateStateBy key)
* In a foreachRdd at the end of the transformations I’m reading the
accumulator
It seems like there is not much literature about Spark's Accumulators so I
thought I'd ask here:
Do Accumulators reside in a Task ? Are they being serialized with the task ?
Sent back on task completion as part of the ResultTask ?
Are they reliable ? If so, when ? Can I relay on ac
.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:647)
at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:647)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.
Putting your code in a file I find the following on line 17:
stepAcc = new StepAccumulator();
However I don't think that was where the NPE was thrown.
Another thing I don't understand was that there were two addAccumulator()
calls at the top of stack trace while in your code I don'
The code was written in 1.4 but I am compiling it and running it with 1.3.
import it.unimi.dsi.fastutil.objects.Object2ObjectOpenHashMap;
import org.apache.spark.AccumulableParam;
import scala.Tuple4;
import thomsonreuters.trailblazer.operation.DriverCalc;
import thomsonreuters.trailblazer.operati
Can you show related code in DriverAccumulator.java ?
Which Spark release do you use ?
Cheers
On Mon, Aug 3, 2015 at 3:13 PM, Anubhav Agarwal wrote:
> Hi,
> I am trying to modify my code to use HDFS and multiple nodes. The code
> works fine when I run it locally in a single machine with a sing
Hi,
I am trying to modify my code to use HDFS and multiple nodes. The code
works fine when I run it locally in a single machine with a single worker.
I have been trying to modify it and I get the following error. Any hint
would be helpful.
java.lang.NullPointerException
at
thomsonreuters.
ultTask.runTask(ResultTask.scala:61)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:64)
>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(
pache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> 2.For accumulator variable it says :
> 15/07/29 19:23:12 ER
update accumulators for
ResultTask(1, 16)
java.util.NoSuchElementException: key not found: 2
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.mutable.HashMap.apply(HashMap.scala:64
owever, very big
partitions.
Guillaume
Hey,
I am not 100% sure but from my understanding accumulators
are per partition (so per task as its the same) and are sent
back to the driver with the task result and merged. When a
task needs to be run
I stumbled upon zipWithUniqueId/zipWithIndex. Is this what you are looking
for?
https://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaRDDLike.html#zipWithUniqueId()
On 22 June 2015 at 06:16, Michal Čizmazia wrote:
> If I am not mistaken, one way to see the accumulat
If I am not mistaken, one way to see the accumulators is that they are just
write-only for the workers and their value can be read by the driver.
Therefore they cannot be used for ID generation as you wish.
On 22 June 2015 at 04:30, anshu shukla wrote:
> But i just want to update rdd ,
e 2015 at 21:32, Will Briggs wrote:
>
>> It sounds like accumulators are not necessary in Spark Streaming - see
>> this post (
>> http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html)
>> for more details.
>>
>&g
StreamingContext.sparkContext()
On 21 June 2015 at 21:32, Will Briggs wrote:
> It sounds like accumulators are not necessary in Spark Streaming - see
> this post (
> http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html)
> for more detai
It sounds like accumulators are not necessary in Spark Streaming - see this
post (
http://apache-spark-user-list.1001560.n3.nabble.com/Shared-variable-in-Spark-Streaming-td11762.html)
for more details.
On June 21, 2015, at 7:31 PM, anshu shukla wrote:
In spark Streaming ,Since we are already
In spark Streaming ,Since we are already having Streaming context , which
does not allows us to have accumulators .We have to get sparkContext for
initializing accumulator value .
But having 2 spark context will not serve the problem .
Please Help !!
--
Thanks & Regards,
Anshu Shukla
of a hack but should work.
>>
>> Note that if you loose an executor in between, then that doesn't work
>> anymore, probably you could detect it and recompute the sketches, but it
>> would become over complicated.
>>
>>
>>
>> 2015-06-18 14:27 GMT+02:0
.
>>
>> Coalescing is what we do now. It creates, however, very big partitions.
>>
>> Guillaume
>>
>> Hey,
>>
>> I am not 100% sure but from my understanding accumulators are per
>> partition (so per task as its the same) and are sent back to the driver
ation.
Coalescing is what we do now. It creates, however, very big
partitions.
Guillaume
Hey,
I am not 100% sure but from my understanding accumulators are per
partition (so per task as its the same) and are sent back to the
driver with the task result and merged. When
> Hey,
>
> I am not 100% sure but from my understanding accumulators are per
> partition (so per task as its the same) and are sent back to the driver
> with the task result and merged. When a task needs to be run n times
> (multiple rdds depend on this one, some partition loss lat
Hi,
Thank you for this confirmation.
Coalescing is what we do now. It creates, however, very big partitions.
Guillaume
Hey,
I am not 100% sure but from my understanding accumulators are per partition
(so per task as its the same) and are sent back to the driver with the task
result and
Hey,
I am not 100% sure but from my understanding accumulators are per partition
(so per task as its the same) and are sent back to the driver with the task
result and merged. When a task needs to be run n times (multiple rdds
depend on this one, some partition loss later in the chain etc) then
Hi,
I'm trying to figure out the smartest way to implement a global
count-min-sketch on accumulators. For now, we are doing that with RDDs.
It works well, but with one sketch per partition, merging takes too long.
As you probably know, a count-min sketch is a big mutable array of arra
You need to make sure to name the accumulator.
On Tue, May 26, 2015 at 2:23 PM, Snehal Nagmote
wrote:
> Hello all,
>
> I have accumulator in spark streaming application which counts number of
> events received from Kafka.
>
> From the documentation , It seems Spark UI has support to display it
Hello all,
I have accumulator in spark streaming application which counts number of
events received from Kafka.
>From the documentation , It seems Spark UI has support to display it .
But I am unable to see it on UI. I am using spark 1.3.1
Do I need to call any method (print) or am I missing
, the map() operation
> re-executed(map(x => accumulator += x) re-executed), then the final result
> of acculumator will be "2", twice as the correct result?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Questions
ew this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746p22747.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-un
e than once if the task restarte? And then the final result
> will be many times of the real result?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746.html
Given the lazy nature of an RDD if you use an accumulator inside a map()
and then you call count and saveAsTextfile over that accumulator will be
called twice. IMHO, accumulators are a bit nondeterministic you need to be
sure when to read them to avoid unexpected re-executions
El 3/5/2015 2:09 p
peration), this operation will be
execuated more than once if the task restarte? And then the final result
will be many times of the real result?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746.html
Sent from the Apache
Sanity-check: would it be possible that `threshold_var` be negative ?
—
FG
On Fri, Jan 30, 2015 at 5:06 PM, Peter Thai wrote:
> Hello,
> I am seeing negative values for accumulators. Here's my implementation in a
> standalone app in Spark 1.1.1rc:
> implicit object BigIn
Hello,
I am seeing negative values for accumulators. Here's my implementation in a
standalone app in Spark 1.1.1rc:
implicit object BigIntAccumulatorParam extends AccumulatorParam[BigInt] {
def addInPlace(t1: Int, t2: BigInt) = BigInt(t1) + t2
def addInPlace(t1: BigInt, t2: B
an error in my wording.
>
> Should be " I'm assuming it's not immediately aggregating on the driver
> each time I call the += on the Accumulator."
>
> On Wed, Jan 14, 2015 at 9:19 PM, Corey Nolet wrote:
>
>> What are the limitations of using Accumulators to
What are the limitations of using Accumulators to get a union of a bunch of
small sets?
Let's say I have an RDD[Map{String,Any} and i want to do:
rdd.map(accumulator += Set(_.get("entityType").get))
What implication does this have on performance? I'm assuming it's no
Just noticed an error in my wording.
Should be " I'm assuming it's not immediately aggregating on the driver
each time I call the += on the Accumulator."
On Wed, Jan 14, 2015 at 9:19 PM, Corey Nolet wrote:
> What are the limitations of using Accumulators to get a unio
Hi,
The Spark documentation states that "If accumulators are created with a
name, they will be displayed in Spark’s UI"
http://spark.apache.org/docs/latest/programming-guide.html#accumulators
Where exactly are they shown? I may be dense, but I can't find them on the
UI from http:/
Hi
Does spark have built in possiblity of exposing current value of
Accumulator [1] using Monitoring and Instrumentation [2].
Unfortunately I couldn't find anything in Sources which could be used.
Does it mean only way to expose current accumulator value is to implement
new Source which would hook
day, December 16, 2014 at 10:23 AM
To: "'user@spark.apache.org'"
mailto:user@spark.apache.org>>
Subject: Understanding disk usage with Accumulators
Hi all – I’m running a long running batch-processing job with Spark through
Yarn. I am doing the following
Batch Process
Hi all – I’m running a long running batch-processing job with Spark through
Yarn. I am doing the following
Batch Process
val resultsArr =
sc.accumulableCollection(mutable.ArrayBuffer[ListenableFuture[Result]]())
InMemoryArray.forEach{
1) Using a thread pool, generate callable jobs that operate
ram)
accu: org.apache.spark.Accumulator[scala.math.BigInt] = 0
scala> accu += 100
scala> accu.value
res1: scala.math.BigInt = 100
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Negative-Accumulators-tp19706p20199.html
Sent from the Apache Spark User List maili
1 - 100 of 124 matches
Mail list logo