Yes, correct.

However, note that when an accumulator operation is *idempotent*, meaning
that repeated application for the same data behaves exactly like one
application, then that accumulator can be safely called in transformation
steps (non-actions), too.

For example, max and min tracking. Just last week I wrote one that used a
hash map to track the latest timestamps seen for specific keys.

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Sun, May 3, 2015 at 8:07 AM, xiazhuchang <hk8...@163.com> wrote:

> “For accumulator updates performed inside actions only, Spark guarantees
> that
> each task’s update to the accumulator will only be applied once, i.e.
> restarted tasks will not update the value. In transformations, users should
> be aware of that each task’s update may be applied more than once if tasks
> or job stages are re-executed. ”
> Is this mean the guarantees(accumulator only be updated once) only in
> actions? That is to say, one should use the accumulator only in actions,
> orelse there may be some errors(update more than once) if used in
> transformations?
> e.g. map(x => accumulator += x)
> After executed, the correct result of accumulator should be "1";
> Unfortunately, some errors happened, restart task, the map() operation
> re-executed(map(x => accumulator += x)  re-executed), then the final result
> of acculumator will be "2", twice as the correct result?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746p22747.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to