Yes, correct. However, note that when an accumulator operation is *idempotent*, meaning that repeated application for the same data behaves exactly like one application, then that accumulator can be safely called in transformation steps (non-actions), too.
For example, max and min tracking. Just last week I wrote one that used a hash map to track the latest timestamps seen for specific keys. dean Dean Wampler, Ph.D. Author: Programming Scala, 2nd Edition <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) Typesafe <http://typesafe.com> @deanwampler <http://twitter.com/deanwampler> http://polyglotprogramming.com On Sun, May 3, 2015 at 8:07 AM, xiazhuchang <hk8...@163.com> wrote: > “For accumulator updates performed inside actions only, Spark guarantees > that > each task’s update to the accumulator will only be applied once, i.e. > restarted tasks will not update the value. In transformations, users should > be aware of that each task’s update may be applied more than once if tasks > or job stages are re-executed. ” > Is this mean the guarantees(accumulator only be updated once) only in > actions? That is to say, one should use the accumulator only in actions, > orelse there may be some errors(update more than once) if used in > transformations? > e.g. map(x => accumulator += x) > After executed, the correct result of accumulator should be "1"; > Unfortunately, some errors happened, restart task, the map() operation > re-executed(map(x => accumulator += x) re-executed), then the final result > of acculumator will be "2", twice as the correct result? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Questions-about-Accumulators-tp22746p22747.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >