Hi Burak, It makes sense, it boils down to any actions happens after transformations then. Thanks for your answers.
Best, Wei 2015-06-24 15:06 GMT-07:00 Burak Yavuz <[email protected]>: > Hi Wei, > > During the action, all the transformations before it will occur in order > leading up to the action. If you have an accumulator in any of these > transformations, then you won't get exactly once semantics, because the > transformation may be restarted elsewhere. > > Bet, > Burak > > On Wed, Jun 24, 2015 at 2:25 PM, Wei Zhou <[email protected]> wrote: > >> Hi Burak, >> >> Thanks for your quick reply. I guess what confuses me is that accumulator >> won't be updated until an action is used due to the laziness, so >> transformation such as a map won't even update the accumulator, then how >> would restarted the transformation ended up updating accumulator more than >> once? >> >> Best, >> Wei >> >> 2015-06-24 13:23 GMT-07:00 Burak Yavuz <[email protected]>: >> >>> Hi Wei, >>> >>> For example, when a straggler executor gets killed in the middle of a >>> map operation and it's task is restarted at a different instance, the >>> accumulator will be updated more than once. >>> >>> Best, >>> Burak >>> >>> On Wed, Jun 24, 2015 at 1:08 PM, Wei Zhou <[email protected]> wrote: >>> >>>> Quoting from Spark Program guide: >>>> >>>> "For accumulator updates performed inside *actions only*, Spark >>>> guarantees that each task’s update to the accumulator will only be applied >>>> once, i.e. restarted tasks will not update the value. In transformations, >>>> users should be aware of that each task’s update may be applied more than >>>> once if tasks or job stages are re-executed." >>>> >>>> Can anyone gives me a possible scenario of when accumulator might be >>>> updated more than once during transformation? Thanks. >>>> >>>> Regards, >>>> Wei >>>> >>> >>> >> >
