I proposed a fix https://github.com/apache/spark/pull/2524
Glad to receive feedbacks
--
Nan Zhu
On Tuesday, September 23, 2014 at 9:06 PM, Sandy Ryza wrote:
> Filed https://issues.apache.org/jira/browse/SPARK-3642 for documenting these
> nuances.
>
> -Sandy
>
> On Mon, Sep 22, 2014
Filed https://issues.apache.org/jira/browse/SPARK-3642 for documenting
these nuances.
-Sandy
On Mon, Sep 22, 2014 at 10:36 AM, Nan Zhu wrote:
> I see, thanks for pointing this out
>
>
> --
> Nan Zhu
>
> On Monday, September 22, 2014 at 12:08 PM, Sandy Ryza wrote:
>
> MapReduce counters do not
I see, thanks for pointing this out
--
Nan Zhu
On Monday, September 22, 2014 at 12:08 PM, Sandy Ryza wrote:
> MapReduce counters do not count duplications. In MapReduce, if a task needs
> to be re-run, the value of the counter from the second task overwrites the
> value from the first t
MapReduce counters do not count duplications. In MapReduce, if a task
needs to be re-run, the value of the counter from the second task
overwrites the value from the first task.
-Sandy
On Mon, Sep 22, 2014 at 4:55 AM, Nan Zhu wrote:
> If you think it as necessary to fix, I would like to resub
If you think it as necessary to fix, I would like to resubmit that PR (seems to
have some conflicts with the current DAGScheduler)
My suggestion is to make it as an option in accumulator, e.g. some algorithms
utilizing accumulator for result calculation, it needs a deterministic
accumulator,
Hmm, good point, this seems to have been broken by refactorings of the
scheduler, but it worked in the past. Basically the solution is simple -- in a
result stage, we should not apply the update for each task ID more than once --
the same way we don't call job.listener.taskSucceeded more than on
Hi, Matei,
Can you give some hint on how the current implementation guarantee the
accumulator is only applied for once?
There is a pending PR trying to achieving this
(https://github.com/apache/spark/pull/228/files), but from the current
implementation, I didn’t see this has been done? (may
Hey Sandy,
On September 20, 2014 at 8:50:54 AM, Sandy Ryza (sandy.r...@cloudera.com) wrote:
Hey All,
A couple questions came up about shared variables recently, and I wanted to
confirm my understanding and update the doc to be a little more clear.
*Broadcast variables*
Now that tasks data i
Hey All,
A couple questions came up about shared variables recently, and I wanted to
confirm my understanding and update the doc to be a little more clear.
*Broadcast variables*
Now that tasks data is automatically broadcast, the only occasions where it
makes sense to explicitly broadcast are:
*