Thanks Mark and Jeff
On Wed, Mar 16, 2016 at 7:11 AM, Mark Hamstra
wrote:
> Looks to me like the one remaining Stage would execute 19788 Task if all
> of those Tasks succeeded on the first try; but because of retries, 19841
> Tasks were actually executed. Meanwhile, there were 41405 Tasks in th
Looks to me like the one remaining Stage would execute 19788 Task if all of
those Tasks succeeded on the first try; but because of retries, 19841 Tasks
were actually executed. Meanwhile, there were 41405 Tasks in the the 163
Stages that were skipped.
I think -- but the Spark UI's accounting may n
Okay, so out of 164 stages, is 163 are skipped. And how 41405 tasks are
skipped if the total is only 19788.
On Wed, Mar 16, 2016 at 6:31 AM, Mark Hamstra
wrote:
> It's not just if the RDD is explicitly cached, but also if the map outputs
> for stages have been materialized into shuffle files and
It's not just if the RDD is explicitly cached, but also if the map outputs
for stages have been materialized into shuffle files and are still
accessible through the map output tracker. Because of that, explicitly
caching RDD actions often gains you little or nothing, since even without a
call to c
If RDD is cached, this RDD is only computed once and the stages for
computing this RDD in the following jobs are skipped.
On Wed, Mar 16, 2016 at 8:14 AM, Prabhu Joseph
wrote:
> Hi All,
>
>
> Spark UI Completed Jobs section shows below information, what is the
> skipped value shown for Stages a