Re: Spark UI Completed Jobs

2016-03-15 Thread Prabhu Joseph
Thanks Mark and Jeff On Wed, Mar 16, 2016 at 7:11 AM, Mark Hamstra wrote: > Looks to me like the one remaining Stage would execute 19788 Task if all > of those Tasks succeeded on the first try; but because of retries, 19841 > Tasks were actually executed. Meanwhile, there were 41405 Tasks in th

Re: Spark UI Completed Jobs

2016-03-15 Thread Mark Hamstra
Looks to me like the one remaining Stage would execute 19788 Task if all of those Tasks succeeded on the first try; but because of retries, 19841 Tasks were actually executed. Meanwhile, there were 41405 Tasks in the the 163 Stages that were skipped. I think -- but the Spark UI's accounting may n

Re: Spark UI Completed Jobs

2016-03-15 Thread Prabhu Joseph
Okay, so out of 164 stages, is 163 are skipped. And how 41405 tasks are skipped if the total is only 19788. On Wed, Mar 16, 2016 at 6:31 AM, Mark Hamstra wrote: > It's not just if the RDD is explicitly cached, but also if the map outputs > for stages have been materialized into shuffle files and

Re: Spark UI Completed Jobs

2016-03-15 Thread Mark Hamstra
It's not just if the RDD is explicitly cached, but also if the map outputs for stages have been materialized into shuffle files and are still accessible through the map output tracker. Because of that, explicitly caching RDD actions often gains you little or nothing, since even without a call to c

Re: Spark UI Completed Jobs

2016-03-15 Thread Jeff Zhang
If RDD is cached, this RDD is only computed once and the stages for computing this RDD in the following jobs are skipped. On Wed, Mar 16, 2016 at 8:14 AM, Prabhu Joseph wrote: > Hi All, > > > Spark UI Completed Jobs section shows below information, what is the > skipped value shown for Stages a