Note that this problem is probably NOT caused directly by GraphX, but
GraphX reveals it because as you go further down the iterations, you get
further and further away of a shuffle you can rely on.

On Thu, Jun 25, 2015 at 7:43 PM, Thomas Gerber <[email protected]>
wrote:

> Hello,
>
> We run GraphX ConnectedComponents, and we notice that there is a time gap
> that becomes larger and larger during Jobs, that is not accounted for.
>
> In the screenshot attached, you will notice that each job only takes
> around 2 1/2min. At first, the next job/iteration starts immediately after
> the previous one. But as we go through iterations, there is a gap (time
> where job N+1 starts - time where job N finishes) that grows, reaching
> ultimately 6 minutes around the 30th iteration .
>
> I suspect it has to do with DAG computation on the driver, as evidenced by
> the very large (and getting larger at every iteration) of pending stages
> that are ultimately skipped.
>
> So,
> 1. is there anything obvious we can do to make that "gap" between
> iterations shorter?
> 2. would dividing the number of partitions in the input RDD per 2 divide
> the gap by 2 as well?
>
> I ask because 3 min gap on average for a job length of 2 1/2 min => we are
> "wasting" 50% of CPU time on the Executors.
>
> Thanks!
> Thomas
>

Reply via email to