The console progress bars are implemented on top of a new stable "status
API" that was added in Spark 1.2.  It's possible to query job progress
using this interface (in older versions of Spark, you could implement a
custom SparkListener and maintain the counts of completed / running /
failed tasks / stages yourself).

There are actually several subtleties involved in implementing "job-level"
progress bars which behave in an intuitive way; there's a pretty extensive
discussion of the challenges at https://github.com/apache/spark/pull/3009.
Also, check out the pull request for the console progress bars for an
interesting design discussion around how they handle parallel stages:
https://github.com/apache/spark/pull/3029.

I'm not sure about the plumbing that would be necessary to display live
progress updates in the IPython notebook UI, though.  The general pattern
would probably involve a mapping to relate notebook cells to Spark jobs
(you can do this with job groups, I think), plus some periodic timer that
polls the driver for the status of the current job in order to update the
progress bar.

For Spark 1.3, I'm working on designing a REST interface to accesses this
type of job / stage / task progress information, as well as expanding the
types of information exposed through the stable status API interface.

- Josh

On Thu, Dec 25, 2014 at 10:01 AM, Eric Friedman <eric.d.fried...@gmail.com>
wrote:

> Spark 1.2.0 is SO much more usable than previous releases -- many thanks
> to the team for this release.
>
> A question about progress of actions.  I can see how things are
> progressing using the Spark UI.  I can also see the nice ASCII art
> animation on the spark driver console.
>
> Has anyone come up with a way to accomplish something similar in an
> iPython notebook using pyspark?
>
> Thanks
> Eric
>

Reply via email to