The console progress bars are implemented on top of a new stable "status API" that was added in Spark 1.2. It's possible to query job progress using this interface (in older versions of Spark, you could implement a custom SparkListener and maintain the counts of completed / running / failed tasks / stages yourself).
There are actually several subtleties involved in implementing "job-level" progress bars which behave in an intuitive way; there's a pretty extensive discussion of the challenges at https://github.com/apache/spark/pull/3009. Also, check out the pull request for the console progress bars for an interesting design discussion around how they handle parallel stages: https://github.com/apache/spark/pull/3029. I'm not sure about the plumbing that would be necessary to display live progress updates in the IPython notebook UI, though. The general pattern would probably involve a mapping to relate notebook cells to Spark jobs (you can do this with job groups, I think), plus some periodic timer that polls the driver for the status of the current job in order to update the progress bar. For Spark 1.3, I'm working on designing a REST interface to accesses this type of job / stage / task progress information, as well as expanding the types of information exposed through the stable status API interface. - Josh On Thu, Dec 25, 2014 at 10:01 AM, Eric Friedman <eric.d.fried...@gmail.com> wrote: > Spark 1.2.0 is SO much more usable than previous releases -- many thanks > to the team for this release. > > A question about progress of actions. I can see how things are > progressing using the Spark UI. I can also see the nice ASCII art > animation on the spark driver console. > > Has anyone come up with a way to accomplish something similar in an > iPython notebook using pyspark? > > Thanks > Eric >