Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/42#discussion_r10363343
--- Diff:
core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala ---
@@ -30,16 +32,23 @@ import org.apache.spark.scheduler._
* class, since the UI thread and the DAGScheduler event loop may otherwise
* be reading/updating the internal data structures concurrently.
*/
-private[spark] class JobProgressListener(val sc: SparkContext) extends
SparkListener {
+private[ui] class JobProgressListener(sc: SparkContext, live: Boolean)
+ extends StorageStatusSparkListener {
--- End diff --
This seems a little confusing to me...it looks like this extends
StorageStatusSparkListener because it needs access to the list of executors and
their addresses through the storageStatusList member variable. But for
JobProgressListener, storageStatusList isn't really kept up to date correctly
-- because onTaskEnd gets overridden by JobProgressListener, so the RDD info
won't get correctly updated when the task ends.
I'd propose simplifying this by having JobProgressListener just extend the
usual SparkListener. Then, you need a way of getting the executors for the
ExecutorTable on the stage status page. But you can get the list of executor
Ids / hostnames from the list of tasks (which works because you only care about
the executors that are being used by one of the tasks anyway).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---