This is great news thanks for the update! I will either wait for the
1.0 release or go and test it ahead of time from git rather than trying
to pull it out of JobLogger or creating my own SparkListener.
On 04/02/2014 06:48 PM, Andrew Or wrote:
Hi Philip,
In the upcoming release of Spark 1.0 there will be a feature that
provides for exactly what you describe: capturing the information
displayed on the UI in JSON. More details will be provided in the
documentation, but for now, anything before 0.9.1 can only go through
JobLogger.scala, which outputs information in a somewhat arbitrary
format and will be deprecated soon. If you find this feature useful,
you can test it out by building the master branch of Spark yourself,
following the instructions in https://github.com/apache/spark/pull/42.
Andrew
On Wed, Apr 2, 2014 at 3:39 PM, Philip Ogren <[email protected]
<mailto:[email protected]>> wrote:
What I'd like is a way to capture the information provided on the
stages page (i.e. cluster:4040/stages via IndexPage). Looking
through the Spark code, it doesn't seem like it is possible to
directly query for specific facts such as how many tasks have
succeeded or how many total tasks there are for a given active
stage. Instead, it looks like all the data for the page is
generated at once using information from the JobProgressListener.
It doesn't seem like I have any way to programmatically access
this information myself. I can't even instantiate my own
JobProgressListener because it is spark package private. I could
implement my SparkListener and gather up the information myself.
It feels a bit awkward since classes like Task and TaskInfo are
also spark package private. It does seem possible to gather up
what I need but it seems like this sort of information should just
be available without by implementing a custom SparkListener (or
worse screen scraping the html generated by StageTable!)
I was hoping that I would find the answer in MetricsServlet which
is turned on by default. It seems that when I visit
http://cluster:4040/metrics/json/ I should be able to get
everything I want but I don't see the basic stage/task progress
information I would expect. Are there special metrics properties
that I should set to get this info? I think this would be the
best solution - just give it the right URL and parse the resulting
JSON - but I can't seem to figure out how to do this or if it is
possible.
Any advice is appreciated.
Thanks,
Philip
On 04/01/2014 09:43 AM, Philip Ogren wrote:
Hi DB,
Just wondering if you ever got an answer to your question
about monitoring progress - either offline or through your own
investigation. Any findings would be appreciated.
Thanks,
Philip
On 01/30/2014 10:32 PM, DB Tsai wrote:
Hi guys,
When we're running a very long job, we would like to show
users the current progress of map and reduce job. After
looking at the api document, I don't find anything for
this. However, in Spark UI, I could see the progress of
the task. Is there anything I miss?
Thanks.
Sincerely,
DB Tsai
Machine Learning Engineer
Alpine Data Labs
--------------------------------------
Web: http://alpinenow.com/