This is great news thanks for the update! I will either wait for the 1.0 release or go and test it ahead of time from git rather than trying to pull it out of JobLogger or creating my own SparkListener.

On 04/02/2014 06:48 PM, Andrew Or wrote:
Hi Philip,

In the upcoming release of Spark 1.0 there will be a feature that provides for exactly what you describe: capturing the information displayed on the UI in JSON. More details will be provided in the documentation, but for now, anything before 0.9.1 can only go through JobLogger.scala, which outputs information in a somewhat arbitrary format and will be deprecated soon. If you find this feature useful, you can test it out by building the master branch of Spark yourself, following the instructions in https://github.com/apache/spark/pull/42.

Andrew


On Wed, Apr 2, 2014 at 3:39 PM, Philip Ogren <[email protected] <mailto:[email protected]>> wrote:

    What I'd like is a way to capture the information provided on the
    stages page (i.e. cluster:4040/stages via IndexPage).  Looking
    through the Spark code, it doesn't seem like it is possible to
    directly query for specific facts such as how many tasks have
    succeeded or how many total tasks there are for a given active
    stage.  Instead, it looks like all the data for the page is
    generated at once using information from the JobProgressListener.
    It doesn't seem like I have any way to programmatically access
    this information myself.  I can't even instantiate my own
    JobProgressListener because it is spark package private.  I could
    implement my SparkListener and gather up the information myself.
     It feels a bit awkward since classes like Task and TaskInfo are
    also spark package private.  It does seem possible to gather up
    what I need but it seems like this sort of information should just
    be available without by implementing a custom SparkListener (or
    worse screen scraping the html generated by StageTable!)

    I was hoping that I would find the answer in MetricsServlet which
    is turned on by default.  It seems that when I visit
    http://cluster:4040/metrics/json/ I should be able to get
    everything I want but I don't see the basic stage/task progress
    information I would expect.  Are there special metrics properties
    that I should set to get this info?  I think this would be the
    best solution - just give it the right URL and parse the resulting
    JSON - but I can't seem to figure out how to do this or if it is
    possible.

    Any advice is appreciated.

    Thanks,
    Philip



    On 04/01/2014 09:43 AM, Philip Ogren wrote:

        Hi DB,

        Just wondering if you ever got an answer to your question
        about monitoring progress - either offline or through your own
        investigation.  Any findings would be appreciated.

        Thanks,
        Philip

        On 01/30/2014 10:32 PM, DB Tsai wrote:

            Hi guys,

            When we're running a very long job, we would like to show
            users the current progress of map and reduce job. After
            looking at the api document, I don't find anything for
            this. However, in Spark UI, I could see the progress of
            the task. Is there anything I miss?

            Thanks.

            Sincerely,

            DB Tsai
            Machine Learning Engineer
            Alpine Data Labs
            --------------------------------------
            Web: http://alpinenow.com/





Reply via email to