Re: Change for submitting to yarn in 1.3.1

Chester @work Wed, 13 May 2015 20:20:07 -0700

Patrick
    Thanks for responding. Yes. many of are features requests not private 
client related. These are the things I have been working with since last year. 
    I have trying to push the PR for these changes. If the new Launcher lib is 
the way to go , we will try to work with new APIs.


  Thanks
Chester

Sent from my iPhone

> On May 13, 2015, at 7:22 PM, Patrick Wendell <pwend...@gmail.com> wrote:
> 
> Hey Chester,
> 
> Thanks for sending this. It's very helpful to have this list.
> 
> The reason we made the Client API private was that it was never
> intended to be used by third parties programmatically and we don't
> intend to support it in its current form as a stable API. We thought
> the fact that it was for internal use would be obvious since it
> accepts arguments as a string array of CL args. It was always intended
> for command line use and the stable API was the command line.
> 
> When we migrated the Launcher library we figured we covered most of
> the use cases in the off chance someone was using the Client. It
> appears we regressed one feature which was a clean way to get the app
> ID.
> 
> The items you list here 2-6 all seem like new feature requests rather
> than a regression caused by us making that API private.
> 
> I think the way to move forward is for someone to design a proper
> long-term stable API for the things you mentioned here. That could
> either be by extension of the Launcher library. Marcelo would be
> natural to help with this effort since he was heavily involved in both
> YARN support and the launcher. So I'm curious to hear his opinion on
> how best to move forward.
> 
> I do see how apps that run Spark would benefit of having a control
> plane for querying status, both on YARN and elsewhere.
> 
> - Patrick
> 
>> On Wed, May 13, 2015 at 5:44 AM, Chester At Work <ches...@alpinenow.com> 
>> wrote:
>> Patrick
>>     There are several things we need, some of them already mentioned in the 
>> mailing list before.
>> 
>> I haven't looked at the SparkLauncher code, but here are few things we need 
>> from our perspectives for Spark Yarn Client
>> 
>>     1) client should not be private ( unless alternative is provided) so we 
>> can call it directly.
>>     2) we need a way to stop the running yarn app programmatically ( the PR 
>> is already submitted)
>>     3) before we start the spark job, we should have a call back to the 
>> application, which will provide the yarn container capacity (number of cores 
>> and max memory ), so spark program will not set values beyond max values (PR 
>> submitted)
>>     4) call back could be in form of yarn app listeners, which call back 
>> based on yarn status changes ( start, in progress, failure, complete etc), 
>> application can react based on these events in PR)
>> 
>>     5) yarn client passing arguments to spark program in the form of main 
>> program, we had experience problems when we pass a very large argument due 
>> the length limit. For example, we use json to serialize the argument and 
>> encoded, then parse them as argument. For wide columns datasets, we will run 
>> into limit. Therefore, an alternative way of passing additional larger 
>> argument is needed. We are experimenting with passing the args via a 
>> established akka messaging channel.
>> 
>>    6) spark yarn client in yarn-cluster mode right now is essentially a 
>> batch job with no communication once it launched. Need to establish the 
>> communication channel so that logs, errors, status updates, progress bars, 
>> execution stages etc can be displayed on the application side. We added an 
>> akka communication channel for this (working on PR ).
>> 
>>       Combined with others items in this list, we are able to redirect print 
>> and error statement to application log (outside of the hadoop cluster), so 
>> spark UI equivalent progress bar via spark listener. We can show yarn 
>> progress via yarn app listener before spark started; and status can be 
>> updated during job execution.
>> 
>>    We are also experimenting with long running job with additional spark 
>> commands and interactions via this channel.
>> 
>> 
>>     Chester
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Sent from my iPad
>> 
>>> On May 12, 2015, at 20:54, Patrick Wendell <pwend...@gmail.com> wrote:
>>> 
>>> Hey Kevin and Ron,
>>> 
>>> So is the main shortcoming of the launcher library the inability to
>>> get an app ID back from YARN? Or are there other issues here that
>>> fundamentally regress things for you.
>>> 
>>> It seems like adding a way to get back the appID would be a reasonable
>>> addition to the launcher.
>>> 
>>> - Patrick
>>> 
>>>> On Tue, May 12, 2015 at 12:51 PM, Marcelo Vanzin <van...@cloudera.com> 
>>>> wrote:
>>>> On Tue, May 12, 2015 at 11:34 AM, Kevin Markey <kevin.mar...@oracle.com>
>>>> wrote:
>>>> 
>>>>> I understand that SparkLauncher was supposed to address these issues, but
>>>>> it really doesn't.  Yarn already provides indirection and an arm's length
>>>>> transaction for starting Spark on a cluster. The launcher introduces yet
>>>>> another layer of indirection and dissociates the Yarn Client from the
>>>>> application that launches it.
>>>> 
>>>> Well, not fully. The launcher was supposed to solve "how to launch a Spark
>>>> app programatically", but in the first version nothing was added to
>>>> actually gather information about the running app. It's also limited in the
>>>> way it works because of Spark's limitations (one context per JVM, etc).
>>>> 
>>>> Still, adding things like this is something that is definitely in the scope
>>>> for the launcher library; information such as app id can be useful for the
>>>> code launching the app, not just in yarn mode. We just have to find a clean
>>>> way to provide that information to the caller.
>>>> 
>>>> 
>>>>> I am still reading the newest code, and we are still researching options
>>>>> to move forward.  If there are alternatives, we'd like to know.
>>>> Super hacky, but if you launch Spark as a child process you could parse the
>>>> stderr and get the app ID.
>>>> 
>>>> --
>>>> Marcelo
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: Change for submitting to yarn in 1.3.1

Reply via email to