Patrick Thanks for responding. Yes. many of are features requests not private client related. These are the things I have been working with since last year. I have trying to push the PR for these changes. If the new Launcher lib is the way to go , we will try to work with new APIs.
Thanks Chester Sent from my iPhone > On May 13, 2015, at 7:22 PM, Patrick Wendell <pwend...@gmail.com> wrote: > > Hey Chester, > > Thanks for sending this. It's very helpful to have this list. > > The reason we made the Client API private was that it was never > intended to be used by third parties programmatically and we don't > intend to support it in its current form as a stable API. We thought > the fact that it was for internal use would be obvious since it > accepts arguments as a string array of CL args. It was always intended > for command line use and the stable API was the command line. > > When we migrated the Launcher library we figured we covered most of > the use cases in the off chance someone was using the Client. It > appears we regressed one feature which was a clean way to get the app > ID. > > The items you list here 2-6 all seem like new feature requests rather > than a regression caused by us making that API private. > > I think the way to move forward is for someone to design a proper > long-term stable API for the things you mentioned here. That could > either be by extension of the Launcher library. Marcelo would be > natural to help with this effort since he was heavily involved in both > YARN support and the launcher. So I'm curious to hear his opinion on > how best to move forward. > > I do see how apps that run Spark would benefit of having a control > plane for querying status, both on YARN and elsewhere. > > - Patrick > >> On Wed, May 13, 2015 at 5:44 AM, Chester At Work <ches...@alpinenow.com> >> wrote: >> Patrick >> There are several things we need, some of them already mentioned in the >> mailing list before. >> >> I haven't looked at the SparkLauncher code, but here are few things we need >> from our perspectives for Spark Yarn Client >> >> 1) client should not be private ( unless alternative is provided) so we >> can call it directly. >> 2) we need a way to stop the running yarn app programmatically ( the PR >> is already submitted) >> 3) before we start the spark job, we should have a call back to the >> application, which will provide the yarn container capacity (number of cores >> and max memory ), so spark program will not set values beyond max values (PR >> submitted) >> 4) call back could be in form of yarn app listeners, which call back >> based on yarn status changes ( start, in progress, failure, complete etc), >> application can react based on these events in PR) >> >> 5) yarn client passing arguments to spark program in the form of main >> program, we had experience problems when we pass a very large argument due >> the length limit. For example, we use json to serialize the argument and >> encoded, then parse them as argument. For wide columns datasets, we will run >> into limit. Therefore, an alternative way of passing additional larger >> argument is needed. We are experimenting with passing the args via a >> established akka messaging channel. >> >> 6) spark yarn client in yarn-cluster mode right now is essentially a >> batch job with no communication once it launched. Need to establish the >> communication channel so that logs, errors, status updates, progress bars, >> execution stages etc can be displayed on the application side. We added an >> akka communication channel for this (working on PR ). >> >> Combined with others items in this list, we are able to redirect print >> and error statement to application log (outside of the hadoop cluster), so >> spark UI equivalent progress bar via spark listener. We can show yarn >> progress via yarn app listener before spark started; and status can be >> updated during job execution. >> >> We are also experimenting with long running job with additional spark >> commands and interactions via this channel. >> >> >> Chester >> >> >> >> >> >> >> >> >> >> Sent from my iPad >> >>> On May 12, 2015, at 20:54, Patrick Wendell <pwend...@gmail.com> wrote: >>> >>> Hey Kevin and Ron, >>> >>> So is the main shortcoming of the launcher library the inability to >>> get an app ID back from YARN? Or are there other issues here that >>> fundamentally regress things for you. >>> >>> It seems like adding a way to get back the appID would be a reasonable >>> addition to the launcher. >>> >>> - Patrick >>> >>>> On Tue, May 12, 2015 at 12:51 PM, Marcelo Vanzin <van...@cloudera.com> >>>> wrote: >>>> On Tue, May 12, 2015 at 11:34 AM, Kevin Markey <kevin.mar...@oracle.com> >>>> wrote: >>>> >>>>> I understand that SparkLauncher was supposed to address these issues, but >>>>> it really doesn't. Yarn already provides indirection and an arm's length >>>>> transaction for starting Spark on a cluster. The launcher introduces yet >>>>> another layer of indirection and dissociates the Yarn Client from the >>>>> application that launches it. >>>> >>>> Well, not fully. The launcher was supposed to solve "how to launch a Spark >>>> app programatically", but in the first version nothing was added to >>>> actually gather information about the running app. It's also limited in the >>>> way it works because of Spark's limitations (one context per JVM, etc). >>>> >>>> Still, adding things like this is something that is definitely in the scope >>>> for the launcher library; information such as app id can be useful for the >>>> code launching the app, not just in yarn mode. We just have to find a clean >>>> way to provide that information to the caller. >>>> >>>> >>>>> I am still reading the newest code, and we are still researching options >>>>> to move forward. If there are alternatives, we'd like to know. >>>> Super hacky, but if you launch Spark as a child process you could parse the >>>> stderr and get the app ID. >>>> >>>> -- >>>> Marcelo >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>> For additional commands, e-mail: dev-h...@spark.apache.org >>> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org