As we are revamping our config loading logic in SEP-23: Simplify Job Runner, we will introduce backward incompatible changes on the job submission logic, i.e. deprecating the usage of --config-factory and --config-path, which reads a full config upon job submission, instead, we will only provide job submission related config through --config and ConfigLoader will load the complete config on AM instead.
Please take a look and chime in your thoughts. Best, Ke > On Dec 2, 2019, at 4:42 PM, Ke Wu <ke.wu...@gmail.com> wrote: > > Hi Xinyu, > > Please see the response in line: > >> 1. After this change, seems the original config-factory and config-path >> are only used to supply parameters for submitting job. Is that the case? >> Which configs are still needed in the submission? > > Yes, only configs related to job submission is needed. > > job.name, job.factory.class & yarn.package.path are the minimum three configs > needed for the job submission, which may be supplied by --config instead. > >> 2. For backward compatibility, does it still work if the user doesn't >> specify the new ConfigLoader in the command line? The >> PropertiesConfigLoader class seems requiring the path of the config after >> exploding the tgz. > > If the user does not specify config loader in the config, then it will work > in the previous flow, where runner publishes configs in coordinator stream > and job coordinator/application master will pick it up by reading from Kafka. > So this is a backward compatible change. > >> 3. If the final plan is to remove the original config factory/path, how >> do we pass the parameters needed for Yarn submission, e.g. job name, id, >> and tgz path? > > We can either pass them by --config or introduce delicate command line > arguments for it in CommandLine.scala. > > > Let me know if you have any further questions. > > Best, > Ke > >> On Nov 27, 2019, at 11:02 AM, Xinyu Liu <xinyuliu...@gmail.com> wrote: >> >> Thanks a lot for putting out the design for simplifying the job submission >> process. The motivation makes sense to me that most of the planning and >> config generation should be done after submitting to the cluster, instead >> of during the submission, which can happen in a local sandbox without the >> access to the resources needed for planning. It also improves the process >> from the security stand of the view. >> >> A few questions regarding to the interface changes: >> >> 1. After this change, seems the original config-factory and config-path >> are only used to supply parameters for submitting job. Is that the case? >> Which configs are still needed in the submission? >> 2. For backward compatibility, does it still work if the user doesn't >> specify the new ConfigLoader in the command line? The >> PropertiesConfigLoader class seems requiring the path of the config after >> exploding the tgz. >> 3. If the final plan is to remove the original config factory/path, how >> do we pass the parameters needed for Yarn submission, e.g. job name, id, >> and tgz path? >> >> Thanks, >> Xinyu >> >> On Fri, Nov 15, 2019 at 3:00 PM Ke Wu <ke.wu...@gmail.com> wrote: >> >>> We created SEP-23: Simplify Job Runner, which simplifies job runner by >>> moving config retrieval and planning to AM. >>> >>> Please find out the SEP wiki below: >>> >>> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner >>> >>> Please take a look and chime in your thoughts. >>> >>> Thanks, >>> Ke >>> >