Re: [DISCUSS] SEP-23: Simplify Job Runner

Ke Wu Fri, 06 Dec 2019 13:52:43 -0800

As we are revamping our config loading logic in SEP-23: Simplify Job Runner, we 
will introduce backward incompatible changes on the job submission logic, i.e. 
deprecating the usage of --config-factory and --config-path, which reads a full 
config upon job submission, instead, we will only provide job submission 
related config through --config and ConfigLoader will load the complete config 
on AM instead.


Please take a look and chime in your thoughts.

Best,
Ke

> On Dec 2, 2019, at 4:42 PM, Ke Wu <[email protected]> wrote:
> 
> Hi Xinyu,
> 
> Please see the response in line:
> 
>>  1. After this change, seems the original config-factory and config-path
>>  are only used to supply parameters for submitting job. Is that the case?
>>  Which configs are still needed in the submission?
> 
> Yes, only configs related to job submission is needed.
> 
> job.name, job.factory.class & yarn.package.path are the minimum three configs 
> needed for the job submission, which may be supplied by --config instead. 
> 
>>  2. For backward compatibility, does it still work if the user doesn't
>>  specify the new ConfigLoader in the command line? The
>>  PropertiesConfigLoader class seems requiring the path of the config after
>>  exploding the tgz.
> 
> If the user does not specify config loader in the config, then it will work 
> in the previous flow, where runner publishes configs in coordinator stream 
> and job coordinator/application master will pick it up by reading from Kafka. 
> So this is a backward compatible change.
> 
>>  3. If the final plan is to remove the original config factory/path, how
>>  do we pass the parameters needed for Yarn submission, e.g. job name, id,
>>  and tgz path?
> 
> We can either pass them by --config or introduce delicate command line 
> arguments for it in CommandLine.scala.
> 
> 
> Let me know if you have any further questions.
> 
> Best,
> Ke
> 
>> On Nov 27, 2019, at 11:02 AM, Xinyu Liu <[email protected]> wrote:
>> 
>> Thanks a lot for putting out the design for simplifying the job submission
>> process. The motivation makes sense to me that most of the planning and
>> config generation should be done after submitting to the cluster, instead
>> of during the submission, which can happen in a local sandbox without the
>> access to the resources needed for planning. It also improves the process
>> from the security stand of the view.
>> 
>> A few questions regarding to the interface changes:
>> 
>>  1. After this change, seems the original config-factory and config-path
>>  are only used to supply parameters for submitting job. Is that the case?
>>  Which configs are still needed in the submission?
>>  2. For backward compatibility, does it still work if the user doesn't
>>  specify the new ConfigLoader in the command line? The
>>  PropertiesConfigLoader class seems requiring the path of the config after
>>  exploding the tgz.
>>  3. If the final plan is to remove the original config factory/path, how
>>  do we pass the parameters needed for Yarn submission, e.g. job name, id,
>>  and tgz path?
>> 
>> Thanks,
>> Xinyu
>> 
>> On Fri, Nov 15, 2019 at 3:00 PM Ke Wu <[email protected]> wrote:
>> 
>>> We created SEP-23: Simplify Job Runner, which simplifies job runner by
>>> moving config retrieval and planning to AM.
>>> 
>>> Please find out the SEP wiki below:
>>> 
>>> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner
>>> 
>>> Please take a look and chime in your thoughts.
>>> 
>>> Thanks,
>>> Ke
>>> 
>

Re: [DISCUSS] SEP-23: Simplify Job Runner

Reply via email to