So, assuming we go forward with this, the followup question is whether or not to move "main_class" and "java_opts" for Java actions into "edp.java.main_class" and "edp.java.java_opts" configs.
I think yes. Best, Trevor On Wed, 2014-01-29 at 09:15 -0500, Trevor McKay wrote: > On Wed, 2014-01-29 at 14:35 +0400, Alexander Ignatov wrote: > > Thank you for bringing this up, Trevor. > > > > EDP gets more diverse and it's time to change its model. > > I totally agree with your proposal, but one minor comment. > > Instead of "savanna." prefix in job_configs wouldn't it be better to make it > > as "edp."? I think "savanna." is too more wide word for this. > > +1, brilliant. EDP is perfect. I was worried about the scope of > "savanna." too. > > > And one more bureaucratic thing... I see you already started implementing > > it [1], > > and it is named and goes as new EDP workflow [2]. I think new bluprint > > should be > > created for this feature to track all code changes as well as docs updates. > > Docs I mean public Savanna docs about EDP, rest api docs and samples. > > Absolutely, I can make it new blueprint. Thanks. > > > [1] https://review.openstack.org/#/c/69712 > > [2] > > https://blueprints.launchpad.net/openstack/?searchtext=edp-oozie-streaming-mapreduce > > > > Regards, > > Alexander Ignatov > > > > > > > > On 28 Jan 2014, at 20:47, Trevor McKay <tmc...@redhat.com> wrote: > > > > > Hello all, > > > > > > In our first pass at EDP, the model for job settings was very consistent > > > across all of our job types. The execution-time settings fit into this > > > (superset) structure: > > > > > > job_configs = {'configs': {}, # config settings for oozie and hadoop > > > 'params': {}, # substitution values for Pig/Hive > > > 'args': []} # script args (Pig and Java actions) > > > > > > But we have some things that don't fit (and probably more in the > > > future): > > > > > > 1) Java jobs have 'main_class' and 'java_opts' settings > > > Currently these are handled as additional fields added to the > > > structure above. These were the first to diverge. > > > > > > 2) Streaming MapReduce (anticipated) requires mapper and reducer > > > settings (different than the mapred.xxxx.class settings for > > > non-streaming MapReduce) > > > > > > Problems caused by adding fields > > > -------------------------------- > > > The job_configs structure above is stored in the database. Each time we > > > add a field to the structure above at the level of configs, params, and > > > args, we force a change to the database tables, a migration script and a > > > change to the JSON validation for the REST api. > > > > > > We also cause a change for python-savannaclient and potentially other > > > clients. > > > > > > This kind of change seems bad. > > > > > > Proposal: Borrow a page from Oozie and add "savanna." configs > > > ------------------------------------------------------------- > > > I would like to fit divergent job settings into the structure we already > > > have. One way to do this is to leverage the 'configs' dictionary. This > > > dictionary primarily contains settings for hadoop, but there are a > > > number of "oozie.xxx" settings that are passed to oozie as configs or > > > set by oozie for the benefit of running apps. > > > > > > What if we allow "savanna." settings to be added to configs? If we do > > > that, any and all special configuration settings for specific job types > > > or subtypes can be handled with no database changes and no api changes. > > > > > > Downside > > > -------- > > > Currently, all 'configs' are rendered in the generated oozie workflow. > > > The "savanna." settings would be stripped out and processed by Savanna, > > > thereby changing that behavior a bit (maybe not a big deal) > > > > > > We would also be mixing "savanna." configs with config_hints for jobs, > > > so users would potentially see "savanna.xxxx" settings mixed with oozie > > > and hadoop settings. Again, maybe not a big deal, but it might blur the > > > lines a little bit. Personally, I'm okay with this. > > > > > > Slightly different > > > ------------------ > > > We could also add a "'savanna-configs': {}" element to job_configs to > > > keep the configuration spaces separate. > > > > > > But, now we would have 'savanna-configs' (or another name), 'configs', > > > 'params', and 'args'. Really? Just how many different types of values > > > can we come up with? :) > > > > > > I lean away from this approach. > > > > > > Related: breaking up the superset > > > --------------------------------- > > > > > > It is also the case that not every job type has every value type. > > > > > > Configs Params Args > > > Hive Y Y N > > > Pig Y Y Y > > > MapReduce Y N N > > > Java Y N Y > > > > > > So do we make that explicit in the docs and enforce it in the api with > > > errors? > > > > > > Thoughts? I'm sure there are some :) > > > > > > Best, > > > > > > Trevor > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > OpenStack-dev mailing list > > > OpenStack-dev@lists.openstack.org > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > _______________________________________________ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev