On Thu, 2014-07-17 at 15:54 +0100, Michael Kerrin wrote: > On Thursday 26 June 2014 12:20:30 Clint Byrum wrote: > > > Excerpts from Macdonald-Wallace, Matthew's message of 2014-06-26 > 04:13:31 -0700: > > > > Hi all, > > > > > > > > I've been working more and more with TripleO recently and whilst > it does > > > > seem to solve a number of problems well, I have found a couple of > > > > idiosyncrasies that I feel would be easy to address. > > > > > > > > My primary concern lies in the fact that os-refresh-config does > not run on > > > > every boot/reboot of a system. Surely a reboot *is* a > configuration > > > > change and therefore we should ensure that the box has come up in > the > > > > expected state with the correct config? > > > > > > > > This is easily fixed through the addition of an "@reboot" entry in > > > > /etc/crontab to run o-r-c or (less easily) by re-designing o-r-c > to run > > > > as a service. > > > > > > > > My secondary concern is that through not running os-refresh-config > on a > > > > regular basis by default (i.e. every 15 minutes or something in > the same > > > > style as chef/cfengine/puppet), we leave ourselves exposed to > someone > > > > trying to make a "quick fix" to a production node and taking that > node > > > > offline the next time it reboots because the config was still left > as > > > > broken owing to a lack of updates to HEAT (I'm thinking a "quick > change" > > > > to allow root access via SSH during a major incident that is then > left > > > > unchanged for months because no-one updated HEAT). > > > > > > > > There are a number of options to fix this including Modifying > > > > os-collect-config to auto-run os-refresh-config on a regular basis > or > > > > setting os-refresh-config to be its own service running via > upstart or > > > > similar that triggers every 15 minutes > > > > > > > > I'm sure there are other solutions to these problems, however I > know from > > > > experience that claiming this is solved through "education of > users" or > > > > (more severely!) via HR is not a sensible approach to take as by > the time > > > > you realise that your configuration has been changed for the last > 24 > > > > hours it's often too late! > > > So I see two problems highlighted above. > > > > > > 1) We don't re-assert ephemeral state set by o-r-c scripts. You're > right, > > > and we've been talking about it for a while. The right thing to do > is > > > have os-collect-config re-run its command on boot. I don't think a > cron > > > job is the right way to go, we should just have a file in /var/run > that > > > is placed there only on a successful run of the command. If that > file > > > does not exist, then we run the command. > > > > > > I've just opened this bug in response: > > > > > > https://bugs.launchpad.net/os-collect-config/+bug/1334804 > > > > > > > I have been looking into bug #1334804 and I have a review up to > resolve it. I want to highlight something. > > > > Currently on a reboot we start all services via upstart (on debian > anyways) and there have been quite a lot of issues around this - > missing upstart scripts and timing issues. I don't know the issues on > fedora. > > > > So with a fix to #1334804, on a reboot upstart will start all the > services first (with potentially out-of-date configuration), then > o-c-c will start o-r-c and will now configure all services and restart > them or start them if upstart isn't configured properly. > > > > I would like to turn off all boot scripts for services we configure > and leave all this to o-r-c. I think this will simplify things and put > us in control of starting services. I believe that it will also narrow > the gap between fedora and debian or debian and debian so what works > on one should work on the other and make it easier for developers.
I'm not sold on this approach. At the very least I think we want to make this optional because not all deployments may want to have o-r-c be the central service starting agent. So I'm opposed to this being our (only!) default... The job of o-r-c in this regard is to assert state... which to me means making sure that a service is configured correctly (config files, set to start on boot, and initially started). Requiring o-r-c to be the service starting agent (always) is beyond the scope of the o-r-c tool. If people want to use it in that mode I think having an *option* to do this is fine. I don't think it should be required though. Furthermore I don't think we should get into the habit of writing our elements in such a matter that things no longer start on boot without o-r-c in the mix. I do think we can solve these problems. But taking a hardwired prescriptive approach is not good here... > > > > Having the ability to service nova-api stop|start|restart is very > handy but this will be a manually thing and I intend to leave that > there. > > > > What do people think and how best do I push this forward. I feel that > this leads into the the re-assert-system-state spec but mainly I think > this is a bug and doesn't require a spec. > > > > I will be at the tripleo mid-cycle meetup next and willing to discuss > this with anyone interested in this and put together the necessary > bits to make this happen. > > > > Michael > > > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev