On Thu, Jan 11, 2018 at 07:58:11AM -0500, David Shrewsbury wrote: > This is probably mostly my fault since I did not WIP or -2 my change in > 532575 to keep it > from getting merged without some infra coordination. > > Because of that change, it is also required that we change the user > zuul-executor starts > as from root to zuul [1], and that we also open up the new default finger > port on the > executors [2]. Once those are in place, we should be ok to restart the > executors. > > As for ze04, since that one restarted as the 'root' user, and never dropped > privileges > to the 'zuul' user due to 532575, I'm not sure what state it is going to be > in after applying > [1] and [2]. Would it create files/directories as root that would now be > inaccessible if it > were to restart with the zuul user? Think logs, work dirs, etc... > For permissions, we should likely confirm that puppet-zuul will properly setup zuul:zuul on the required folders. Then next puppet run we'd be protected. > > -Dave > > > [1] https://review.openstack.org/532594 > [2] https://review.openstack.org/532709 > > > On Wed, Jan 10, 2018 at 11:53 PM, Ian Wienand <iwien...@redhat.com> wrote: > > > Hi, > > > > To avoid you having to pull apart the logs starting ~ [1], we > > determined that ze04.o.o was externally rebooted at 01:00UTC (there is > > a rather weird support ticket which you can look at, which is assigned > > to a rackspace employee but in our queue, saying the host became > > unresponsive). > > > > Unfortunately that left a bunch of jobs orphaned and necessitated a > > restart of zuul. > > > > However, recent changes to not run the executor as root [2] were thus > > partially rolled out on ze04 as it came up after reboot. As a > > consequence when the host came back up the executor was running as > > root with an invalid finger server. > > > > The executor on ze04 has been stopped, and the host placed in the > > emergency file to avoid it coming back. There are now some in-flight > > patches to complete this transition, which will need to be staged a > > bit more manually. > > > > The other executors have been left as is, based on the KISS theory > > they shouldn't restart and pick up the code until this has been dealt > > with. > > > > Thanks, > > > > -i > > > > > > [1] http://eavesdrop.openstack.org/irclogs/%23openstack- > > infra/%23openstack-infra.2018-01-11.log.html#t2018-01-11T01:09:20 > > [2] https://review.openstack.org/#/c/532575/ > > > > _______________________________________________ > > OpenStack-Infra mailing list > > OpenStack-Infra@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra > > > > > -- > David Shrewsbury (Shrews)
> _______________________________________________ > OpenStack-Infra mailing list > OpenStack-Infra@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra _______________________________________________ OpenStack-Infra mailing list OpenStack-Infra@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra