Public bug reported: I recently had some trouble with dnsmasq causing it to segfault in certain situations. No doubt, this was a bug in dnsmasq. However, it was quite troubling that Neutron never noted that dnsmasq had stopped working. This is because dnsmasq is spawned as a daemon, even though it is most definitely "owned" by neutron-dhcp-agent. Also if neutron-dhcp- agent should die, since dnsmasq is a daemon it will continue to run and be "stale", requiring manual intervention to clean up. However if it is in the foreground then it will stay in neutron-dhcp-agent's process group and should also die and if need-be cleaned up by init.
I did some analysis and will not be able to dig into the actual implementation. However my analysis shows that this would work: * use utils.create_process instead of execute and remember returned Popen object. * spawn a greenthread to wait() on the process * if it dies, restart it and log the error code * pass the -k option so dnsmasq stays in foreground * kill the process using child signals Note sure how or if SIGCHLD plays a factor. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1257524 Title: If neutron spawned dnsmasq dies, neutron-dhcp-agent will be totally unaware Status in OpenStack Neutron (virtual network service): New Bug description: I recently had some trouble with dnsmasq causing it to segfault in certain situations. No doubt, this was a bug in dnsmasq. However, it was quite troubling that Neutron never noted that dnsmasq had stopped working. This is because dnsmasq is spawned as a daemon, even though it is most definitely "owned" by neutron-dhcp-agent. Also if neutron- dhcp-agent should die, since dnsmasq is a daemon it will continue to run and be "stale", requiring manual intervention to clean up. However if it is in the foreground then it will stay in neutron-dhcp-agent's process group and should also die and if need-be cleaned up by init. I did some analysis and will not be able to dig into the actual implementation. However my analysis shows that this would work: * use utils.create_process instead of execute and remember returned Popen object. * spawn a greenthread to wait() on the process * if it dies, restart it and log the error code * pass the -k option so dnsmasq stays in foreground * kill the process using child signals Note sure how or if SIGCHLD plays a factor. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1257524/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp