On Sat, May 31, 2014 at 01:25:04AM +0800, Qin Zhao wrote:
> Hi all,
> When I run Icehouse code, I encountered a strange problem. The nova-compute
> service becomes stuck, when I boot instances. I report this bug in
> https://bugs.launchpad.net/nova/+bug/1313477.
> After thinking several days, I feel I know its root cause. This bug should
> be a deadlock problem cause by pipe fd leaking.  I draw a diagram to
> illustrate this problem.
> https://docs.google.com/drawings/d/1pItX9urLd6fmjws3BVovXQvRg_qMdTHS-0JhYfSkkVc/pub?w=960&h=720
> However, I have not find a very good solution to prevent this deadlock.
> This problem is related with Python runtime, libguestfs, and eventlet. The
> situation is a little complicated. Is there any expert who can help me to
> look for a solution? I will appreciate for your help!

Thanks for the useful diagram.  libguestfs itself is very careful to
open all file descriptors with O_CLOEXEC (atomically if the OS
supports that), so I'm fairly confident that the bug is in Python 2,
not in libguestfs.

Another thing to say is that g.shutdown() sends a kill 9 signal to the
subprocess.  Furthermore you can obtain the qemu PID (g.get_pid()) and
send any signal you want to the process.

I wonder if a simpler way to fix this wouldn't be something like
adding a tiny C extension to the Python code to use pipe2 to open the
Python pipe with O_CLOEXEC atomically?  Are we allowed Python
extensions in OpenStack?

BTW do feel free to CC libgues...@redhat.com on any libguestfs
problems you have.  You don't need to subscribe to the list.


