Dear stackers, FYI. Eventually I report this problem to libguestfs. A workaround has been included into libguestfs code to fix it. Thanks for your supporting! https://bugzilla.redhat.com/show_bug.cgi?id=1123007
On Sat, Jun 7, 2014 at 3:27 AM, Qin Zhao <chaoc...@gmail.com> wrote: > Yuriy, > > And I think if we use proxy object of multiprocessing, the green thread > will not switch during we call libguestfs. Is that correct? > > > On Fri, Jun 6, 2014 at 2:44 AM, Qin Zhao <chaoc...@gmail.com> wrote: > >> Hi Yuriy, >> >> I read multiprocessing source code just now. Now I feel it may not solve >> this problem very easily. For example, let us assume that we will use the >> proxy object in Manager's process to call libguestfs. In manager.py, I see >> it needs to create a pipe, before fork the child process. The write end of >> this pipe is required by child process. >> >> >> http://sourcecodebrowser.com/python-multiprocessing/2.6.2.1/classmultiprocessing_1_1managers_1_1_base_manager.html#a57fe9abe7a3d281286556c4bf3fbf4d5 >> >> And in Process._bootstrp(), I think we will need to register a function >> to be called by _run_after_forkers(), in order to closed the fds inherited >> from Nova process. >> >> >> http://sourcecodebrowser.com/python-multiprocessing/2.6.2.1/classmultiprocessing_1_1process_1_1_process.html#ae594800e7bdef288d9bfbf8b79019d2e >> >> And we also can not close the write end fd created by Manager in >> _run_after_forkers(). One feasible way may be getting that fd from the 5th >> element of _args attribute of Process object, then skip to close this >> fd.... I have not investigate if or not Manager need to use other fds, >> besides this pipe. Personally, I feel such an implementation will be a >> little tricky and risky, because it tightly depends on Manager code. If >> Manager opens other files, or change the argument order, our code will fail >> to run. Am I wrong? Is there any other safer way? >> >> >> On Thu, Jun 5, 2014 at 11:40 PM, Yuriy Taraday <yorik....@gmail.com> >> wrote: >> >>> Please take a look at >>> https://docs.python.org/2.7/library/multiprocessing.html#managers - >>> everything is already implemented there. >>> All you need is to start one manager that would serve all your requests >>> to libguestfs. The implementation in stdlib will provide you with all >>> exceptions and return values with minimum code changes on Nova side. >>> Create a new Manager, register an libguestfs "endpoint" in it and call >>> start(). It will spawn a separate process that will speak with calling >>> process over very simple RPC. >>> From the looks of it all you need to do is replace tpool.Proxy calls in >>> VFSGuestFS.setup method to calls to this new Manager. >>> >>> >>> On Thu, Jun 5, 2014 at 7:21 PM, Qin Zhao <chaoc...@gmail.com> wrote: >>> >>>> Hi Yuriy, >>>> >>>> Thanks for reading my bug! You are right. Python 3.3 or 3.4 should not >>>> have this issue, since they have can secure the file descriptor. Before >>>> OpenStack move to Python 3, we may still need a solution. Calling >>>> libguestfs in a separate process seems to be a way. This way, Nova code can >>>> close those fd by itself, not depending upon CLOEXEC. However, that will be >>>> an expensive solution, since it requires a lot of code change. At least we >>>> need to write code to pass the return value and exception between these two >>>> processes. That will make this solution very complex. Do you agree? >>>> >>>> >>>> On Thu, Jun 5, 2014 at 9:39 PM, Yuriy Taraday <yorik....@gmail.com> >>>> wrote: >>>> >>>>> This behavior of os.pipe() has changed in Python 3.x so it won't be an >>>>> issue on newer Python (if only it was accessible for us). >>>>> >>>>> From the looks of it you can mitigate the problem by running >>>>> libguestfs requests in a separate process (multiprocessing.managers comes >>>>> to mind). This way the only descriptors child process could theoretically >>>>> inherit would be long-lived pipes to main process although they won't leak >>>>> because they should be marked with CLOEXEC before any libguestfs request >>>>> is >>>>> run. The other benefit is that this separate process won't be busy opening >>>>> and closing tons of fds so the problem with inheriting will be avoided. >>>>> >>>>> >>>>> On Thu, Jun 5, 2014 at 2:17 PM, laserjetyang <laserjety...@gmail.com> >>>>> wrote: >>>>> >>>>>> Will this patch of Python fix your problem? >>>>>> *http://bugs.python.org/issue7213 >>>>>> <http://bugs.python.org/issue7213>* >>>>>> >>>>>> >>>>>> On Wed, Jun 4, 2014 at 10:41 PM, Qin Zhao <chaoc...@gmail.com> wrote: >>>>>> >>>>>>> Hi Zhu Zhu, >>>>>>> >>>>>>> Thank you for reading my diagram! I need to clarify that this >>>>>>> problem does not occur during data injection. Before creating the ISO, >>>>>>> the >>>>>>> driver code will extend the disk. Libguestfs is invoked in that time >>>>>>> frame. >>>>>>> >>>>>>> And now I think this problem may occur at any time, if the code use >>>>>>> tpool to invoke libguestfs, and one external commend is executed in >>>>>>> another >>>>>>> green thread simultaneously. Please correct me if I am wrong. >>>>>>> >>>>>>> I think one simple solution for this issue is to call libguestfs >>>>>>> routine in greenthread, rather than another native thread. But it will >>>>>>> impact the performance very much. So I do not think that is an >>>>>>> acceptable >>>>>>> solution. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jun 4, 2014 at 12:00 PM, Zhu Zhu <bjzzu...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Qin Zhao, >>>>>>>> >>>>>>>> Thanks for raising this issue and analysis. According to the issue >>>>>>>> description and happen scenario( >>>>>>>> https://docs.google.com/drawings/d/1pItX9urLd6fmjws3BVovXQvRg_qMdTHS-0JhYfSkkVc/pub?w=960&h=720 >>>>>>>> ), if that's the case, concurrent mutiple KVM spawn instances(*with >>>>>>>> both config drive and data injection enabled*) are triggered, the >>>>>>>> issue can be very likely to happen. >>>>>>>> As in libvirt/driver.py _create_image method, right after iso >>>>>>>> making "cdb.make_drive", the driver will attempt "data injection" >>>>>>>> which will call the libguestfs launch in another thread. >>>>>>>> >>>>>>>> Looks there were also a couple of libguestfs hang issues from >>>>>>>> Launch pad as below. . I am not sure if libguestfs itself can have >>>>>>>> certain >>>>>>>> mechanism to free/close the fds that inherited from parent process >>>>>>>> instead >>>>>>>> of require explicitly calling the tear down. Maybe open a defect to >>>>>>>> libguestfs to see what their thoughts? >>>>>>>> >>>>>>>> https://bugs.launchpad.net/nova/+bug/1286256 >>>>>>>> https://bugs.launchpad.net/nova/+bug/1270304 >>>>>>>> >>>>>>>> ------------------------------ >>>>>>>> Zhu Zhu >>>>>>>> Best Regards >>>>>>>> >>>>>>>> >>>>>>>> *From:* Qin Zhao <chaoc...@gmail.com> >>>>>>>> *Date:* 2014-05-31 01:25 >>>>>>>> *To:* OpenStack Development Mailing List (not for usage questions) >>>>>>>> <openstack-dev@lists.openstack.org> >>>>>>>> *Subject:* [openstack-dev] [Nova] nova-compute deadlock >>>>>>>> Hi all, >>>>>>>> >>>>>>>> When I run Icehouse code, I encountered a strange problem. The >>>>>>>> nova-compute service becomes stuck, when I boot instances. I report >>>>>>>> this >>>>>>>> bug in https://bugs.launchpad.net/nova/+bug/1313477. >>>>>>>> >>>>>>>> After thinking several days, I feel I know its root cause. This bug >>>>>>>> should be a deadlock problem cause by pipe fd leaking. I draw a >>>>>>>> diagram to >>>>>>>> illustrate this problem. >>>>>>>> https://docs.google.com/drawings/d/1pItX9urLd6fmjws3BVovXQvRg_qMdTHS-0JhYfSkkVc/pub?w=960&h=720 >>>>>>>> >>>>>>>> However, I have not find a very good solution to prevent this >>>>>>>> deadlock. This problem is related with Python runtime, libguestfs, and >>>>>>>> eventlet. The situation is a little complicated. Is there any expert >>>>>>>> who >>>>>>>> can help me to look for a solution? I will appreciate for your help! >>>>>>>> >>>>>>>> -- >>>>>>>> Qin Zhao >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> OpenStack-dev mailing list >>>>>>>> OpenStack-dev@lists.openstack.org >>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Qin Zhao >>>>>>> >>>>>>> _______________________________________________ >>>>>>> OpenStack-dev mailing list >>>>>>> OpenStack-dev@lists.openstack.org >>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> OpenStack-dev mailing list >>>>>> OpenStack-dev@lists.openstack.org >>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Kind regards, Yuriy. >>>>> >>>>> _______________________________________________ >>>>> OpenStack-dev mailing list >>>>> OpenStack-dev@lists.openstack.org >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>> >>>>> >>>> >>>> >>>> -- >>>> Qin Zhao >>>> >>>> _______________________________________________ >>>> OpenStack-dev mailing list >>>> OpenStack-dev@lists.openstack.org >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>>> >>> >>> >>> -- >>> >>> Kind regards, Yuriy. >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> >> >> >> -- >> Qin Zhao >> > > > > -- > Qin Zhao > -- Qin Zhao
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev