On 1/13/20 5:08 PM, Ian Jackson wrote: > If the application calls libxl with ao_how==0 and also makes calls > like _occurred, libxl will sometimes get stuck. > > The bug happens as follows (for example): > > Thread A > libxl_do_thing(,ao_how==0) > libxl_do_thing starts, sets up some callbacks > libxl_do_thing exit path calls AO_INPROGRESS > libxl__ao_inprogress goes into event loop > eventloop_iteration sleeps on: > - do_thing's current fd set > - sigchld pipe if applicable > - its poller > > Thread B > libxl_something_occurred > the something is to do with do_thing, above > do_thing_next_callback does some more work > do_thing_next_callback becomes interested in fd N > thread B returns to application > > Note that nothing wakes up thread A. A is not listening on fd N. So > do_thing_* will not spot when fd N signals. do_thing will not make > further timely progress. If there is no timeout thread A will never > wake up. > > The problem here occurs because thread A is waiting on an out of date > osevent set. > > There is also the possibility that a thread might block waiting for > libxl osevents but outside libxl, eg if the application used > libxl_osevent_beforepoll. We will deal with that in a moment. > > See the big comment in libxl_event.c for a fairly formal correctness > argument. > > This depends on libxl__egc_ao_cleanup_1_baton being called everywhere > an egc or ao is disposed of. Firstly egcs: in this patch we rename > libxl__egc_cleanup, which means we catch all the disposal sites. > Secondly aos: these are disposed of by (i) AO_CREATE_FAIL > (ii) ao__inprogress and (iii) an event which completes the ao later. > (i) and (ii) we handle by adding the call to _baton. In the case of > (iii) any such function must be an event-generating function so it has > an egc too, so it will pass on the baton when the egc is disposed. > > Reported-by: George Dunlap <george.dun...@citrix.com> > Signed-off-by: Ian Jackson <ian.jack...@eu.citrix.com>
This all looks very plausible. I don't feel confident I have enough of a grasp of the situation to say that I would notice anything missing; but I think it's worth putting in and letting osstest give it some exercise (via libvirt). Reviewed-by: George Dunlap <george.dun...@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel