On 1/13/20 5:08 PM, Ian Jackson wrote:
> If the application calls libxl with ao_how==0 and also makes calls
> like _occurred, libxl will sometimes get stuck.
> 
> The bug happens as follows (for example):
> 
>   Thread A
>        libxl_do_thing(,ao_how==0)
>        libxl_do_thing starts, sets up some callbacks
>        libxl_do_thing exit path calls AO_INPROGRESS
>        libxl__ao_inprogress goes into event loop
>        eventloop_iteration sleeps on:
>           - do_thing's current fd set
>           - sigchld pipe if applicable
>           - its poller
> 
>   Thread B
>        libxl_something_occurred
>        the something is to do with do_thing, above
>        do_thing_next_callback does some more work
>        do_thing_next_callback becomes interested in fd N
>        thread B returns to application
> 
> Note that nothing wakes up thread A.  A is not listening on fd N.  So
> do_thing_* will not spot when fd N signals.  do_thing will not make
> further timely progress.  If there is no timeout thread A will never
> wake up.
> 
> The problem here occurs because thread A is waiting on an out of date
> osevent set.
> 
> There is also the possibility that a thread might block waiting for
> libxl osevents but outside libxl, eg if the application used
> libxl_osevent_beforepoll.  We will deal with that in a moment.
> 
> See the big comment in libxl_event.c for a fairly formal correctness
> argument.
> 
> This depends on libxl__egc_ao_cleanup_1_baton being called everywhere
> an egc or ao is disposed of.  Firstly egcs: in this patch we rename
> libxl__egc_cleanup, which means we catch all the disposal sites.
> Secondly aos: these are disposed of by (i) AO_CREATE_FAIL
> (ii) ao__inprogress and (iii) an event which completes the ao later.
> (i) and (ii) we handle by adding the call to _baton.  In the case of
> (iii) any such function must be an event-generating function so it has
> an egc too, so it will pass on the baton when the egc is disposed.
> 
> Reported-by: George Dunlap <george.dun...@citrix.com>
> Signed-off-by: Ian Jackson <ian.jack...@eu.citrix.com>

This all looks very plausible.  I don't feel confident I have enough of
a grasp of the situation to say that I would notice anything missing;
but I think it's worth putting in and letting osstest give it some
exercise (via libvirt).

Reviewed-by: George Dunlap <george.dun...@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to