On 17/09/2018 14:53, Kevin Wolf wrote:
>>> I think I can drop the ref/unref pair, but not the whole patch (whose
>>> main point is reordering dec_in_flight vs. the AIO callback).
>>
>> You're right, though I think I did that on purpose back in the day.
>> IIRC it was related to bdrv_drain, which might never complete if called
>> from an AIO callback.
> 
> Hm... This seems to become a common pattern, it's the same as for the
> job completion callbacks (only improved enough for the bug at hand to
> disappear instead of properly fixed in "blockjob: Lie better in
> child_job_drained_poll()").
> 
> Either you say there is no activity even though there is still a
> callback pending, then bdrv_drain() called from elsewhere will return
> too early and we get a bug. Or you say there is activity, then any
> nested drain inside that callback will deadlock and we get a bug, too.
> 
> So I suppose we need some way to know which activities to ignore during
> drain, depending on who is the caller? :-/

Some alternatives:

1) anything that needs to do invoke I/O for callbacks must use inc/dec
in flight manually.  Simplest but hard to assert though.  At least SCSI
IDE are broken.

2) callbacks cannot indirectly result in bdrv_drain.  Sounds easier
since there are not many AIO callers anymore - plus it also seems easier
to add debugging code for.

3) we provide device callbacks for "am I busy" and invoke them from
bdrv_drain's poll loop.

Just off the top of my head.  Not sure which is best.

Paolo

Reply via email to