I agree that passing in QEMUIOVector to blk_aio_ioctl() as a holder of the void* buffer used in blk_aio_ioctl_entry() is unnecessary. But, as Kevin noted, read and write were using the QEMUIOVector in BlkRwCo.
To avoid changes to the callers of blk_aio_ioctl(), I’ll change blk_aio_prwv() to take a void pointer instead of QEMUIOVector* and use a union to hold the buffer in BlkRwCo. > On Nov 22, 2017, at 11:24 AM, Paolo Bonzini <pbonz...@redhat.com> wrote: > > On 22/11/2017 19:06, Kevin Wolf wrote: >> Am 22.11.2017 um 17:34 hat Paolo Bonzini geschrieben: >>> On 22/11/2017 16:33, Deepa Srinivasan wrote: >>>> Starting qemu with the following arguments causes qemu to segfault: >>>> ... -device lsi,id=lsi0 -drive >>>> file=iscsi:<...>,format=raw,if=none,node-name= >>>> iscsi1 -device scsi-block,bus=lsi0.0,id=<...>,drive=iscsi1 >>>> >>>> This patch fixes blk_aio_ioctl() so it does not pass stack addresses to >>>> blk_aio_ioctl_entry() which may be invoked after blk_aio_ioctl() returns. >>>> More >>>> details about the bug follow. >>>> >>>> blk_aio_ioctl() invokes blk_aio_prwv() with blk_aio_ioctl_entry as the >>>> coroutine parameter. blk_aio_prwv() ultimately calls aio_co_enter(). >>>> >>>> When blk_aio_ioctl() is executed from within a coroutine context (e.g. >>>> iscsi_bh_cb()), aio_co_enter() adds the coroutine (blk_aio_ioctl_entry) to >>>> the current coroutine's wakeup queue. blk_aio_ioctl() then returns. >>>> >>>> When blk_aio_ioctl_entry() executes later, it accesses an invalid pointer: >>>> .... >>>> BlkRwCo *rwco = &acb->rwco; >>>> >>>> rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset, >>>> rwco->qiov->iov[0].iov_base); <--- qiov is >>>> invalid >>>> here >>>> ... >>>> >>>> In the case when blk_aio_ioctl() is called from a non-coroutine context, >>>> blk_aio_ioctl_entry() executes immediately. But if bdrv_co_ioctl() calls >>>> qemu_coroutine_yield(), blk_aio_ioctl() will return. When the coroutine >>>> execution is complete, control returns to blk_aio_ioctl_entry() after the >>>> call >>>> to blk_co_ioctl(). There is no invalid reference after this point, but the >>>> function is still holding on to invalid pointers. >>>> >>>> The fix is to allocate memory for the QEMUIOVector and struct iovec as >>>> part of >>>> the request struct which the IO buffer is part of. The memory for this >>>> struct is >>>> guaranteed to be valid till the AIO is completed. >>>> >>>> Signed-off-by: Deepa Srinivasan <deepa.sriniva...@oracle.com> >>>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> >>>> Reviewed-by: Mark Kanda <mark.ka...@oracle.com> >>>> --- >>>> block/block-backend.c | 13 ++----------- >>>> hw/block/virtio-blk.c | 9 ++++++++- >>>> hw/scsi/scsi-disk.c | 10 +++++++++- >>>> hw/scsi/scsi-generic.c | 9 ++++++++- >>>> include/sysemu/block-backend.h | 2 +- >>>> 5 files changed, 28 insertions(+), 15 deletions(-) >>>> >>>> diff --git a/block/block-backend.c b/block/block-backend.c >>>> index baef8e7..c275827 100644 >>>> --- a/block/block-backend.c >>>> +++ b/block/block-backend.c >>>> @@ -1472,19 +1472,10 @@ static void blk_aio_ioctl_entry(void *opaque) >>>> blk_aio_complete(acb); >>>> } >>>> >>>> -BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long int req, void >>>> *buf, >>>> +BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long int req, >>>> QEMUIOVector *qiov, >>>> BlockCompletionFunc *cb, void *opaque) >>> >>> I think this is not the best way to fix the bug, because it adds extra >>> unnecessary code in the callers. >>> >>> Perhaps you can change BlkRwCo's "qiov" field to "void *buf" and the >>> same for blk_aio_prwv's "qiov" argument? >>> >>> Then the QEMUIOVector is not needed at all, and blk_co_ioctl can just >>> use rwco->buf. >> >> But the same struct is used for read and write requests that do use an >> actual QEMUIOVector and not just a linear buffer. > > Then let's call it "void *opaque", or make it a union (but I think > that's overkill). > > The QEMUIOVector pointer is opaque as far as blk_aio_prwv is concerned, > and it is only created by blk_aio_ioctl for blk_aio_ioctl_entry to > extract buf: > > rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset, > rwco->qiov->iov[0].iov_base); > > Exposing the fake QEMUIOVector to the callers of blk_aio_ioctl is much > uglier than using a void* for what is effectively a multi-type pointer. > > Paolo