On Fri, Jul 26, 2019 at 04:18:46PM -0400, John Snow wrote: > Paolo, Stefan and Kevin: can I loop you in here? I'm quite uncertain > about this and I'd like to clear this up quickly if it's possible: > > On 7/25/19 8:58 PM, John Snow wrote: > > > > > > On 7/7/19 10:55 PM, shaju.abra...@nutanix.com wrote: > >> From: Shaju Abraham <shaju.abra...@nutanix.com> > >> > >> During the IDE DMA transfer for a ISCSI target,when libiscsi encounters > >> a SENSE KEY error, it sets the task->sense to the value "COMMAND ABORTED". > >> The function iscsi_translate_sense() later translaters this error to > >> -ECANCELED > >> and this value is passed to the callback function. In the case of IDE DMA > >> read > >> or write, the callback function returns immediately if the value of the ret > >> argument is -ECANCELED. > >> Later when ide_cancel_dma_sync() function is invoked the assertion > >> "s->bus->dma->aiocb == ((void *)0)" fails and the qemu process gets > >> terminated. > >> Fix the issue by making the value of s->bus->dma->aiocb = NULL when > >> -ECANCELED is passed to the callback. > >> > >> Signed-off-by: Shaju Abraham <shaju.abra...@nutanix.com> > >> --- > >> hw/ide/core.c | 1 + > >> 1 file changed, 1 insertion(+) > >> > >> diff --git a/hw/ide/core.c b/hw/ide/core.c > >> index 6afadf8..78ea357 100644 > >> --- a/hw/ide/core.c > >> +++ b/hw/ide/core.c > >> @@ -841,6 +841,7 @@ static void ide_dma_cb(void *opaque, int ret) > >> bool stay_active = false; > >> > >> if (ret == -ECANCELED) { > >> + s->bus->dma->aiocb = NULL; > >> return; > >> } > >> > >> > > > > The part that makes me nervous here is that I can't remember why we do > > NO cleanup whatsoever for the ECANCELED case. > > > > commit 0d910cfeaf2076b116b4517166d5deb0fea76394 > > Author: Fam Zheng <f...@redhat.com> > > Date: Thu Sep 11 13:41:07 2014 +0800 > > > > ide/ahci: Check for -ECANCELED in aio callbacks > > > > > > ... This looks like we never expected the aio callbacks to ever get > > called with ECANCELED, so we treat this as a QEMU-internal signal. > > > > It looks like we expect these callbacks to do NOTHING in this case; but > > I'm not sure where the IDE state machine does its cleanup otherwise. > > (The DMA might have been canceled, but the DMA and IDE state machines > > still need to exit their loop.) > > > > If you take a look at this patch from 2014 though, there are many other > > spots where we have littered ECANCELED checks that might also cause > > problems if we're receiving error codes we thought we couldn't get normally. > > > > I am worried this patch papers over something worse. > > > I'm not clear why Fam's patch adds a do-nothing return to the ide_dma_cb > if it's invoked with ECANCELED: shouldn't it be the case that the IDE > state machine needs to know that a transfer it was relying on to service > an ATA command was canceled and treat it like an error? > > Why was it ever correct to ignore these? Is it because we only ever > canceled DMA during reset/shutdown/etc? > > It appears as if iscsi requests can actually genuinely return an > ECANCELED errno, so there are likely several places in the IDE code that > need to accommodate this from happening. > > The easiest fix LOOKS like just deleting the special-casing of ECANCELED > altogether and letting the error pathways handle things as normal. > > Am I mistaken?
I think your instincts are right that there are deeper issues. The first step would be test cases, then you can be sure various scenarios have been handled correctly. I noticed that ide_sector_read_cb(), ide_sector_write_cb(), and ide_flush_cb() all differ in whether they reset s->pio_aiocb and s->status before returning early due to -ECANCELED. That must be a bug. I didn't look at the ide_dma_cb() code path. Stefan
signature.asc
Description: PGP signature