> Am 06.10.2015 um 19:07 schrieb John Snow <js...@redhat.com>: > > > >> On 10/06/2015 05:20 AM, Peter Lieven wrote: >>> Am 06.10.2015 um 10:57 schrieb Kevin Wolf: >>> Am 05.10.2015 um 23:15 hat John Snow geschrieben: >>>> >>>> On 09/21/2015 08:25 AM, Peter Lieven wrote: >>>>> PIO read requests on the ATAPI interface used to be sync blk requests. >>>>> This has to siginificant drawbacks. First the main loop hangs util an >>>>> I/O request is completed and secondly if the I/O request does not >>>>> complete (e.g. due to an unresponsive storage) Qemu hangs completely. >>>>> >>>>> Signed-off-by: Peter Lieven <p...@kamp.de> >>>>> --- >>>>> hw/ide/atapi.c | 69 >>>>> ++++++++++++++++++++++++++++++++++++---------------------- >>>>> 1 file changed, 43 insertions(+), 26 deletions(-) >>>>> >>>>> diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c >>>>> index 747f466..9257e1c 100644 >>>>> --- a/hw/ide/atapi.c >>>>> +++ b/hw/ide/atapi.c >>>>> @@ -105,31 +105,51 @@ static void cd_data_to_raw(uint8_t *buf, int lba) >>>>> memset(buf, 0, 288); >>>>> } >>>>> -static int cd_read_sector(IDEState *s, int lba, uint8_t *buf, int >>>>> sector_size) >>>>> +static void cd_read_sector_cb(void *opaque, int ret) >>>>> { >>>>> - int ret; >>>>> + IDEState *s = opaque; >>>>> - switch(sector_size) { >>>>> - case 2048: >>>>> - block_acct_start(blk_get_stats(s->blk), &s->acct, >>>>> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); >>>>> - ret = blk_read(s->blk, (int64_t)lba << 2, buf, 4); >>>>> - block_acct_done(blk_get_stats(s->blk), &s->acct); >>>>> - break; >>>>> - case 2352: >>>>> - block_acct_start(blk_get_stats(s->blk), &s->acct, >>>>> - 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); >>>>> - ret = blk_read(s->blk, (int64_t)lba << 2, buf + 16, 4); >>>>> - block_acct_done(blk_get_stats(s->blk), &s->acct); >>>>> - if (ret < 0) >>>>> - return ret; >>>>> - cd_data_to_raw(buf, lba); >>>>> - break; >>>>> - default: >>>>> - ret = -EIO; >>>>> - break; >>>>> + block_acct_done(blk_get_stats(s->blk), &s->acct); >>>>> + >>>>> + if (ret < 0) { >>>>> + ide_atapi_io_error(s, ret); >>>>> + return; >>>>> + } >>>>> + >>>>> + if (s->cd_sector_size == 2352) { >>>>> + cd_data_to_raw(s->io_buffer, s->lba); >>>>> } >>>>> - return ret; >>>>> + >>>>> + s->lba++; >>>>> + s->io_buffer_index = 0; >>>>> + s->status &= ~BUSY_STAT; >>>>> + >>>>> + ide_atapi_cmd_reply_end(s); >>>>> +} >>>>> + >>>>> +static int cd_read_sector(IDEState *s, int lba, void *buf, int >>>>> sector_size) >>>>> +{ >>>>> + if (sector_size != 2048 && sector_size != 2352) { >>>>> + return -EINVAL; >>>>> + } >>>>> + >>>>> + s->iov.iov_base = buf; >>>>> + if (sector_size == 2352) { >>>>> + buf += 4; >>>>> + } >>> This doesn't look quite right, buf is never read after this. >>> >>> Also, why +=4 when it was originally buf + 16? >> >> You are right. I mixed that up. >> >>> >>>>> + >>>>> + s->iov.iov_len = 4 * BDRV_SECTOR_SIZE; >>>>> + qemu_iovec_init_external(&s->qiov, &s->iov, 1); >>>>> + >>>>> + if (blk_aio_readv(s->blk, (int64_t)lba << 2, &s->qiov, 4, >>>>> + cd_read_sector_cb, s) == NULL) { >>>>> + return -EIO; >>>>> + } >>>>> + >>>>> + block_acct_start(blk_get_stats(s->blk), &s->acct, >>>>> + 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ); >>>>> + s->status |= BUSY_STAT; >>>>> + return 0; >>>>> } >>>> We discussed this off-list a bit, but for upstream synchronization: >>>> >>>> Unfortunately, I believe making cd_read_sector here non-blocking makes >>>> ide_atapi_cmd_reply_end non-blocking, and as a result makes calls to >>>> s->end_transfer_func() nonblocking, which functions like ide_data_readw >>>> are not prepared to cope with. >>> I don't think that's a problem as long as BSY is set while the >>> asynchronous command is running and DRQ is cleared. The latter will >>> protect ide_data_readw(). ide_sector_read() does essentially the same >>> thing. >> >> I was thinking the same. Without the BSY its not working at all. >> >>> >>> Or maybe I'm just missing what you're trying to say. >>> >>>> My suggestion is to buffer an entire DRQ block of data at once >>>> (byte_count_limit) to avoid the problem. >>> No matter whether there is a problem or not, buffering more data at once >>> (and therefore doing less requests) is better for performance anyway. >> >> Its possible to do only one read in the backend and read the whole >> request into the IO buffer. I send a follow-up. > > Be cautious: we only have 128K (+4 bytes) to play with in the io_buffer > and the READ10 cdb can request up to 128MiB! For performance, it might > be nice to always buffer something like: > > MIN(128K, nb_sectors * sector_size)
isnt nb_sectors limited to CD_MAX_SECTORS (32)? Peter > > and then as the guest drains the DRQ block of size byte_count_limit > which can only be at largest 0xFFFE (we can fit in at least two of these > per io_buffer refill) we can just shift the data_ptr and data_end > pointers to utilize io_buffer like a ring buffer. > > Because the guest can at most fetch 0xfffe bytes at a time, it will tend > to leave at least 4 bytes left over from a 64 block read. Luckily, we've > got 4 extra bytes in s->io_buffer, so with a ring buffer we can always > rebuffer *at least* two full DRQ blocks of data at a time. > > The routine would basically look like this: > > - No DRQ blocks buffered, so read up to 64 blocks or however many are > left for our transfer > - If we have at least one full DRQ block allocated, start the transfer > and send an interrupt > - If we ran out of DRQ blocks, go back to the top and buffer them. > > This would eliminate the need for code stanza #3 in > ide_atapi_cmd_reply_end, which re-starts a transfer without signaling to > the guest. We'd only have: > > ide_atapi_cmd_reply_end(...) { > if (packet_transfer_size == 0) { end(...); return; } > if (blocks_buffered < 1) { async_buffer_blocks(...); return; } > ide_transfer_start(...) > ide_set_irq(s->bus); > } > > > which is a good deal simpler than what we have now, though I need to > look into the formatting of raw CD data a little more to make sure my > numbers make sense... it may not be quite so easy to buffer multiple DRQ > blocks in some cases, but so it goes -- we should always be able to > buffer at least one. > >> Maybe do you have a pointer to the test tool that John mentioned? >> >> Peter >>