Am 23.09.2014 um 11:32 hat Peter Lieven geschrieben: > On 23.09.2014 10:59, Kevin Wolf wrote: > >Am 23.09.2014 um 08:15 hat Peter Lieven geschrieben: > >>On 22.09.2014 21:06, Paolo Bonzini wrote: > >>>Il 22/09/2014 11:43, Peter Lieven ha scritto: > >>>>This series aims not at touching default behaviour. The default for > >>>>max_transfer_length > >>>>is 0 (no limit). max_transfer_length is a limit that MUST be satisfied > >>>>otherwise the request > >>>>will fail. And Patch 2 aims at catching this fail earlier in the stack. > >>>Understood. But the right fix is to avoid that backend limits transpire > >>>into guest ABI, not to catch the limits earlier. So the right fix would > >>>be to implement request splitting. > >>Since you proposed to add traces for this would you leave those in? > >>And since iSCSI is the only user of this at the moment would you > >>go for implementing this check in the iSCSI block layer? > >> > >>As for the split logic would you think it is enough to build new qiov's > >>out of the too big iov without copying the contents? This would work > >>as long as a single iov inside the qiov is not bigger the > >>max_transfer_length. > >>Otherwise I would need to allocate temporary buffers and copy around. > >You can split single iovs, too. There are functions that make this very > >easy, they copy a sub-qiov with a byte granularity offset and length > >(qemu_iovec_concat and friends). qcow2 uses the same to split requests > >at (fragmented) cluster boundaries. > > Might it be as easy as this?
This is completely untested, right? :-) But ignoring bugs, the principle looks right. > static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs, > int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, > BdrvRequestFlags flags) > { > if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) { > return -EINVAL; > } > > if (bs->bl.max_transfer_length && > nb_sectors > bs->bl.max_transfer_length) { > int ret = 0; > QEMUIOVector *qiov2 = NULL; Make it "QEMUIOVector qiov2;" on the stack. > size_t soffset = 0; > > trace_bdrv_co_do_readv_toobig(bs, sector_num, nb_sectors, > bs->bl.max_transfer_length); > > qemu_iovec_init(qiov2, qiov->niov); And &qiov2 here, then this doesn't crash with a NULL dereference. > while (nb_sectors > bs->bl.max_transfer_length && !ret) { > qemu_iovec_reset(qiov2); > qemu_iovec_concat(qiov2, qiov, soffset, > bs->bl.max_transfer_length << BDRV_SECTOR_BITS); > ret = bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS, > bs->bl.max_transfer_length << > BDRV_SECTOR_BITS, > qiov2, flags); > soffset += bs->bl.max_transfer_length << BDRV_SECTOR_BITS; > sector_num += bs->bl.max_transfer_length; > nb_sectors -= bs->bl.max_transfer_length; > } > qemu_iovec_destroy(qiov2); > if (ret) { > return ret; > } The error check needs to be immediately after the assignment of ret, otherwise the next loop iteration can overwrite an error with a success (and if it didn't, it would still do useless I/O because the request as a whole would fail anyway). > } > > return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS, > nb_sectors << BDRV_SECTOR_BITS, qiov, flags); qiov doesn't work here for the splitting case. You need the remaining part, not the whole original qiov. Kevin