On Wed, Apr 23, 2025 at 05:47:44PM +0800, lma wrote: > 在 2025-04-18 23:34,Stefan Hajnoczi 写道: > > On Thu, Apr 17, 2025 at 07:27:26PM +0800, lma wrote: > > > Hi all, > > > > > > In case of SCSI passthrough, If the Block Limits VPD device response > > > is > > > absent from hardware, QEMU handles it. > > > > > > There are several variables involved in this process as follows: > > > * The bl.max_transfer > > > * The bl.max_iov that is associated with IOV_MAX. > > > * The bl.max_hw_iov that is associated with the max_segments sysfs > > > setting > > > for the relevant block device on the host. > > > * The bl.max_hw_transfer that is associated with the BLKSECTGET > > > ioctl, in > > > other words related to the current max_sectors_kb sysfs setting of the > > > relevant block device on the host. > > > > > > Then take the smallest value and return it as the result of "Maximum > > > transfer length" after relevant calculation, See: > > > static uint64_t calculate_max_transfer(SCSIDevice *s) > > > { > > > uint64_t max_transfer = blk_get_max_hw_transfer(s->conf.blk); > > > uint32_t max_iov = blk_get_max_hw_iov(s->conf.blk); > > > > > > assert(max_transfer); > > > max_transfer = MIN_NON_ZERO(max_transfer, > > > max_iov * qemu_real_host_page_size()); > > > > > > return max_transfer / s->blocksize; > > > } > > > > > > > > > However, due to the limitation of IOV_MAX, no matter how powerful > > > the host > > > scsi hardware is, the "Maximum transfer length" that qemu emulates > > > in bl vpd > > > page is capped at 8192 sectors in case of 4kb page size and 512 bytes > > > logical block size. > > > For example: > > > host:~ # sg_vpd -p bl /dev/sda > > > Block limits VPD page (SBC) > > > ...... > > > Maximum transfer length: 0 blocks [not reported] > > > ...... > > > > > > > > > host:~ # cat /sys/class/block/sda/queue/max_sectors_kb > > > 16384 > > > > > > host:~ # cat /sys/class/block/sda/queue/max_hw_sectors_kb > > > 32767 > > > > > > host:~ # cat /sys/class/block/sda/queue/max_segments > > > 4096 > > > > > > > > > Expected: > > > guest:~ # sg_vpd -p bl /dev/sda > > > Block limits VPD page (SBC) > > > ...... > > > Maximum transfer length: 0x8000 > > > ...... > > > > > > guest:~ # cat /sys/class/block/sda/queue/max_sectors_kb > > > 16384 > > > > > > guest:~ # cat /sys/class/block/sda/queue/max_hw_sectors_kb > > > 32767 > > > > > > > > > Actual: > > > guest:~ # sg_vpd -p bl /dev/sda > > > Block limits VPD page (SBC) > > > ...... > > > Maximum transfer length: 0x2000 > > > ...... > > > > > > guest:~ # cat /sys/class/block/sda/queue/max_sectors_kb > > > 4096 > > > > > > guest:~ # cat /sys/class/block/sda/queue/max_hw_sectors_kb > > > 32767 > > > > > > > > > It seems the current design logic is not able to fully utilize the > > > performance of the scsi hardware. I have two questions: > > > 1. I'm curious that is it reasonable to drop the logic about IOV_MAX > > > limitation, directly use the return value of BLKSECTGET as the maximum > > > transfer length when QEMU emulates the block limit page of scsi vpd? > > > If we doing so, we will have maximum transfer length in the guest > > > that is > > > consistent with the capabilities of the host hardware。 > > > > > > 2. Besides, Assume I set a value(eg: 8192 in kb) to max_sectors_kb > > > in guest > > > which doesn't exceed the capabilities of the host hardware(eg: 16384 > > > in kb) > > > but exceeds the limit(eg: 4096 in kb) caused by IOV_MAX, > > > Any risks in readv()/writev() of raw-posix? > > > > Not a definitive answer, but just something to encourage discussion: > > > > In theory IOV_MAX should not be factored into the Block Limits VPD page > > Maximum Transfer Length field because there is already a HBA limit on > > the maximum number of segments. For example, virtio-scsi has a seg_max > > Configuration Space field that guest drivers honor independently of > > Maximum Transfer Length. > > > > However, I can imagine why MAX_IOV needs to be factored in: > > > > 1. The maximum number of segments might be hardcoded in guest drivers > > for some SCSI HBAs and QEMU has no way of exposing MAX_IOV to the > > guest in that case. > > > > 2. Guest physical RAM addresses translate to host virtual memory. That > > means 1 segment as seen by the guest might actually require multiple > > physical DMA segments on the host. A conservative calculation that > > assumes the worst-case 1 iovec per 4 KB memory page prevents the > > host maximum segments limit (note this is not the Maximum Transfer > > Length limit!) from being exceeded. > > > > So there seem to be at least two problems here. If you relax the > > calculation there will be corner cases that break because the guest can > > send too many segments. > > > > Stefan > > The maximum allowed value for > /sys/class/block/<GUEST_DEV>/queue/max_sectors_kb in guest os depends > on the smaller of below two items in guest os: > the "maximum transfer length of block limits VPD page" > and > the "/sys/class/block/<GUEST_DEV>/queue/max_hw_sectors_kb". > > > The "seg_max Configuration Space field" in hw/scsi/virtio-scsi.c: > static const Property virtio_scsi_properties[] = { > ... > DEFINE_PROP_UINT32("max_sectors", VirtIOSCSI, > parent_obj.conf.max_sectors, > 0xFFFF), > ... > }; > > This field determines the value of max_hw_sectors_kb in sysfs in guest > os, Eg: In case of Logical block size 512 bytes, 0xFFFF sectors means: > max_hw_sectors_kb = 0xFFFF/2 = 32767, I believe many users will keep > this default value when using virtio-scsi, rather than customizing it. > > But by the current design and affected by IOV_MAX, the upper limit of > /sys/class/block/<GUEST_DEV>/queue/max_sectors_kb is 4096 for SCSI > passthrough scenario in case of 4kb page size and 512 bytes logical > block size. Therefore, the gap between the upper limit of max_sectors_kb > and the max_hw_sectors_kb is very large. > > I think this design logic is a bit strange.
Unless you can think of a different correct way to report block limits for scsi-generic devices, then I think we're stuck with the sub-optimal conservative value. By the way, scsi-disk.c's scsi-block and scsi-hd devices are less restrictive because the host is able to split requests. Splitting is not possible for SCSI passthrough requests since they could be vendor-specific requests and the host does not have enough information to split them. Can you use -device scsi-block instead of -device scsi-generic? That would solve this problem. Stefan
signature.asc
Description: PGP signature