On Wed, Apr 23, 2025 at 05:47:44PM +0800, lma wrote:
> 在 2025-04-18 23:34,Stefan Hajnoczi 写道:
> > On Thu, Apr 17, 2025 at 07:27:26PM +0800, lma wrote:
> > > Hi all,
> > > 
> > > In case of SCSI passthrough, If the Block Limits VPD device response
> > > is
> > > absent from hardware, QEMU handles it.
> > > 
> > > There are several variables involved in this process as follows:
> > > * The bl.max_transfer
> > > * The bl.max_iov that is associated with IOV_MAX.
> > > * The bl.max_hw_iov that is associated with the max_segments sysfs
> > > setting
> > > for the relevant block device on the host.
> > > * The bl.max_hw_transfer that is associated with the BLKSECTGET
> > > ioctl, in
> > > other words related to the current max_sectors_kb sysfs setting of the
> > > relevant block device on the host.
> > > 
> > > Then take the smallest value and return it as the result of "Maximum
> > > transfer length" after relevant calculation, See:
> > > static uint64_t calculate_max_transfer(SCSIDevice *s)
> > > {
> > >     uint64_t max_transfer = blk_get_max_hw_transfer(s->conf.blk);
> > >     uint32_t max_iov = blk_get_max_hw_iov(s->conf.blk);
> > > 
> > >     assert(max_transfer);
> > >     max_transfer = MIN_NON_ZERO(max_transfer,
> > >                                 max_iov * qemu_real_host_page_size());
> > > 
> > >     return max_transfer / s->blocksize;
> > > }
> > > 
> > > 
> > > However, due to the limitation of IOV_MAX, no matter how powerful
> > > the host
> > > scsi hardware is, the "Maximum transfer length" that qemu emulates
> > > in bl vpd
> > > page is capped at 8192 sectors in case of 4kb page size and 512 bytes
> > > logical block size.
> > > For example:
> > > host:~ # sg_vpd -p bl /dev/sda
> > > Block limits VPD page (SBC)
> > >   ......
> > >   Maximum transfer length: 0 blocks [not reported]
> > >   ......
> > > 
> > > 
> > > host:~ # cat /sys/class/block/sda/queue/max_sectors_kb
> > > 16384
> > > 
> > > host:~ # cat /sys/class/block/sda/queue/max_hw_sectors_kb
> > > 32767
> > > 
> > > host:~ # cat /sys/class/block/sda/queue/max_segments
> > > 4096
> > > 
> > > 
> > > Expected:
> > > guest:~ # sg_vpd -p bl /dev/sda
> > > Block limits VPD page (SBC)
> > >   ......
> > >   Maximum transfer length: 0x8000
> > >   ......
> > > 
> > > guest:~ # cat /sys/class/block/sda/queue/max_sectors_kb
> > > 16384
> > > 
> > > guest:~ # cat /sys/class/block/sda/queue/max_hw_sectors_kb
> > > 32767
> > > 
> > > 
> > > Actual:
> > > guest:~ # sg_vpd -p bl /dev/sda
> > > Block limits VPD page (SBC)
> > >   ......
> > >   Maximum transfer length: 0x2000
> > >   ......
> > > 
> > > guest:~ # cat /sys/class/block/sda/queue/max_sectors_kb
> > > 4096
> > > 
> > > guest:~ # cat /sys/class/block/sda/queue/max_hw_sectors_kb
> > > 32767
> > > 
> > > 
> > > It seems the current design logic is not able to fully utilize the
> > > performance of the scsi hardware. I have two questions:
> > > 1. I'm curious that is it reasonable to drop the logic about IOV_MAX
> > > limitation, directly use the return value of BLKSECTGET as the maximum
> > > transfer length when QEMU emulates the block limit page of scsi vpd?
> > >    If we doing so, we will have maximum transfer length in the guest
> > > that is
> > > consistent with the capabilities of the host hardware。
> > > 
> > > 2. Besides, Assume I set a value(eg: 8192 in kb) to max_sectors_kb
> > > in guest
> > > which doesn't exceed the capabilities of the host hardware(eg: 16384
> > > in kb)
> > > but exceeds the limit(eg: 4096 in kb) caused by IOV_MAX,
> > >    Any risks in readv()/writev() of raw-posix?
> > 
> > Not a definitive answer, but just something to encourage discussion:
> > 
> > In theory IOV_MAX should not be factored into the Block Limits VPD page
> > Maximum Transfer Length field because there is already a HBA limit on
> > the maximum number of segments. For example, virtio-scsi has a seg_max
> > Configuration Space field that guest drivers honor independently of
> > Maximum Transfer Length.
> > 
> > However, I can imagine why MAX_IOV needs to be factored in:
> > 
> > 1. The maximum number of segments might be hardcoded in guest drivers
> >    for some SCSI HBAs and QEMU has no way of exposing MAX_IOV to the
> >    guest in that case.
> > 
> > 2. Guest physical RAM addresses translate to host virtual memory. That
> >    means 1 segment as seen by the guest might actually require multiple
> >    physical DMA segments on the host. A conservative calculation that
> >    assumes the worst-case 1 iovec per 4 KB memory page prevents the
> >    host maximum segments limit (note this is not the Maximum Transfer
> >    Length limit!) from being exceeded.
> > 
> > So there seem to be at least two problems here. If you relax the
> > calculation there will be corner cases that break because the guest can
> > send too many segments.
> > 
> > Stefan
> 
> The maximum allowed value for
> /sys/class/block/<GUEST_DEV>/queue/max_sectors_kb in guest os depends
> on the smaller of below two items in guest os:
> the "maximum transfer length of block limits VPD page"
> and
> the "/sys/class/block/<GUEST_DEV>/queue/max_hw_sectors_kb".
> 
> 
> The "seg_max Configuration Space field" in hw/scsi/virtio-scsi.c:
> static const Property virtio_scsi_properties[] = {
>     ...
>     DEFINE_PROP_UINT32("max_sectors", VirtIOSCSI,
> parent_obj.conf.max_sectors,
>                                                   0xFFFF),
>     ...
> };
> 
> This field determines the value of max_hw_sectors_kb in sysfs in guest
> os, Eg: In case of Logical block size 512 bytes, 0xFFFF sectors means:
> max_hw_sectors_kb = 0xFFFF/2 = 32767, I believe many users will keep
> this default value when using virtio-scsi, rather than customizing it.
> 
> But by the current design and affected by IOV_MAX, the upper limit of
> /sys/class/block/<GUEST_DEV>/queue/max_sectors_kb is 4096 for SCSI
> passthrough scenario in case of 4kb page size and 512 bytes logical
> block size. Therefore, the gap between the upper limit of max_sectors_kb
> and the max_hw_sectors_kb is very large.
> 
> I think this design logic is a bit strange.

Unless you can think of a different correct way to report block limits
for scsi-generic devices, then I think we're stuck with the sub-optimal
conservative value.

By the way, scsi-disk.c's scsi-block and scsi-hd devices are less
restrictive because the host is able to split requests. Splitting is not
possible for SCSI passthrough requests since they could be
vendor-specific requests and the host does not have enough information
to split them.

Can you use -device scsi-block instead of -device scsi-generic? That
would solve this problem.

Stefan

Attachment: signature.asc
Description: PGP signature

Reply via email to