在 2025-04-18 23:34,Stefan Hajnoczi 写道:
On Thu, Apr 17, 2025 at 07:27:26PM +0800, lma wrote:
Hi all,
In case of SCSI passthrough, If the Block Limits VPD device response
is
absent from hardware, QEMU handles it.
There are several variables involved in this process as follows:
* The bl.max_transfer
* The bl.max_iov that is associated with IOV_MAX.
* The bl.max_hw_iov that is associated with the max_segments sysfs
setting
for the relevant block device on the host.
* The bl.max_hw_transfer that is associated with the BLKSECTGET ioctl,
in
other words related to the current max_sectors_kb sysfs setting of the
relevant block device on the host.
Then take the smallest value and return it as the result of "Maximum
transfer length" after relevant calculation, See:
static uint64_t calculate_max_transfer(SCSIDevice *s)
{
uint64_t max_transfer = blk_get_max_hw_transfer(s->conf.blk);
uint32_t max_iov = blk_get_max_hw_iov(s->conf.blk);
assert(max_transfer);
max_transfer = MIN_NON_ZERO(max_transfer,
max_iov * qemu_real_host_page_size());
return max_transfer / s->blocksize;
}
However, due to the limitation of IOV_MAX, no matter how powerful the
host
scsi hardware is, the "Maximum transfer length" that qemu emulates in
bl vpd
page is capped at 8192 sectors in case of 4kb page size and 512 bytes
logical block size.
For example:
host:~ # sg_vpd -p bl /dev/sda
Block limits VPD page (SBC)
......
Maximum transfer length: 0 blocks [not reported]
......
host:~ # cat /sys/class/block/sda/queue/max_sectors_kb
16384
host:~ # cat /sys/class/block/sda/queue/max_hw_sectors_kb
32767
host:~ # cat /sys/class/block/sda/queue/max_segments
4096
Expected:
guest:~ # sg_vpd -p bl /dev/sda
Block limits VPD page (SBC)
......
Maximum transfer length: 0x8000
......
guest:~ # cat /sys/class/block/sda/queue/max_sectors_kb
16384
guest:~ # cat /sys/class/block/sda/queue/max_hw_sectors_kb
32767
Actual:
guest:~ # sg_vpd -p bl /dev/sda
Block limits VPD page (SBC)
......
Maximum transfer length: 0x2000
......
guest:~ # cat /sys/class/block/sda/queue/max_sectors_kb
4096
guest:~ # cat /sys/class/block/sda/queue/max_hw_sectors_kb
32767
It seems the current design logic is not able to fully utilize the
performance of the scsi hardware. I have two questions:
1. I'm curious that is it reasonable to drop the logic about IOV_MAX
limitation, directly use the return value of BLKSECTGET as the maximum
transfer length when QEMU emulates the block limit page of scsi vpd?
If we doing so, we will have maximum transfer length in the guest
that is
consistent with the capabilities of the host hardware。
2. Besides, Assume I set a value(eg: 8192 in kb) to max_sectors_kb in
guest
which doesn't exceed the capabilities of the host hardware(eg: 16384
in kb)
but exceeds the limit(eg: 4096 in kb) caused by IOV_MAX,
Any risks in readv()/writev() of raw-posix?
Not a definitive answer, but just something to encourage discussion:
In theory IOV_MAX should not be factored into the Block Limits VPD page
Maximum Transfer Length field because there is already a HBA limit on
the maximum number of segments. For example, virtio-scsi has a seg_max
Configuration Space field that guest drivers honor independently of
Maximum Transfer Length.
However, I can imagine why MAX_IOV needs to be factored in:
1. The maximum number of segments might be hardcoded in guest drivers
for some SCSI HBAs and QEMU has no way of exposing MAX_IOV to the
guest in that case.
2. Guest physical RAM addresses translate to host virtual memory. That
means 1 segment as seen by the guest might actually require multiple
physical DMA segments on the host. A conservative calculation that
assumes the worst-case 1 iovec per 4 KB memory page prevents the
host maximum segments limit (note this is not the Maximum Transfer
Length limit!) from being exceeded.
So there seem to be at least two problems here. If you relax the
calculation there will be corner cases that break because the guest can
send too many segments.
Stefan
The maximum allowed value for
/sys/class/block/<GUEST_DEV>/queue/max_sectors_kb in guest os depends
on the smaller of below two items in guest os:
the "maximum transfer length of block limits VPD page"
and
the "/sys/class/block/<GUEST_DEV>/queue/max_hw_sectors_kb".
The "seg_max Configuration Space field" in hw/scsi/virtio-scsi.c:
static const Property virtio_scsi_properties[] = {
...
DEFINE_PROP_UINT32("max_sectors", VirtIOSCSI,
parent_obj.conf.max_sectors,
0xFFFF),
...
};
This field determines the value of max_hw_sectors_kb in sysfs in guest
os, Eg: In case of Logical block size 512 bytes, 0xFFFF sectors means:
max_hw_sectors_kb = 0xFFFF/2 = 32767, I believe many users will keep
this default value when using virtio-scsi, rather than customizing it.
But by the current design and affected by IOV_MAX, the upper limit of
/sys/class/block/<GUEST_DEV>/queue/max_sectors_kb is 4096 for SCSI
passthrough scenario in case of 4kb page size and 512 bytes logical
block size. Therefore, the gap between the upper limit of max_sectors_kb
and the max_hw_sectors_kb is very large.
I think this design logic is a bit strange.
Anyway, Thanks for the detailed answer,
Lin