On Sun, 2020-09-06 at 23:26 +0800, Tom Yan wrote: > I don't disagree with your proposal, but the thing is, do we even need > another field/limit for case 1? For example, do we *really* need to > clamp sizes[2] (NBD_INFO_BLOCK_SIZE) in nbd/server.c and > max_io_sectors (SCSIBlockLimits) in hw/scsi/scsi-disk.c to to any kind > of "dynamic" limit?
It depends on if we have other block drivers that have IO segment/max transfer sizes, that we need to enforce, other than the SG_IO case. Knowing that the commit that added these limits to file-posix, which I was fixing is relatively new, I guess there are other cases for these limits. I'll check this tomorrow. > > Either way I can add another field (`max_pwrite`, maybe?) to > BlockLimits, as an infrastructure/hint, but what should be switched to > it, and what value should each block driver set it to, is probably > beyond what I can take. Implementation wise, I think I can do this, but I'll wait few days for the feedback on my proposal. Thanks for finding this bug! Best regards, Maxim Levitsky > > IMHO my patches should be merged first (they are really just fixing a > regression and some bugs). If anyone finds it necessary and is capable > and willing to fix the insufficiency of BlockLimits, they can do it > afterwards with another patch series as an improvement. I can add a > patch to remove the clamping in nbd/server.c as a perhaps-temporary > measure to address the original issue, if desired. > > On Sun, 6 Sep 2020 at 20:53, Maxim Levitsky <mlevi...@redhat.com> wrote: > > On Sun, 2020-09-06 at 19:04 +0800, Tom Yan wrote: > > > Maybe you want to add some condition for this: > > > https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659 > > > Or not clamp it at all. > > > > > > On Sun, 6 Sep 2020 at 18:58, Tom Yan <tom.t...@gmail.com> wrote: > > > > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky appears > > > > to have assumed that the only "SCSI Passthrough" is `-device > > > > scsi-generic`, while the fact is there's also `-device scsi-block` > > > > (passthrough without the sg driver). Unlike `-device scsi-hd`, getting > > > > max_sectors is necessary to it (more precisely, hw_max_sectors might > > > > what matters, but BLKSECTGET reports max_sectors, so). > > > > > > > > I'm unsure about how qemu-nbd works, but the commit clearly wasn't the > > > > right approach to fix the original issue it addresses. (It should for > > > > example ignore the max_transfer if it will never matter in to it, or > > > > overrides it in certain cases; when I glimpsed over > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see how > > > > it could be file-posix problem when it is reporting the right thing, > > > > regardless of whether "removing" the code helps.) > > > > > > > > I don't think we want to "mark" `-device scsi-block` as sg either. It > > > > will probably bring even more unexpected problems, because they are > > > > literally different sets of things behind the scene / in the kernel. > > > > Yes, I did overlook the fact that scsi-block is kind of hybrid passthough > > device, > > doing SG_IO for some things and regular IO for others. > > > > I don't think that my approach was wrong way to fix the problem, but as you > > found > > out, abusing 'bs->sg' hack (which I would be very happy to remove > > completely) > > wasn't a good idea. > > I actualy was aware of scsi-block and that it does SG_IO but it > > was forgotten some where on the way. > > > > So in summary what the problem is and what I think is the right solution: > > > > > > Each qemu block driver exposes block limits and assumes that they are the > > same > > for two IO interfaces the block driver can expose: > > > > 1. Regular qemu blk_pread/pwrite alike functions > > 2. blk_ioctl (tiny wrapper against SG_IO), supported by posix-raw on > > host block devices/sg char devices, and by iscsi > > > > The problem is that these two interfaces can have different block limits. > > > > I don't know about iscsi, but for files, doing regular IO is always > > unlimited, > > since it passes through the kernel block layer and segemented there on > > demand which is faster that doing it in userspace, while SG_IO is passed as > > is > > to the underlying SCSI device and lacks this segmentation. > > > > Regardless of how NBD uses these limits, I think that these limits should > > be correctly > > exposed by the block drivers, and therefore I propose is that each qemu > > block driver > > would expose a pair of block limits. > > One for the regular block IO, and other for SG_IO. > > > > Then block driver clients (like scsi devices that you mention, nbd, etc) > > can choose which limit to use, depending on which IO api they use. > > The scsi-generic, and scsi-block can use the SG_IO limits, > > while the rest can use the normal (unlimited for file I/O) limits, > > including the NBD. > > > > Best regards, > > Maxim Levitsky > > > > > > On Fri, 4 Sep 2020 at 10:09, Tom Yan <tom.t...@gmail.com> wrote: > > > > > sg devices have different major/minor than their corresponding > > > > > block devices. Using sysfs to get max segments never really worked > > > > > for them. > > > > > > > > > > Fortunately the sg driver provides an ioctl to get sg_tablesize, > > > > > which is apparently equivalent to max segments. > > > > > > > > > > Signed-off-by: Tom Yan <tom.t...@gmail.com> > > > > > --- > > > > > block/file-posix.c | 17 ++++++++++++++++- > > > > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/block/file-posix.c b/block/file-posix.c > > > > > index 411a23cf99..9e37594145 100644 > > > > > --- a/block/file-posix.c > > > > > +++ b/block/file-posix.c > > > > > @@ -1178,6 +1178,21 @@ static int sg_get_max_transfer_length(int fd) > > > > > #endif > > > > > } > > > > > > > > > > +static int sg_get_max_segments(int fd) > > > > > +{ > > > > > +#ifdef SG_GET_SG_TABLESIZE > > > > > + long max_segments = 0; > > > > > + > > > > > + if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) { > > > > > + return max_segments; > > > > > + } else { > > > > > + return -errno; > > > > > + } > > > > > +#else > > > > > + return -ENOSYS; > > > > > +#endif > > > > > +} > > > > > + > > > > > static int get_max_transfer_length(int fd) > > > > > { > > > > > #if defined(BLKSECTGET) && defined(BLKSSZGET) > > > > > @@ -1268,7 +1283,7 @@ static void > > > > > hdev_refresh_limits(BlockDriverState *bs, Error **errp) > > > > > bs->bl.max_transfer = pow2floor(ret); > > > > > } > > > > > > > > > > - ret = get_max_segments(s->fd); > > > > > + ret = bs->sg ? sg_get_max_segments(s->fd) : > > > > > get_max_segments(s->fd); > > > > > if (ret > 0) { > > > > > bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, > > > > > ret * > > > > > qemu_real_host_page_size); > > > > > -- > > > > > 2.28.0 > > > > >