On 10/12/2014 15:35, Ming Lei wrote: >>> It is _not_ never happen at all, and easy to be triggered when using >>> mkfs. >> >> mkfs is not something to optimize for, it's just something that should >> work. (Also, some hardware may time out if you do write same with too >> high a block count). > > I don't think it is related with the hardware time out issue since your > patch still splits the block count into 2G - 1, and both are same wrt. > block count.
If the guest sends a 1TB WRITE SAME, it's more likely to time out. >> Both Linux and Windows will always use UNMAP on QEMU, except for the >> small time period where Linux used WRITE SAME and this bug was >> discovered. And all versions of Linux that used WRITE SAME honored the >> max_ws_blocks field. > > Not sure how you get the conclusion. Because the WRITE SAME patch was submitted ~1 month ago. Windows uses UNMAP because Microsoft says so. > Secondly SBC-3 draft doesn't describe the priority explicitly among > UNMAP, WRITE SAME 10, and WRITE SAME 16, so it is driver's > freedom to take anyone in theory. Sure, but WRITE SAME with UNMAP doesn't make sense if you do not have LBPRZ, which QEMU does not set. In fact the only sensible things to do are: - use WRITE SAME if LBPRZ - use UNMAP if !LBPRZ So any sensible guest will use UNMAP. > Finally blkdev_issue_zeroout() can send WRITE SAME(10/16) directly > and it can be from user space, fs, and block drivers. That is WRITE SAME without UNMAP, it is not used by mkfs, and Linux has always honored max_write_same_blocks for it (defaulting to a 65535 block limit for older devices that did not report a limit). So what *concrete* case would be fixed by adding extra little-used code in QEMU to do the split? Paolo > Thanks, > Ming Lei > >