On 10/16/19 1:32 PM, Gerd Hoffmann wrote:
> On Tue, Oct 01, 2019 at 10:25:00AM +0300, Denis Plotnikov wrote:
>> Some linux kernels has a performance flaw in virtio block device access.
>> On some frequent disk access patterns, e.g. 1M read, the kernel produces
>> more block requests than needed. This happens because of virtio seg_max
>> parameter set to 126 (virtqueue_size - 2) which limits the maximum block
>> request to 516096 (126 * 4096_PAGE_SIZE).
>>
>> Setting seg_max > 126 fixes the issue, however, not all linux kernels
>> allow that. The old ones have a restriction virtqueue_size >= seg_max.
>> This restriction is hardcoded and the kernel crashes in case of violation.
>> The restriction is relaxed in the recent kernels. Windows kernels don't
>> have such a restriction.
>>
>> To increse seg_max and not to break the restriction, one can increase the
>> virtqueue size to 256 and seg_max to 254. To do that, seabios support of
>> 256 virtqueue size is needed.
> --verbose please.  Do you talk about guest kernels?  Or host kernels?
We are talking about guest kernels. Host ones are not a problem :) We could
easily query them. The problem in the guest is fixed with

        commit 44ed8089e991a60d614abe0ee4b9057a28b364e4
        Author: Richard W.M. Jones <address@hidden>
        Date:   Thu Aug 10 17:56:51 2017 +0100
   
            scsi: virtio: Reduce BUG if total_sg > virtqueue size to WARN.
   
            If using indirect descriptors, you can make the total_sg as
            large as you want.  If not, BUG is too serious because the
            function later returns -ENOSPC

Without that patch the guest crashes if we get the amount of
segments in the request more than the virt-queue length even
if indirect requests are used.

> Why does changing seabios fix the kernel bug?
Changing SeaBIOS does not fix guest kernel bug :) but allows
to dodge it.

Linux guests submit IO requests no longer than PAGE_SIZE * max_seg
field reported by SCSI controller. Thus typical sequential read with
1 MB size results in the following pattern of the IO from the guest:
      8,16   1    15754     2.766095122  2071  D   R 2095104 + 1008 [dd]
      8,16   1    15755     2.766108785  2071  D   R 2096112 + 1008 [dd]
      8,16   1    15756     2.766113486  2071  D   R 2097120 + 32 [dd]
      8,16   1    15757     2.767668961     0  C   R 2095104 + 1008 [0]
      8,16   1    15758     2.768534315     0  C   R 2096112 + 1008 [0]
      8,16   1    15759     2.768539782     0  C   R 2097120 + 32 [0]
The IO was generated by
      dd if=/dev/sda of=/dev/null bs=1024 iflag=direct

So, in order to fix this and observe normal
      8,16   1     9921     2.662721340  2063  D   R 2095104 + 1024 [dd]
      8,16   1     9922     2.662737585  2063  D   R 2096128 + 1024 [dd]
      8,16   1     9923     2.665188167     0  C   R 2095104 + 1024 [0]
      8,16   1     9924     2.665198777     0  C   R 2096128 + 1024 [0]
we need to have max_segments to be not less than 128.

Though his could be achieved only when virt_queue size is > 128 due
to the guest bug listed above. We were surrendered to invent any
usable "guest detect" code. Support of VIRTIO 1.0 is not the indicator.
The bug is triggered in a lot of popular guest distros like RHEL 7.

We have solved the puzzle setting virt-queue size to 256 locally in
our machine types. And here we come to SeaBIOS assert! It is triggered
once the queue size is set above 128.

>   Is this just a
> performance issue (first paragraph sounds like it is, and that would not
> be that much of a problem IMO given that virtio-blk isn't tweaked for
> performance anyway)?  Or something more serious?

The patch sent by Denis does not change anything inside SeaBIOS but
it allows to accept lengthier queue, specified inside the QEMU.
That is all.

Hope this is verbose enough :)

Den
_______________________________________________
SeaBIOS mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to