Hi,

we are facing sporadic latency issue in our guests due to the
synchronous nature of bdrv_check_byte_request on raw images:

#0  0x00007ff8962e2070 in lseek64 () from /lib64/libpthread.so.0
#1  0x00000000004d7f97 in raw_getlength (bs=0xcab010) at block/raw-posix.c:704
#2  0x00000000004b9d22 in bdrv_getlength (bs=0xcab010) at block.c:816
#3  0x00000000004b95f0 in bdrv_check_byte_request (bs=0xcab010, offset=233472, 
size=4096) at block.c:618
#4  0x00000000004b9664 in bdrv_check_request (bs=0xcab010, sector_num=456, 
nb_sectors=8) at block.c:632
#5  0x00000000004bb56a in bdrv_aio_readv (bs=0xcab010, sector_num=456, 
qiov=0xcf5d48, nb_sectors=8, cb=0x4eaddc <scsi_read_complete>, opaque=0xcf5cc0) 
at block.c:1554

The host tends to hang in lseek on some kernel mutex. Instead of
addressing this with the big hammer (more preemptible kernel), I
wondered why we need that many raw_getlength requests with all their
file open/ioctl/close/whatever syscalls - and that in the asynchronous
I/O path.

Looking at bdrv_check_byte_request, I find

    if (bs->growable)
        return 0;

i.e. out-of-bound requests on growable devices are always handled by the
respective IO handler. Just fixed-size devices (like the classic raw
disk image file...) go through

    len = bdrv_getlength(bs);

    if (offset < 0)
        return -EIO;

    if ((offset > len) || (len - offset < size))
        return -EIO;

for _each_ request!? Why? Looks like there is room for improvement, not
only /wrt latency, isn't it?

Jan


PS: This should also explain at least some of the latencies I once measured
with prio-boosted VCPU threads on PREEMPT-RT kernels.

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux


Reply via email to