On 01/28/2017 07:23 PM, Max Reitz wrote: > On 25.01.2017 20:44, Denis V. Lunev wrote: >> On 01/25/2017 08:59 PM, Max Reitz wrote: >>> [CC-ing John] >>> >>> On 25.01.2017 17:42, Denis V. Lunev wrote: >>>> Technically there is a problem when the guest DVD is created by libvirt >>>> with AIO mode 'native' on Linux. Current QEMU is unable to start the >>>> domain configured as follows: >>>> <disk type='file' device='cdrom'> >>>> <driver name='qemu' type='raw' cache='none' io='native'/> >>>> <target dev='sdb' bus='scsi'/> >>>> <readonly/> >>>> </disk> >>>> The problem comes from the combination of 'cache' and 'io' options. >>>> >>>> 'io' option is common option and it is removed from block driver >>>> specific options. 'cache' originally is not. The patch makes 'cache' >>>> option common. This works fine as long as cdrom media insertion >>>> later on. >>>> >>>> Signed-off-by: Denis V. Lunev <d...@openvz.org> >>>> CC: Kevin Wolf <kw...@redhat.com> >>>> CC: Max Reitz <mre...@redhat.com> >>>> --- >>>> blockdev.c | 18 +++++++++++++++--- >>>> 1 file changed, 15 insertions(+), 3 deletions(-) >>> There was a Red Hat BZ for this: >>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1342999 >>> >>> There is now a corresponding BZ for libvirt: >>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1377321 >>> >>> The gist is that it was determined to be a problem with libvirt. >>> >>> RHEL has a downstream commit to work around this issue by ignoring the >>> cache mode set for empty CD-ROMs -- but that is only because that was >>> simpler than fixing libvirt. >>> >>> (Note that it's ignoring the cache mode instead of actually evaluating it.) >>> >> This is what I have exactly started from: >> http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHEV/SRPMS/qemu-kvm-rhev-2.6.0-27.el7.src.rpm >> >> This package starts VM well for the above mentioned configuration: >> >> <disk type='file' device='cdrom'> >> <driver name='qemu' type='raw' cache='none' io='native'/> >> <target dev='sdb' bus='scsi'/> >> <readonly/> >> </disk> >> >> but the problem comes later at 'change' moment. It reports >> >> 'Details: internal error: unable to execute QEMU command 'change': >> aio=native was specified, but it requires cache.direct=on, which >> was not specified.)' >> >> >> Thus partial solution implemented by the RedHat is really >> partial and does not work at the second stage. > Yes, because it is not supposed to work for that. The only thing the > downstream patch does is ignore the cache mode so you can still start > the VM even if libvirt decides to specify a cache parameter for an empty > drive (which it should not do). > >> I have started from >> >> diff --git a/blockdev.c b/blockdev.c >> index d280dc4..e2c9053 100644 >> --- a/blockdev.c >> +++ b/blockdev.c >> @@ -608,6 +608,13 @@ static BlockBackend *blockdev_init(const char >> *file, QDict *bs_opts, >> goto early_err; >> } >> >> + if (qdict_haskey(bs_opts, BDRV_OPT_CACHE_DIRECT)) { >> + bdrv_flags |= BDRV_O_NOCACHE; >> + } >> + if (qdict_haskey(bs_opts, BDRV_OPT_CACHE_NO_FLUSH)) { >> + bdrv_flags |= BDRV_O_NO_FLUSH; >> + } >> + >> blk_rs = blk_get_root_state(blk); >> blk_rs->open_flags = bdrv_flags; >> blk_rs->read_only = !(bdrv_flags & BDRV_O_RDWR); >> >> The problem for us is that we have such previously valid configurations >> in the field. >> >>>> May be this has already discussed, but still. AIO=native for CDROM without >>>> media seems important case. >>> First, I personally don't find aio=native very important for CD-ROM >>> drives. Yes, we shouldn't make IDE CD-ROM slower than necessary, but I >>> don't think aio=native will make the difference between "Slow like >>> CD-ROMs are in reality" and "As fast as virtio". >> the problem for me is that each clone() call is costly and counts. That >> is why we are trying to avoid it whenever possible. That is why 'native' >> mode is MUCH better. Also it would be very nice not to use cached >> IO, which is very good for memory overcommit situations. >> >> Here I am fighting not with the performance, which does not make >> any real sense, but with a memory footprint. > Fair enough. Still, specifying a cache mode for an empty drive simply > doesn't make sense. > > Have you considered inserting a null-co:// file at startup as a > workaround? If you use the old 'change' (or blockdev-change-medium) > command, it should retain AIO and cache mode. ok. Thanks for the suggestion.
> >>> Second, all this patch does is revert some changes done by commit >>> 91a097e7478940483e76d52217f05bc05b98d5a5, which was very deliberate. >>> >>> Third, you may then be asking for the recommended way to put an >>> aio=native medium into a CD-ROM drive. Good thing you ask, because we >>> have a way that we want to recommend but can't because it's still >>> considered experimental: >>> >>> The BDS is added using blockdev-add, with all of the appropriate caching >>> and aio options. Then it's inserted into the drive using the >>> x-blockdev-insert-medium command, and the drive is closed using >>> blockdev-close-tray. >> the problem, again, is that with x-blockdev-insert-medium I can not >> deal with block driver options, or may be I am missing something. > x-blockdev-insert-medium just inserts a BDS as a medium. The BDS is > added via blockdev-add which allows you to set all of these options. > >>> There are a couple of issues with this: First, blockdev-add and >>> x-blockdev-insert-medium are still experimental. The good news are that >>> I think the reason why the latter is experimental has actually >>> disappeared and we can just drop the x- by now. The >>> not-so-good-but-not-so-bad-either news are that we plan to get >>> blockdev-add declared non-experimental for 2.9. Let's see how it goes. >> Does this mean that x-blockdev-del would also lose x- prefix? > As far as I'm aware, yes. that sounds good! >>> The other issue is that of course it's more cumbersome to use than a >>> simple 'change' via HMP. I argue that if someone communicates with a VM >>> by hand, they either have to deal with this added complexity or they >>> cannot use aio=native for CD-ROM drives -- which, as I said, I don't >>> think is too bad. >> this is how 'virsh change-media' works at the moment, at least with >> latest downstreams. >> That is why this is not that clear. > Well, that can be easily changed, the question is how fast that will be. > You would still need to make libvirt aware of what cache mode to use for > the new medium, and that may require deeper changes. > > (I supposed libvirt only stores a cache mode per drive while it should > actually store a cache mode per image.) fair enough. libvirt could do that. Thank you for these clarifications. Den