Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

Christoffer Dall Thu, 01 Feb 2018 02:49:09 -0800

On Thu, Feb 01, 2018 at 11:42:22AM +0100, Andrew Jones wrote:
> On Thu, Feb 01, 2018 at 10:59:54AM +0100, Christoffer Dall wrote:
> > On Thu, Feb 1, 2018 at 10:33 AM, Ard Biesheuvel
> > <ard.biesheu...@linaro.org> wrote:
> > > On 1 February 2018 at 09:17, Christoffer Dall
> > > <christoffer.d...@linaro.org> wrote:
> > >> On Wed, Jan 31, 2018 at 9:15 PM, Ard Biesheuvel
> > >> <ard.biesheu...@linaro.org> wrote:
> > >>> On 31 January 2018 at 19:12, Christoffer Dall
> > >>> <christoffer.d...@linaro.org> wrote:
> > >>>> On Wed, Jan 31, 2018 at 7:00 PM, Ard Biesheuvel
> > >>>> <ard.biesheu...@linaro.org> wrote:
> > >>>>> On 31 January 2018 at 17:39, Christoffer Dall
> > >>>>> <christoffer.d...@linaro.org> wrote:
> > >>>>>> On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel
> > >>>>>> <ard.biesheu...@linaro.org> wrote:
> > >>>>>>> On 31 January 2018 at 16:53, Christoffer Dall
> > >>>>>>> <christoffer.d...@linaro.org> wrote:
> > >>>>>>>> On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel
> > >>>>>>>> <ard.biesheu...@linaro.org> wrote:
> > >>>>>>>>> On 31 January 2018 at 09:53, Christoffer Dall
> > >>>>>>>>> <christoffer.d...@linaro.org> wrote:
> > >>>>>>>>>> On Mon, Jan 29, 2018 at 10:32:12AM +0000, Marc Zyngier wrote:
> > >>>>>>>>>>> On 29/01/18 10:04, Peter Maydell wrote:
> > >>>>>>>>>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert 
> > >>>>>>>>>>> > <dgilb...@redhat.com> wrote:
> > >>>>>>>>>>> >> * Peter Maydell (peter.mayd...@linaro.org) wrote:
> > >>>>>>>>>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert 
> > >>>>>>>>>>> >>> <dgilb...@redhat.com> wrote:
> > >>>>>>>>>>> >>>> * Peter Maydell (peter.mayd...@linaro.org) wrote:
> > >>>>>>>>>>> >>>>> I think the correct fix here is that your test code 
> > >>>>>>>>>>> >>>>> should turn
> > >>>>>>>>>>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable 
> > >>>>>>>>>>> >>>>> doesn't work
> > >>>>>>>>>>> >>>>> for Arm KVM guests (for the same reason that VGA device 
> > >>>>>>>>>>> >>>>> video memory
> > >>>>>>>>>>> >>>>> doesn't work). If it's RAM your guest has to arrange to 
> > >>>>>>>>>>> >>>>> map it as
> > >>>>>>>>>>> >>>>> Normal Cacheable, and then everything should work fine.
> > >>>>>>>>>>> >>>>
> > >>>>>>>>>>> >>>> Does this cause problems with migrating at just the wrong 
> > >>>>>>>>>>> >>>> point during
> > >>>>>>>>>>> >>>> a VM boot?
> > >>>>>>>>>>> >>>
> > >>>>>>>>>>> >>> It wouldn't surprise me if it did, but I don't think I've 
> > >>>>>>>>>>> >>> ever
> > >>>>>>>>>>> >>> tried to provoke that problem...
> > >>>>>>>>>>> >>
> > >>>>>>>>>>> >> If you think it'll get the RAM contents wrong, it might be 
> > >>>>>>>>>>> >> best to fail
> > >>>>>>>>>>> >> the migration if you can detect the cache is disabled in the 
> > >>>>>>>>>>> >> guest.
> > >>>>>>>>>>> >
> > >>>>>>>>>>> > I guess QEMU could look at the value of the "MMU 
> > >>>>>>>>>>> > disabled/enabled" bit
> > >>>>>>>>>>> > in the guest's system registers, and refuse migration if it's 
> > >>>>>>>>>>> > off...
> > >>>>>>>>>>> >
> > >>>>>>>>>>> > (cc'd Marc, Christoffer to check that I don't have the wrong 
> > >>>>>>>>>>> > end
> > >>>>>>>>>>> > of the stick about how thin the ice is in the period before 
> > >>>>>>>>>>> > the
> > >>>>>>>>>>> > guest turns on its MMU...)
> > >>>>>>>>>>>
> > >>>>>>>>>>> Once MMU and caches are on, we should be in a reasonable place 
> > >>>>>>>>>>> for QEMU
> > >>>>>>>>>>> to have a consistent view of the memory. The trick is to 
> > >>>>>>>>>>> prevent the
> > >>>>>>>>>>> vcpus from changing that. A guest could perfectly turn off its 
> > >>>>>>>>>>> MMU at
> > >>>>>>>>>>> any given time if it needs to (and it is actually required on 
> > >>>>>>>>>>> some HW if
> > >>>>>>>>>>> you want to mitigate headlining CVEs), and KVM won't know about 
> > >>>>>>>>>>> that.
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> (Clarification: KVM can detect this is it bother to check the 
> > >>>>>>>>>> VCPU's
> > >>>>>>>>>> system registers, but we don't trap to KVM when the VCPU turns 
> > >>>>>>>>>> off its
> > >>>>>>>>>> caches, right?)
> > >>>>>>>>>>
> > >>>>>>>>>>> You may have to pause the vcpus before starting the migration, 
> > >>>>>>>>>>> or
> > >>>>>>>>>>> introduce a new KVM feature that would automatically pause a 
> > >>>>>>>>>>> vcpu that
> > >>>>>>>>>>> is trying to disable its MMU while the migration is on. This 
> > >>>>>>>>>>> would
> > >>>>>>>>>>> involve trapping all the virtual memory related system 
> > >>>>>>>>>>> registers, with
> > >>>>>>>>>>> an obvious cost. But that cost would be limited to the time it 
> > >>>>>>>>>>> takes to
> > >>>>>>>>>>> migrate the memory, so maybe that's acceptable.
> > >>>>>>>>>>>
> > >>>>>>>>>> Is that even sufficient?
> > >>>>>>>>>>
> > >>>>>>>>>> What if the following happened. (1) guest turns off MMU, (2) 
> > >>>>>>>>>> guest
> > >>>>>>>>>> writes some data directly to ram (3) qemu stops the vcpu (4) 
> > >>>>>>>>>> qemu reads
> > >>>>>>>>>> guest ram.  QEMU's view of guest ram is now incorrect (stale,
> > >>>>>>>>>> incoherent, ...).
> > >>>>>>>>>>
> > >>>>>>>>>> I'm also not really sure if pausing one VCPU because it turned 
> > >>>>>>>>>> off its
> > >>>>>>>>>> MMU will go very well when trying to migrate a large VM 
> > >>>>>>>>>> (wouldn't this
> > >>>>>>>>>> ask for all the other VCPUs beginning to complain that the 
> > >>>>>>>>>> stopped VCPU
> > >>>>>>>>>> appears to be dead?).  As a short-term 'fix' it's probably 
> > >>>>>>>>>> better to
> > >>>>>>>>>> refuse migration if you detect that a VCPU had begun turning off 
> > >>>>>>>>>> its
> > >>>>>>>>>> MMU.
> > >>>>>>>>>>
> > >>>>>>>>>> On the larger scale of thins; this appears to me to be another 
> > >>>>>>>>>> case of
> > >>>>>>>>>> us really needing some way to coherently access memory between 
> > >>>>>>>>>> QEMU and
> > >>>>>>>>>> the VM, but in the case of the VCPU turning off the MMU prior to
> > >>>>>>>>>> migration, we don't even know where it may have written data, 
> > >>>>>>>>>> and I'm
> > >>>>>>>>>> therefore not really sure what the 'proper' solution would be.
> > >>>>>>>>>>
> > >>>>>>>>>> (cc'ing Ard who has has thought about this problem before in the 
> > >>>>>>>>>> context
> > >>>>>>>>>> of UEFI and VGA.)
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Actually, the VGA case is much simpler because the host is not
> > >>>>>>>>> expected to write to the framebuffer, only read from it, and the 
> > >>>>>>>>> guest
> > >>>>>>>>> is not expected to create a cacheable mapping for it, so any
> > >>>>>>>>> incoherency can be trivially solved by cache invalidation on the 
> > >>>>>>>>> host
> > >>>>>>>>> side. (Note that this has nothing to do with DMA coherency, but 
> > >>>>>>>>> only
> > >>>>>>>>> with PCI MMIO BARs that are backed by DRAM in the host)
> > >>>>>>>>
> > >>>>>>>> In case of the running guest, the host will also only read from the
> > >>>>>>>> cached mapping.  Of course, at restore, the host will also write
> > >>>>>>>> through a cached mapping, but shouldn't the latter case be 
> > >>>>>>>> solvable by
> > >>>>>>>> having KVM clean the cache lines when faulting in any page?
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> We are still talking about the contents of the framebuffer, right? 
> > >>>>>>> In
> > >>>>>>> that case, yes, afaict
> > >>>>>>>
> > >>>>>>
> > >>>>>> I was talking about normal RAM actually... not sure if that changes 
> > >>>>>> anything?
> > >>>>>>
> > >>>>>
> > >>>>> The main difference is that with a framebuffer BAR, it is pointless
> > >>>>> for the guest to map it cacheable, given that the purpose of a
> > >>>>> framebuffer is its side effects, which are not guaranteed to occur
> > >>>>> timely if the mapping is cacheable.
> > >>>>>
> > >>>>> If we are talking about normal RAM, then why are we discussing it here
> > >>>>> and not down there?
> > >>>>>
> > >>>>
> > >>>> Because I was trying to figure out how the challenge of accessing the
> > >>>> VGA framebuffer differs from the challenge of accessing guest RAM
> > >>>> which may have been written by the guest with the MMU off.
> > >>>>
> > >>>> First approximation, they are extremely similar because the guest is
> > >>>> writing uncached to memory, which the host now has to access via a
> > >>>> cached mapping.
> > >>>>
> > >>>> But I'm guessing that a "clean+invalidate before read on the host"
> > >>>> solution will result in terrible performance for a framebuffer and
> > >>>> therefore isn't a good solution for that problem...
> > >>>>
> > >>>
> > >>> That highly depends on where 'not working' resides on the performance
> > >>> scale. Currently, VGA on KVM simply does not work at all, and so
> > >>> working but slow would be a huge improvement over the current
> > >>> situation.
> > >>>
> > >>> Also, the performance hit is caused by the fact that the data needs to
> > >>> make a round trip to memory, and the invalidation (without cleaning)
> > >>> performed by the host shouldn't make that much worse than it
> > >>> fundamentally is to begin with.
> > >>>
> > >>
> > >> True, but Marc pointed out (on IRC) that the cache maintenance
> > >> instructions are broadcast across all CPUs and may therefore adversely
> > >> affect the performance of your entire system by running an emulated
> > >> VGA framebuffer on a subset of the host CPUs.  However, he also
> > >> pointed out that the necessary cache maintenance can be done in EL0
> 
> I had some memory of this not being the case, but I just checked and
> indeed 'dc civac' will work from el0 with sctlr_el1.uci=1. Note, I
> see that the gcc builtin __builtin___clear_cache() does not use that
> instruction (it uses 'dc cvau'), so we'd need to implement a wrapper
> ourselves (maybe that's why I thought it wasn't available...)
> 
> > >> and we wouldn't even need a new ioctl or anything else from KVM or the
> > >> kernel to do this.  I think we should measure the effect of this
> > >> before dismissing it though.
> > >>
> > >
> > > If you could use dirty page tracking to only perform the cache
> > > invalidation when the framebuffer memory has been updated, you can at
> > > least limit the impact to cases where the framebuffer is actually
> > > used, rather than sitting idle with a nice wallpaper image.
> 
> Yes, this is the exact approach I took back when I experimented with
> this. I must have screwed up my PoC in some way (like using the gcc
> builtin), because it wasn't working for me...
> 
Does that mean you have some code you feel like reviving and use to send
out an RFC based on? ;)


-Christoffer

Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

Reply via email to