Re: code-gen options for disabling multi-operand AArch64 and ARM instructions

Richard Biener Thu, 07 Jun 2018 02:47:51 -0700

On Thu, Jun 7, 2018 at 11:45 AM Ard Biesheuvel
<ard.biesheu...@linaro.org> wrote:
>
> On 7 June 2018 at 11:35, Richard Biener <richard.guent...@gmail.com> wrote:
> > On Thu, Jun 7, 2018 at 10:45 AM Ard Biesheuvel
> > <ard.biesheu...@linaro.org> wrote:
> >>
> >> On 7 June 2018 at 10:21, Christoffer Dall <christoffer.d...@arm.com> wrote:
> >> > On Thu, Jun 07, 2018 at 09:56:18AM +0200, Ard Biesheuvel wrote:
> >> >> On 7 June 2018 at 09:48, Christoffer Dall <christoffer.d...@arm.com> 
> >> >> wrote:
> >> >> > [+Will]
> >> >> >
> >> >> > On Tue, Jun 05, 2018 at 03:07:14PM +0200, Laszlo Ersek wrote:
> >> >> >> On 06/05/18 13:30, Richard Biener wrote:
> >> >> >> > On Mon, Jun 4, 2018 at 8:11 PM Laszlo Ersek <ler...@redhat.com> 
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> Hi!
> >> >> >> >>
> >> >> >> >> Apologies if this isn't the right place for asking. For the 
> >> >> >> >> problem
> >> >> >> >> statement, I'll simply steal Ard's writeup [1]:
> >> >> >> >>
> >> >> >> >>> KVM on ARM refuses to decode load/store instructions used to 
> >> >> >> >>> perform
> >> >> >> >>> I/O to emulated devices, and instead relies on the exception 
> >> >> >> >>> syndrome
> >> >> >> >>> information to describe the operand register, access size, etc. 
> >> >> >> >>> This
> >> >> >> >>> is only possible for instructions that have a single input/output
> >> >> >> >>> register (as opposed to ones that increment the offset register, 
> >> >> >> >>> or
> >> >> >> >>> load/store pair instructions, etc). Otherwise, QEMU crashes with 
> >> >> >> >>> the
> >> >> >> >>> following error
> >> >> >> >>>
> >> >> >> >>>   error: kvm run failed Function not implemented
> >> >> >> >>>   [...]
> >> >> >> >>>   QEMU: Terminated
> >> >> >> >>>
> >> >> >> >>> and KVM produces a warning such as the following in the kernel 
> >> >> >> >>> log
> >> >> >> >>>
> >> >> >> >>>   kvm [17646]: load/store instruction decoding not implemented
> >> >> >> >
> >> >> >> > This looks like a kvm/qemu issue to me.  Whatever that exception 
> >> >> >> > syndrome
> >> >> >> > thing is, it surely has a pointer to the offending instruction it 
> >> >> >> > could decode?
> >> >> >>
> >> >> >> I believe so -- the instruction decoding is theoretically possible 
> >> >> >> (to
> >> >> >> my understanding); KVM currently doesn't do it because it's super
> >> >> >> complex (again, to my understanding).
> >> >> >>
> >> >> > The instruction decoding was considered and discarded because the
> >> >> > understanding at the time was that any instruction that didn't 
> >> >> > generate
> >> >> > valid decoding hints in the syndrome register (such as multiple output
> >> >> > register operations) would not be safe to use on device memory, and
> >> >> > therefore shouldn't be used neither on real hardware nor in VM guests.
> >> >> >
> >> >>
> >> >> How is it unsafe for a load or store with writeback to be used on
> >> >> device memory? That does not make sense to me.
> >> >
> >> > I don't understand that either, which is why I cc'ed Will who argued for
> >> > this last IIRC.
> >> >
> >> >> In any case, I suppose that *decoding* the instruction is not the
> >> >> problem, it is loading the opcode in the first place, given that it is
> >> >> not recorded in any system registers when the exception is taken. ELR
> >> >> will contain a virtual guest address [which could be in userland], and
> >> >> the host should translate that (which involves guest page tables that
> >> >> may be modified by other VCPUs concurrently) and map it to be able to
> >> >> get to the actual bits.
> >> >>
> >> >> > If this still holds, it's not a question of an architecture bug or a
> >> >> > missing feature in KVM, but a question of a guest doing something 
> >> >> > wrong.
> >> >> >
> >> >>
> >> >> Do you have a mutt macro for that response? :-)
> >> >>
> >> >
> >> > No I don't.  And I wouldn't mind adding instruction decoding to KVM.  I
> >> > already wrote it once, but the maintainer didn't want to merge the code
> >> > unless I unified all instruction decoding in the arm kernel, which I was
> >> > unable to do.
> >> >
> >>
> >> Yikes.
> >>
> >> So how does your code actually load the opcode?
> >>
> >> > Sarkasm and instruction decoding stories aside, we've had a number of
> >> > reports of this kind of error in the past where the problem was simply
> >> > people using the wrong the DT with their guest kernel.  I don't think
> >> > we've seen an actual case of a real guest that was using the 'wrong'
> >> > instruction to actually do I/O.
> >> >
> >>
> >> Currently, LTO builds of EDK2 for 32-bit mach-virt are broken because
> >> of this. The MMIO accessors are written in C using volatile pointers,
> >> allowing LTO to merge adjacent accesses or loops performing MMIO,
> >> resulting in, e.g., instructions with writeback to be emitted.
> >
> > I'd like to see a testcase where GCC does merging on volatile accesses.
> > That would be a GCC bug.  So I suspect the C code isn't quite using volatile
> > accesses...
> >
>
> The accesses themselves are not being merged. But code such as
>
> MmioRead32:
>   ldr   w0, [x0]
>   ret
>
> SomeOtherFunction:
>   ...
> 0:mov   x20, x0
>   bl    MmioRead32
>   ...
>   add   x20, x20, #4
>   ...
>   b.xx  0b
>
>
> (where the two are based on C code but from different compilation
> units) may under LTO be turned into code involving a post increment on
> the memory address of the ldr, resulting in an instruction that has
> two outputs, triggering the KVM error.


Ah, I see!  Of course this doesn't have anything to do with LTO per-se,
you are just lucky it doesn't happen without ;)

Richard.

Re: code-gen options for disabling multi-operand AArch64 and ARM instructions

Reply via email to