On 07/06/18 10:46, Richard Biener wrote: > On Thu, Jun 7, 2018 at 11:45 AM Ard Biesheuvel > <ard.biesheu...@linaro.org> wrote: >> >> On 7 June 2018 at 11:35, Richard Biener <richard.guent...@gmail.com> wrote: >>> On Thu, Jun 7, 2018 at 10:45 AM Ard Biesheuvel >>> <ard.biesheu...@linaro.org> wrote: >>>> >>>> On 7 June 2018 at 10:21, Christoffer Dall <christoffer.d...@arm.com> wrote: >>>>> On Thu, Jun 07, 2018 at 09:56:18AM +0200, Ard Biesheuvel wrote: >>>>>> On 7 June 2018 at 09:48, Christoffer Dall <christoffer.d...@arm.com> >>>>>> wrote: >>>>>>> [+Will] >>>>>>> >>>>>>> On Tue, Jun 05, 2018 at 03:07:14PM +0200, Laszlo Ersek wrote: >>>>>>>> On 06/05/18 13:30, Richard Biener wrote: >>>>>>>>> On Mon, Jun 4, 2018 at 8:11 PM Laszlo Ersek <ler...@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>> Hi! >>>>>>>>>> >>>>>>>>>> Apologies if this isn't the right place for asking. For the problem >>>>>>>>>> statement, I'll simply steal Ard's writeup [1]: >>>>>>>>>> >>>>>>>>>>> KVM on ARM refuses to decode load/store instructions used to perform >>>>>>>>>>> I/O to emulated devices, and instead relies on the exception >>>>>>>>>>> syndrome >>>>>>>>>>> information to describe the operand register, access size, etc. This >>>>>>>>>>> is only possible for instructions that have a single input/output >>>>>>>>>>> register (as opposed to ones that increment the offset register, or >>>>>>>>>>> load/store pair instructions, etc). Otherwise, QEMU crashes with the >>>>>>>>>>> following error >>>>>>>>>>> >>>>>>>>>>> error: kvm run failed Function not implemented >>>>>>>>>>> [...] >>>>>>>>>>> QEMU: Terminated >>>>>>>>>>> >>>>>>>>>>> and KVM produces a warning such as the following in the kernel log >>>>>>>>>>> >>>>>>>>>>> kvm [17646]: load/store instruction decoding not implemented >>>>>>>>> >>>>>>>>> This looks like a kvm/qemu issue to me. Whatever that exception >>>>>>>>> syndrome >>>>>>>>> thing is, it surely has a pointer to the offending instruction it >>>>>>>>> could decode? >>>>>>>> >>>>>>>> I believe so -- the instruction decoding is theoretically possible (to >>>>>>>> my understanding); KVM currently doesn't do it because it's super >>>>>>>> complex (again, to my understanding). >>>>>>>> >>>>>>> The instruction decoding was considered and discarded because the >>>>>>> understanding at the time was that any instruction that didn't generate >>>>>>> valid decoding hints in the syndrome register (such as multiple output >>>>>>> register operations) would not be safe to use on device memory, and >>>>>>> therefore shouldn't be used neither on real hardware nor in VM guests. >>>>>>> >>>>>> >>>>>> How is it unsafe for a load or store with writeback to be used on >>>>>> device memory? That does not make sense to me. >>>>> >>>>> I don't understand that either, which is why I cc'ed Will who argued for >>>>> this last IIRC. >>>>> >>>>>> In any case, I suppose that *decoding* the instruction is not the >>>>>> problem, it is loading the opcode in the first place, given that it is >>>>>> not recorded in any system registers when the exception is taken. ELR >>>>>> will contain a virtual guest address [which could be in userland], and >>>>>> the host should translate that (which involves guest page tables that >>>>>> may be modified by other VCPUs concurrently) and map it to be able to >>>>>> get to the actual bits. >>>>>> >>>>>>> If this still holds, it's not a question of an architecture bug or a >>>>>>> missing feature in KVM, but a question of a guest doing something wrong. >>>>>>> >>>>>> >>>>>> Do you have a mutt macro for that response? :-) >>>>>> >>>>> >>>>> No I don't. And I wouldn't mind adding instruction decoding to KVM. I >>>>> already wrote it once, but the maintainer didn't want to merge the code >>>>> unless I unified all instruction decoding in the arm kernel, which I was >>>>> unable to do. >>>>> >>>> >>>> Yikes. >>>> >>>> So how does your code actually load the opcode? >>>> >>>>> Sarkasm and instruction decoding stories aside, we've had a number of >>>>> reports of this kind of error in the past where the problem was simply >>>>> people using the wrong the DT with their guest kernel. I don't think >>>>> we've seen an actual case of a real guest that was using the 'wrong' >>>>> instruction to actually do I/O. >>>>> >>>> >>>> Currently, LTO builds of EDK2 for 32-bit mach-virt are broken because >>>> of this. The MMIO accessors are written in C using volatile pointers, >>>> allowing LTO to merge adjacent accesses or loops performing MMIO, >>>> resulting in, e.g., instructions with writeback to be emitted. >>> >>> I'd like to see a testcase where GCC does merging on volatile accesses. >>> That would be a GCC bug. So I suspect the C code isn't quite using volatile >>> accesses... >>> >> >> The accesses themselves are not being merged. But code such as >> >> MmioRead32: >> ldr w0, [x0] >> ret >> >> SomeOtherFunction: >> ... >> 0:mov x20, x0 >> bl MmioRead32 >> ... >> add x20, x20, #4 >> ... >> b.xx 0b >> >> >> (where the two are based on C code but from different compilation >> units) may under LTO be turned into code involving a post increment on >> the memory address of the ldr, resulting in an instruction that has >> two outputs, triggering the KVM error. > > Ah, I see! Of course this doesn't have anything to do with LTO per-se, > you are just lucky it doesn't happen without ;) > > Richard. >
I didn't think we supported pre/post increments on volatiles. R.