On Thu, Jun 7, 2018 at 11:45 AM Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > > On 7 June 2018 at 11:35, Richard Biener <richard.guent...@gmail.com> wrote: > > On Thu, Jun 7, 2018 at 10:45 AM Ard Biesheuvel > > <ard.biesheu...@linaro.org> wrote: > >> > >> On 7 June 2018 at 10:21, Christoffer Dall <christoffer.d...@arm.com> wrote: > >> > On Thu, Jun 07, 2018 at 09:56:18AM +0200, Ard Biesheuvel wrote: > >> >> On 7 June 2018 at 09:48, Christoffer Dall <christoffer.d...@arm.com> > >> >> wrote: > >> >> > [+Will] > >> >> > > >> >> > On Tue, Jun 05, 2018 at 03:07:14PM +0200, Laszlo Ersek wrote: > >> >> >> On 06/05/18 13:30, Richard Biener wrote: > >> >> >> > On Mon, Jun 4, 2018 at 8:11 PM Laszlo Ersek <ler...@redhat.com> > >> >> >> > wrote: > >> >> >> >> > >> >> >> >> Hi! > >> >> >> >> > >> >> >> >> Apologies if this isn't the right place for asking. For the > >> >> >> >> problem > >> >> >> >> statement, I'll simply steal Ard's writeup [1]: > >> >> >> >> > >> >> >> >>> KVM on ARM refuses to decode load/store instructions used to > >> >> >> >>> perform > >> >> >> >>> I/O to emulated devices, and instead relies on the exception > >> >> >> >>> syndrome > >> >> >> >>> information to describe the operand register, access size, etc. > >> >> >> >>> This > >> >> >> >>> is only possible for instructions that have a single input/output > >> >> >> >>> register (as opposed to ones that increment the offset register, > >> >> >> >>> or > >> >> >> >>> load/store pair instructions, etc). Otherwise, QEMU crashes with > >> >> >> >>> the > >> >> >> >>> following error > >> >> >> >>> > >> >> >> >>> error: kvm run failed Function not implemented > >> >> >> >>> [...] > >> >> >> >>> QEMU: Terminated > >> >> >> >>> > >> >> >> >>> and KVM produces a warning such as the following in the kernel > >> >> >> >>> log > >> >> >> >>> > >> >> >> >>> kvm [17646]: load/store instruction decoding not implemented > >> >> >> > > >> >> >> > This looks like a kvm/qemu issue to me. Whatever that exception > >> >> >> > syndrome > >> >> >> > thing is, it surely has a pointer to the offending instruction it > >> >> >> > could decode? > >> >> >> > >> >> >> I believe so -- the instruction decoding is theoretically possible > >> >> >> (to > >> >> >> my understanding); KVM currently doesn't do it because it's super > >> >> >> complex (again, to my understanding). > >> >> >> > >> >> > The instruction decoding was considered and discarded because the > >> >> > understanding at the time was that any instruction that didn't > >> >> > generate > >> >> > valid decoding hints in the syndrome register (such as multiple output > >> >> > register operations) would not be safe to use on device memory, and > >> >> > therefore shouldn't be used neither on real hardware nor in VM guests. > >> >> > > >> >> > >> >> How is it unsafe for a load or store with writeback to be used on > >> >> device memory? That does not make sense to me. > >> > > >> > I don't understand that either, which is why I cc'ed Will who argued for > >> > this last IIRC. > >> > > >> >> In any case, I suppose that *decoding* the instruction is not the > >> >> problem, it is loading the opcode in the first place, given that it is > >> >> not recorded in any system registers when the exception is taken. ELR > >> >> will contain a virtual guest address [which could be in userland], and > >> >> the host should translate that (which involves guest page tables that > >> >> may be modified by other VCPUs concurrently) and map it to be able to > >> >> get to the actual bits. > >> >> > >> >> > If this still holds, it's not a question of an architecture bug or a > >> >> > missing feature in KVM, but a question of a guest doing something > >> >> > wrong. > >> >> > > >> >> > >> >> Do you have a mutt macro for that response? :-) > >> >> > >> > > >> > No I don't. And I wouldn't mind adding instruction decoding to KVM. I > >> > already wrote it once, but the maintainer didn't want to merge the code > >> > unless I unified all instruction decoding in the arm kernel, which I was > >> > unable to do. > >> > > >> > >> Yikes. > >> > >> So how does your code actually load the opcode? > >> > >> > Sarkasm and instruction decoding stories aside, we've had a number of > >> > reports of this kind of error in the past where the problem was simply > >> > people using the wrong the DT with their guest kernel. I don't think > >> > we've seen an actual case of a real guest that was using the 'wrong' > >> > instruction to actually do I/O. > >> > > >> > >> Currently, LTO builds of EDK2 for 32-bit mach-virt are broken because > >> of this. The MMIO accessors are written in C using volatile pointers, > >> allowing LTO to merge adjacent accesses or loops performing MMIO, > >> resulting in, e.g., instructions with writeback to be emitted. > > > > I'd like to see a testcase where GCC does merging on volatile accesses. > > That would be a GCC bug. So I suspect the C code isn't quite using volatile > > accesses... > > > > The accesses themselves are not being merged. But code such as > > MmioRead32: > ldr w0, [x0] > ret > > SomeOtherFunction: > ... > 0:mov x20, x0 > bl MmioRead32 > ... > add x20, x20, #4 > ... > b.xx 0b > > > (where the two are based on C code but from different compilation > units) may under LTO be turned into code involving a post increment on > the memory address of the ldr, resulting in an instruction that has > two outputs, triggering the KVM error.
Ah, I see! Of course this doesn't have anything to do with LTO per-se, you are just lucky it doesn't happen without ;) Richard.