On Thu, Jun 7, 2018 at 10:35 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Thu, Jun 7, 2018 at 10:45 AM Ard Biesheuvel > <ard.biesheu...@linaro.org> wrote: >> >> On 7 June 2018 at 10:21, Christoffer Dall <christoffer.d...@arm.com> wrote: >> > On Thu, Jun 07, 2018 at 09:56:18AM +0200, Ard Biesheuvel wrote: >> >> On 7 June 2018 at 09:48, Christoffer Dall <christoffer.d...@arm.com> >> >> wrote: >> >> > [+Will] >> >> > >> >> > On Tue, Jun 05, 2018 at 03:07:14PM +0200, Laszlo Ersek wrote: >> >> >> On 06/05/18 13:30, Richard Biener wrote: >> >> >> > On Mon, Jun 4, 2018 at 8:11 PM Laszlo Ersek <ler...@redhat.com> >> >> >> > wrote: >> >> >> >> >> >> >> >> Hi! >> >> >> >> >> >> >> >> Apologies if this isn't the right place for asking. For the problem >> >> >> >> statement, I'll simply steal Ard's writeup [1]: >> >> >> >> >> >> >> >>> KVM on ARM refuses to decode load/store instructions used to >> >> >> >>> perform >> >> >> >>> I/O to emulated devices, and instead relies on the exception >> >> >> >>> syndrome >> >> >> >>> information to describe the operand register, access size, etc. >> >> >> >>> This >> >> >> >>> is only possible for instructions that have a single input/output >> >> >> >>> register (as opposed to ones that increment the offset register, or >> >> >> >>> load/store pair instructions, etc). Otherwise, QEMU crashes with >> >> >> >>> the >> >> >> >>> following error >> >> >> >>> >> >> >> >>> error: kvm run failed Function not implemented >> >> >> >>> [...] >> >> >> >>> QEMU: Terminated >> >> >> >>> >> >> >> >>> and KVM produces a warning such as the following in the kernel log >> >> >> >>> >> >> >> >>> kvm [17646]: load/store instruction decoding not implemented >> >> >> > >> >> >> > This looks like a kvm/qemu issue to me. Whatever that exception >> >> >> > syndrome >> >> >> > thing is, it surely has a pointer to the offending instruction it >> >> >> > could decode? >> >> >> >> >> >> I believe so -- the instruction decoding is theoretically possible (to >> >> >> my understanding); KVM currently doesn't do it because it's super >> >> >> complex (again, to my understanding). >> >> >> >> >> > The instruction decoding was considered and discarded because the >> >> > understanding at the time was that any instruction that didn't generate >> >> > valid decoding hints in the syndrome register (such as multiple output >> >> > register operations) would not be safe to use on device memory, and >> >> > therefore shouldn't be used neither on real hardware nor in VM guests. >> >> > >> >> >> >> How is it unsafe for a load or store with writeback to be used on >> >> device memory? That does not make sense to me. >> > >> > I don't understand that either, which is why I cc'ed Will who argued for >> > this last IIRC. >> > >> >> In any case, I suppose that *decoding* the instruction is not the >> >> problem, it is loading the opcode in the first place, given that it is >> >> not recorded in any system registers when the exception is taken. ELR >> >> will contain a virtual guest address [which could be in userland], and >> >> the host should translate that (which involves guest page tables that >> >> may be modified by other VCPUs concurrently) and map it to be able to >> >> get to the actual bits. >> >> >> >> > If this still holds, it's not a question of an architecture bug or a >> >> > missing feature in KVM, but a question of a guest doing something wrong. >> >> > >> >> >> >> Do you have a mutt macro for that response? :-) >> >> >> > >> > No I don't. And I wouldn't mind adding instruction decoding to KVM. I >> > already wrote it once, but the maintainer didn't want to merge the code >> > unless I unified all instruction decoding in the arm kernel, which I was >> > unable to do. >> > >> >> Yikes. >> >> So how does your code actually load the opcode? >> >> > Sarkasm and instruction decoding stories aside, we've had a number of >> > reports of this kind of error in the past where the problem was simply >> > people using the wrong the DT with their guest kernel. I don't think >> > we've seen an actual case of a real guest that was using the 'wrong' >> > instruction to actually do I/O. >> > >> >> Currently, LTO builds of EDK2 for 32-bit mach-virt are broken because >> of this. The MMIO accessors are written in C using volatile pointers, >> allowing LTO to merge adjacent accesses or loops performing MMIO, >> resulting in, e.g., instructions with writeback to be emitted. > > I'd like to see a testcase where GCC does merging on volatile accesses. > That would be a GCC bug. So I suspect the C code isn't quite using volatile > accesses... >
I'd also like to see a proper testcase to action. IIRC we have checks preventing such merges for volatiles irrespective of LTO or not. Ramana