On Thu, Jun 7, 2018 at 10:45 AM Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > > On 7 June 2018 at 10:21, Christoffer Dall <christoffer.d...@arm.com> wrote: > > On Thu, Jun 07, 2018 at 09:56:18AM +0200, Ard Biesheuvel wrote: > >> On 7 June 2018 at 09:48, Christoffer Dall <christoffer.d...@arm.com> wrote: > >> > [+Will] > >> > > >> > On Tue, Jun 05, 2018 at 03:07:14PM +0200, Laszlo Ersek wrote: > >> >> On 06/05/18 13:30, Richard Biener wrote: > >> >> > On Mon, Jun 4, 2018 at 8:11 PM Laszlo Ersek <ler...@redhat.com> wrote: > >> >> >> > >> >> >> Hi! > >> >> >> > >> >> >> Apologies if this isn't the right place for asking. For the problem > >> >> >> statement, I'll simply steal Ard's writeup [1]: > >> >> >> > >> >> >>> KVM on ARM refuses to decode load/store instructions used to perform > >> >> >>> I/O to emulated devices, and instead relies on the exception > >> >> >>> syndrome > >> >> >>> information to describe the operand register, access size, etc. This > >> >> >>> is only possible for instructions that have a single input/output > >> >> >>> register (as opposed to ones that increment the offset register, or > >> >> >>> load/store pair instructions, etc). Otherwise, QEMU crashes with the > >> >> >>> following error > >> >> >>> > >> >> >>> error: kvm run failed Function not implemented > >> >> >>> [...] > >> >> >>> QEMU: Terminated > >> >> >>> > >> >> >>> and KVM produces a warning such as the following in the kernel log > >> >> >>> > >> >> >>> kvm [17646]: load/store instruction decoding not implemented > >> >> > > >> >> > This looks like a kvm/qemu issue to me. Whatever that exception > >> >> > syndrome > >> >> > thing is, it surely has a pointer to the offending instruction it > >> >> > could decode? > >> >> > >> >> I believe so -- the instruction decoding is theoretically possible (to > >> >> my understanding); KVM currently doesn't do it because it's super > >> >> complex (again, to my understanding). > >> >> > >> > The instruction decoding was considered and discarded because the > >> > understanding at the time was that any instruction that didn't generate > >> > valid decoding hints in the syndrome register (such as multiple output > >> > register operations) would not be safe to use on device memory, and > >> > therefore shouldn't be used neither on real hardware nor in VM guests. > >> > > >> > >> How is it unsafe for a load or store with writeback to be used on > >> device memory? That does not make sense to me. > > > > I don't understand that either, which is why I cc'ed Will who argued for > > this last IIRC. > > > >> In any case, I suppose that *decoding* the instruction is not the > >> problem, it is loading the opcode in the first place, given that it is > >> not recorded in any system registers when the exception is taken. ELR > >> will contain a virtual guest address [which could be in userland], and > >> the host should translate that (which involves guest page tables that > >> may be modified by other VCPUs concurrently) and map it to be able to > >> get to the actual bits. > >> > >> > If this still holds, it's not a question of an architecture bug or a > >> > missing feature in KVM, but a question of a guest doing something wrong. > >> > > >> > >> Do you have a mutt macro for that response? :-) > >> > > > > No I don't. And I wouldn't mind adding instruction decoding to KVM. I > > already wrote it once, but the maintainer didn't want to merge the code > > unless I unified all instruction decoding in the arm kernel, which I was > > unable to do. > > > > Yikes. > > So how does your code actually load the opcode? > > > Sarkasm and instruction decoding stories aside, we've had a number of > > reports of this kind of error in the past where the problem was simply > > people using the wrong the DT with their guest kernel. I don't think > > we've seen an actual case of a real guest that was using the 'wrong' > > instruction to actually do I/O. > > > > Currently, LTO builds of EDK2 for 32-bit mach-virt are broken because > of this. The MMIO accessors are written in C using volatile pointers, > allowing LTO to merge adjacent accesses or loops performing MMIO, > resulting in, e.g., instructions with writeback to be emitted.
I'd like to see a testcase where GCC does merging on volatile accesses. That would be a GCC bug. So I suspect the C code isn't quite using volatile accesses... > We have a fix for that: either disable LTO to build the MMIO access > library, or use assembler for the actual ldr/str instructions. > > In the kernel, I think the hygiene when it comes to using the > appropriate accessors for MMIO is much higher, and given that the > exception in the original report was actually taken from USR32 mode, I > am highly skeptical about whether this has anything to do with using > the wrong instructions for MMIO. > > >> > I added Will here, because he provided the rationale for why the premise > >> > held last we discussed this, and he may be able to update us on that. > >> > > >> > As Ard pointed out, KVM could probably provide a more helpful error > >> > message or be a bit more clever in trying to find out what happened. > >> > Some times, if guests are misconfigured, and for example think that > >> > normal memory is placed in the guest physical memory map where the > >> > hypervisor placed an I/O device, guests will issue non-decodable > >> > instructions against I/O memory and KVM simply provides this misleading > >> > error message. > >> > > >> > >> As I pointed out in the BZ entry, this smells like another exception > >> that gets reported at EL2, and lacks the syndrome information for > >> unrelated reasons (i.e., nothing to do with the opcode) > > > > I don't understand how that would result in the error message reported. > > kvm_handle_guest_abort() has: > > > > /* Check the stage-2 fault is trans. fault or write fault */ > > if (fault_status != FSC_FAULT && fault_status != FSC_PERM && > > fault_status != FSC_ACCESS) { > > kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n", > > kvm_vcpu_trap_get_class(vcpu), > > (unsigned long)kvm_vcpu_trap_get_fault(vcpu), > > (unsigned long)kvm_vcpu_get_hsr(vcpu)); > > return -EFAULT; > > } > > > > Are you saying that we can get a stage 2 data abort, which reports the > > DFSC as translation, permission, or access fault, while it's something > > different completely? > > > > Ah ok, i missed that bit. Thanks for pointing it out. > > So is the assumption here that the ISV bit will always be 1 for those > DFSC classes? (provided that the other conditions [0] are met) > > > In any case, I was just trying to help here, by providing some > > additional background and context. I agree that KVM's error messaging > > could be improved, and if we should implement instruction decoding for > > certain instructions that can validly be used for I/O access, then we > > can add that to our to-do list. > > > > Much appreciated. And please don't take my sarcasm the wrong way. > > [0] > """ > ISV, bit [24] > Instruction syndrome valid. Indicates whether the syndrome information > in ISS[23:14] is valid. > 0 No valid instruction syndrome. ISS[23:14] are RES 0. > 1 ISS[23:14] hold a valid instruction syndrome. > This bit is 0 for all faults reported in ESR_EL2 except the following > stage 2 aborts: > • AArch64 loads and stores of a single general-purpose register > (including the register > specified with 0b11111 ), including those with Acquire/Release > semantics, but excluding Load > Exclusive or Store Exclusive and excluding those with writeback. > • AArch32 instructions where the instruction: > — Is an LDR, LDA, LDRT, LDRSH, LDRSHT, LDRH, LDAH, LDRHT, LDRSB, > LDRSBT, LDRB, LDAB, LDRBT, STR, STL, STRT, STRH, STLH, STRHT, STRB, > STLB, or STRBT instruction. > — Is not performing register writeback. > — Is not using R15 as a source or destination register. > For these cases, ISV is UNKNOWN if the exception was generated in Debug stat > """