On Thu, Jun 07, 2018 at 10:45:13AM +0200, Ard Biesheuvel wrote:
> On 7 June 2018 at 10:21, Christoffer Dall <christoffer.d...@arm.com> wrote:
> > On Thu, Jun 07, 2018 at 09:56:18AM +0200, Ard Biesheuvel wrote:
> >> On 7 June 2018 at 09:48, Christoffer Dall <christoffer.d...@arm.com> wrote:
> >> > [+Will]
> >> >
> >> > On Tue, Jun 05, 2018 at 03:07:14PM +0200, Laszlo Ersek wrote:
> >> >> On 06/05/18 13:30, Richard Biener wrote:
> >> >> > On Mon, Jun 4, 2018 at 8:11 PM Laszlo Ersek <ler...@redhat.com> wrote:
> >> >> >>
> >> >> >> Hi!
> >> >> >>
> >> >> >> Apologies if this isn't the right place for asking. For the problem
> >> >> >> statement, I'll simply steal Ard's writeup [1]:
> >> >> >>
> >> >> >>> KVM on ARM refuses to decode load/store instructions used to perform
> >> >> >>> I/O to emulated devices, and instead relies on the exception 
> >> >> >>> syndrome
> >> >> >>> information to describe the operand register, access size, etc. This
> >> >> >>> is only possible for instructions that have a single input/output
> >> >> >>> register (as opposed to ones that increment the offset register, or
> >> >> >>> load/store pair instructions, etc). Otherwise, QEMU crashes with the
> >> >> >>> following error
> >> >> >>>
> >> >> >>>   error: kvm run failed Function not implemented
> >> >> >>>   [...]
> >> >> >>>   QEMU: Terminated
> >> >> >>>
> >> >> >>> and KVM produces a warning such as the following in the kernel log
> >> >> >>>
> >> >> >>>   kvm [17646]: load/store instruction decoding not implemented
> >> >> >
> >> >> > This looks like a kvm/qemu issue to me.  Whatever that exception 
> >> >> > syndrome
> >> >> > thing is, it surely has a pointer to the offending instruction it 
> >> >> > could decode?
> >> >>
> >> >> I believe so -- the instruction decoding is theoretically possible (to
> >> >> my understanding); KVM currently doesn't do it because it's super
> >> >> complex (again, to my understanding).
> >> >>
> >> > The instruction decoding was considered and discarded because the
> >> > understanding at the time was that any instruction that didn't generate
> >> > valid decoding hints in the syndrome register (such as multiple output
> >> > register operations) would not be safe to use on device memory, and
> >> > therefore shouldn't be used neither on real hardware nor in VM guests.
> >> >
> >>
> >> How is it unsafe for a load or store with writeback to be used on
> >> device memory? That does not make sense to me.
> >
> > I don't understand that either, which is why I cc'ed Will who argued for
> > this last IIRC.
> >
> >> In any case, I suppose that *decoding* the instruction is not the
> >> problem, it is loading the opcode in the first place, given that it is
> >> not recorded in any system registers when the exception is taken. ELR
> >> will contain a virtual guest address [which could be in userland], and
> >> the host should translate that (which involves guest page tables that
> >> may be modified by other VCPUs concurrently) and map it to be able to
> >> get to the actual bits.
> >>
> >> > If this still holds, it's not a question of an architecture bug or a
> >> > missing feature in KVM, but a question of a guest doing something wrong.
> >> >
> >>
> >> Do you have a mutt macro for that response? :-)
> >>
> >
> > No I don't.  And I wouldn't mind adding instruction decoding to KVM.  I
> > already wrote it once, but the maintainer didn't want to merge the code
> > unless I unified all instruction decoding in the arm kernel, which I was
> > unable to do.
> >
> 
> Yikes.
> 
> So how does your code actually load the opcode?
> 

It was ages ago, and for the original 32-bit only KVM/ARM
implementation, but yes, it reads back the instruction, which is
absolutely horrible on an SMP guest, because other VCPUs could swap out
the page of the trapped instruction, so it requires something like
stopping the world to get a consistent read of the offending
instruction.

> > Sarkasm and instruction decoding stories aside, we've had a number of
> > reports of this kind of error in the past where the problem was simply
> > people using the wrong the DT with their guest kernel.  I don't think
> > we've seen an actual case of a real guest that was using the 'wrong'
> > instruction to actually do I/O.
> >
> 
> Currently, LTO builds of EDK2 for 32-bit mach-virt are broken because
> of this. The MMIO accessors are written in C using volatile pointers,
> allowing LTO to merge adjacent accesses or loops performing MMIO,
> resulting in, e.g., instructions with writeback to be emitted.

ok, it would be good to understand if there are any instructions that
are safe to use for I/O on the 32-bit side, and if anyone cares enough
about this use case, that would be a good argument for merging such
patches.

> 
> We have a fix for that: either disable LTO to build the MMIO access
> library, or use assembler for the actual ldr/str instructions.

Will fixed this for 32-bit Linux in
195bbcac2e5c12f7fb99cdcc492c3000c5537f4a.

The commit text only talks about running under a hypervisor, and
unfortunately I can't seem to dig up the explanation for why we should
just drop the instruction decoding.

> 
> In the kernel, I think the hygiene when it comes to using the
> appropriate accessors for MMIO is much higher, and given that the
> exception in the original report was actually taken from USR32 mode, I
> am highly skeptical about whether this has anything to do with using
> the wrong instructions for MMIO.
> 

I agree, it's unlikely.

> >> > I added Will here, because he provided the rationale for why the premise
> >> > held last we discussed this, and he may be able to update us on that.
> >> >
> >> > As Ard pointed out, KVM could probably provide a more helpful error
> >> > message or be a bit more clever in trying to find out what happened.
> >> > Some times, if guests are misconfigured, and for example think that
> >> > normal memory is placed in the guest physical memory map where the
> >> > hypervisor placed an I/O device, guests will issue non-decodable
> >> > instructions against I/O memory and KVM simply provides this misleading
> >> > error message.
> >> >
> >>
> >> As I pointed out in the BZ entry, this smells like another exception
> >> that gets reported at EL2, and lacks the syndrome information for
> >> unrelated reasons (i.e., nothing to do with the opcode)
> >
> > I don't understand how that would result in the error message reported.
> > kvm_handle_guest_abort() has:
> >
> >         /* Check the stage-2 fault is trans. fault or write fault */
> >         if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
> >             fault_status != FSC_ACCESS) {
> >                 kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
> >                         kvm_vcpu_trap_get_class(vcpu),
> >                         (unsigned long)kvm_vcpu_trap_get_fault(vcpu),
> >                         (unsigned long)kvm_vcpu_get_hsr(vcpu));
> >                 return -EFAULT;
> >         }
> >
> > Are you saying that we can get a stage 2 data abort, which reports the
> > DFSC as translation, permission, or access fault, while it's something
> > different completely?
> >
> 
> Ah ok, i missed that bit. Thanks for pointing it out.
> 
> So is the assumption here that the ISV bit will always be 1 for those
> DFSC classes? (provided that the other conditions [0] are met)
> 

Yes, I think that was the assumption when we wrote that code.

> > In any case, I was just trying to help here, by providing some
> > additional background and context.  I agree that KVM's error messaging
> > could be improved, and if we should implement instruction decoding for
> > certain instructions that can validly be used for I/O access, then we
> > can add that to our to-do list.
> >
> 
> Much appreciated. And please don't take my sarcasm the wrong way.
> 

No problem.

Thanks,
-Christoffer

> [0]
> """
> ISV, bit [24]
> Instruction syndrome valid. Indicates whether the syndrome information
> in ISS[23:14] is valid.
> 0 No valid instruction syndrome. ISS[23:14] are RES 0.
> 1 ISS[23:14] hold a valid instruction syndrome.
> This bit is 0 for all faults reported in ESR_EL2 except the following
> stage 2 aborts:
> • AArch64 loads and stores of a single general-purpose register
> (including the register
> specified with 0b11111 ), including those with Acquire/Release
> semantics, but excluding Load
> Exclusive or Store Exclusive and excluding those with writeback.
> • AArch32 instructions where the instruction:
> — Is an LDR, LDA, LDRT, LDRSH, LDRSHT, LDRH, LDAH, LDRHT, LDRSB,
> LDRSBT, LDRB, LDAB, LDRBT, STR, STL, STRT, STRH, STLH, STRHT, STRB,
> STLB, or STRBT instruction.
> — Is not performing register writeback.
> — Is not using R15 as a source or destination register.
> For these cases, ISV is UNKNOWN if the exception was generated in Debug stat
> """

Reply via email to