On Thu, Jun 7, 2018 at 10:35 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Thu, Jun 7, 2018 at 10:45 AM Ard Biesheuvel
> <ard.biesheu...@linaro.org> wrote:
>>
>> On 7 June 2018 at 10:21, Christoffer Dall <christoffer.d...@arm.com> wrote:
>> > On Thu, Jun 07, 2018 at 09:56:18AM +0200, Ard Biesheuvel wrote:
>> >> On 7 June 2018 at 09:48, Christoffer Dall <christoffer.d...@arm.com> 
>> >> wrote:
>> >> > [+Will]
>> >> >
>> >> > On Tue, Jun 05, 2018 at 03:07:14PM +0200, Laszlo Ersek wrote:
>> >> >> On 06/05/18 13:30, Richard Biener wrote:
>> >> >> > On Mon, Jun 4, 2018 at 8:11 PM Laszlo Ersek <ler...@redhat.com> 
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi!
>> >> >> >>
>> >> >> >> Apologies if this isn't the right place for asking. For the problem
>> >> >> >> statement, I'll simply steal Ard's writeup [1]:
>> >> >> >>
>> >> >> >>> KVM on ARM refuses to decode load/store instructions used to 
>> >> >> >>> perform
>> >> >> >>> I/O to emulated devices, and instead relies on the exception 
>> >> >> >>> syndrome
>> >> >> >>> information to describe the operand register, access size, etc. 
>> >> >> >>> This
>> >> >> >>> is only possible for instructions that have a single input/output
>> >> >> >>> register (as opposed to ones that increment the offset register, or
>> >> >> >>> load/store pair instructions, etc). Otherwise, QEMU crashes with 
>> >> >> >>> the
>> >> >> >>> following error
>> >> >> >>>
>> >> >> >>>   error: kvm run failed Function not implemented
>> >> >> >>>   [...]
>> >> >> >>>   QEMU: Terminated
>> >> >> >>>
>> >> >> >>> and KVM produces a warning such as the following in the kernel log
>> >> >> >>>
>> >> >> >>>   kvm [17646]: load/store instruction decoding not implemented
>> >> >> >
>> >> >> > This looks like a kvm/qemu issue to me.  Whatever that exception 
>> >> >> > syndrome
>> >> >> > thing is, it surely has a pointer to the offending instruction it 
>> >> >> > could decode?
>> >> >>
>> >> >> I believe so -- the instruction decoding is theoretically possible (to
>> >> >> my understanding); KVM currently doesn't do it because it's super
>> >> >> complex (again, to my understanding).
>> >> >>
>> >> > The instruction decoding was considered and discarded because the
>> >> > understanding at the time was that any instruction that didn't generate
>> >> > valid decoding hints in the syndrome register (such as multiple output
>> >> > register operations) would not be safe to use on device memory, and
>> >> > therefore shouldn't be used neither on real hardware nor in VM guests.
>> >> >
>> >>
>> >> How is it unsafe for a load or store with writeback to be used on
>> >> device memory? That does not make sense to me.
>> >
>> > I don't understand that either, which is why I cc'ed Will who argued for
>> > this last IIRC.
>> >
>> >> In any case, I suppose that *decoding* the instruction is not the
>> >> problem, it is loading the opcode in the first place, given that it is
>> >> not recorded in any system registers when the exception is taken. ELR
>> >> will contain a virtual guest address [which could be in userland], and
>> >> the host should translate that (which involves guest page tables that
>> >> may be modified by other VCPUs concurrently) and map it to be able to
>> >> get to the actual bits.
>> >>
>> >> > If this still holds, it's not a question of an architecture bug or a
>> >> > missing feature in KVM, but a question of a guest doing something wrong.
>> >> >
>> >>
>> >> Do you have a mutt macro for that response? :-)
>> >>
>> >
>> > No I don't.  And I wouldn't mind adding instruction decoding to KVM.  I
>> > already wrote it once, but the maintainer didn't want to merge the code
>> > unless I unified all instruction decoding in the arm kernel, which I was
>> > unable to do.
>> >
>>
>> Yikes.
>>
>> So how does your code actually load the opcode?
>>
>> > Sarkasm and instruction decoding stories aside, we've had a number of
>> > reports of this kind of error in the past where the problem was simply
>> > people using the wrong the DT with their guest kernel.  I don't think
>> > we've seen an actual case of a real guest that was using the 'wrong'
>> > instruction to actually do I/O.
>> >
>>
>> Currently, LTO builds of EDK2 for 32-bit mach-virt are broken because
>> of this. The MMIO accessors are written in C using volatile pointers,
>> allowing LTO to merge adjacent accesses or loops performing MMIO,
>> resulting in, e.g., instructions with writeback to be emitted.
>
> I'd like to see a testcase where GCC does merging on volatile accesses.
> That would be a GCC bug.  So I suspect the C code isn't quite using volatile
> accesses...
>

I'd also like to see a proper testcase to action. IIRC we have checks
preventing such merges for volatiles irrespective of LTO or not.


Ramana

Reply via email to