(I hit 'Send' too soon, apologies for the two stage reply)

On 5 June 2018 at 10:18, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote:
> On 5 June 2018 at 10:16, Laszlo Ersek <ler...@redhat.com> wrote:
>> On 06/05/18 08:04, Ard Biesheuvel wrote:
>>> On 4 June 2018 at 20:10, Laszlo Ersek <ler...@redhat.com> wrote:
>>>> Hi!
>>>>
>>>> Apologies if this isn't the right place for asking. For the problem
>>>> statement, I'll simply steal Ard's writeup [1]:
>>>>
>>>>> KVM on ARM refuses to decode load/store instructions used to perform
>>>>> I/O to emulated devices, and instead relies on the exception syndrome
>>>>> information to describe the operand register, access size, etc. This
>>>>> is only possible for instructions that have a single input/output
>>>>> register (as opposed to ones that increment the offset register, or
>>>>> load/store pair instructions, etc). Otherwise, QEMU crashes with the
>>>>> following error
>>>>>
>>>>>   error: kvm run failed Function not implemented
>>>>>   [...]
>>>>>   QEMU: Terminated
>>>>>
>>>>> and KVM produces a warning such as the following in the kernel log
>>>>>
>>>>>   kvm [17646]: load/store instruction decoding not implemented
>>>>>
>>>>> GCC with LTO enabled will emit such instructions for Mmio[Read|Write]
>>>>> invocations performed in a loop, so we need to disable LTO [...]
>>>>
>>>> We have a Red Hat Bugzilla about the (very likely) same issue [2].
>>>>
>>>> Earlier, we had to work around the same on AArch64 too [3].
>>>>
>>>> Would it be possible to introduce a dedicated -mXXX option, for ARM and
>>>> AArch64, that disabled the generation of such multi-operand
>>>> instructions?
>>>>
>>>> I note there are several similar instructions (for other architectures):
>>>> * -mno-multiple (ppc)
>>>> * -mno-fused-madd (ia64)
>>>> * -mno-mmx and a lot of friends (x86)
>>>>
>>>> Obviously, if the feature request is deemed justified, we should provide
>>>> the exact family of instructions to disable. I'll leave that to others
>>>> on the CC list with more ARM/AArch64 expertise; I just wanted to get
>>>> this thread started. (Sorry if the option is already being requested
>>>> elsewhere; I admit I didn't search the GCC bugzilla.)
>>>>
>>>
>>> I am not convinced that tweaking GCC code generation is the correct
>>> approach here, to be honest.
>>>
>>> The issue only occurs when load/store instructions trap into KVM,
>>> which (correct me if I am wrong) mostly only occurs when emulating
>>> MMIO. The case I have been looking into (UEFI) uses MMIO accessors
>>> correctly, but due to the way they are implemented (in C), LTO code
>>> generation may result in load/store instructions with multiple outputs
>>> to be used.
>>>
>>> So first of all, I would like to understand the magnitude of the
>>> problem. If all cases we can identify involve performing MMIO using C
>>> memory references, I think we should fix the code rather than the
>>> compiler.
>>
>> To my understanding, Daniel has the opposite preference; namely, the
>> above approach doesn't scale to a large and moving target like the
>> kernel. Because the instructions in question work on the bare metal
>> (IOW, the guest code is not "broken" in any sense of the word), people
>> will continue writing kernel MMIO code in C that "lures" gcc into
>> generating such ARM/AArch64 assembly that contains those instructions.
>>

Kernel MMIO code is not written in C, it used readl/writel and friends.

>> The RHBZ I linked earlier remains elusive; the issue is not easy to
>> trigger, and when it does trigger, one has to investigate the symptoms
>> (the guest code at the trap address) every time, trace it back to
>> C-language source code, and either tweak that C code, or else tweak the
>> compiler flags specifically for that code / module. AIUI Daniel prefers
>> to work around the KVM issue without having to analyze every guest site,
>> as they pop up over time. The expression "all cases we can identify" is
>> the core of the problem; it's not a well-defined set.
>>
>> Your edk2 ArmVirtQemu patch adds a heavy-weight flag (-fno-lto) to a
>> pin-point location; another possibility (that might scale better to
>> humans) is a new, lighter-weight flag, such as "-mno-multiple", that is
>> applied universally to a codebase.
>>
>
> That will affect *all* memory references, which will undoubtedly hurt
> performance.

... given that it will prevent us from using load/store pair
instructions, which are used *everywhere*. Every function prologue
consists of a collection of ldp instructions, every function epilogue
has stp instructions. Optimized copy routines use ldp/stp as well.

So rebuilding the world with -mno-multiple is not going to fly, really.

We should probably start with improving the diagnostics produced by
KVM when this situation hits. And given that this is (imo) a bug in
the architecture, it should be up to KVM to work around it.

Reply via email to