Re: code-gen options for disabling multi-operand AArch64 and ARM instructions

Ard Biesheuvel Tue, 05 Jun 2018 01:19:59 -0700

On 5 June 2018 at 10:16, Laszlo Ersek <[email protected]> wrote:
> On 06/05/18 08:04, Ard Biesheuvel wrote:
>> On 4 June 2018 at 20:10, Laszlo Ersek <[email protected]> wrote:
>>> Hi!
>>>
>>> Apologies if this isn't the right place for asking. For the problem
>>> statement, I'll simply steal Ard's writeup [1]:
>>>
>>>> KVM on ARM refuses to decode load/store instructions used to perform
>>>> I/O to emulated devices, and instead relies on the exception syndrome
>>>> information to describe the operand register, access size, etc. This
>>>> is only possible for instructions that have a single input/output
>>>> register (as opposed to ones that increment the offset register, or
>>>> load/store pair instructions, etc). Otherwise, QEMU crashes with the
>>>> following error
>>>>
>>>>   error: kvm run failed Function not implemented
>>>>   [...]
>>>>   QEMU: Terminated
>>>>
>>>> and KVM produces a warning such as the following in the kernel log
>>>>
>>>>   kvm [17646]: load/store instruction decoding not implemented
>>>>
>>>> GCC with LTO enabled will emit such instructions for Mmio[Read|Write]
>>>> invocations performed in a loop, so we need to disable LTO [...]
>>>
>>> We have a Red Hat Bugzilla about the (very likely) same issue [2].
>>>
>>> Earlier, we had to work around the same on AArch64 too [3].
>>>
>>> Would it be possible to introduce a dedicated -mXXX option, for ARM and
>>> AArch64, that disabled the generation of such multi-operand
>>> instructions?
>>>
>>> I note there are several similar instructions (for other architectures):
>>> * -mno-multiple (ppc)
>>> * -mno-fused-madd (ia64)
>>> * -mno-mmx and a lot of friends (x86)
>>>
>>> Obviously, if the feature request is deemed justified, we should provide
>>> the exact family of instructions to disable. I'll leave that to others
>>> on the CC list with more ARM/AArch64 expertise; I just wanted to get
>>> this thread started. (Sorry if the option is already being requested
>>> elsewhere; I admit I didn't search the GCC bugzilla.)
>>>
>>
>> I am not convinced that tweaking GCC code generation is the correct
>> approach here, to be honest.
>>
>> The issue only occurs when load/store instructions trap into KVM,
>> which (correct me if I am wrong) mostly only occurs when emulating
>> MMIO. The case I have been looking into (UEFI) uses MMIO accessors
>> correctly, but due to the way they are implemented (in C), LTO code
>> generation may result in load/store instructions with multiple outputs
>> to be used.
>>
>> So first of all, I would like to understand the magnitude of the
>> problem. If all cases we can identify involve performing MMIO using C
>> memory references, I think we should fix the code rather than the
>> compiler.
>
> To my understanding, Daniel has the opposite preference; namely, the
> above approach doesn't scale to a large and moving target like the
> kernel. Because the instructions in question work on the bare metal
> (IOW, the guest code is not "broken" in any sense of the word), people
> will continue writing kernel MMIO code in C that "lures" gcc into
> generating such ARM/AArch64 assembly that contains those instructions.
>
> The RHBZ I linked earlier remains elusive; the issue is not easy to
> trigger, and when it does trigger, one has to investigate the symptoms
> (the guest code at the trap address) every time, trace it back to
> C-language source code, and either tweak that C code, or else tweak the
> compiler flags specifically for that code / module. AIUI Daniel prefers
> to work around the KVM issue without having to analyze every guest site,
> as they pop up over time. The expression "all cases we can identify" is
> the core of the problem; it's not a well-defined set.
>
> Your edk2 ArmVirtQemu patch adds a heavy-weight flag (-fno-lto) to a
> pin-point location; another possibility (that might scale better to
> humans) is a new, lighter-weight flag, such as "-mno-multiple", that is
> applied universally to a codebase.
>


That will affect *all* memory references, which will undoubtedly hurt
performance.

Re: code-gen options for disabling multi-operand AArch64 and ARM instructions

Reply via email to