On 5 June 2018 at 10:16, Laszlo Ersek <ler...@redhat.com> wrote: > On 06/05/18 08:04, Ard Biesheuvel wrote: >> On 4 June 2018 at 20:10, Laszlo Ersek <ler...@redhat.com> wrote: >>> Hi! >>> >>> Apologies if this isn't the right place for asking. For the problem >>> statement, I'll simply steal Ard's writeup [1]: >>> >>>> KVM on ARM refuses to decode load/store instructions used to perform >>>> I/O to emulated devices, and instead relies on the exception syndrome >>>> information to describe the operand register, access size, etc. This >>>> is only possible for instructions that have a single input/output >>>> register (as opposed to ones that increment the offset register, or >>>> load/store pair instructions, etc). Otherwise, QEMU crashes with the >>>> following error >>>> >>>> error: kvm run failed Function not implemented >>>> [...] >>>> QEMU: Terminated >>>> >>>> and KVM produces a warning such as the following in the kernel log >>>> >>>> kvm [17646]: load/store instruction decoding not implemented >>>> >>>> GCC with LTO enabled will emit such instructions for Mmio[Read|Write] >>>> invocations performed in a loop, so we need to disable LTO [...] >>> >>> We have a Red Hat Bugzilla about the (very likely) same issue [2]. >>> >>> Earlier, we had to work around the same on AArch64 too [3]. >>> >>> Would it be possible to introduce a dedicated -mXXX option, for ARM and >>> AArch64, that disabled the generation of such multi-operand >>> instructions? >>> >>> I note there are several similar instructions (for other architectures): >>> * -mno-multiple (ppc) >>> * -mno-fused-madd (ia64) >>> * -mno-mmx and a lot of friends (x86) >>> >>> Obviously, if the feature request is deemed justified, we should provide >>> the exact family of instructions to disable. I'll leave that to others >>> on the CC list with more ARM/AArch64 expertise; I just wanted to get >>> this thread started. (Sorry if the option is already being requested >>> elsewhere; I admit I didn't search the GCC bugzilla.) >>> >> >> I am not convinced that tweaking GCC code generation is the correct >> approach here, to be honest. >> >> The issue only occurs when load/store instructions trap into KVM, >> which (correct me if I am wrong) mostly only occurs when emulating >> MMIO. The case I have been looking into (UEFI) uses MMIO accessors >> correctly, but due to the way they are implemented (in C), LTO code >> generation may result in load/store instructions with multiple outputs >> to be used. >> >> So first of all, I would like to understand the magnitude of the >> problem. If all cases we can identify involve performing MMIO using C >> memory references, I think we should fix the code rather than the >> compiler. > > To my understanding, Daniel has the opposite preference; namely, the > above approach doesn't scale to a large and moving target like the > kernel. Because the instructions in question work on the bare metal > (IOW, the guest code is not "broken" in any sense of the word), people > will continue writing kernel MMIO code in C that "lures" gcc into > generating such ARM/AArch64 assembly that contains those instructions. > > The RHBZ I linked earlier remains elusive; the issue is not easy to > trigger, and when it does trigger, one has to investigate the symptoms > (the guest code at the trap address) every time, trace it back to > C-language source code, and either tweak that C code, or else tweak the > compiler flags specifically for that code / module. AIUI Daniel prefers > to work around the KVM issue without having to analyze every guest site, > as they pop up over time. The expression "all cases we can identify" is > the core of the problem; it's not a well-defined set. > > Your edk2 ArmVirtQemu patch adds a heavy-weight flag (-fno-lto) to a > pin-point location; another possibility (that might scale better to > humans) is a new, lighter-weight flag, such as "-mno-multiple", that is > applied universally to a codebase. >
That will affect *all* memory references, which will undoubtedly hurt performance.