(I hit 'Send' too soon, apologies for the two stage reply) On 5 June 2018 at 10:18, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > On 5 June 2018 at 10:16, Laszlo Ersek <ler...@redhat.com> wrote: >> On 06/05/18 08:04, Ard Biesheuvel wrote: >>> On 4 June 2018 at 20:10, Laszlo Ersek <ler...@redhat.com> wrote: >>>> Hi! >>>> >>>> Apologies if this isn't the right place for asking. For the problem >>>> statement, I'll simply steal Ard's writeup [1]: >>>> >>>>> KVM on ARM refuses to decode load/store instructions used to perform >>>>> I/O to emulated devices, and instead relies on the exception syndrome >>>>> information to describe the operand register, access size, etc. This >>>>> is only possible for instructions that have a single input/output >>>>> register (as opposed to ones that increment the offset register, or >>>>> load/store pair instructions, etc). Otherwise, QEMU crashes with the >>>>> following error >>>>> >>>>> error: kvm run failed Function not implemented >>>>> [...] >>>>> QEMU: Terminated >>>>> >>>>> and KVM produces a warning such as the following in the kernel log >>>>> >>>>> kvm [17646]: load/store instruction decoding not implemented >>>>> >>>>> GCC with LTO enabled will emit such instructions for Mmio[Read|Write] >>>>> invocations performed in a loop, so we need to disable LTO [...] >>>> >>>> We have a Red Hat Bugzilla about the (very likely) same issue [2]. >>>> >>>> Earlier, we had to work around the same on AArch64 too [3]. >>>> >>>> Would it be possible to introduce a dedicated -mXXX option, for ARM and >>>> AArch64, that disabled the generation of such multi-operand >>>> instructions? >>>> >>>> I note there are several similar instructions (for other architectures): >>>> * -mno-multiple (ppc) >>>> * -mno-fused-madd (ia64) >>>> * -mno-mmx and a lot of friends (x86) >>>> >>>> Obviously, if the feature request is deemed justified, we should provide >>>> the exact family of instructions to disable. I'll leave that to others >>>> on the CC list with more ARM/AArch64 expertise; I just wanted to get >>>> this thread started. (Sorry if the option is already being requested >>>> elsewhere; I admit I didn't search the GCC bugzilla.) >>>> >>> >>> I am not convinced that tweaking GCC code generation is the correct >>> approach here, to be honest. >>> >>> The issue only occurs when load/store instructions trap into KVM, >>> which (correct me if I am wrong) mostly only occurs when emulating >>> MMIO. The case I have been looking into (UEFI) uses MMIO accessors >>> correctly, but due to the way they are implemented (in C), LTO code >>> generation may result in load/store instructions with multiple outputs >>> to be used. >>> >>> So first of all, I would like to understand the magnitude of the >>> problem. If all cases we can identify involve performing MMIO using C >>> memory references, I think we should fix the code rather than the >>> compiler. >> >> To my understanding, Daniel has the opposite preference; namely, the >> above approach doesn't scale to a large and moving target like the >> kernel. Because the instructions in question work on the bare metal >> (IOW, the guest code is not "broken" in any sense of the word), people >> will continue writing kernel MMIO code in C that "lures" gcc into >> generating such ARM/AArch64 assembly that contains those instructions. >>
Kernel MMIO code is not written in C, it used readl/writel and friends. >> The RHBZ I linked earlier remains elusive; the issue is not easy to >> trigger, and when it does trigger, one has to investigate the symptoms >> (the guest code at the trap address) every time, trace it back to >> C-language source code, and either tweak that C code, or else tweak the >> compiler flags specifically for that code / module. AIUI Daniel prefers >> to work around the KVM issue without having to analyze every guest site, >> as they pop up over time. The expression "all cases we can identify" is >> the core of the problem; it's not a well-defined set. >> >> Your edk2 ArmVirtQemu patch adds a heavy-weight flag (-fno-lto) to a >> pin-point location; another possibility (that might scale better to >> humans) is a new, lighter-weight flag, such as "-mno-multiple", that is >> applied universally to a codebase. >> > > That will affect *all* memory references, which will undoubtedly hurt > performance. ... given that it will prevent us from using load/store pair instructions, which are used *everywhere*. Every function prologue consists of a collection of ldp instructions, every function epilogue has stp instructions. Optimized copy routines use ldp/stp as well. So rebuilding the world with -mno-multiple is not going to fly, really. We should probably start with improving the diagnostics produced by KVM when this situation hits. And given that this is (imo) a bug in the architecture, it should be up to KVM to work around it.