[PATCH] D79916: Map -O to -O1 instead of -O2

Brandon Bergren via Phabricator via cfe-commits Thu, 17 Sep 2020 11:13:14 -0700

Bdragon28 added a comment.

In D79916#2279863 <https://reviews.llvm.org/D79916#2279863>, @Bdragon28 wrote:


> In D79916#2279816 <https://reviews.llvm.org/D79916#2279816>, @jrtc27 wrote:
>
>> In D79916#2279812 <https://reviews.llvm.org/D79916#2279812>, @Bdragon28 
>> wrote:
>>
>>> In D79916#2279045 <https://reviews.llvm.org/D79916#2279045>, @jrtc27 wrote:
>>>
>>>> This has significantly regressed FreeBSD's performance with the new 
>>>> version of Clang. It seems Clang does not inline functions at -O1, unlike 
>>>> GCC, and since FreeBSD currently compiles its kernel with -O whenever 
>>>> debug symbols are enabled[1] (which, of course, is almost always true), 
>>>> this results in all its `static inline` helper functions not being inlined 
>>>> at all, a pattern that is common in the kernel, used for things like 
>>>> `get_curthread` and the atomics implementations.
>>>>
>>>> [1] This is a dubious decision made in r140400 in 2005 to provide "truer 
>>>> debugger stack traces" (well, before then there was ping-ponging between 
>>>> -O and -O2 based on concerns around correctness vs performance, but amd64 
>>>> is an exception that has always used -O2 since r127180 it seems). Given 
>>>> that GCC will inline at -O, at least these days, the motivation seems to 
>>>> no longer exist, and compiling a kernel at anything other than -O2 (or 
>>>> maybe -O3) seems like a silly thing to do, but nevertheless it's what is 
>>>> currently done.
>>>>
>>>> Cc: @dim @trasz
>>>
>>> This is actually SUCH a bad idea that a kernel built with -O will *not work 
>>> at all* on 32 bit powerpc platforms (presumably due to allocating stack 
>>> frames in the middle of assembly fragments in the memory management that 
>>> are supposed to be inlined at all times.) I had to hack kern.pre.mk to 
>>> rquest -O2 at all times just to get a functioning kernel.
>>
>> Well, -O0, -O1, -O2 and -O should all produce working kernels, and any cases 
>> where they don't are compiler bugs (or kernel bugs if they rely on UB) that 
>> should be fixed, not worked around by tweaking the compiler flags in a 
>> fragile way until you get the behaviour relied on. Correctness and 
>> performance are very different issues here.
>
> As an example:
>
>   static __inline void
>   mtsrin(vm_offset_t va, register_t value)
>   {
>   
>           __asm __volatile ("mtsrin %0,%1; isync" :: "r"(value), "r"(va));
>   }
>
> This code is used in the mmu when bootstrapping the cpu like so:
>
>   for (i = 0; i < 16; i++)
>           mtsrin(i << ADDR_SR_SHFT, kernel_pmap->pm_sr[i]);
>   powerpc_sync();
>   
>   sdr = (u_int)moea_pteg_table | (moea_pteg_mask >> 10);
>   __asm __volatile("mtsdr1 %0" :: "r"(sdr));
>   isync();
>   
>   tlbia();
>
> During the loop there, we are in the middle of programming the MMU segment 
> registers in real mode, and is supposed to be doing all work out of 
> registers. (and powerpc_sync() and isync() should be expanded to their single 
> assembly instruction, not a function call. The whole point of calling those 
> is that we are in an inconsistent hardware state and need to sync up before 
> continuing execution)
>
> If there isn't a way to force inlining, we will have to change to using 
> preprocessor macros in cpufunc.h.

Actually, this is probably a bad example. Since we're in real mode it doesn't 
really matter. But I can see other places where powerpc_sync() / isync() are 
dangerous to expand to a function call.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79916/new/

https://reviews.llvm.org/D79916

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D79916: Map -O to -O1 instead of -O2

Reply via email to