[Bug target/94391] gcc refers to absolute symbols with R_X86_64_PC32 relocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94391 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #5 from Fangrui Song --- This bug exposes several problems: * GNU ld does not reject a PC-relative relocation referencing a SHN_ABS symbol * GCC should not produce R_X86_64_PC32 referencing an external symbol in -fpie mode. % gcc -fuse-ld=lld -nostdlib -fpie -pie a.c % objdump -dr a.o ... : 0: 55 push %rbp 1: 48 89 e5mov%rsp,%rbp 4: 48 8d 05 00 00 00 00lea0x0(%rip),%rax# b 7: R_X86_64_PC32_binary_a_c_size-0x4 b: 5d pop%rbp c: c3 retq % gcc -fuse-ld=bfd -nostdlib -fpie -pie a.c b.o -o a /usr/bin/ld.bfd: warning: cannot find entry symbol _start; defaulting to 1000 % objdump -dr a ... 1000 : 1000: 55 push %rbp 1001: 48 89 e5mov%rsp,%rbp 1004: 48 8d 05 39 f0 ff fflea-0xfc7(%rip),%rax# 44 <_binary_a_c_size> 100b: 5d pop%rbp 100c: c3 retq It is incorrect to reference a non-preemptible symbol with a PC relative relocation in a -pie link. GNU ld allows it but the code can be incorrect at runtime. lea-0xfc7(%rip),%rax # loads 44 to %rax only if the load base is 0. Due to ASLR (-pie), this is simply not true. lld correctly rejects the relocation. To fix this, I had a write-up last year: https://gcc.gnu.org/legacy-ml/gcc/2019-05/msg00215.html We should change the configure-time HAVE_LD_PIE_COPYRELOC to an option, probably -f(no-)direct-access-extern In clang, HAVE_LD_PIE_COPYRELOC is a compile-time option: -mpie-copy-relocations. But I think we should improve the option name. At the very least, we can also let -fno-pic code reference an external symbol with GOT to avoid copy relocations. -f(no-)direct-access-extern may be a candidate.
[Bug target/94391] gcc refers to absolute symbols with R_X86_64_PC32 relocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94391 --- Comment #6 from Fangrui Song --- > It is incorrect to reference a non-preemptible symbol with a PC relative > relocation in a -pie link. GNU ld allows it but the code can be incorrect at > runtime. Correction: It is incorrect to reference a non-preemptible SHN_ABS symbol with a PC relative relocation in a PIC (-shared or -pie) link. This is non-representable due to ASLR (load base not fixed at link time)
[Bug target/94391] gcc refers to absolute symbols with R_X86_64_PC32 relocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94391 --- Comment #10 from Fangrui Song --- > extern unsigned long _binary_a_c_size; > unsigned long foo() { return _binary_a_c_size; } This is incorrect. The code will treat the value of _binary_a_c_size as an address (load base + size) and dereference that address mov-0xfc3(%rip),%rax# 44 <_binary_a_c_size> > NO LLD is not implemented the ABI as PIE COPYRELOC is required by the ABI > these days. My objdump -d output in Comment #5 demonstrates that GNU ld linked code will be incorrect at runtime. It can be argued that either the user code or GCC does the wrong thing, but a linker is not responsible for the mistake. (I have argued lld does the right thing by erroring at link time.) The compiler can ask the assembler to produce an indirect (GOT) reference. The code (`unsigned long foo() { return (unsigned long)_binary_a_c_size; }) will work perfectly. > Also it is wrong for a person to assume a normal C variable could be SHN_ABS; > that is the bug here. > It is a bug in the user code. > I showed up to fix it by using an top level inline-asm. -fno-pic and -fpic work fine. -fpie before commit 77ad54d911dd7cb88caf697ac213929f6132fdcf worked fine. commit 77ad54d911dd7cb88caf697ac213929f6132fdcf ("x86-64: Optimize access to globals in PIE with copy reloc") is responsible for the -fpie change. In 2015, H.J. invented R_X86_64_{REX,}GOTPCRELX. The linker relaxation is a perfect solution. We can retire HAVE_LD_PIE_COPYREL now. // The code will still be faulty but we can argue that it is an user error. __attribute__((visibility("hidden"))) extern unsigned long _binary_a_c_size; unsigned long foo() { return _binary_a_c_size; } The relaxed R_X86_64_{REX,}GOTPCRELX will be a bit longer than R_X86_64_PC32. The difference is small enough and should not matter for practical use cases. For those who care about the tiny regression, we can invent an option -fdirect-access-extern (clang currently calls it -mpie-copy-relocations but we can design a better name). It is more useful on non-x86 architectures for a mostly statically linked program. extern int var; int foo(void) { return var; } // clang -target aarch64 -fPIE -O3 adrpx8, :got:var ldr x8, [x8, :got_lo12:var] ldr w0, [x8] ret // clang -target aarch64 -fPIE -O3 -mpie-copy-relocations adrpx8, var ldr w0, [x8, :lo12:var] ret // x86-64 // clang -O3 -fPIE a.c -Wa,--mrelax-relocations=yes 0: 48 8b 05 00 00 00 00mov0x0(%rip),%rax# 7 3: R_X86_64_REX_GOTPCRELX var-0x4 7: 8b 00 mov(%rax),%eax 9: c3 retq // clang -O3 -fPIE a.c -mpie-copy-relocations 0: 8b 05 00 00 00 00 mov0x0(%rip),%eax# 6 2: R_X86_64_PC32var-0x4 6: c3 retq
[Bug target/94391] gcc refers to absolute symbols with R_X86_64_PC32 relocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94391 --- Comment #23 from Fangrui Song --- (In reply to Andrew Pinski from comment #18) > (In reply to Yuxuan Shui from comment #17) > > Sorry, I am here to report a bug, not to find a workaround for my use case. > > I gave you the correct usage for your use case. If you don't like it is not > my fault. A wontfix/invalid does not seem a proper resolution to the bug(s). We need a solution, instead of a workaround (SHN_ABS _binary_*_size can be changed to _binary_*_end minus _binary_*_start). Let me repeat. The code has worked fine for a long time. 1. -fno-pie code can only be linked with -no-pie. A PC32 relocation can be resolved to a SHN_ABS definition. 2. -fpie code can be linked with either -no-pie or -pie. 3. -fpic code can be linked with -no-pie, -pie or -shared. GCC produces a GOT relocation. The linker will fill up the GOT entry at link time. It is a constant at runtime. 1 and 3 always work. For 2 -fpie -pie, it is incorrect to reference a non-preemptible SHN_ABS symbol with a PC relative relocation in a PIC (-shared or -pie) link (missed GNU ld diagnostic: https://sourceware.org/bugzilla/show_bug.cgi?id=25749) A GOT relocation was produced until commit 77ad54d911dd7cb88caf697ac213929f6132fdcf ("x86-64: Optimize access to globals in PIE with copy reloc"). I have proposed my solution in Comment 10: revert the patch. It has very little value after H.J. invented GOTPCRELX in 2015. As a compensation, we can invent a pair of new -f options, -f(no-)direct-access-extern-object. -fno-pie defaults to -fdirect-access-extern-object. -fpie defaults to -fno-direct-access-extern-object. -fno-pie users who really want to get rid of copy relocations can enable -fno-direct-access-extern-object. Note: a copy relocation is needed if the definition turns out to be provided by a shared object. -fpie users who really care about GOT slowdown can enable -fdirect-access-extern-object. This is more relevant on non-x86 due to the lack of linker relaxation (R_X86_64_{REX_,}GOTPCRELX)
[Bug preprocessor/77488] Proposal for __FILENAME_ONLY__
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77488 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #8 from Fangrui Song --- Clang since version 9 supports `__FILE_NAME__` (basename) as an extension https://reviews.llvm.org/D61756 I don't know whether it has been proposed on WG14 or WG21 mailing lists, though (seems not).
[Bug target/95095] New: Feature request: support -fno-unique-section-names
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095 Bug ID: 95095 Summary: Feature request: support -fno-unique-section-names Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- -ffunction-sections produces sections .text.foo , .text.bar , etc, which can take significant amount of string table space. In clang, -fno-unique-section-names emits multiple ".text" sections which can share the section name. Multiple sections with the same name require the new GNU as feature https://sourceware.org/bugzilla/show_bug.cgi?id=25380 (binutils 2.35). For .text.exit.* .text.unlikely.* .text.hot.* .text.startup.* , the preferred sections are .text.exit. .text.unlikely. .text.hot. .text.startup. The trailing dots can avoid a linker problem described in https://reviews.llvm.org/D79600 - pasted below for your convenience GNU ld's internal linker script uses (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=add44f8d5c5c05e08b11e033127a744d61c26aee) .text : { *(.text.unlikely .text.*_unlikely .text.unlikely.*) *(.text.exit .text.exit.*) *(.text.startup .text.startup.*) *(.text.hot .text.hot.*) *(SORT(.text.sorted.*)) *(.text .stub .text.* .gnu.linkonce.t.*) /* .gnu.warning sections are handled specially by elf.em. */ *(.gnu.warning) } Because *(.text.exit .text.exit.*) is ordered before *(.text .text.*), in a -ffunction-sections build, the C library function exit will be placed before other functions. gold's -z keep-text-section-prefix has the same problem. In lld, -z keep-text-section-prefix recognizes .text.{exit,hot,startup,unlikely,unknown}.*, but not .text.{exit,hot,startup,unlikely,unknown}, to avoid the strange placement problem.
[Bug debug/95096] New: Feature request: add -fsplit-dwarf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95096 Bug ID: 95096 Summary: Feature request: add -fsplit-dwarf Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- -gsplit-dwarf has an undesired property: it sets the debug info level to 2. When plugged into a build system, this can enable debug info unnecessarily (when the user does not specify -g or specifies -g0). -fsplit-dwarf can enable .dwo, but do not enable debug info by itself. Its reaction with -g1 may need some thoughts: whether line tables in .dwo will be beneficial. As a start, we can add the option first, which should be simple (for a beginner like me:/ )
[Bug debug/95096] Feature request: add -fsplit-dwarf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95096 --- Comment #1 from Fangrui Song --- Created https://sourceware.org/pipermail/gcc-patches/2020-May/545638.html
[Bug target/95095] Feature request: support -fno-unique-section-names
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095 --- Comment #1 from Fangrui Song --- I just learned that `int main() {}` compiles to .text.startup in -O2 or -Os It seems that .text.startup. may be better to not accidentally move a C function named `startup` (`startup.` is not a valid C identifier)
[Bug debug/95482] New: Feature request: add -gsplit-dwarf=single
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95482 Bug ID: 95482 Summary: Feature request: add -gsplit-dwarf=single Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- DWARF v5 Appendix F. says > The sections that do not require relocation, however, can be written to the > relocatable object (.o) file but ignored by the linker, or they can be > written to a separate DWARF object (.dwo) file that need not be accessed by > the linker. GCC/clang -gsplit-dwarf write a separate DWARF object (.dwo) clang in addition supports -gsplit-dwarf=single (https://reviews.llvm.org/D52296 ) to write the sections (with the SHF_EXCLUDE flag) in the .o file. Linkers ignore SHF_EXCLUDE sections in non -r mode. Note, SHF_EXCLUDE (0x8000) is in the range of processor-specific bits and clashes with several processors's (obsoleted?) flags (see https://sourceware.org/pipermail/binutils/2020-April/110691.html )
[Bug gcov-profile/96092] New: Should --coverage respect -ffile-prefix-map?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96092 Bug ID: 96092 Summary: Should --coverage respect -ffile-prefix-map? Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me CC: marxin at gcc dot gnu.org Target Milestone: --- % gcc-10 -ffile-prefix-map=/tmp/c=/src --coverage -c -g /tmp/c/a.c # -ffile-prefix-map implies -fdebug-prefix-map % llvm-dwarfdump -debug-info a.o | grep /src DW_AT_name("/src/a.c") DW_AT_comp_dir("/src") DW_AT_decl_file ("/src/a.c") # --coverage is not affected % r2 -qc 'pxw `?v $s`' a.gcno 0x 0x67636e6f 0x42303065 0x27b4c272 0x0002 oncge00Br..' 0x0010 0x706d742f 0x632f 0x0001 0x0100 /tmp/c.. 0x0020 0x000f 0x067072eb 0x40058857 0xdb5de9e8 .rp.W..@..]. 0x0030 0x0002 0x6e69616d 0x 0x main 0x0040 0x0003 0x706d742f 0x612f632f 0x632e /tmp/c/a.c.. 0x0050 0x0001 0x0005 0x0001 0x000c 0x0060 0x0141 0x0001 0x0004 0x0143 ..A...C. 0x0070 0x0003 0x 0x0002 0x0004 0x0080 0x0143 0x0003 0x0002 0x0003 ..C. 0x0090 0x0005 0x0143 0x0003 0x0003 ..C. 0x00a0 0x0001 0x0001 0x0145 0x0009 ..E. 0x00b0 0x0002 0x 0x0003 0x706d742f /tmp 0x00c0 0x612f632f 0x632e 0x0001 0x /c/a.c.. 0x00d0 0x I created this issue because I saw a clang-side proposal https://reviews.llvm.org/D83154 (add -fcoverage-prefix-map) today.
[Bug gcov-profile/96092] Should --coverage respect -ffile-prefix-map?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96092 --- Comment #3 from Fangrui Song --- (In reply to Martin Liška from comment #2) > Apparently we've got a patch in queue that does something similar: > > +fprofile-prefix-path= > +Common·Joined·RejectNegative·Var(profile_prefix_path) > +remove·prefix·from·absolute·path·before·manging·name·for·-fprofile- > generate=·and·-fprofile-use=. Can we generalize the option to -fprofile-prefix-map= and let it be part of -ffile-prefix-map? We can let clang side add -fprofile-prefix-map= as well (https://reviews.llvm.org/D83154#2146085 ) clang may not support -fprofile-prefix-path= as it can be emulated by -fprofile-prefix-map= (IIUC, in GCC, -fprofile-generate uses gcov so either -fprofile-prefix-map= or -fcoverage-prefix-map= will be an ok name. In clang, -fprofile-generate is an instrumentation different from --coverage (gcov).)
[Bug driver/93645] Support Clang 12 --ld-path=
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93645 --- Comment #4 from Fangrui Song --- https://sourceware.org/pipermail/gcc-patches/2020-July/550659.html [PATCH v3] Add --ld-path= to specify an arbitrary executable as the linker I changed the title to --ld-path because -fuse-ld=/absolute/path/to/ld is not a good design. -fuse-ld= can mean the linker flavor (there can be option dispatch on this option) & --ld-path can specify the path overriding -fuse-ld='s default choice. -f* options are usually about code generation or language features. --ld-path does not belong to the category so -f is not very appropriate. Clang 12 will have --ld-path.
[Bug middle-end/192] String literals don't obey -fdata-sections
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #19 from Fangrui Song --- (In reply to Jakub Jelinek from comment #14) > This doesn't really look like a good idea to me. Instead, perhaps ld's > --gc-sections or new special option should just remove unused string > literals from mergeable sections. > With your patch, I bet you lose e.g. all tail merging. Consider: > const char *used1 () { return "foo bar baz blah blah"; } > in one TU and > const char *used2 () { return "bar baz blah blah"; } > in another. The linker necessarily knows which strings (or other data) in > mergeable sections are used and which are unused. I second Jakub's idea that the linker should perform the constant merge (which is implemented in LLD): the cost of a section header (sizeof(Elf64_Shdr)=64) + a section name (".rodata.xxx.str1.1") is quite large. Created a GNU ld (and gold) feature request: https://sourceware.org/bugzilla/show_bug.cgi?id=26622
[Bug gcov-profile/97062] New: [gcov] Don't repeat display of inline functions in headers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97062 Bug ID: 97062 Summary: [gcov] Don't repeat display of inline functions in headers Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me CC: marxin at gcc dot gnu.org Target Milestone: --- This is a minor display issue. >a.cc cat<b.cc cat<a.h cat<
[Bug gcov-profile/91601] gcov: ICE in handle_cycle, at gcov.c:699 happen which get code coverage with lcov.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91601 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #17 from Fangrui Song --- The algorithm is Donald B. Johnson's "Finding all the elementary circuits of a directed graph" (1975). (Hawick and James's just implemented the same algorithm by changing the representation of graphs). I am wondering why we enumerate every elementary cycle, find the minimum edge, reduce edge weighs, and repeat the process. What do we lose if we don't use the costly algorithm? (The time complexity is O(n*e*(c+1)). However, many implementations (Boost and gcov.c) do not use a hash set for the blocked list, and thus I suspect the actual complexity is higher). Do we have other low-cost approaches? (e.g. repeatedly finding strongly connected components and reducing)
[Bug gcov-profile/97065] New: Support -fprofile-update=set (boolean counters)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97065 Bug ID: 97065 Summary: Support -fprofile-update=set (boolean counters) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me CC: marxin at gcc dot gnu.org Target Milestone: --- I can understand that defaulting -fprofile-update=prefer-atomic in GCC 7 and using atomic counters when -pthread is specified is for very good reasons that imprecise line execution counts can be very confusing. However, atomic counters can lead to very drastic performance degradation when the contention is large (e.g. bug 80952) Sometimes users just need to know whether a statement is executed or not. For example lcov does not really need to know the number. A boolean mode -fprofile-update=set may be useful. 'set' is the name used by Go -covermode=set
[Bug gcov-profile/85351] [GCOV] Wrong coverage with exit() executed in a if statement within a called function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85351 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #5 from Fangrui Song --- I am a bit curious how GCC instruments such functions which may alter control flows * exit/execve/execl/etc * fork * functions which may throw or call any above functions If you force a split basic block after such functions, you get counts correct but you pay the costs that there is one more basic block and two more arcs. In -fprofile-arcs you need to pay the instrumentation cost of one arc (after taking into account of the Kirchhoff circuit law's spanning tree optimization). If you assume every external function call may alter control flows, you pay rather large overhead for things you probably care little (since I know some underlying mechanism I don't trust line counts after special functions).
[Bug driver/93645] Support Clang 12 --ld-path=
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93645 --- Comment #5 from Fangrui Song --- Ping
[Bug c++/92413] New: [temp.explicit] Explicit template instantiations should not define member functions that are not defined at the point of instantiation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92413 Bug ID: 92413 Summary: [temp.explicit] Explicit template instantiations should not define member functions that are not defined at the point of instantiation Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- https://wg21.cmeerw.net/cwg/issue546 Change 17.8.2 [temp.explicit] paragraph 8 as follows: An explicit instantiation definition that names a class template specialization explicitly instantiates the class template specialization and is only an explicit instantiation definition of members whose definition is visible>that have been defined at the point of instantiation. template struct C {void foo();}; template struct C; template void C::foo() {} GCC<4.9 does not define C::foo(), while GCC>=4.9 defines C::foo() I am not sure whether this example is non-conforming, but -Wall -Wextra -pedantic gives no diagnostic. (clang 3.0~trunk does not define C::foo(). You may read the discussions at https://bugs.llvm.org/show_bug.cgi?id=43937)
[Bug c/93194] New: -fpatchable-function-entries : __patchable_function_entries has wrong sh_flags and sh_addralign
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93194 Bug ID: 93194 Summary: -fpatchable-function-entries : __patchable_function_entries has wrong sh_flags and sh_addralign Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- % cat a.c void f(){} % gcc -fpatchable-function-entry=3 -c a.c % readelf -S a.o ... [Nr] Name TypeAddress OffSize ES Flg Lk Inf Al [ 0] NULL 00 00 00 0 0 0 [ 1] .text PROGBITS 40 0a 00 AX 0 0 1 [ 2] .data PROGBITS 4a 00 00 WA 0 0 1 [ 3] .bss NOBITS 4a 00 00 WA 0 0 1 [ 4] __patchable_function_entries PROGBITS 4a 08 00 A 0 0 1 [ 5] .rela__patchable_function_entries RELA 0001a0 18 18 I 10 4 8 sh_addralign of __patchable_function_entries should be 8 on ELF64 platforms, 4 on ELF32 platforms, instead of 1. __patchable_function_entries should have the SHF_WRITE flag. A __patchable_function_entries entry is relocated by a symbolic relocation (e.g. R_X86_64_64, R_AARCH64_ABS64, R_PPC64_ADDR64). In -shared or -pie mode, the linker will create a dynamic relocation, * non-preemptible (STB_LOCAL / non-STV_DEFAULT / -Bsymbolic / not-shared / --dynamic-list excluded / etc): relative relocation (e.g. R_X86_64_RELATIVE) * preemptible: symbolic relocation (e.g. R_X86_64_64) (We can't emit an offset relative to the image base (.quad .Lfoo - .Lbase), because differences across sections are generally not representable. A symbolic relocation gives the runtime code information about the symbol names, which may be desirable.)
[Bug middle-end/93194] -fpatchable-function-entries : __patchable_function_entries has wrong sh_flags and sh_addralign
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93194 --- Comment #1 from Fangrui Song --- The SHF_WRITE issue has been fixed. https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00271.html will fix sh_addralign
[Bug middle-end/93195] New: -fpatchable-function-entries : __patchable_function_entries should consider comdat groups
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93195 Bug ID: 93195 Summary: -fpatchable-function-entries : __patchable_function_entries should consider comdat groups Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- % cat a.cc inline void foo() {} void bar() { foo(); } % cat b.cc inline void foo() {} void bar1() { foo(); } % g++ -fpatchable-function-entry=1 -c a.cc b.cc Linkers don't allow a relocation to a discarded symbol (foo). % ld.bfd a.o b.o ... `.text._Z3foov' referenced in section `__patchable_function_entries' of b.o: defined in discarded section `.text._Z3foov[_Z3foov]' of b.o % gold a.o b.o b.o(__patchable_function_entries+0x0): error: relocation refers to local symbol "" [5], which is defined in a discarded section section group signature: "_Z3foov" prevailing definition is from a.o % ld.lld a.o b.o ld.lld: error: relocation refers to a discarded section: .text._Z3foov >>> defined in b.o >>> referenced by b.cc >>> b.o:(__patchable_function_entries+0x0)
[Bug middle-end/93197] New: -fpatchable-function-entries : __patchable_function_entries does not survive under --gc-sections
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93197 Bug ID: 93197 Summary: -fpatchable-function-entries : __patchable_function_entries does not survive under --gc-sections Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- __patchable_function_entries is not a GC root, and not referenced by a retained section. It will thus be garbage collected. The only solution I can think of requires fixes to both GCC and GNU ld. * GNU ld: implement interaction between SHF_LINK_ORDER and --gc-sections https://sourceware.org/bugzilla/show_bug.cgi?id=24526 * GCC: Create one __patchable_function_entry section for each function. For each function `foo`, + If foo needs to be placed in a comdat group, place __patchable_function_entry in the comdat group + Otherwise, set the SHF_LINK_ORDER flag of __patchable_function_entry and set its sh_link to reference the section containing `foo`
[Bug target/92424] [aarch64] Broken code with -fpatchable-function-entry and BTI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #8 from Fangrui Song --- Where shall we place .cfi_startproc? Clang HEAD (and clang 10)'s placement is: foo: .loc 1 3 0 # line number .cfi_startproc # CFI bti c .Lpatch0: # __patchable_function_entries label nop Not placing .cfi_startproc (as GCC current does) will make addr2line on the function entry address print ??:0 For M>0, clang does not attach line number information for NOPs before the function entry label.
[Bug target/93492] Broken code with -fpatchable-function-entry and -fcf-protection=full
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93492 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #2 from Fangrui Song --- On Clang's side. After https://reviews.llvm.org/D73760 , clang -target x86_64 -fpatchable-function-entry=2,0 -fcf-protection=branch -S a.c -g: .cfi_startproc is placed at the function entry, so that NOPs after the function entry are in the CFI region .loc directive is similar. The idea is that addr2line at the function address should show the correct filename and line, instead of ??:0. foo:# @foo .Lfoo$local: .Lfunc_begin0: .file 1 "/tmp/c" "a.c" .loc1 3 0 # a.c:3:0 .cfi_startproc # %bb.0:# %entry endbr64 .Lpatch0: xchgw %ax, %ax ... .section __patchable_function_entries,"awo",@progbits,foo,unique,0 .p2align3 .quad .Lpatch0 The section flag "o" and the linkage "unique" (LLVM assembly extensions) are used to fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93197 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93195 . I have filed GNU as feature requests (https://sourceware.org/bugzilla/show_bug.cgi?id=25380 https://sourceware.org/bugzilla/show_bug.cgi?id=25381). GNU ld needs required garbage collection semantics https://sourceware.org/ml/binutils/2019-11/msg00266.html)
[Bug target/93492] Broken code with -fpatchable-function-entry and -fcf-protection=full
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93492 --- Comment #7 from Fangrui Song --- > Is -fasynchronous-unwind-tables compatible with -fpatchable-function-entry? Apparently the Linux kernel does not care about it. To make it usable in userspace, we should place .cfi_startproc in a reasonable place. (A more concerning issue is that __patchable_function_entries can be stripped by -Wl,--gc-sections , as the bug I linked above describes) Interaction with -g1 (line table) % clang -g -fpatchable-function-entry=2 a.c -o a # latest clang % addr2line -e a 0x$(nm a | awk '/ main/{print $1}') /tmp/c/a.c:1 % gcc -g -fpatchable-function-entry=2 a.c -o a % addr2line -e a 0x$(nm a | awk '/ main/{print $1}') ??:? For M>0, I think it is fine to leave NOPs before the function entry uncovered by line table information. clang -fpatchable-function-entry=2,1 layout is the same as #c2, except for a NOP above foo: % clang -g -fpatchable-function-entry=2,1 a.c -o a # or gcc -g -fpatchable-function-entry=2,1 a.c -o a % addr2line -e a $(nm a | ruby -ane 'print ($F[0].to_i(16)-1).to_s(16) if / main/') crtstuff.c:?
[Bug middle-end/93195] -fpatchable-function-entries : __patchable_function_entries should consider comdat groups
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93195 --- Comment #1 from Fangrui Song --- This is similar to --gc-sections (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93536) but a bit different. The only reasonable fix I can think of is to place __patchable_function_entries in the same section group. The ELF spec says: > A symbol table entry with STB_LOCAL binding that is defined relative to one > of a group's sections, and that is contained in a symbol table section that > is not part of the group, must be discarded if the group members are > discarded. References to this symbol table entry from outside the group are > not allowed. Both GCC and clang reference a .L local symbol in __patchable_function_entries. The __patchable_function_entries must be discarded when the associated text section is discarded. We don't want __patchable_function_entries.foo __patchable_function_entries.bar because that can waste lots of bytes in .shstrtab . clang -fpatchable-function-entry=2 -S a.cc b.cc # COMDAT and SHF_LINK_ORDER are used at the same time .section __patchable_function_entries,"awo",@progbits,_Z3barv,unique,0 .p2align3 .quad .Lfunc_begin0 .section __patchable_function_entries,"aGwo",@progbits,_Z3foov,comdat,_Z3foov,unique,1 .p2align3 .quad .Lfunc_begin1 Because GNU as and ld don't have the features yet. So when -no-integrated-as is specified (the output is expected to be consumable by GNU as) clang -fpatchable-function-entry=2 -no-integrated-as -S a.cc b.cc ## The assembler will combine sections with the same name ## If either .Lfunc_begin0 or .Lfunc_begin1 is discarded, the linker will report an error. .section__patchable_function_entries,"aw",@progbits .p2align3 .quad .Lfunc_begin0 .section__patchable_function_entries,"aw",@progbits .p2align3 .quad .Lfunc_begin1
[Bug target/93492] Broken code with -fpatchable-function-entry and -fcf-protection=full
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93492 --- Comment #11 from Fangrui Song --- (In reply to H.J. Lu from comment #8) > Created attachment 47762 [details] > A patch to handle targetm.asm_out.post_cfi_startproc I don't work on GCC, so I am hoping other x86 maintainers can review. (I know close to zero about its build system. "How to work on GCC" is not well documented. I can play with stage1-gcc/xgcc -B stage1-gcc -fsyntax-only /tmp/c/a.c but I don't even know how to build stage1 only) For tests, I think at least 3 configurations should be tested. -fpatchable-function-entry=0 -fcf-protection=branch -fpatchable-function-entry=1 -fcf-protection=branch -fpatchable-function-entry=2,1 -fcf-protection=branch I am a bit concerned about the introduction of cfi_startproc_emitted My idea is that NOPs after the function entry label should really be a arch specific feature. It should be implemented like a pass beside make_pass_insert_endbranch. We build the function body, then prepend NOPs, then prepend endbr64. That may be cleaner.
[Bug driver/93645] New: Support -fuse-ld=/absolute/path/to/ld
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93645 Bug ID: 93645 Summary: Support -fuse-ld=/absolute/path/to/ld Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- This feature request generalizes -fuse-ld=bfd -fuse-ld=gold https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55470 and -fuse-ld=lld clang -fuse-ld= also supports the following forms: -fuse-ld=/path/to/binutils-gdb/Debug/ld/ld-new -fuse-ld=/path/to/ld.lld -fuse-ld=/usr/bin/ld.lld-9
[Bug driver/93645] Support -fuse-ld=/absolute/path/to/ld
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93645 --- Comment #1 from Fangrui Song --- Posted a patch https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00510.html I agree with https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59321#c4 we should use a new option, instead of overloading --print-prog-name=ld for a different meaning gcc --print-prog-name=ld -fuse-ld=bfd => ld.bfd
[Bug driver/52982] add option to select particular linker
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52982 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #4 from Fangrui Song --- I posted a patch https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00510.html to make -fuse-ld=linker generic (absolute path or ld.linker)
[Bug c/99587] New: warning: ‘retain’ attribute ignored while __has_attribute(retain) is true
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99587 Bug ID: 99587 Summary: warning: ‘retain’ attribute ignored while __has_attribute(retain) is true Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- If configure-time ld does not support SHF_GNU_RETAIN, __has_attribute(retain) may be true while using it will cause a warning. % cat x.c #if defined(__has_attribute) && __has_attribute(retain) __attribute__((used, retain)) int a; #endif % ~/Dev/gcc/out/release/gcc/xgcc -B ~/Dev/gcc/out/release/gcc -c x.c x.c:1:1: warning: ‘retain’ attribute ignored [-Wattributes] 1 | __attribute__((used, retain)) int a; | ^ % ~/Dev/gcc/out/release/gcc/xgcc --version xgcc (GCC) 11.0.1 20210313 (experimental) ... __has_attribute(retain) should return 0 in this case.
[Bug c/99587] warning: ‘retain’ attribute ignored while __has_attribute(retain) is 1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99587 --- Comment #6 from Fangrui Song --- (In reply to Jakub Jelinek from comment #5) > (In reply to Florian Weimer from comment #4) > > For retain, something along these lines might work: > > > > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c > > index c1f652d1dc9..cdae464ab8a 100644 > > --- a/gcc/c-family/c-attribs.c > > +++ b/gcc/c-family/c-attribs.c > > @@ -329,8 +329,10 @@ const struct attribute_spec c_common_attribute_table[] > > = > > handle_used_attribute, NULL }, > >{ "unused", 0, 0, false, false, false, false, > > handle_unused_attribute, NULL }, > > +#if SUPPORTS_SHF_GNU_RETAIN > >{ "retain", 0, 0, true, false, false, false, > > handle_retain_attribute, NULL }, > > +#endif > >{ "externally_visible", 0, 0, true, false, false, false, > > handle_externally_visible_attribute, NULL }, > >{ "no_reorder",0, 0, true, false, false, false, > > > > In other cases, it's more difficult because those are subtarget-dependent. > > Doing the above would "fix" __has_attribute, but on the other side would mean > the compiler would not know how many and what kind of operands the attribute > has, whether it is for function declarations, other declarations, types or > what > etc., so for invalid code it would have inconsistent diagnostics. Are you willing to properly fix it? :) I implemented the attribute on clang (https://reviews.llvm.org/D97447). __has_attribute(retain) is always 1 and there is no ignored diagnostic, regardless of the target (even if non-ELF), and __has_attribute(retain) works in assembly mode as well. This is intentional so that: with bleeding-edge toolchain, non-ELF targets don't need macros to decide whether 'retain' should be added. Ultimately, I want the glibc static linking problem with ld -z start-stop-gc fixed https://sourceware.org/pipermail/libc-alpha/2021-March/123833.html (glibc has -Wattributes, so __has_attribute(retain)=1 && "warning: ‘retain’ attribute ignored" can cause some inconvenience.) And I hope eventually ld -z start-stop-gc can be the default.
[Bug libgcc/99759] New: morestack.S should support .init_array.0 besides .ctors.65535
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99759 Bug ID: 99759 Summary: morestack.S should support .init_array.0 besides .ctors.65535 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgcc Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- to drop reliance on ld's default linker script .init_array: { PROVIDE_HIDDEN (__init_array_start = .); KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) PROVIDE_HIDDEN (__init_array_end = .); } The input section description is quite close but does not sort .init_array.* and .ctors.* with the same priority together.
[Bug target/99836] New: aarch64: -fpatchable-function-entry=N[,0] should place .cfi_startproc before NOPs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99836 Bug ID: 99836 Summary: aarch64: -fpatchable-function-entry=N[,0] should place .cfi_startproc before NOPs Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- Extracted from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424#c8 % echo 'int main() {}' > a.c % clang --target=aarch64 -fpatchable-function-entry=2 -mbranch-protection=standard -S a.c -o - ... main: // @main .Lfunc_begin0: .cfi_startproc // %bb.0: // %entry hint#34 .Lpatch0: nop nop % /tmp/glibc-many/install/compilers/aarch64-linux-gnu/bin/aarch64-glibc-linux-gnu-g++ -fpatchable-function-entry=2 -mbranch-protection=standard -S a.c -o - .arch armv8-a .file "a.c" .text .align 2 .global main .type main, %function main: hint34 // bti c .section__patchable_function_entries,"aw",@progbits .align 3 .8byte .LPFE1 .text .LPFE1: nop nop .LFB0: .cfi_startproc For -fpatchable-function-entry=N[,0], placing .cfi_startproc before NOPs makes more sense and can make unwinding work in that region. For N[,M] where M>0, that is a very narrow use case by the Linux kernel. I prefer not to place .cfi_startproc above the function label.
[Bug gcov-profile/97507] New: Move __gcov_exit from per-object .fini_array.00100 to libgcov
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97507 Bug ID: 97507 Summary: Move __gcov_exit from per-object .fini_array.00100 to libgcov Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me CC: marxin at gcc dot gnu.org Target Milestone: --- Per object file .fini_array.00100 wastes space. __gcov_exit can be called in libgcov. It can be registered via atexit (if first run) in __gcov_init. The Linux kernel does not call destructors and currently discards .fini_array and .fini_array.* `gcc -fprofile-arcs` is currently one reason that .fini_array needs to be discarded (another reason is kasan. I don't know other reasons)
[Bug tree-optimization/66512] PRE fails to optimize calls to pure functions in C++, ok in C
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66512 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #4 from Fangrui Song --- Should this be reopened? https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html 'const' is not clarified on its interaction with threads (https://gcc.gnu.org/legacy-ml/gcc/2015-09/msg00365.html) and void f() { for (;;) g(p()); } is still pessimized for C++ (I tend to agree that 'const' should imply 'nothrow'; even if no, the #c2 case should be resolved properly)
[Bug target/98063] New: Emit R_X86_64_GOTOFF64 instead of R_X86_64_GOTPCRELX for -mcmodel=large -fno-plt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98063 Bug ID: 98063 Summary: Emit R_X86_64_GOTOFF64 instead of R_X86_64_GOTPCRELX for -mcmodel=large -fno-plt Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- % cat a.c #include int main() { puts("meow"); } % gcc -mcmodel=large -fno-plt -O1 -S a.c -fpic -o - -O2 -fno-asynchronous-unwind-tables ... main: .L2: movabsq $_GLOBAL_OFFSET_TABLE_-.L2, %r11 subq$8, %rsp leaq.L2(%rip), %rax movabsq $.LC0@GOTOFF, %rdx addq%r11, %rax leaq(%rax,%rdx), %rdi call*puts@GOTPCREL(%rip) xorl%eax, %eax addq$8, %rsp ret The distance between the GOT entry and the next instruction of call may be larger than 32-bit. An R_X86_64_GOTPCRELX relocation cannot be used.
[Bug c/98112] New: Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112 Bug ID: 98112 Summary: Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- After "x86-64: Optimize access to globals in PIE with copy reloc", GCC x86-64 asks the assembler to produce an R_X86_64_PC32 for an external data access. * It introduced a configure-time variable HAVE_LD_PIE_COPYRELOC which has a misleading name: PC32 does not necessarily cause a copy relocation. If the external data * It affects users who want to configure GCC not to emit R_X86_64_PC32 for an external data access so that copy relocations can be avoided if the data turns out to be defined in a different shared object/executable * While it made sense (in turns of performance) before H.J. Lu added GOTPCRELX to x86-64, it hardly matters if any nowadays. * This optimization can actually benefit non-x86-64. An option is more suitable. In Clang, the GCC style HAVE_LD_PIE_COPYRELOC is implemented as -mpie-copy-relocations, which has a misleading name. I agree that this should be implemented as an option, instead of a configure-time variable. I suggest that we add a new architecture-independent option -f[no-]direct-access-external-data (I am happy to add a similar one in Clang once consensus is made) and delete HAVE_LD_PIE_COPYRELOC. The option means whether a direct access (PC-relative relocation) can be generated for an external data access. The value can default to true for -fno-pic code (it seems that most architectures behave this way). For non-x86-64, the value defaults to false for -fpie/-fpic code (I believe most architectures use a GOT). In the future, for x86-64, please consider defaulting to -fno-direct-access-external-data for -fpie/-fpic so that issues related to STV_PROTECTED data can be properly fixed (see my analysis last year https://gcc.gnu.org/legacy-ml/gcc/2019-05/msg00215.html )
[Bug target/98112] Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112 --- Comment #2 from Fangrui Song --- Note: -fdirect-access-external-data is architecture-independent. For example, currently Clang on aarch64 can perform the following optimization: // clang -target aarch64 -fPIE -O3 adrpx8, :got:var ldr x8, [x8, :got_lo12:var] ldr w0, [x8] ret // clang -target aarch64 -fPIE -O3 -mpie-copy-relocations adrpx8, var ldr w0, [x8, :lo12:var] ret A better name for -mpie-copy-relocations is -fno-direct-access-external-data: 1. the option can affect -fno-pic and -fpic 2. for -no-pie and -pie links, there is not necessary a copy relocation (-fpic can use this option as well, but keep in mind that DSOs do not support copy relocations. So if such code is used for -shared links and the data turns out to be undefined, the linker will reject the object file) --- The second thing about the feature request is that x86-64 should default to -fno-direct-access-external-data for -fpie to address the protected symbol issues. (-fno-direct-access-external-data for -fpie is the behavior on most architectures.) (1): PC32 referencing a protected function is unnecessarily rejected in a -shared link (this also affects aarch64) // gcc -fpic -fuse-ld=bfd -shared -fvisibility=protected b.c => relocation R_X86_64_PC32 against protected symbol `f' can not be used when making a shared object // aarch64-linux-gnu-gcc -fpic -fuse-ld=bfd -shared -fvisibility=protected b.c => relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `f' which may bind externally can not be used when making a shared object; recompile with -fPIC // gold is good void f() {} void *g() { return &f; } This can be fixed by making GNU ld more permissive. (2) protected data access can use slightly more efficient PC32. Currently it uses the slightly pessimized REX_GOTPCRELX. int a __attribute__((visibility("protected"))); int f() { return a; }
[Bug gcov-profile/98257] New: Replace Donald B. Johnson's cycle enumeration with iterative loop finding
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98257 Bug ID: 98257 Summary: Replace Donald B. Johnson's cycle enumeration with iterative loop finding Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me CC: marxin at gcc dot gnu.org Target Milestone: --- gcov used _J. C. Tiernan, An Efficient Search Algorithm to Find the Elementary Circuits of a Graph, Comm ACM 1970_. The worst-case time bound is exponential in the number of elementary circuits. It enumerated cycles (aka simple circuit, aka elementary circuit) and performed cycle cancelling. In 2016, the resolution to PR67992 switched to Donald B. Johnson's algorithm to improve performance. The theoretical time complexity is $O((V+E)(c+1))$ where $c$ is the number of cycles, which is exponential in the size of the graph. (Boost attributed the algorithm to K. A. Hawick and H. A. James, and gcov inherited this name. However, that paper did not improve Johnson's algorithm.) Actually every step of cycle cancelling decreases the count of at lease one arc to 0, so there is at most $O(E)$ cycles. The resolution to PR90380 skipped non-positive arcs and decreased the time complexity to $O(V*E^2)$ (in theory it could be $O(E^2)$ but the implementation has a linear scan). This is all unnecessary. We can just iteratively find cycles (using the classical tri-color DFS) and perform cycle cancelling. There are at most O(E) cycles and the overall time complexity is O(E^2). ( We are processing a reducible flow graph (there is no intuitive cycle count for an irreducible flow graph). Every natural loop is identified by a back edge. By constructing a dominator tree, finding back edges, identifying natural loops and clearing the arc counters (we will compute incoming counts so we clear counters to prevent duplicates), the time complexity can be decreased to $O(depthOfNestedLoops*E)$. In practice, the semi-NCA algorithm (time complexity: $O(V^2)$, but considered faster than the almost linear Lengauer-Tarjan's algorithm) is not difficult to implement, but identifying natural loops is troublesome. So the method is not useful.)
[Bug target/97827] bootstrap error building the amdgcn-amdhsa offload compiler with LLVM 11
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97827 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #9 from Fangrui Song --- I want to know whether this is really a wontfix on GCC's side. Richard Sandiford on https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559572.html "I'm not saying we should bend over backwards to support difficult quirks. But here we're talking about a choice between (a) doing something that works “everywhere” unconditionally (and keeping things simple) vs. (b) having both code that takes a shortcut and code that doesn't take a shortcut and trying to predict which one we should do." This makes a lot of sense to me. For the LLVM "fix", we had not known this PR before https://reviews.llvm.org/D92052#2452577 To me personally, I might have a different opinion if I knew this is not an entire dead end on gcc -S output.
[Bug target/97827] bootstrap error building the amdgcn-amdhsa offload compiler with LLVM 11
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97827 --- Comment #10 from Fangrui Song --- Note: the section key is not just (name, group name "G"). It is a quadruple: (name, group name "G", linked-to "o", unique ID) Keeping just name works for the simplest case. If GCC decides to support PR95095 -fno-unique-section-names, unique ID can be common. https://sourceware.org/bugzilla/show_bug.cgi?id=25490#c3 added the support for `.section ,unique` to GNU as.
[Bug target/98112] Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112 --- Comment #3 from Fangrui Song --- Are you happy with the option name -f[no-]direct-access-external-data ? https://reviews.llvm.org/D92633 is what I want to add to Clang. I want GCC and Clang to use the same option names...
[Bug middle-end/93195] -fpatchable-function-entries : __patchable_function_entries should consider comdat groups
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93195 --- Comment #10 from Fangrui Song --- (In reply to Jakub Jelinek from comment #9) > I believe this broke building the kernel, see > https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561974.html > for details. For > ld: .init.data has both ordered [`__patchable_function_entries' in > init/main.o] and unordered [`.init.data' in > ./drivers/firmware/efi/libstub/vsprintf.stub.o] sections ld should be flexible in mixed SHF_LINK_ORDER & non-SHF_LINK_ORDER components in an output section https://sourceware.org/bugzilla/show_bug.cgi?id=26256
[Bug c/94722] implement __attribute__((no_stack_protector)) function attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #7 from Fangrui Song --- (In reply to Martin Liška from comment #6) > Implemented. #include void foo(const char *a) { char b[34]; strcpy(b, a); } __attribute__((no_stack_protector)) void bar(const char *a) { foo(a); } #include __attribute__((no_stack_protector)) void foo(const char *a) { char b[34]; strcpy(b, a); } void bar(const char *a) { foo(a); } In both cases, foo can be inlined. In Clang, the recent resolution https://reviews.llvm.org/D91816 is that a ssp function cannot be inlined into a nossp function and a nossp function cannot be inlined into a ssp function. I think one argument for the no-inline behavior is that ssp conveys the security enforcement intention and the GCC behavior may degrade the security hardening while inlining a ssp chunk. Previously Clang upgraded the caller from nossp to ssp after inlining. However, that behavior caused https://lore.kernel.org/lkml/20200422192113.gg26...@zn.tnic/T/#t (the caller may not have set up %gs and upgrading it to ssp can break it) The new Clang behavior also disallows a nossp callee from being inlined into a ssp caller. That makes the rules easier to explain but I haven't thought very clearly about the implications though.
[Bug target/98112] Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112 --- Comment #5 from Fangrui Song --- (In reply to Segher Boessenkool from comment #4) > (In reply to Fangrui Song from comment #3) > > Are you happy with the option name -f[no-]direct-access-external-data ? > > Not at all, no :-( > > The name does not explain its purpose at all, and the whole concept only > makes sense for a fraction of all targets. > A -mcopy-relocs ("generate copy > relocations if that is a good idea"), defined *per target*, would be a lot > better, or a -mpic-use-copy-relocs (since you say it is *not* just for pie), > or something like that. Please read my first comment why copy relocs is a bad name. The compiler behavior is whether the external data symbol is accessed directly/indirectly. Copy relocs is just the inferred ELF linker behavior (in -no-pie/-pie link mode) when the symbol is external. The option name should mention the direct behavior, instead of the inferred behavior at the linking stage. -fdirect-access-external-data makes sense on other binary formats, though I won't ask GCC to implement relevant behaviors for other binary formats. * For example, on COFF, the behavior is like always -fdirect-access-external-data. __declspec(dllimport) is needed to use indirect access. * On Mach-O, the behavior is like -fdirect-access-external-data for -fno-pic (only available on arm) and the opposite for -fpic. If you don't want to think of non-ELF, feel free to make the option specific to ELF. Also feel free to make it specific to -fno-pic/-fpie (disallowed for -fpic). I have no plan to implement Clang -fdirect-access-external-data for -fpic as well. > You want to have this a generic option, while it is > not clear at all what it would mean, what it would *do*, which is especially > important if you want this to be an option used by multiple compilers: if it > is not clear to every user what simple, sensible thing a flag is the knob > for, that flag simply cannot be used at all -- or worse, some users *will* > use it, but then their intentions are not clear to humans, and different > compilers can (and will!) think the user wanted something else! To be clear, GCC botched things with the inappropriate HAVE_LD_PIE_COPYRELOC and I made the proposal to (1) let non-x86-64 leverage the missing optimization for -pie (2) eventually fix the x86-64 STV_PROTECTED story. I have considered all the potential simplification of internal representations for Clang this option will enable. (llvm/lib/Target/TargetMachine.cpp shouldAssumeDSOLocal can be further simplified with this option)
[Bug target/98112] Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112 --- Comment #7 from Fangrui Song --- (In reply to Segher Boessenkool from comment #6) > (In reply to Fangrui Song from comment #5) > > Please read my first comment why copy relocs is a bad name. > > Since I reply to some of that (namely, your argument 1)), you could assume I > have read your comment already ;-) > > > The compiler > > behavior is whether the external data symbol is accessed > > directly/indirectly. > > Not really, no. It isn't clear at all what "directly" even means! > > Copy relocs is just the inferred ELF linker behavior > > (in -no-pie/-pie link mode) when the symbol is external. The option name > > should mention the direct behavior, instead of the inferred behavior at the > > linking stage. > > Yes. But your proposed solution just makes this worse :-( I try to use one term to describe absolute/PC-relative relocation types (e.g. R_X86_64_64, R_X86_64_PC32)... "Indirect" means GOT-generating relocation types and (PowerPC64) TOC-generating relocation types. "direct/indirect" are more descriptive and more accurate than "copy relocs" (which is not the case if the symbol turns out to be defined locally; this term does not apply to other binary formats). > > -fdirect-access-external-data makes sense on other binary formats, though I > > won't ask GCC to > > implement relevant behaviors for other binary formats. > > But what does that *mean*? "direct access"? (And, "external data", for that > matter! This isn't as obvious as it was thirty years ago.) In PowerPC64 ELF v2, the term "GOT-indirect addressing" is used, In x86-64 psABI, there is a section "Indirect Call via the GOT Slot". Indirect calls/jumps are pretty common - so it is understood that GOT relocation types generally mean "indirect". "external data" is the best term I find for things like `extern int var;` It means the data symbol is undefined in the current translation unit but may be defined in another translation unit or another linked unit. > > * For example, on COFF, the behavior is like always > > -fdirect-access-external-data. __declspec(dllimport) is needed to use > > indirect access. > > I don't know what "declspec" is. Something something mswindows? Yes. `extern int var; int foo() { return var; }` compiles to `movl var(%rip), %eax` (a "direct access" (PC-relative) relocation type). Its behavior is like always -fdirect-access-external-data. __declspec(dllimport) annotation can override the command line option. > > * On Mach-O, the behavior is like -fdirect-access-external-data for -fno-pic > > (only available on arm) and the opposite for -fpic. > > So what you want is that object that are globally visible will be implemented > as-is? For if you do not do whole-program optimisation, for example? So > that > a) those objects will actually *exist*, and b) they will be laid out in the > way > the program expects? Undefined global objects and address-taken functions in the current translation unit are affected. A function taken address is very like a data symbol: ``` // gcc -fno-pic generates an absolute relocation type. If foo is defined in a DSO, // it will require a "canonical PLT entry" (st_shndx=0, st_value!=0) - a hack agreed by the linker and ld.so extern void foo(); void *addr() { return foo; } ``` The default ELF behavior on most architectures is: -fno-pic uses an absolute relocation type while (non-x86-64) -fpie uses a GOT-generating relocation type (x86-64) -fpie uses PC-relative. If -fno-direct-access-external-data is specified, -fno-pic/-fpie will use GOT-generating relocation types to prevent * copy relocations if the symbol turns out to be undefined in the module. * canonical PLT entry for an address-taken function. The proposed option is local to a translation unit (like most options). However, if this information is recorded in LTO IR files, the optimizer can assume the variable can be referenced via a direct relocation type in the combined IR file. > > If you don't want to think of non-ELF, feel free to make the option specific > > to ELF. > > The problem is not that I don't want to think about it, but that the way it > seems to be defined only applies to ELF (and to some specific (sub-)targets > using ELF, even). As I mentioned earlier, this applies to other binary formats. I'll just show you evidence by pointing you directly to the code ;-) In LLVM, generally speaking, a dso_local undefined global object is accessed directly while a non-dso_local undefined global object is accessed via GOT indirection. In Clang, dso_local annotation is added in https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CodeGenModule.cpp#L913-L988 (The internal abstraction is currently a bit unfortunate. LLVM IR has another set of rules (many are duplicated) https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/TargetMachine.cpp#L94-L178 I intend to eventually clean up the LLVM IR side rules) (Attributes generally supersede
[Bug target/95095] Feature request: support -fno-unique-section-names
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095 --- Comment #3 from Fangrui Song --- (In reply to Segher Boessenkool from comment #2) > Can't we use ".text%name" for -ffunction-sections, like we did originally, > in 1996? See cf4403481dd6. This does not conflict with other section > names, and does not have all the problems you get from doing anything that > is not a simple prefix. A function named 'foo' compiles to '.text%foo'? It might have been better to avoid conflicts with '.text.startup' '.text.hot' etc but now such a change would just inconvenience users (think of various Linux kernel linker script fragments). .text%name does not address -fno-unique-section-names.
[Bug libstdc++/98785] New: _Unwind_ForcedUnwind going through a non-empty exception specification
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98785 Bug ID: 98785 Summary: _Unwind_ForcedUnwind going through a non-empty exception specification Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- gcc/testsuite/g++.dg/eh/forced3.C says forced unwinding calls std::unexpected going through a throw() function. gcc/testsuite/g++.dg/eh/forced4.C says forced unwinding does not call std::unexpected going through a throw(int) function. The behavior looks strange: if we consider forced unwinding a special exception type, both throw() and throw(int) should catch it. Note: for nothrow, GCC emits minimum amount of .gcc_except_table section. forced unwinding calls std::terminate.
[Bug target/95095] Feature request: support -fno-unique-section-names
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095 --- Comment #5 from Fangrui Song --- Linux kernel include/asm-generic/vmlinux.lds.h currently has #define TEXT_TEXT \ ALIGN_FUNCTION(); \ *(.text.hot .text.hot.*)\ *(TEXT_MAIN .text.fixup)\ *(.text.unlikely .text.unlikely.*) \ *(.text.unknown .text.unknown.*)\ NOINSTR_TEXT\ *(.text..refcount) \ *(.ref.text)\ MEM_KEEP(init.text*)\ MEM_KEEP(exit.text*)\ If you change .text.* to .text%* , this script will need a change, along with other projects which use or adapt GNU ld's built-in linker script .text : { *(.text.unlikely .text.*_unlikely .text.unlikely.*) *(.text.exit .text.exit.*) *(.text.startup .text.startup.*) *(.text.hot .text.hot.*) *(SORT(.text.sorted.*)) *(.text .stub .text.* .gnu.linkonce.t.*) /* .gnu.warning sections are handled specially by elf.em. */ *(.gnu.warning) } By default, -fno-unique-section-names produces '.text' instead of '.text.foo' in the normal -ffunction-sections case. For PGO, -fno-unique-section-names produces '.text.hot.' instead of '.text.hot.foo' in the normal -ffunction-sections case. '.text.hot.' is an attempt to distinguish PGO caused 'hot' from a regular functions named 'hot'.
[Bug target/95095] Feature request: support -fno-unique-section-names
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095 --- Comment #7 from Fangrui Song --- (In reply to Segher Boessenkool from comment #6) > I was under the impression this unique section thing needed the trailing > dot thing. This probably is not true. > > I still think the old "%" thing is much superior to the trailing dot thing, > but that then is orthogonal to the "unique section" thing, so let's ignore > it now :-) > > It still remains that this flag needs a name that says what it *does*, as I > mentioned at the end of Comment 4. -ffunction-sections -fno-unique-section-names => .text.% .text.startup.% .text.hot.% .text.cold.% ... ? I agree that it is superior. If GCC wants to support this scheme, that looks fine to me. It is likely that I can migrate Clang to this scheme as well. I think .text% .text.startup% .text.hot% .text.cold% ... is slightly worse.
[Bug target/95095] Feature request: support -fno-unique-section-names
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095 --- Comment #9 from Fangrui Song --- (In reply to Segher Boessenkool from comment #8) > I say nothing like that. I say that > .text.hot. > is nasty (is easily mistaken for .text.hot). > > I also say that and that named-per-function sections are better as > .text%name > than as > .text.name > (just as they were long ago), because this doesn't conflict with things like > .text.hot > (and there is a very long history of such conflicts giving real-world > problems). .text%name and .text.hot%name will break existing output section descriptions for .text My scheme .text.% .text.hot.% is backward compatible.
[Bug c/99282] New: Emit .cfi_sections without arguments for -fno-asynchronous-unwind-tables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99282 Bug ID: 99282 Summary: Emit .cfi_sections without arguments for -fno-asynchronous-unwind-tables Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- .cfi_* in inline asm is rare, but can be useful if the user wants precise unwind information. % cat a.c int main() { asm("pushl 0\n.cfi_adjust_cfa_offset 4\npop %%eax\n.cfi_adjust_cfa_offset -4" ::: "eax"); } % gcc -m32 -c -fomit-frame-pointer -fno-asynchronous-unwind-tables a.c a.c: Assembler messages: a.c:3: Error: CFI instruction used without previous .cfi_startproc a.c:5: Error: CFI instruction used without previous .cfi_startproc -fasynchronous-unwind-tables & -fno-asynchronous-unwind-tables do not have a predefined macro, so it is difficult for the inline asm to know whether CFI directives should be used. For ergonomics, users just want to write CFI directives and hope they will be silently ignored in -fno-asynchronous-unwind-tables mode. However, GNU as errors for .cfi_* without .cfi_startproc . I suggest that (1) GCC emits ".cfi_sections" (no argument) at the beginning, (2) GNU as suppresses the error if no .eh_frame/.debug_frame is needed (feature request: https://sourceware.org/bugzilla/show_bug.cgi?id=27472).
[Bug inline-asm/99282] Emit .cfi_sections without arguments for -fno-asynchronous-unwind-tables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99282 --- Comment #2 from Fangrui Song --- (In reply to Jakub Jelinek from comment #1) > There is the __GCC_HAVE_DWARF2_CFI_ASM predefined macro that tells if .cfi* > directives are used or not. And, inline asm that wishes to be usable in > both can use that. Thanks. I did not know this macro. So the user writing inline asm does have a way to know whether .cfi_* should be inserted. If you think emitting `.cfi_sections` is unnecessary, I am fine and happy that this is closed. (GCC already generates `.cfi_sections .debug_frame\n`, so perhaps supporting `.cfi_sections\n` is not that costly? :) Users will newer toolchain can be a bit happier - they don't need to do `#ifdef __GCC_HAVE_DWARF2_CFI_ASM`).
[Bug demangler/100437] New: libiberty: Support more characters for function clones
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100437 Bug ID: 100437 Summary: libiberty: Support more characters for function clones Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: demangler Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- In the demangler, the ('.' (alpha|'_')+) ('.' digit+)* scheme as implemented for PR40831 allows a decimal but not a hexadecimal. It'd be great to support a hexadecimal (or more characters e.g. base64). There are at least two use cases in clang now. 1. In Clang ThinLTO, a local symbol needs to be promoted to a global symbol so that it can be imported into other modules. Such a symbol gets a suffix with a hash (a simple increasing ID scheme cannot avoid collision), e.g. _ZL5localv.llvm.104029495979337208 % c++filt <<< _ZL5localv.llvm.104029495979337208 local() [clone .llvm.104029495979337208] # A suffix with mixed digits and letters (e.g. many hexadecimals) doesn't work. % c++filt <<< _ZL5localv.llvm.11aa _ZL5localv.llvm.11aa 2. clang -funique-internal-linkage-names -c a.cc # use clang trunk (Improve profile accuracy for local symbols) There is a long decimal representation of a MD5 module hash. _ZL5localv.__uniq.247706070344499593425200173608446019371 If more digits are allowed, clang can switch to that so that shorter symbol names can be used, saving .strtab space. I understand that the original digit/letter separation is to allow multiple clones. There should be some way supporting more characters. If it is not useful to know there are 4 clones, just lift the restriction? % c++filt <<< _ZL5localv.llvm.aaa.000.bbb.111.ccc.222 local() [clone .llvm] [clone .aaa.000] [clone .bbb.111] [clone .ccc.222]
[Bug c/100483] New: Extend -fno-semantic-interposition to global variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483 Bug ID: 100483 Summary: Extend -fno-semantic-interposition to global variables Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- % cat a.c int var; int foo() { return var; } (I implemented this for clang 11 x86) % clang -fpic -fno-semantic-interposition -O2 -S a.c % cat a.s ... foo:# @foo .Lfoo$local: # %bb.0:# %entry movl.Lvar$local(%rip), %eax retq ... var: .Lvar$local: .long 0 # 0x0 .size var, 4 # On x86-64, because of R_X86_64_REX_GOTPCRELX, it isn't too bad without the optimization. # This is more useful on other architectures without GOT optimization. # With my clang patch https://reviews.llvm.org/D101873 % clang -target aarch64 -fpic -fno-semantic-interposition -fno-asynchronous-unwind-tables -O2 -S a.c % cat a.s ... foo:// @foo .Lfoo$local: // %bb.0: // %entry adrpx8, .Lvar$local ldr w0, [x8, :lo12:.Lvar$local] ret
[Bug c/100593] New: [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 Bug ID: 100593 Summary: [ELF] -fno-pic: Use GOT to take address of an external default visibility function Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- Most ELF targets use an absolute relocation (e.g. R_X86_64_32) to take the address of a default visibility non-definition function declaration. The absolute relocation can cause a canonical PLT entry (st_shndx=0, st_value!=0; The term is a parlance within a few LLD developers, but not broadly adopted). If the defining DSO is linked with Bsymbolic-functions (or -Bsymbolic), the addresses taken within the DSO and outside of the DSO will be different. Since C++ requires uniqueness of the address, this violates the language standard. Outside of the GNU ELF world, many dynamic linking implementations have shifted to a direct binding and non-interposition by default world. We have rants from people complaining about shared object performance. (e.g. https://lore.kernel.org/lkml/CAHk-=whs8QZf3YnifdLv57+FhBi5_WeNTG1B-suOES=rcus...@mail.gmail.com/ "Re: Very slow clang kernel config .." https://www.facebook.com/dan.colascione/posts/10107358290728348 "Python is 1.3x faster when compiled in a way that re-examines shitty technical decisions from the 1990s.") I believe ld -Bsymbolic-functions can materialize most of the savings other implementations provide, without introducing complex things to ELF. However, since -Bsymbolic-functions doesn't play well with -fno-pic's canonical PLT entries, we should fix -fno-pic. Converting a direct access to a GOT access for a function symbol cannot be in a performance critical path, so let's just do it. Static linking is happy, too - the linker can either optimize out the GOT (x86-64 GOTPCRELX, PPC64 TOC) or prefill the GOT entry with a constant. Once -fno-pic has the sane behavior (GOT by default), more and more shared objects can be optionally built with -Bsymbolic-functions - if they don't intend to support interposition, while still being compatible with -fno-pic executables. How effective is -Bsymbolic-functions? As a data point, my x86_64 Linux kernel defconfig build with -Bsymbolic-functions linked Clang is 15% faster. (83% JUMP_SLOT relocations are eliminated!) % cat a.c extern void fun(); void *get() { return (void*)fun; } % gcc -fno-pic -S a.c -O2 -o - get: .LFB0: .cfi_startproc movl$fun, %eax ret % aarch64-linux-gnu-gcc -fno-pic -S a.c -O2 -o - ... adrpx0, fun add x0, x0, :lo12:fun # good, ppc64 elfv2 always uses TOC % powerpc64le-linux-gnu-gcc -fno-pic -S a.c -O2 -o - ... addis 3,2,.LC0@toc@ha ld 3,.LC0@toc@l(3)
[Bug c/100483] Extend -fno-semantic-interposition to global variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483 --- Comment #1 from Fangrui Song --- Another request is a new option: -fno-semantic-interposition-function. With this option, we only assume functions cannot be interposed. -fno-semantic-interposition assumes both functions and variables cannot be interposed.
[Bug c/100618] New: Add a -fno-semantic-interposition variant which allows variable interposition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100618 Bug ID: 100618 Summary: Add a -fno-semantic-interposition variant which allows variable interposition Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- Extracted from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483 The documentation says -fno-semantic-interposition applies to variables. Having an option which only apply to external linkage function definitions will be useful. Assuming no-variable-interposition is unfortunately incompatible with the plethora of copy relocations: -fno-pic emits direct access relocations referencing a global variable. If the global variable turns out to be defined in a shared object, there will be a copy relocation in the executable. The object the shared object sees and the executable sees will be different. See https://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic#the-last-alliance-of-elf-and-men for more context.
[Bug c/100618] Add a -fno-semantic-interposition variant which allows variable interposition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100618 --- Comment #1 from Fangrui Song --- Perhaps -fsemantic-interposition=function,variable (default -fpic/-fPIC) -fsemantic-interposition=variable (compatible with copy relocations but enable function optimizations) -fsemantic-interposition= (alias: -fno-semantic-interposition) ?
[Bug c/100483] Extend -fno-semantic-interposition to global variables
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483 Fangrui Song changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #5 from Fangrui Song --- (In reply to Jan Hubicka from comment #3) Thanks for the clarification. I misinterpreted the documentation. Then it seems that -fno-semantic-interposition is a very safe optimization for distributions to default to. Closing as intended. I will try changing Clang to drop the local aliases for variables. It is tricky not to use local aliases for address taking of functions, though. Fortunately, this will not cause any problems once we do https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 (In reply to H.J. Lu from comment #4) I will much appreciate it if you want to fix some copy relocations/canonical PLT entries issues so that it will be more easy for distributions to switch to something like a default -Wl,-Bsymbolic-global-functions. What does ld.so do for the proposed GNU_PROPERTY_SINGLE_GLOBAL_DEFINITION? Does it apply to STB_GLOBAL or also STB_WEAK? Does it add all definitions to a global namespace to enforce single definition for every candidate? If it does the additional check, this would further slow down dynamic linking. And I believe we should do the function oriented non-interposition-by-default plan, which will not be blocked by copy relocation elimination. (https://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic#copy-relocations)
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #2 from Fangrui Song --- (In reply to Alexander Monakov from comment #1) > It is not necessary to change -fno-pic code generation to gain most of the > -Bsymbolic benefit It is necessary, otherwise the function address taken from the -Bsymbolic/-Bsymbolic-functions/-Bsymbolic-global-functions shared object may be different from the address taken from the -fno-pic code. The ELF hack is called canonical PLT entry, similar to copy relocations. > as you say, the most important point is to avoid jumping > via PLT trampolines (or, with -fno-plt, GOT loads) for function calls, so > the linker could do -Bsymbolic relaxation for sites where address doesn't > matter (calls and jumps) while keeping a dynamic relocation for address > loads? Under some new option of course, like -Bsymbolic-plt. Right? There are two points: (1) R_*_JUMP_SLOT symbol lookup cost (2) whether call sites get penalized by the PLT indirection. -fno-pic code must use GOT (instead of an absolute relocation) for default visibility external function access to be compatible with a -Bsymbolic/-Bsymbolic-functions/-Bsymbolic-global-functions shared object.
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #4 from Fangrui Song --- (In reply to Alexander Monakov from comment #3) > I understand what you're saying, but it seems we're talking past each other. > > I agree that if a library is linked with any -Bsymbolic* flag, the main > executable is at risk of broken address uniqueness unless it uses GOT > indirection. > > I am saying that if the library was linked with a more restrictive variant > of -Bsymbolic (that I called -Bsymbolic-plt), it would still get most the > benefit of -Bsymbolic, while remaining compatible with unmodified > executables. > > Would you agree? You misunderstand this. Emitting GOT-generating relocation in -fno-pic mode is the only way to avoid canonical PLT entry, if the function turns out to be defined in a shared object. No -Bsymbolic variant can make this compatible. Our goal is to eliminate symbol lookup for the function definition in the shared object. We must eliminate symbolic dynamic relocations, i.e. no JUMP_SLOT, no GLOB_DAT, no R_X86_64_64. The linker must set an address in the shared object and bind references to that address. In many programs (not long-running, not all code paths are exercised), the symbol lookup may cost more than the PLT indirection, given the sheer amount of symbol lookups. Now a -fno-pic program uses an absolute/PC-relative relocation => the linker must set an address in the executable's address space as well. The traditional ELF hack (st_value!=0, st_shndx=0) achieves this and let the shared object symbol reference bind to the executable definition. Note that we have explicitly eliminated symbol lookup for the defining shared object so the pointer equality cannot be satisfied at all.
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #6 from Fangrui Song --- (In reply to Alexander Monakov from comment #5) > Hm, I still don't think I'm misunderstanding what you're saying. I'm > familiar with the ELF standard (and FWIW I have read your blog posts on > related matters). I am responding to this sentiment from the opening comment: > > > I believe ld -Bsymbolic-functions can materialize most of the savings other > > implementations provide, without introducing complex things to ELF. > > However, since -Bsymbolic-functions doesn't play well with -fno-pic's > > canonical PLT entries, we should fix -fno-pic. > > I am saying that fixing -fno-pic is not the only possible way forward. > Rather, a restricted -Bsymbolic-functions that relaxes relocations that are > not address-significant allows to still get some (but not all) of the > benefits for unchanged -fno-pic executables. You are right. A pure linker approach is possible. However, I think the approach is inelegant, because the linker would have different preemptibility ideas on different relocation types and (as you said) indirect calls like vtable definitions are not optimized. Let's say the proposed linker option for shared objects is -Bsymbolic-plt. The discussion below focuses on default visibility definitions which would otherwise be preemptible. Let categorize relocation types first. PLT-generating: R_X86_64_PLT32 GOT-generating: R_X86_64_GOTPCREL, R_X86_64_GOTPCRELX, R_X86_64_REX_GOTPCRELX absolute (symbolic): R_X86_64_64 There are three choices. (a) If all relocation types are PLT-generating, bind branch targets directly and suppress the PLT entry. If GOT-generating/absolute relocations are present, don't change behaviors. This choice is less effective for some otherwise address-insignificant functions, e.g. non-vague-linkage virtual functions. b) If all relocation types are R_X86_64_PLT32 or GOT-generating, bind branch targets directly and suppress the PLT entry. If GOT-generating relocations are present, produce a GOT entry and an associated R_X86_64_GLOB_DAT. If absolute relocations are present, don't change behaviors. c) Always bind branch targets directly and suppress the PLT entry. If GOT-generating relocations are present, produce a GOT entry and an associated R_X86_64_GLOB_DAT. If absolute relocations are present, produce outstanding dynamic relocations of the same type. > > You misunderstand this. Emitting GOT-generating relocation in -fno-pic mode > > is the only way to avoid canonical PLT entry, if the function turns out to > > be defined in a shared object. No -Bsymbolic variant can make this > > compatible. > > Well, if you frame the goal as "eliminate canonical PLT entries", then yes, > but that in itself surely is not the end goal? The end goals are reducing > startup time (which my idea helps only partially since it may bind direct > calls but not e.g. vtable definitions) and runtime overheads (where again my > proposal is weaker but not significantly so, assuming address loads are > rarely on hot paths). Yes, the end goal is to reduce startup time and bind call targets directly if feasible. Yes, -Bsymbolic-plt can help the goal partially. > > To clarify once more. I am not outright rejecting the idea in your opening > comment. I am saying that there potentially is a lighter-weight alternative, > which may be implementable purely in the linker, and still gets most of the > benefit you're promoting (like in your Clang example). Which is nice, > because it can be rolled out sooner, individual libraries/distros/users can > opt-in and experiment as they like, etc. Such a -Bsymbolic-plt can achieve some goals. But given that the function pointer equality problems are usually benign (-fno-pic is relatively uncommon in many areas; making use of such pointer equality is not a common practice), I'd hope we just don't add that intermediate linker option.
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #8 from Fangrui Song --- Seems that -fno-plt -fno-pic does have the required properties. A side effect is that all external calls use the (x86-64) call *f@GOTPCREL(%rip) (x86-32) call *f@GOT form. The instruction is one byte longer. (Calling a function is a common case. Taking the address in a non-vtable case is uncommon. So I'd rather punish the uncommon address taking). When the linker notices that the branch target is defined in the executable, it can optimize out the GOT to use an addr32 prefix instead. (gold and ld.lld haven't implemented the optimization for 32-bit) __attribute__((noplt)) int f(); void h() {} void *g() { h(); // call h f(); // call *f@GOTPCREL(%rip) return f; // movq f@GOTPCREL(%rip), %rax }
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #9 from Fangrui Song --- I have a patch to implement this Clang. It'd be good to have a name even if GCC wants to postpone the implementation for now. How about -fdirect-access-external-function & -fno-direct-access-external-function ? It is similar to the feature request -fdirect-access-external-data
[Bug c/100618] Add a -fno-semantic-interposition variant which allows variable interposition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100618 Fangrui Song changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #4 from Fangrui Song --- Clang 13 -fno-semantic-interposition will be mostly consistent with GCC -fno-semantic-interposition. It looked like a misunderstand from my side.
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #11 from Fangrui Song --- (In reply to Alexander Monakov from comment #10) > Is there something wrong or undesirable with making this under -fno-plt (or > the noplt attribute as in your example)? > > (after all, it is a kind of PLT-avoidance transformation, just for > addressing rather than direct calling/jumping) -fno-plt is generally undesired due to longer branch instructions and performance lost when the branch target is defined in the exe/so when the linker is gold/ld.lld (they cannot optimize jmp *got to jmp target) For non-x86, -fno-plt doesn't exist at all. If implemented, there requires many more instructions which are certainly undesirable. So -fno-plt can never be a default. Using GOT to take the address of an external function in -fno-pic is just a better default. I want the behavior to become the behavior, so it should not be under -fno-plt.
[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 --- Comment #13 from Fangrui Song --- (In reply to H.J. Lu from comment #12) > We should handle it in the whole Linux software stack: > > https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/8 > > not just in compiler. It is great that you have the desire to fix these fundamental issues :) I think a GNU_PROPERTY marker is over-engineering. See https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/8 for details. Many things (including this and PR98112) can be changed today. When -fno-direct-access-external-data/-fno-direct-access-external-function as -fno-pic default becomes prevailing, make ld warning by default for R_*_COPY/canonical PLT entries. After a while (say one or two years), let glibc ld.so warn for R_*_COPY/canonical PLT entries.
[Bug driver/100937] New: configure: Add --enable-default-semantic-interposition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100937 Bug ID: 100937 Summary: configure: Add --enable-default-semantic-interposition Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- Add a configure option --enable-default-semantic-interposition to customize -f(no-)semantic-interposition default. The suppression of interprocedural optimizations and inlining for such default visibility non-vague-linkage function definitions is the biggest difference between -fPIE/-fPIC. Distributions may want to enable default -fno-semantic-interposition to reclaim the lost performance from -fPIC (e.g. CPython is said to be 27% faster; Clang is 3% faster).
[Bug driver/100937] configure: Add --enable-default-semantic-interposition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100937 Fangrui Song changed: What|Removed |Added Resolution|WONTFIX |--- Status|RESOLVED|UNCONFIRMED --- Comment #2 from Fangrui Song --- How is it a portability problem? clang -fpic has always been allowing interprocedural optimizations for non-vague-linkage function definitions. FreeBSD uses clang and software works with no problem. For a vague-linkage function definition, a call site in the same translation unit may inline the callee. Whether -fno-semantic-interposition is enabled/disabled has no effect. For a non-vague-linkage function definition, by default (-fsemantic-interposition) the -fpic mode does not allow a call site in the same translation unit to inline the callee or perform other interprocedural optimizations. -fno-semantic-interposition re-enables interprocedural optimizations. If a caller inlines a callee, using LD_PRELOAD to interpose the callee will not affect the caller. But many other LD_PRELOAD usage still work. We consider the small LD_PRELOAD limitation a good trade off for the speedup.
[Bug driver/100937] configure: Add --enable-default-semantic-interposition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100937 --- Comment #6 from Fangrui Song --- Then can you add a -fvisibility=protected variant which only applies to non-weak defined functions? Two issues need to be fixed: (1): https://sourceware.org/bugzilla/show_bug.cgi?id=27973 __attribute__((visibility("protected"))) void *foo () { return (void *)foo; } % gcc -fpic -shared -fuse-ld=bfd a.s /usr/bin/ld.bfd: /tmp/ccWPJCLw.o: relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object /usr/bin/ld.bfd: final link failed: bad value collect2: error: ld returned 1 exit status (2): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 [ELF] -fno-pic: Use GOT to take address of an external default visibility function Distributions want fast C++ non-vague-linkage functions can enable this option.
[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #7 from Fangrui Song --- Some notes. {gcc,clang} -fsanitize-coverage={trace-pc,trace-cmp} is another coverage feature. It uses no_sanitize_coverage instead of no_instrument_function. The GCC support for no_sanitize_coverage is very new (by Martin, in 2021-05-25). (In Clang, the feature has more modes, e.g. you can control func/bb/edge.) The Linux kernel use case (include/linux/compiler_types.h ) uses 'noinline' so inlining is not a concern. /* Section for code which can't be instrumented at all */ #define noinstr \ noinline notrace __attribute((__section__(".noinstr.text")))\ __no_kcsan __no_sanitize_address Clang supports another filtering mechanism, -fprofile-list= (https://reviews.llvm.org/D94820). But the kernel use case seems to prefer a function attribute.
[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 --- Comment #8 from Fangrui Song --- I am thinking of __attribute__((no_profile)). In Clang, -fprofile-generate(-fcs-profile-generate)/-fprofile-instr-generate/-fprofile-arcs are all different. It will make sense to have a attribute disabling all such profiling related features. I am not sure an umbrella __attribute__((no_instrument_function)) is suitable. The Linux kernel wanting noinstr to exclude -fprofile-* is a very specific characteristic, not suitable for other applications.
[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 --- Comment #14 from Fangrui Song --- (In reply to Martin Liška from comment #13) > What's likely missing is that the attribute should prevent inlining. I'm > going to test how it behaves right now. Then, the issue can be closed. It's not clear to me that no_profile_instrument_function should prevent inlining. I'll argue that attributes should be orthogonal. https://lists.llvm.org/pipermail/llvm-dev/2021-April/150062.html https://reviews.llvm.org/D101011#271 If the user wants to suppress inlining, add noinline. Can a no_profile_instrument_function function be inlined to another no_profile_instrument_function function? Why not. Can a no_profile_instrument_function function be inlined into a function without the attribute? This may be controversial but I'd argue that it can. GCC no_stack_protector behaves this way. no_profile_instrument_function can mean that user does not want profiling when the function is called with its entity, not via another entity.
[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 --- Comment #18 from Fangrui Song --- (In reply to Nick Desaulniers from comment #15) > (In reply to Fangrui Song from comment #14) > > Can a no_profile_instrument_function function be inlined into a function > > without the attribute? This may be controversial but I'd argue that it can. > > GCC no_stack_protector behaves this way. no_profile_instrument_function can > > mean that user does not want profiling when the function is called with its > > entity, not via another entity. > > I respectfully but strongly disagree. It's surprising to developers when > they ask for no stack protector, or no profiling instrumentation, then get > one anyways. For long call chains, it's hard for developers to diagnose on > their own which function they called that missed such function attribute. > > This reminds me of "what color is your function?" > https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ > As suddenly a developer would need to verify for a no_* attributed function > that they only call no_* attributed functions, or add noinline (which is a > big hammer to all call sites, and games with aliases that have the noinline > attribute are kind of ridiculous). > > It's less surprising to prevent inline substitution upon function attribute > mismatch. Then a developer can self diagnose with -Rpass=inline. Either way, > some form of diagnostics would be helpful for these kinds of issues, and has > been requested by Android platform developers working on Zygote. > > For no_stack_protector in LLVM, I implemented the rules: upon mismatch, > prevent inline substitution unless the user specified always_inline. This > fixed suspend/resume bugs in x86 Linux kernels when built with LTO. > > Though, I'm happy to revisit that behavior in LLVM; we could add > > #define noinline_for_lto __attribute__((__noinline__)) > > then use that in the Linux kernel instead. Our problem is that a boolean attribute with 1 bit information cannot express whether a neg attribute function can be inlined into a pos attribute function. Let's agree to disagree. I don't see why a no_profile_instrument_function function suppress inlining into a function without the attribute. For the use cases where users want to suppress inlining, they can add noinline. What I worry about is that now GCC has an attitude and if the LLVM side doesn't follow it is like diverging. However, the GCC patch is still in review. I think a similar topic may need to be raided on llvm-dev side as I feel this is the tip of the iceberg - more attributes can be similarly leveraged. So, how about a llvm-dev discussion?
[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 --- Comment #20 from Fangrui Song --- (In reply to Marco Elver from comment #19) I am ok with "inlining suppression" as an implementation strategy and I agree that it should be useful. What I objected strongly is "promised inlining suppression". For example, if an inlining pass happens after instrumentation, then the function attribute doesn't necessarily need to suppress inlining. After instrumentation is done, we can even treat the noprofile attribute as a no-op. The example applies to the non-LTO case -fsanitize-coverage= . (We don't actually use the noprofile function attribute for -fsanitize-coverage=, but I cannot find a better example in LLVM; I think all other noprofile affected instrumentations happen before the inliner pipeline). So in a documentation, it can be said that the inlined copy (if any) will not get instrumentation, but it **should not** say that a noprofile function cannot be inlined into a function without the attribute.
[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 --- Comment #21 from Fangrui Song --- (In reply to Fangrui Song from comment #20) > For example, if an inlining pass happens after instrumentation, then the > function attribute doesn't necessarily need to suppress inlining. After > instrumentation is done, we can even treat the noprofile attribute as a > no-op. Sent too early:) Amendment: a smart inliner can inline the noprofile callee and then drop instrumentation code. That will also be an approach which does not break the "no instrumenting my code" contract. Other approaches can be (probably more relevant to function specialization/clones): the instrumentation pass can leave an un-instrumented copy which can be used by a subsequent inliner. As we can see, all these approaches are much more complex than simply "suppressing inlining". So I agree that "suppressing inlining" is a good implementation detail here.
[Bug target/108622] New: x86 -fno-pic: use DW_EH_PE_indirect|DW_EH_PE_pcrel for personality/ttype encoding
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108622 Bug ID: 108622 Summary: x86 -fno-pic: use DW_EH_PE_indirect|DW_EH_PE_pcrel for personality/ttype encoding Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- In .eh_frame and .gcc_except_table, the aarch64 and riscv ports use DW_EH_PE_indirect|DW_EH_PE_pcrel for both -fno-pic and PIC code to avoid canonical PLT entry/copy relocation, if the personality and typeinfo objects are defined in a shared object (common case, libstdc++.so.6 or libc++.so.?). AIUI there is no drawback other than a negligible size increase. % g++ -fno-pic -no-pie -fuse-ld=bfd a.cc -o a % readelf -Wr a | grep COPY 00403db8 00090005 R_X86_64_COPY 00403db8 _ZTIi@CXXABI_1.3 + 0 00403dc8 00080005 R_X86_64_COPY 00403dc8 _ZTIPKc@CXXABI_1.3 + 0 % readelf -W --dyn-syms a | grep __gxx_personality_v 10: 00401060 0 FUNCGLOBAL DEFAULT UND __gxx_personality_v0@CXXABI_1.3 (2) % g++ -fpic -no-pie -fuse-ld=bfd a.cc -o a % readelf -Wr a | grep COPY % readelf -W --dyn-syms a | grep __gxx_personality_v0 7: 0 FUNCGLOBAL DEFAULT UND __gxx_personality_v0@CXXABI_1.3 (2) Essentially this applies -mno-direct-extern-access unconditionally to -fno-pic, cleaning up gcc/config/i386/i386.cc:asm_preferred_eh_data_format
[Bug target/108622] x86 -fno-pic: use DW_EH_PE_indirect|DW_EH_PE_pcrel for personality/ttype encoding
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108622 --- Comment #1 from Fangrui Song --- https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611081.html [PATCH] x86: Use DW_EH_PE_indirect|DW_EH_PE_pcrel encodings for -fno-pic code
[Bug c++/108761] New: Add option to produce a unique section for non-COMDAT __attribute__((section("foo"))) object
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108761 Bug ID: 108761 Summary: Add option to produce a unique section for non-COMDAT __attribute__((section("foo"))) object Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- % cat a.cc __attribute__((section("foo"))) void f() {} __attribute__((section("foo"))) void g() {} % g++ -c -ffunction-sections a.cc % readelf -WS a.o | grep foo [ 4] foo PROGBITS 40 0e 00 AX 0 0 1 There is one section named `foo`, with f and g in it (they do not use COMDAT). In ld --gc-sections, f and g are retained or discarded as a unit. If we place f and g in two `foo` sections, --gc-sections can discard them separately. (We need assembler syntax `.section foo,"ax",@progbits,unique,1` which requires binutils>=2.35.) https://reviews.llvm.org/D143745 proposes to add such a feature with an option name like `-ffunction-sections[=(default,all)]`. I feel that the option argument is non-intuitive but do not come up with a better name right now. I raise this feature request to seek feedback from GCC :)
[Bug c++/108761] Add option to produce a unique section for non-COMDAT __attribute__((section("foo"))) object
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108761 --- Comment #3 from Fangrui Song --- New syntax setting the flags will be useful. Also, currently there is no way to customize the section type.
[Bug c/108978] New: Add __builtin_FILE_NAME() which behaves like the __FILE_NAME__ macro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108978 Bug ID: 108978 Summary: Add __builtin_FILE_NAME() which behaves like the __FILE_NAME__ macro Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- PR c/42579 added __FILE_NAME__. On the Clang side someone is proposing __builtin_FILE_NAME (https://reviews.llvm.org/D144878) a la __builtin_FILE .
[Bug target/99888] Add powerpc ELFv2 support for -fpatchable-function-entry*
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99888 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #5 from Fangrui Song --- * There is a restriction on the number of instructions between the function label and the .localentry directive. * For -fpatchable-function-entry=N[,M], M nops must precede the function label. On aarch64/x86/etc, these nops are consecutive. Personally I think this condition can be lifted for PowerPC ELFv2. The runtime library will need to check st_other or do some instruction inspection, which may be fine. nop nop nop foo: .LCF0: .cfi_startproc addis 2,12,.TOC.-.LCF0@ha addi 2,2,.TOC.-.LCF0@l .localentry foo,.-foo nop nop
[Bug driver/106897] New: driver: support -gz=zstd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106897 Bug ID: 106897 Summary: driver: support -gz=zstd Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- Translate -gz=std to --compress-debug-sections=zstd for as and ld. This requires that binutils supports zstd, feature request: https://sourceware.org/bugzilla/show_bug.cgi?id=29397
[Bug driver/106897] driver: support -gz=zstd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106897 --- Comment #4 from Fangrui Song --- Yes, the change will be straightforward, basically the files touched by the pending https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597586.html ("[PATCH] Remove legacy -gz=zlib-gnu"). I sent it because I knew that we would need a new compression format, and some cleanup would make the logic more maintainable.
[Bug driver/93645] Support Clang 12 --ld-path=
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93645 --- Comment #13 from Fangrui Song --- (In reply to Martin Liška from comment #12) > (In reply to Fangrui Song from comment #11) > > (In reply to Martin Liška from comment #10) > > > I replied here: > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573823.html > > > > There are people wanting to use mold > > https://www.reddit.com/r/rust/comments/rhcnzt/ > > mold_a_modern_linker_10_release/ > > I agree that's unfortunate. Note I'm having a patch that adds -fuse-ld=mold: > https://gcc.gnu.org/git/?p=gcc.git;a=commit; > h=759cdbb29dbe8fc80ba5c1f113a015cafe9eb69c > > I can try suggesting that to the community for GCC 12 (and maybe backport > that). > Are you interested? I think it may be useful to simply allow -fuse-ld=word (`word` cannot include a separator). If that may be troublesome, having -fuse-ld=mold in GCC 12 is still nice. --ld-path is occasionally useful, but I can accept that GCC declines it. > Note the linker is very interesting, but it lacks LTO support.. Right...
[Bug driver/93645] Support Clang 12 --ld-path=
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93645 --- Comment #15 from Fangrui Song --- -- is definitely rare, but not non-existent. In GCC, there is {-,--}specs. In Clang, there are --cuda-path, --ptxas-path, --hip-path, --classpath, etc. (In reply to Martin Liška from comment #14) > > > > I think it may be useful to simply allow -fuse-ld=word (`word` cannot > > include a separator). > > Sure, but Jakub had some concerns: > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573833.html I do not see an objection to -fuse-ld=word. For --ld-path, -- is definitely rare, but not non-existent. In GCC, there are {-,--}specs --sysroot. In Clang, there are --cuda-path, --ptxas-path, --hip-path, --classpath, etc. -fuse-ld= users mostly care about whether another linker can build their programs, not whether the option can bootstrap GCC. I actually think ld.lld is quite sufficient in bootstrapping GCC but if there are edge-case extensions which not supported, ld.lld developers may not want to the project with more obscure options... > > > > If that may be troublesome, having -fuse-ld=mold in GCC 12 is still nice. > > > > I've just done that: > https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587426.html Thanks
[Bug target/100896] --enable-initfini-array should be enabled for cross compiler to Linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100896 Fangrui Song changed: What|Removed |Added CC||i at maskray dot me --- Comment #4 from Fangrui Song --- In gcc/acinclude.m4:285, if cross-compiling, (`if test "x${build}" = "x${target}" && test "x${build}" = "x${host}"; then `) will not be taken. else case "${target}" in aarch64*-linux-gnu*) # AArch64 postdates glibc support for .init_array/.fini_array, # so we don't need the preprocessor test above. gcc_cv_initfini_array=yes ;; *) AC_MSG_CHECKING(cross compile... guessing) gcc_cv_initfini_array=no ;; esac fi]) On non-aarch64, `gcc_cv_initfini_array=no` will run, `HAVE_INITFINI_ARRAY_SUPPORT` will therefore be 0. compilers/powerpc64le-linux-gnu/gcc/gcc/auto-host.h 1578:#define HAVE_INITFINI_ARRAY_SUPPORT 0 compilers/powerpc64le-linux-gnu/gcc/gcc/config.status 1257:D["HAVE_INITFINI_ARRAY_SUPPORT"]=" 0" compilers/powerpc64le-linux-gnu/gcc/gcc/config.log 6900:| #define HAVE_INITFINI_ARRAY_SUPPORT 0 7169:| #define HAVE_INITFINI_ARRAY_SUPPORT 0 7484:| #define HAVE_INITFINI_ARRAY_SUPPORT 0 8557:#define HAVE_INITFINI_ARRAY_SUPPORT 0 The built GCC will use the legacy .ctors % many=/tmp/glibc-many % cat a.c __attribute__ ((constructor)) static int foo (void) { return 42; } % /tmp/glibc-many/install/compilers/powerpc64le-linux-gnu/bin/powerpc64le-glibc-linux-gnu-gcc -c a.c && readelf -WS a.o | egrep 'ctors|init_array' [ 4] .ctorsPROGBITS 70 08 00 WA 0 0 8 [ 5] .rela.ctors RELA 000218 18 18 I 10 4 8 --- Noticed the problem when using scripts/build-many-glibcs.py built GCC (cd ~/Dev/glibc) scripts/build-many-glibcs.py /tmp/glibc-many checkout --shallow scripts/build-many-glibcs.py /tmp/glibc-many host-libraries scripts/build-many-glibcs.py /tmp/glibc-many compilers powerpc64le-linux-gnu --keep all
[Bug target/100896] --enable-initfini-array should be enabled for cross compiler to Linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100896 --- Comment #5 from Fangrui Song --- Ah, ok, my /tmp/glibc-many/src/gcc is at releases/gcc-11 while the fix is for 12.0? Anyway, you may want to clean up gcc/acinclude.m4
[Bug driver/100937] configure: Add --enable-default-semantic-interposition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100937 --- Comment #11 from Fangrui Song --- To enable interposition on Mach-O, one needs a non-default configuration like: ld -interposable, DYLD_FORCE_FLAT_NAMESPACE or __attribute__((section("__DATA,__interpose"))). On PE/COFF, such interposition just doesn't exist. Having an option for -fno-semantic-interposition will actually improve portability. (The -fno-semantic-interposition thing is probably the biggest performance gap between gcc -fpic and clang -fpic.) As I said previously, -fvisibility=protected cannot be used because protected visibility is very broken in the GCC/GNU ld system and there is no signal it will be fixed anytime soon: https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected#summary
[Bug driver/103398] New: configure: Enable --enable-default-pie by default for Linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103398 Bug ID: 103398 Summary: configure: Enable --enable-default-pie by default for Linux Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: driver Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- Many Linux distros configure GCC with --enable-default-pie (at least Arch/Debian/Fedora/Gentoo/Ubuntu). I think it makes sense to default to --enable-default-pie for Linux.
[Bug driver/103398] configure: Enable --enable-default-pie by default for Linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103398 --- Comment #2 from Fangrui Song --- I want to switch the default because: * It seems to me that every Linux distro uses --enable-default-pie GCC. I use "many", but it is likely "most" at this point (2021). * When a user builds GCC on Linux, the generated GCC does not default PIE. This almost certainly does not match the behavior of their host GCC. On the libc-alpha mailing list, I have seen that contributors waste time because they don't notice that scripts/build-many-glibcs.py built GCC uses the implicit --disable-default-pie, which has a behavior different from the host GCC or cross compiler provided by system packages.
[Bug driver/93645] Support Clang 12 --ld-path=
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93645 --- Comment #11 from Fangrui Song --- (In reply to Martin Liška from comment #10) > I replied here: > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573823.html There are people wanting to use mold https://www.reddit.com/r/rust/comments/rhcnzt/mold_a_modern_linker_10_release/ "clang does support it but gcc: --ld-path patch has been declined by GCC maintainers, instead they advise to use a workaround: create directory , then ln -s /ld, and then pass -B (-B tells GCC to look for ld in specified location)." :(
[Bug c++/102168] New: -Wnon-virtual-dtor shouldn't fire for protected dtor in a class with a friend declaration
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102168 Bug ID: 102168 Summary: -Wnon-virtual-dtor shouldn't fire for protected dtor in a class with a friend declaration Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- class base; class b { public: void del(base *x); }; class base { friend b; public: virtual void anchor(); protected: virtual // why is this needed? ~base() = default; }; class derived final : public base { public: ~derived() {} }; void b::del(base *x) { delete x; } % g++ -c -Wnon-virtual-dtor a.cc a.cc:8:7: warning: ‘class base’ has virtual functions and accessible non-virtual destructor [-Wnon-virtual-dtor] 8 | class base { | ^~~~ a.cc:17:7: warning: base class ‘class base’ has accessible non-virtual destructor [-Wnon-virtual-dtor] 17 | class derived final : public base { | ^~~ This diagnostic is due to a friend declaration because technically the friend can invoke the dtor. However, this seems a bit dumb (https://reviews.llvm.org/rG4852c770fe87) It just checks the existence of friend, not actually checking whether the dtor is actually used. Checking whether the dtor is actually needed requires dataflow analysis (like frontend devirtualization), which is apparently too heavy and may not fit into a compiler diagnostic. In addition, if the friend class ever uses the dtor, it'd trigger -Wdelete-non-virtual-dtor. Now to suppress the diagnostic, we have to add a `virtual`, wasting 2 entries in the vtable and emitting unneeded D0/D2.
[Bug c/102502] New: C11: _Static_assert disallows const int operand in -O0 while allows it in higher -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102502 Bug ID: 102502 Summary: C11: _Static_assert disallows const int operand in -O0 while allows it in higher -O Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: i at maskray dot me Target Milestone: --- Under some circumstances, const size_t allocation_size = 32768; _Static_assert (allocation_size >= sizeof (struct dirent64), "allocation_size < sizeof (struct dirent64)"); -O0 and non -O0 have different behaviors whether the `const int` operand can be used in a constant expression (-O0: `error: expression in static assertion is not constant`). This is different from a bug "fixed for GCC 8 by r8-4755". git clone https://sourceware.org/git/glibc.git cd glibc mkdir -p out/gcc; cd out/gcc ../../configure --prefix=/tmp/glibc/gcc --disable-werror make -j 20 # you can SIGINT after some needed files used below are generated Comment out some lines to allow -O0 compiles: --- i/include/libc-symbols.h +++ w/include/libc-symbols.h @@ -71,9 +71,9 @@ #define _LIBC 1 /* Some files must be compiled with optimization on. */ -#if !defined __ASSEMBLER__ && !defined __OPTIMIZE__ -# error "glibc cannot be compiled without optimization" -#endif +//#if !defined __ASSEMBLER__ && !defined __OPTIMIZE__ +//# error "glibc cannot be compiled without optimization" +//#endif /* -ffast-math cannot be applied to the C library, as it alters the ABI. Some test components that use -ffast-math are currently not part of # My source dir is at $HOME/Dev/glibc . You may need to adjust. a=(../sysdeps/unix/sysv/linux/dl-opendir.c -std=gnu11 -fgnu89-inline -g -Wall -Wwrite-strings -Wundef -fmerge-all-constants -frounding-math -fno-stack-protector -fno-common -Wstrict-prototypes -Wold-style-definition -fmath-errno -fPIC -fno-stack-protector -DSTACK_PROTECTOR_LEVEL=0 -mno-mmx -ftls-model=initial-exec -I../include -I$HOME/Dev/glibc/out/gcc/elf -I$HOME/Dev/glibc/out/gcc -I../sysdeps/unix/sysv/linux/x86_64/64 -I../sysdeps/unix/sysv/linux/x86_64 -I../sysdeps/unix/sysv/linux/x86/include -I../sysdeps/unix/sysv/linux/x86 -I../sysdeps/x86/nptl -I../sysdeps/unix/sysv/linux/wordsize-64 -I../sysdeps/x86_64/nptl -I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux -I../sysdeps/nptl -I../sysdeps/pthread -I../sysdeps/gnu -I../sysdeps/unix/inet -I../sysdeps/unix/sysv -I../sysdeps/unix/x86_64 -I../sysdeps/unix -I../sysdeps/posix -I../sysdeps/x86_64/64 -I../sysdeps/x86_64/fpu/multiarch -I../sysdeps/x86_64/fpu -I../sysdeps/x86/fpu -I../sysdeps/x86_64/multiarch -I../sysdeps/x86_64 -I../sysdeps/x86/include -I../sysdeps/x86 -I../sysdeps/ieee754/float128 -I../sysdeps/ieee754/ldbl-96/include -I../sysdeps/ieee754/ldbl-96 -I../sysdeps/ieee754/dbl-64 -I../sysdeps/ieee754/flt-32 -I../sysdeps/wordsize-64 -I../sysdeps/ieee754 -I../sysdeps/generic -I.. -I../libio -I. -D_LIBC_REENTRANT -include $HOME/Dev/glibc/out/gcc/libc-modules.h -include ../include/libc-symbols.h -DPIC -DSHARED -DTOP_NAMESPACE=glibc -fsyntax-only) cd $HOME/Dev/glibc/elf % gcc-11 $=a -O2 # no diagnostic % gcc-11 $=a -O1 # no diagnostic % gcc-11 $=a -O0 In file included from ../include/features.h:488, from ../posix/sys/types.h:25, from ../include/sys/types.h:1, from ../sysdeps/unix/sysv/linux/dirstream.h:21, from ../include/dirent.h:3, from ../sysdeps/unix/sysv/linux/opendir.c:18, from ../sysdeps/unix/sysv/linux/dl-opendir.c:1: ../sysdeps/unix/sysv/linux/opendir.c: In function ‘__alloc_dir’: ../sysdeps/unix/sysv/linux/opendir.c:107:35: error: expression in static assertion is not constant 107 | _Static_assert (allocation_size >= sizeof (struct dirent64), | ^~~ ../include/sys/cdefs.h:7:59: note: in definition of macro ‘_Static_assert’ 7 | # define _Static_assert(expr, diagnostic) _Static_assert (expr, diagnostic) | ^~~~ gcc-8, gcc-9, and gcc-10 from Debian testing have the same behavior.
[Bug c/102502] C11: _Static_assert disallows const int operand in -O0 while allows it in higher -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102502 --- Comment #3 from Fangrui Song --- OK, Andrew asked me to file it :) I just wanted to fix glibc and run away from the GCC inconsistency. I know that https://www.iso-9899.info/n1570.html#6.6 p10 says "An implementation may accept other forms of constant expressions." Accepting `const int` in C mode is an extension, but it seems odd to be inconsistent (-O0 and -O2 -Wpedantic reject it while -O2 allows it). % cat reduce.i const int __alloc_dir_allocation_size = 8; void __alloc_dir() { _Static_assert(__alloc_dir_allocation_size, ""); } % gcc reduce.i -c -std=c11 reduce.i: In function ‘__alloc_dir’: reduce.i:2:37: error: expression in static assertion is not constant 2 | void __alloc_dir() { _Static_assert(__alloc_dir_allocation_size, ""); } | ^~~ % gcc reduce.i -c -std=c11 -O1 % gcc reduce.i -c -std=c11 -O2 % gcc reduce.i -c -std=c11 -O2 -Wpedantic reduce.i: In function ‘__alloc_dir’: reduce.i:2:37: warning: expression in static assertion is not an integer constant expression [-Wpedantic] 2 | void __alloc_dir() { _Static_assert(__alloc_dir_allocation_size, ""); } | ^~~ Clang just rejects it in all optimization levels. % clang reduce.i -c -std=c11 -O0 reduce.i:2:37: error: static_assert expression is not an integral constant expression void __alloc_dir() { _Static_assert(__alloc_dir_allocation_size, ""); } ^~~ 1 error generated. % clang reduce.i -c -std=c11 -O1 reduce.i:2:37: error: static_assert expression is not an integral constant expression void __alloc_dir() { _Static_assert(__alloc_dir_allocation_size, ""); } ^~~ 1 error generated.
[Bug libgcc/99759] morestack.S should support .init_array.0 besides .ctors.65535
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99759 Fangrui Song changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #4 from Fangrui Song --- Fixed by f49e3d28be44179f07b8a06159139ce77096dda7 ("libgcc: use .init_stack for constructors if available"). Thanks, Ian!