[Bug target/94391] gcc refers to absolute symbols with R_X86_64_PC32 relocation

2020-03-29 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94391

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #5 from Fangrui Song  ---
This bug exposes several problems:

* GNU ld does not reject a PC-relative relocation referencing a SHN_ABS symbol
* GCC should not produce R_X86_64_PC32 referencing an external symbol in -fpie
mode. 

% gcc -fuse-ld=lld -nostdlib -fpie -pie a.c
% objdump -dr a.o
...
 :
   0:   55  push   %rbp
   1:   48 89 e5mov%rsp,%rbp
   4:   48 8d 05 00 00 00 00lea0x0(%rip),%rax# b 
7: R_X86_64_PC32_binary_a_c_size-0x4
   b:   5d  pop%rbp
   c:   c3  retq

% gcc -fuse-ld=bfd -nostdlib -fpie -pie a.c b.o -o a
/usr/bin/ld.bfd: warning: cannot find entry symbol _start; defaulting to
1000
% objdump -dr a
...
1000 :
1000:   55  push   %rbp
1001:   48 89 e5mov%rsp,%rbp
1004:   48 8d 05 39 f0 ff fflea-0xfc7(%rip),%rax# 44
<_binary_a_c_size>
100b:   5d  pop%rbp
100c:   c3  retq

It is incorrect to reference a non-preemptible symbol with a PC relative
relocation in a -pie link. GNU ld allows it but the code can be incorrect at
runtime.

lea-0xfc7(%rip),%rax  # loads 44 to %rax only if the load base is 0. Due to
ASLR (-pie), this is simply not true.

lld correctly rejects the relocation.

To fix this, I had a write-up last year:
https://gcc.gnu.org/legacy-ml/gcc/2019-05/msg00215.html
We should change the configure-time HAVE_LD_PIE_COPYRELOC to an option,
probably -f(no-)direct-access-extern

In clang, HAVE_LD_PIE_COPYRELOC is a compile-time option:
-mpie-copy-relocations. But I think we should improve the option name. At the
very least, we can also let -fno-pic code reference an external symbol with GOT
to avoid copy relocations. -f(no-)direct-access-extern may be a candidate.

[Bug target/94391] gcc refers to absolute symbols with R_X86_64_PC32 relocation

2020-03-29 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94391

--- Comment #6 from Fangrui Song  ---
> It is incorrect to reference a non-preemptible symbol with a PC relative 
> relocation in a -pie link. GNU ld allows it but the code can be incorrect at 
> runtime.

Correction: It is incorrect to reference a non-preemptible SHN_ABS symbol with
a PC relative relocation in a PIC (-shared or -pie) link. This is
non-representable due to ASLR (load base not fixed at link time)

[Bug target/94391] gcc refers to absolute symbols with R_X86_64_PC32 relocation

2020-03-29 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94391

--- Comment #10 from Fangrui Song  ---
> extern unsigned long _binary_a_c_size;
> unsigned long foo() { return _binary_a_c_size; }

This is incorrect. The code will treat the value of _binary_a_c_size as an
address (load base + size) and dereference that address
mov-0xfc3(%rip),%rax# 44 <_binary_a_c_size>

> NO LLD is not implemented the ABI as PIE COPYRELOC is required by the ABI 
> these days.

My objdump -d output in Comment #5 demonstrates that GNU ld linked code will be
incorrect at runtime.
It can be argued that either the user code or GCC does the wrong thing, but a
linker is not responsible for the mistake.
(I have argued lld does the right thing by erroring at link time.)

The compiler can ask the assembler to produce an indirect (GOT) reference.
The code (`unsigned long foo() { return (unsigned long)_binary_a_c_size; })
will work perfectly.

> Also it is wrong for a person to assume a normal C variable could be SHN_ABS; 
> that is the bug here.
> It is a bug in the user code.
> I showed up to fix it by using an top level inline-asm.

-fno-pic and -fpic work fine. -fpie before commit
77ad54d911dd7cb88caf697ac213929f6132fdcf worked fine.



commit 77ad54d911dd7cb88caf697ac213929f6132fdcf ("x86-64: Optimize access to
globals in PIE with copy reloc")
is responsible for the -fpie change.
In 2015, H.J. invented R_X86_64_{REX,}GOTPCRELX. The linker relaxation is a
perfect solution.
We can retire HAVE_LD_PIE_COPYREL now.


// The code will still be faulty but we can argue that it is an user error.
__attribute__((visibility("hidden"))) extern unsigned long _binary_a_c_size;
unsigned long foo() { return _binary_a_c_size; }


The relaxed R_X86_64_{REX,}GOTPCRELX will be a bit longer than R_X86_64_PC32.
The difference is small enough and should not matter for practical use cases.
For those who care about the tiny regression, we can invent an option
-fdirect-access-extern (clang currently calls it -mpie-copy-relocations but we
can design a better name).
It is more useful on non-x86 architectures for a mostly statically linked
program.

extern int var; int foo(void) { return var; }

// clang -target aarch64 -fPIE -O3
adrpx8, :got:var
ldr x8, [x8, :got_lo12:var]
ldr w0, [x8]
ret
// clang -target aarch64 -fPIE -O3 -mpie-copy-relocations
adrpx8, var
ldr w0, [x8, :lo12:var]
ret

// x86-64
// clang -O3 -fPIE a.c -Wa,--mrelax-relocations=yes
0:   48 8b 05 00 00 00 00mov0x0(%rip),%rax# 7 
 3: R_X86_64_REX_GOTPCRELX   var-0x4
7:   8b 00   mov(%rax),%eax
9:   c3  retq
// clang -O3 -fPIE a.c -mpie-copy-relocations
0:   8b 05 00 00 00 00   mov0x0(%rip),%eax# 6 
 2: R_X86_64_PC32var-0x4
6:   c3  retq

[Bug target/94391] gcc refers to absolute symbols with R_X86_64_PC32 relocation

2020-03-30 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94391

--- Comment #23 from Fangrui Song  ---
(In reply to Andrew Pinski from comment #18)
> (In reply to Yuxuan Shui from comment #17)
> > Sorry, I am here to report a bug, not to find a workaround for my use case.
> 
> I gave you the correct usage for your use case.  If you don't like it is not
> my fault.

A wontfix/invalid does not seem a proper resolution to the bug(s).
We need a solution, instead of a workaround (SHN_ABS _binary_*_size can be
changed to _binary_*_end minus _binary_*_start).

Let me repeat. The code has worked fine for a long time.

1. -fno-pie code can only be linked with -no-pie. A PC32 relocation can be
resolved to a SHN_ABS definition.
2. -fpie code can be linked with either -no-pie or -pie.
3. -fpic code can be linked with -no-pie, -pie or -shared. GCC produces a GOT
relocation.
  The linker will fill up the GOT entry at link time. It is a constant at
runtime.

1 and 3 always work. For 2 -fpie -pie, it is incorrect to reference a
non-preemptible SHN_ABS symbol with a PC relative relocation in a PIC (-shared
or -pie) link (missed GNU ld diagnostic:
https://sourceware.org/bugzilla/show_bug.cgi?id=25749)

A GOT relocation was produced until commit
77ad54d911dd7cb88caf697ac213929f6132fdcf
("x86-64: Optimize access to globals in PIE with copy reloc").

I have proposed my solution in Comment 10: revert the patch.
It has very little value after H.J. invented GOTPCRELX in 2015.
As a compensation, we can invent a pair of new -f options,
-f(no-)direct-access-extern-object.

-fno-pie defaults to -fdirect-access-extern-object. -fpie defaults to
-fno-direct-access-extern-object.

-fno-pie users who really want to get rid of copy relocations can enable
-fno-direct-access-extern-object.
  Note: a copy relocation is needed if the definition turns out to be provided
by a shared object.

-fpie users who really care about GOT slowdown can enable
-fdirect-access-extern-object.
  This is more relevant on non-x86 due to the lack of linker relaxation
(R_X86_64_{REX_,}GOTPCRELX)

[Bug preprocessor/77488] Proposal for __FILENAME_ONLY__

2020-04-25 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77488

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #8 from Fangrui Song  ---
Clang since version 9 supports `__FILE_NAME__` (basename) as an extension
https://reviews.llvm.org/D61756

I don't know whether it has been proposed on WG14 or WG21 mailing lists, though
(seems not).

[Bug target/95095] New: Feature request: support -fno-unique-section-names

2020-05-12 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095

Bug ID: 95095
   Summary: Feature request: support -fno-unique-section-names
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

-ffunction-sections produces sections .text.foo , .text.bar , etc, which can
take significant amount of string table space.

In clang, -fno-unique-section-names emits multiple ".text" sections which can
share the section name. Multiple sections with the same name require the new
GNU as feature https://sourceware.org/bugzilla/show_bug.cgi?id=25380 (binutils
2.35).

For .text.exit.* .text.unlikely.* .text.hot.* .text.startup.* , the preferred
sections are .text.exit. .text.unlikely. .text.hot. .text.startup. The trailing
dots can avoid a linker problem described in https://reviews.llvm.org/D79600

- pasted below for your convenience

GNU ld's internal linker script uses
(https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=add44f8d5c5c05e08b11e033127a744d61c26aee)

.text   :
{
  *(.text.unlikely .text.*_unlikely .text.unlikely.*)
  *(.text.exit .text.exit.*)
  *(.text.startup .text.startup.*)
  *(.text.hot .text.hot.*)
  *(SORT(.text.sorted.*))
  *(.text .stub .text.* .gnu.linkonce.t.*)
  /* .gnu.warning sections are handled specially by elf.em.  */
  *(.gnu.warning)
}
Because *(.text.exit .text.exit.*) is ordered before *(.text .text.*), in a
-ffunction-sections build, the C library function exit will be placed before
other functions.
gold's -z keep-text-section-prefix has the same problem.

In lld, -z keep-text-section-prefix recognizes
.text.{exit,hot,startup,unlikely,unknown}.*, but not
.text.{exit,hot,startup,unlikely,unknown}, to avoid the strange placement
problem.

[Bug debug/95096] New: Feature request: add -fsplit-dwarf

2020-05-12 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95096

Bug ID: 95096
   Summary: Feature request: add -fsplit-dwarf
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

-gsplit-dwarf has an undesired property: it sets the debug info level to 2.
When plugged into a build system, this can enable debug info unnecessarily
(when the user does not specify -g or specifies -g0).

-fsplit-dwarf can enable .dwo, but do not enable debug info by itself.

Its reaction with -g1 may need some thoughts: whether line tables in .dwo will
be beneficial. As a start, we can add the option first, which should be simple
(for a beginner like me:/ )

[Bug debug/95096] Feature request: add -fsplit-dwarf

2020-05-12 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95096

--- Comment #1 from Fangrui Song  ---
Created https://sourceware.org/pipermail/gcc-patches/2020-May/545638.html

[Bug target/95095] Feature request: support -fno-unique-section-names

2020-05-24 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095

--- Comment #1 from Fangrui Song  ---
I just learned that `int main() {}` compiles to .text.startup in -O2 or -Os

It seems that .text.startup. may be better to not accidentally move a C
function named `startup` (`startup.` is not a valid C identifier)

[Bug debug/95482] New: Feature request: add -gsplit-dwarf=single

2020-06-02 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95482

Bug ID: 95482
   Summary: Feature request: add -gsplit-dwarf=single
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

DWARF v5 Appendix F. says

> The sections that do not require relocation, however, can be written to the 
> relocatable object (.o) file but ignored by the linker, or they can be 
> written to a separate DWARF object (.dwo) file that need not be accessed by 
> the linker.

GCC/clang -gsplit-dwarf write a separate DWARF object (.dwo)

clang in addition supports -gsplit-dwarf=single
(https://reviews.llvm.org/D52296 ) to write the sections (with the SHF_EXCLUDE
flag) in the .o file.
Linkers ignore SHF_EXCLUDE sections in non -r mode.

Note, SHF_EXCLUDE (0x8000) is in the range of processor-specific bits and
clashes with several processors's (obsoleted?) flags (see
https://sourceware.org/pipermail/binutils/2020-April/110691.html )

[Bug gcov-profile/96092] New: Should --coverage respect -ffile-prefix-map?

2020-07-06 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96092

Bug ID: 96092
   Summary: Should --coverage respect -ffile-prefix-map?
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

% gcc-10 -ffile-prefix-map=/tmp/c=/src --coverage -c -g /tmp/c/a.c

# -ffile-prefix-map implies -fdebug-prefix-map
% llvm-dwarfdump -debug-info a.o | grep /src
  DW_AT_name("/src/a.c")
  DW_AT_comp_dir("/src")
DW_AT_decl_file ("/src/a.c")

# --coverage is not affected
% r2 -qc 'pxw `?v $s`' a.gcno
0x  0x67636e6f 0x42303065 0x27b4c272 0x0002  oncge00Br..'
0x0010  0x706d742f 0x632f 0x0001 0x0100  /tmp/c..
0x0020  0x000f 0x067072eb 0x40058857 0xdb5de9e8  .rp.W..@..].
0x0030  0x0002 0x6e69616d 0x 0x  main
0x0040  0x0003 0x706d742f 0x612f632f 0x632e  /tmp/c/a.c..
0x0050  0x0001 0x0005 0x0001 0x000c  
0x0060  0x0141 0x0001 0x0004 0x0143  ..A...C.
0x0070  0x0003 0x 0x0002 0x0004  
0x0080  0x0143 0x0003 0x0002 0x0003  ..C.
0x0090  0x0005 0x0143 0x0003 0x0003  ..C.
0x00a0  0x0001 0x0001 0x0145 0x0009  ..E.
0x00b0  0x0002 0x 0x0003 0x706d742f  /tmp
0x00c0  0x612f632f 0x632e 0x0001 0x  /c/a.c..
0x00d0  0x


I created this issue because I saw a clang-side proposal
https://reviews.llvm.org/D83154 (add -fcoverage-prefix-map) today.

[Bug gcov-profile/96092] Should --coverage respect -ffile-prefix-map?

2020-07-11 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96092

--- Comment #3 from Fangrui Song  ---
(In reply to Martin Liška from comment #2)
> Apparently we've got a patch in queue that does something similar:
> 
> +fprofile-prefix-path=
> +Common·Joined·RejectNegative·Var(profile_prefix_path) 
> +remove·prefix·from·absolute·path·before·manging·name·for·-fprofile-
> generate=·and·-fprofile-use=.

Can we generalize the option to -fprofile-prefix-map= and let it be part of
-ffile-prefix-map? We can let clang side add -fprofile-prefix-map= as well
(https://reviews.llvm.org/D83154#2146085 )
clang may not support -fprofile-prefix-path= as it can be emulated by
-fprofile-prefix-map=

(IIUC, in GCC, -fprofile-generate uses gcov so either -fprofile-prefix-map= or
-fcoverage-prefix-map= will be an ok name. In clang, -fprofile-generate is an
instrumentation different from --coverage (gcov).)

[Bug driver/93645] Support Clang 12 --ld-path=

2020-07-24 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93645

--- Comment #4 from Fangrui Song  ---
https://sourceware.org/pipermail/gcc-patches/2020-July/550659.html [PATCH v3]
Add --ld-path= to specify an arbitrary executable as the linker


I changed the title to --ld-path because -fuse-ld=/absolute/path/to/ld is not a
good design. -fuse-ld= can mean the linker flavor (there can be option dispatch
on this option) & --ld-path can specify the path overriding -fuse-ld='s default
choice.

-f* options are usually about code generation or language features. --ld-path
does not belong to the category so -f is not very appropriate.

Clang 12 will have --ld-path.

[Bug middle-end/192] String literals don't obey -fdata-sections

2020-09-15 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #19 from Fangrui Song  ---
(In reply to Jakub Jelinek from comment #14)
> This doesn't really look like a good idea to me.  Instead, perhaps ld's
> --gc-sections or new special option should just remove unused string
> literals from mergeable sections.
> With your patch, I bet you lose e.g. all tail merging.  Consider:
> const char *used1 () { return "foo bar baz blah blah"; }
> in one TU and
> const char *used2 () { return "bar baz blah blah"; }
> in another.  The linker necessarily knows which strings (or other data) in
> mergeable sections are used and which are unused.

I second Jakub's idea that the linker should perform the constant merge (which
is implemented in LLD): the cost of a section header (sizeof(Elf64_Shdr)=64) +
a section name (".rodata.xxx.str1.1") is quite large.

Created a GNU ld (and gold) feature request:
https://sourceware.org/bugzilla/show_bug.cgi?id=26622

[Bug gcov-profile/97062] New: [gcov] Don't repeat display of inline functions in headers

2020-09-15 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97062

Bug ID: 97062
   Summary: [gcov] Don't repeat display of inline functions in
headers
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

This is a minor display issue.

>a.cc cat<b.cc cat<a.h cat<

[Bug gcov-profile/91601] gcov: ICE in handle_cycle, at gcov.c:699 happen which get code coverage with lcov.

2020-09-15 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91601

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #17 from Fangrui Song  ---
The algorithm is Donald B. Johnson's "Finding all the elementary circuits of a
directed graph" (1975). (Hawick and James's just implemented the same algorithm
by changing the representation of graphs).

I am wondering why we enumerate every elementary cycle, find the minimum edge,
reduce edge weighs, and repeat the process.

What do we lose if we don't use the costly algorithm? (The time complexity is
O(n*e*(c+1)). However, many implementations (Boost and gcov.c) do not use a
hash set for the blocked list, and thus I suspect the actual complexity is
higher). Do we have other low-cost approaches? (e.g. repeatedly finding
strongly connected components and reducing)

[Bug gcov-profile/97065] New: Support -fprofile-update=set (boolean counters)

2020-09-16 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97065

Bug ID: 97065
   Summary: Support -fprofile-update=set (boolean counters)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

I can understand that defaulting -fprofile-update=prefer-atomic in GCC 7 and
using atomic counters when -pthread is specified is for very good reasons that
imprecise line execution counts can be very confusing.

However, atomic counters can lead to very drastic performance degradation when
the contention is large (e.g. bug 80952)

Sometimes users just need to know whether a statement is executed or not. For
example lcov does not really need to know the number. A boolean mode
-fprofile-update=set may be useful. 'set' is the name used by Go -covermode=set

[Bug gcov-profile/85351] [GCOV] Wrong coverage with exit() executed in a if statement within a called function

2020-09-16 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85351

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #5 from Fangrui Song  ---
I am a bit curious how GCC instruments such functions which may alter control
flows

* exit/execve/execl/etc
* fork
* functions which may throw or call any above functions

If you force a split basic block after such functions, you get counts correct
but you pay the costs that there is one more basic block and two more arcs. In
-fprofile-arcs you need to pay the instrumentation cost of one arc (after
taking into account of the Kirchhoff circuit law's spanning tree optimization).
If you assume every external function call may alter control flows, you pay
rather large overhead for things you probably care little (since I know some
underlying mechanism I don't trust line counts after special functions).

[Bug driver/93645] Support Clang 12 --ld-path=

2020-09-17 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93645

--- Comment #5 from Fangrui Song  ---
Ping

[Bug c++/92413] New: [temp.explicit] Explicit template instantiations should not define member functions that are not defined at the point of instantiation

2019-11-07 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92413

Bug ID: 92413
   Summary: [temp.explicit] Explicit template instantiations
should not define member functions that are not
defined at the point of instantiation
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

https://wg21.cmeerw.net/cwg/issue546

Change 17.8.2 [temp.explicit] paragraph 8 as follows:

An explicit instantiation definition that names a class template specialization
explicitly instantiates the class template specialization and is only an
explicit instantiation definition of members whose definition is
visible>that have been defined at the point of
instantiation.


template  struct C {void foo();};
template struct C;
template  void C::foo() {}


GCC<4.9 does not define C::foo(), while GCC>=4.9 defines C::foo()

I am not sure whether this example is non-conforming, but -Wall -Wextra
-pedantic gives no diagnostic. (clang 3.0~trunk does not define C::foo().
You may read the discussions at https://bugs.llvm.org/show_bug.cgi?id=43937)

[Bug c/93194] New: -fpatchable-function-entries : __patchable_function_entries has wrong sh_flags and sh_addralign

2020-01-07 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93194

Bug ID: 93194
   Summary: -fpatchable-function-entries :
__patchable_function_entries has wrong sh_flags and
sh_addralign
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

% cat a.c
void f(){}
% gcc -fpatchable-function-entry=3 -c a.c
% readelf -S a.o
...
  [Nr] Name  TypeAddress  OffSize   ES Flg
Lk Inf Al
  [ 0]   NULL 00 00 00 
0   0  0
  [ 1] .text PROGBITS 40 0a 00  AX 
0   0  1
  [ 2] .data PROGBITS 4a 00 00  WA 
0   0  1
  [ 3] .bss  NOBITS   4a 00 00  WA 
0   0  1
  [ 4] __patchable_function_entries PROGBITS 4a
08 00   A  0   0  1
  [ 5] .rela__patchable_function_entries RELA
0001a0 18 18   I 10   4  8


sh_addralign of __patchable_function_entries should be 8 on ELF64 platforms, 4
on ELF32 platforms, instead of 1.

__patchable_function_entries should have the SHF_WRITE flag. A
__patchable_function_entries entry is relocated by a symbolic relocation (e.g.
R_X86_64_64, R_AARCH64_ABS64, R_PPC64_ADDR64).
In -shared or -pie mode, the linker will create a dynamic relocation,

* non-preemptible (STB_LOCAL / non-STV_DEFAULT / -Bsymbolic / not-shared /
--dynamic-list excluded / etc): relative relocation (e.g. R_X86_64_RELATIVE)
* preemptible: symbolic relocation (e.g. R_X86_64_64)


(We can't emit an offset relative to the image base (.quad .Lfoo - .Lbase),
because differences across sections are generally not representable. A symbolic
relocation gives the runtime code information about the symbol names, which may
be desirable.)

[Bug middle-end/93194] -fpatchable-function-entries : __patchable_function_entries has wrong sh_flags and sh_addralign

2020-01-07 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93194

--- Comment #1 from Fangrui Song  ---
The SHF_WRITE issue has been fixed.

https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00271.html will fix sh_addralign

[Bug middle-end/93195] New: -fpatchable-function-entries : __patchable_function_entries should consider comdat groups

2020-01-07 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93195

Bug ID: 93195
   Summary: -fpatchable-function-entries :
__patchable_function_entries should consider comdat
groups
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

% cat a.cc
inline void foo() {}
void bar() { foo(); }
% cat b.cc
inline void foo() {}
void bar1() { foo(); }
% g++ -fpatchable-function-entry=1 -c a.cc b.cc

Linkers don't allow a relocation to a discarded symbol (foo).

% ld.bfd a.o b.o
...
`.text._Z3foov' referenced in section `__patchable_function_entries' of b.o:
defined in discarded section `.text._Z3foov[_Z3foov]' of b.o

% gold a.o b.o
b.o(__patchable_function_entries+0x0): error: relocation refers to local symbol
"" [5], which is defined in a discarded section
  section group signature: "_Z3foov"
  prevailing definition is from a.o

% ld.lld a.o b.o
ld.lld: error: relocation refers to a discarded section: .text._Z3foov
>>> defined in b.o
>>> referenced by b.cc
>>>   b.o:(__patchable_function_entries+0x0)

[Bug middle-end/93197] New: -fpatchable-function-entries : __patchable_function_entries does not survive under --gc-sections

2020-01-08 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93197

Bug ID: 93197
   Summary: -fpatchable-function-entries :
__patchable_function_entries does not survive under
--gc-sections
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

__patchable_function_entries is not a GC root, and not referenced by a retained
section. It will thus be garbage collected.

The only solution I can think of requires fixes to both GCC and GNU ld.

* GNU ld: implement interaction between SHF_LINK_ORDER and --gc-sections
https://sourceware.org/bugzilla/show_bug.cgi?id=24526
* GCC: Create one __patchable_function_entry section for each function. For
each function `foo`,
  + If foo needs to be placed in a comdat group, place
__patchable_function_entry in the comdat group
  + Otherwise, set the SHF_LINK_ORDER flag of __patchable_function_entry and
set its sh_link to reference the section containing `foo`

[Bug target/92424] [aarch64] Broken code with -fpatchable-function-entry and BTI

2020-01-30 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #8 from Fangrui Song  ---
Where shall we place .cfi_startproc?

Clang HEAD (and clang 10)'s placement is:

foo:
.loc 1 3 0  # line number
.cfi_startproc  # CFI
  bti c
.Lpatch0:   # __patchable_function_entries label
  nop

Not placing .cfi_startproc (as GCC current does) will make addr2line on the
function entry address print ??:0

For M>0, clang does not attach line number information for NOPs before the
function entry label.

[Bug target/93492] Broken code with -fpatchable-function-entry and -fcf-protection=full

2020-01-30 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93492

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #2 from Fangrui Song  ---
On Clang's side. After https://reviews.llvm.org/D73760 , clang -target x86_64
-fpatchable-function-entry=2,0 -fcf-protection=branch -S a.c -g:

.cfi_startproc is placed at the function entry, so that NOPs after the function
entry are in the CFI region
.loc directive is similar. The idea is that addr2line at the function address
should show the correct filename and line, instead of ??:0.

foo:# @foo
.Lfoo$local:
.Lfunc_begin0:
.file   1 "/tmp/c" "a.c"
.loc1 3 0   # a.c:3:0
.cfi_startproc
# %bb.0:# %entry
endbr64
.Lpatch0:
xchgw   %ax, %ax
...
.section   
__patchable_function_entries,"awo",@progbits,foo,unique,0
.p2align3
.quad   .Lpatch0



The section flag "o" and the linkage "unique" (LLVM assembly extensions) are
used to fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93197 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93195 .

I have filed GNU as feature requests
(https://sourceware.org/bugzilla/show_bug.cgi?id=25380
https://sourceware.org/bugzilla/show_bug.cgi?id=25381). GNU ld needs required
garbage collection semantics
https://sourceware.org/ml/binutils/2019-11/msg00266.html)

[Bug target/93492] Broken code with -fpatchable-function-entry and -fcf-protection=full

2020-01-31 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93492

--- Comment #7 from Fangrui Song  ---
> Is -fasynchronous-unwind-tables compatible with -fpatchable-function-entry?

Apparently the Linux kernel does not care about it. To make it usable in
userspace, we should place .cfi_startproc in a reasonable place.
(A more concerning issue is that __patchable_function_entries can be stripped
by -Wl,--gc-sections , as the bug I linked above describes)

 Interaction with -g1 (line table)

% clang -g -fpatchable-function-entry=2 a.c -o a  # latest clang
% addr2line -e a 0x$(nm a | awk '/ main/{print $1}')
/tmp/c/a.c:1

% gcc -g -fpatchable-function-entry=2 a.c -o a
% addr2line -e a 0x$(nm a | awk '/ main/{print $1}')
??:?

For M>0, I think it is fine to leave NOPs before the function entry uncovered
by line table information. clang -fpatchable-function-entry=2,1 layout is the
same as #c2, except for a NOP above foo:

% clang -g -fpatchable-function-entry=2,1 a.c -o a
# or gcc -g -fpatchable-function-entry=2,1 a.c -o a
% addr2line -e a $(nm a | ruby -ane 'print ($F[0].to_i(16)-1).to_s(16) if /
main/')
crtstuff.c:?

[Bug middle-end/93195] -fpatchable-function-entries : __patchable_function_entries should consider comdat groups

2020-02-01 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93195

--- Comment #1 from Fangrui Song  ---
This is similar to --gc-sections
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93536) but a bit different.

The only reasonable fix I can think of is to place __patchable_function_entries
in the same section group.
The ELF spec says:

> A symbol table entry with STB_LOCAL binding that is defined relative to one 
> of a group's sections, and that is contained in a symbol table section that 
> is not part of the group, must be discarded if the group members are 
> discarded. References to this symbol table entry from outside the group are 
> not allowed.

Both GCC and clang reference a .L local symbol in __patchable_function_entries.
The __patchable_function_entries must be discarded when the associated text
section is discarded.
We don't want __patchable_function_entries.foo __patchable_function_entries.bar
because that can waste lots of bytes in .shstrtab .


clang -fpatchable-function-entry=2 -S a.cc b.cc

# COMDAT and SHF_LINK_ORDER are used at the same time
.section   
__patchable_function_entries,"awo",@progbits,_Z3barv,unique,0
.p2align3
.quad   .Lfunc_begin0

.section   
__patchable_function_entries,"aGwo",@progbits,_Z3foov,comdat,_Z3foov,unique,1
.p2align3
.quad   .Lfunc_begin1

Because GNU as and ld don't have the features yet. So when -no-integrated-as is
specified (the output is expected to be consumable by GNU as)

clang -fpatchable-function-entry=2 -no-integrated-as -S a.cc b.cc

## The assembler will combine sections with the same name
## If either .Lfunc_begin0 or .Lfunc_begin1 is discarded, the linker will
report an error.
.section__patchable_function_entries,"aw",@progbits
.p2align3
.quad   .Lfunc_begin0

.section__patchable_function_entries,"aw",@progbits
.p2align3
.quad   .Lfunc_begin1

[Bug target/93492] Broken code with -fpatchable-function-entry and -fcf-protection=full

2020-02-02 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93492

--- Comment #11 from Fangrui Song  ---
(In reply to H.J. Lu from comment #8)
> Created attachment 47762 [details]
> A patch to handle targetm.asm_out.post_cfi_startproc

I don't work on GCC, so I am hoping other x86 maintainers can review. (I know
close to zero about its build system. "How to work on GCC" is not well
documented. I can play with stage1-gcc/xgcc -B stage1-gcc -fsyntax-only
/tmp/c/a.c  but I don't even know how to build stage1 only)


For tests, I think at least 3 configurations should be tested.

-fpatchable-function-entry=0 -fcf-protection=branch
-fpatchable-function-entry=1 -fcf-protection=branch
-fpatchable-function-entry=2,1 -fcf-protection=branch

I am a bit concerned about the introduction of cfi_startproc_emitted

My idea is that NOPs after the function entry label should really be a arch
specific feature. It should be implemented like a pass beside
make_pass_insert_endbranch. We build the function body, then prepend NOPs, then
prepend endbr64. That may be cleaner.

[Bug driver/93645] New: Support -fuse-ld=/absolute/path/to/ld

2020-02-09 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93645

Bug ID: 93645
   Summary: Support -fuse-ld=/absolute/path/to/ld
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: driver
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

This feature request generalizes -fuse-ld=bfd -fuse-ld=gold
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55470
and
-fuse-ld=lld

clang -fuse-ld= also supports the following forms:

-fuse-ld=/path/to/binutils-gdb/Debug/ld/ld-new
-fuse-ld=/path/to/ld.lld
-fuse-ld=/usr/bin/ld.lld-9

[Bug driver/93645] Support -fuse-ld=/absolute/path/to/ld

2020-02-09 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93645

--- Comment #1 from Fangrui Song  ---
Posted a patch https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00510.html


I agree with 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59321#c4
we should use a new option, instead of overloading --print-prog-name=ld for a
different meaning

gcc --print-prog-name=ld -fuse-ld=bfd => ld.bfd

[Bug driver/52982] add option to select particular linker

2020-02-09 Thread i at maskray dot me
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52982

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #4 from Fangrui Song  ---
I posted a patch https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00510.html to
make -fuse-ld=linker generic (absolute path or ld.linker)

[Bug c/99587] New: warning: ‘retain’ attribute ignored while __has_attribute(retain) is true

2021-03-14 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99587

Bug ID: 99587
   Summary: warning: ‘retain’ attribute ignored while
__has_attribute(retain) is true
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

If configure-time ld does not support SHF_GNU_RETAIN,  __has_attribute(retain)
may be true while using it will cause a warning.

% cat x.c
#if defined(__has_attribute) && __has_attribute(retain)
__attribute__((used, retain)) int a;
#endif
% ~/Dev/gcc/out/release/gcc/xgcc -B ~/Dev/gcc/out/release/gcc -c x.c
x.c:1:1: warning: ‘retain’ attribute ignored [-Wattributes]
1 | __attribute__((used, retain)) int a;
  | ^
% ~/Dev/gcc/out/release/gcc/xgcc --version  
xgcc (GCC) 11.0.1 20210313 (experimental)
...


__has_attribute(retain) should return 0 in this case.

[Bug c/99587] warning: ‘retain’ attribute ignored while __has_attribute(retain) is 1

2021-03-16 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99587

--- Comment #6 from Fangrui Song  ---
(In reply to Jakub Jelinek from comment #5)
> (In reply to Florian Weimer from comment #4)
> > For retain, something along these lines might work:
> > 
> > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > index c1f652d1dc9..cdae464ab8a 100644
> > --- a/gcc/c-family/c-attribs.c
> > +++ b/gcc/c-family/c-attribs.c
> > @@ -329,8 +329,10 @@ const struct attribute_spec c_common_attribute_table[] 
> > =
> >   handle_used_attribute, NULL },
> >{ "unused", 0, 0, false, false, false, false,
> >   handle_unused_attribute, NULL },
> > +#if SUPPORTS_SHF_GNU_RETAIN
> >{ "retain", 0, 0, true,  false, false, false,
> >   handle_retain_attribute, NULL },
> > +#endif
> >{ "externally_visible", 0, 0, true,  false, false, false,
> >   handle_externally_visible_attribute, NULL },
> >{ "no_reorder",0, 0, true, false, false, false,
> > 
> > In other cases, it's more difficult because those are subtarget-dependent.
> 
> Doing the above would "fix" __has_attribute, but on the other side would mean
> the compiler would not know how many and what kind of operands the attribute
> has, whether it is for function declarations, other declarations, types or
> what
> etc., so for invalid code it would have inconsistent diagnostics.

Are you willing to properly fix it? :)

I implemented the attribute on clang (https://reviews.llvm.org/D97447).
__has_attribute(retain) is always 1 and there is no ignored diagnostic,
regardless of the target (even if non-ELF), and __has_attribute(retain) works
in assembly mode as well. This is intentional so that: with bleeding-edge
toolchain, non-ELF targets don't need macros to decide whether 'retain' should
be added.


Ultimately, I want the glibc static linking problem with ld -z start-stop-gc
fixed
https://sourceware.org/pipermail/libc-alpha/2021-March/123833.html
(glibc has -Wattributes, so __has_attribute(retain)=1 && "warning: ‘retain’
attribute ignored" can cause some inconvenience.)

And I hope eventually ld -z start-stop-gc can be the default.

[Bug libgcc/99759] New: morestack.S should support .init_array.0 besides .ctors.65535

2021-03-24 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99759

Bug ID: 99759
   Summary: morestack.S should support .init_array.0 besides
.ctors.65535
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

to drop reliance on ld's default linker script

  .init_array:
  {
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*)
SORT_BY_INIT_PRIORITY(.ctors.*)))
KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o
*crtend?.o ) .ctors))
PROVIDE_HIDDEN (__init_array_end = .);
  }

The input section description is quite close but does not sort .init_array.*
and .ctors.* with the same priority together.

[Bug target/99836] New: aarch64: -fpatchable-function-entry=N[,0] should place .cfi_startproc before NOPs

2021-03-30 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99836

Bug ID: 99836
   Summary: aarch64: -fpatchable-function-entry=N[,0] should place
.cfi_startproc before NOPs
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

Extracted from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424#c8

% echo 'int main() {}' > a.c
% clang --target=aarch64 -fpatchable-function-entry=2
-mbranch-protection=standard -S a.c -o -
...
main:   // @main
.Lfunc_begin0:
.cfi_startproc
// %bb.0:   // %entry
hint#34
.Lpatch0:
nop
nop

%
/tmp/glibc-many/install/compilers/aarch64-linux-gnu/bin/aarch64-glibc-linux-gnu-g++
-fpatchable-function-entry=2 -mbranch-protection=standard -S a.c -o -
.arch armv8-a
.file   "a.c"
.text
.align  2
.global main
.type   main, %function
main:
hint34 // bti c
.section__patchable_function_entries,"aw",@progbits
.align  3
.8byte  .LPFE1
.text
.LPFE1:
nop
nop
.LFB0:
.cfi_startproc


For -fpatchable-function-entry=N[,0], placing .cfi_startproc before NOPs makes
more sense and can make unwinding work in that region.

For N[,M] where M>0, that is a very narrow use case by the Linux kernel. I
prefer not to place .cfi_startproc above the function label.

[Bug gcov-profile/97507] New: Move __gcov_exit from per-object .fini_array.00100 to libgcov

2020-10-20 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97507

Bug ID: 97507
   Summary: Move __gcov_exit from per-object .fini_array.00100 to
libgcov
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Per object file .fini_array.00100 wastes space. __gcov_exit can be called in
libgcov. It can be registered via atexit (if first run) in __gcov_init.

The Linux kernel does not call destructors and currently discards .fini_array
and .fini_array.*  `gcc -fprofile-arcs` is currently one reason that
.fini_array needs to be discarded (another reason is kasan. I don't know other
reasons)

[Bug tree-optimization/66512] PRE fails to optimize calls to pure functions in C++, ok in C

2020-11-24 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66512

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #4 from Fangrui Song  ---
Should this be reopened?

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html 'const' is
not clarified on its interaction with threads
(https://gcc.gnu.org/legacy-ml/gcc/2015-09/msg00365.html)

and 

void f()
{
  for (;;)
g(p());
}

is still pessimized for C++ (I tend to agree that 'const' should imply
'nothrow'; even if no, the #c2 case should be resolved properly)

[Bug target/98063] New: Emit R_X86_64_GOTOFF64 instead of R_X86_64_GOTPCRELX for -mcmodel=large -fno-plt

2020-11-29 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98063

Bug ID: 98063
   Summary: Emit R_X86_64_GOTOFF64 instead of R_X86_64_GOTPCRELX
for -mcmodel=large -fno-plt
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

% cat a.c
#include 

int main() {
  puts("meow");
}

% gcc -mcmodel=large -fno-plt -O1 -S a.c -fpic -o - -O2
-fno-asynchronous-unwind-tables
...
main:
.L2:
movabsq $_GLOBAL_OFFSET_TABLE_-.L2, %r11
subq$8, %rsp
leaq.L2(%rip), %rax
movabsq $.LC0@GOTOFF, %rdx
addq%r11, %rax
leaq(%rax,%rdx), %rdi
call*puts@GOTPCREL(%rip)
xorl%eax, %eax
addq$8, %rsp
ret

The distance between the GOT entry and the next instruction of call may be
larger than 32-bit. An R_X86_64_GOTPCRELX relocation cannot be used.

[Bug c/98112] New: Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC

2020-12-02 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112

Bug ID: 98112
   Summary: Add -fdirect-access-external-data & drop
HAVE_LD_PIE_COPYRELOC
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

After "x86-64: Optimize access to globals in PIE with copy reloc", GCC x86-64
asks the assembler to produce an R_X86_64_PC32 for an external data access.

* It introduced a configure-time variable HAVE_LD_PIE_COPYRELOC which has a
misleading name: PC32 does not necessarily cause a copy relocation.
  If the external data
* It affects users who want to configure GCC not to emit R_X86_64_PC32 for an
external data access so that copy relocations can be avoided if the data turns
out to be defined in a different shared object/executable
* While it made sense (in turns of performance) before H.J. Lu added GOTPCRELX
to x86-64, it hardly matters if any nowadays.
* This optimization can actually benefit non-x86-64. An option is more
suitable.

In Clang, the GCC style HAVE_LD_PIE_COPYRELOC is implemented as
-mpie-copy-relocations, which has a misleading name.
I agree that this should be implemented as an option, instead of a
configure-time variable.

I suggest that we add a new architecture-independent option
-f[no-]direct-access-external-data (I am happy to add a similar one in Clang
once consensus is made) and delete HAVE_LD_PIE_COPYRELOC. The option means
whether a direct access (PC-relative relocation) can be generated for an
external data access.
The value can default to true for -fno-pic code (it seems that most
architectures behave this way).
For non-x86-64, the value defaults to false for -fpie/-fpic code (I believe
most architectures use a GOT).

In the future, for x86-64, please consider defaulting to
-fno-direct-access-external-data for -fpie/-fpic so that issues related to
STV_PROTECTED data can be properly fixed (see my analysis last year
https://gcc.gnu.org/legacy-ml/gcc/2019-05/msg00215.html )

[Bug target/98112] Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC

2020-12-03 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112

--- Comment #2 from Fangrui Song  ---
Note: -fdirect-access-external-data is architecture-independent. For example,
currently Clang on aarch64 can perform the following optimization:

// clang -target aarch64 -fPIE -O3
  adrpx8, :got:var
  ldr x8, [x8, :got_lo12:var]
  ldr w0, [x8]
  ret
// clang -target aarch64 -fPIE -O3 -mpie-copy-relocations
  adrpx8, var
  ldr w0, [x8, :lo12:var]
  ret

A better name for -mpie-copy-relocations is -fno-direct-access-external-data:

  1. the option can affect -fno-pic and -fpic
  2. for -no-pie and -pie links, there is not necessary a copy relocation
  (-fpic can use this option as well, but keep in mind that DSOs do not support
copy relocations. So if such code is used for -shared links and the data turns
out to be undefined, the linker will reject the object file)

---

The second thing about the feature request is that x86-64 should default to
-fno-direct-access-external-data for -fpie to address the protected symbol
issues.
(-fno-direct-access-external-data for -fpie is the behavior on most
architectures.)

  (1): PC32 referencing a protected function is unnecessarily rejected in a
-shared link (this also affects aarch64)
  // gcc -fpic -fuse-ld=bfd -shared -fvisibility=protected b.c => relocation
R_X86_64_PC32 against protected symbol `f' can not be used when making a shared
object
  // aarch64-linux-gnu-gcc -fpic -fuse-ld=bfd -shared -fvisibility=protected
b.c => relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `f' which may bind
externally can not be used when making a shared object; recompile with -fPIC
  // gold is good

  void f() {}
  void *g() { return &f; }

This can be fixed by making GNU ld more permissive.

  (2) protected data access can use slightly more efficient PC32. Currently it
uses the slightly pessimized REX_GOTPCRELX.
  int a __attribute__((visibility("protected")));
  int f() { return a; }

[Bug gcov-profile/98257] New: Replace Donald B. Johnson's cycle enumeration with iterative loop finding

2020-12-12 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98257

Bug ID: 98257
   Summary: Replace Donald B. Johnson's cycle enumeration with
iterative loop finding
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

gcov used _J. C. Tiernan, An Efficient Search Algorithm to Find the Elementary
Circuits of a Graph, Comm ACM 1970_. The worst-case time bound is exponential
in the number of elementary circuits. It enumerated cycles (aka simple circuit,
aka elementary circuit) and performed cycle cancelling.

In 2016, the resolution to PR67992 switched to Donald B. Johnson's algorithm to
improve performance. The theoretical time complexity is $O((V+E)(c+1))$ where
$c$ is the number of cycles, which is exponential in the size of the graph.
(Boost attributed the algorithm to K. A. Hawick and H. A. James, and gcov
inherited this name. However, that paper did not improve Johnson's algorithm.)

Actually every step of cycle cancelling decreases the count of at lease one arc
to 0, so there is at most $O(E)$ cycles. The resolution to PR90380 skipped
non-positive arcs and decreased the time complexity to $O(V*E^2)$ (in theory it
could be $O(E^2)$ but the implementation has a linear scan).

This is all unnecessary. We can just iteratively find cycles (using the
classical tri-color DFS) and perform cycle cancelling. There are at most O(E)
cycles and the overall time complexity is O(E^2). 

(
We are processing a reducible flow graph (there is no intuitive cycle count for
an irreducible flow graph).
Every natural loop is identified by a back edge. By constructing a dominator
tree, finding back edges, identifying natural loops and clearing the arc
counters (we will compute incoming counts so we clear counters to prevent
duplicates), the time complexity can be decreased to $O(depthOfNestedLoops*E)$.
In practice, the semi-NCA algorithm (time complexity: $O(V^2)$, but considered
faster than the almost linear Lengauer-Tarjan's algorithm) is not difficult to
implement, but identifying natural loops is troublesome. So the method is not
useful.)

[Bug target/97827] bootstrap error building the amdgcn-amdhsa offload compiler with LLVM 11

2020-12-14 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97827

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #9 from Fangrui Song  ---
I want to know whether this is really a wontfix on GCC's side.

Richard Sandiford on
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559572.html

"I'm not saying we should bend over backwards to support difficult
quirks.  But here we're talking about a choice between (a) doing
something that works “everywhere” unconditionally (and keeping things
simple) vs. (b) having both code that takes a shortcut and code that
doesn't take a shortcut and trying to predict which one we should do."

This makes a lot of sense to me. For the LLVM "fix", we had not known this PR
before https://reviews.llvm.org/D92052#2452577
To me personally, I might have a different opinion if I knew this is not an
entire dead end on gcc -S output.

[Bug target/97827] bootstrap error building the amdgcn-amdhsa offload compiler with LLVM 11

2020-12-14 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97827

--- Comment #10 from Fangrui Song  ---
Note: the section key is not just (name, group name "G"). It is a quadruple:
(name, group name "G", linked-to "o", unique ID)

Keeping just name works for the simplest case.

If GCC decides to support PR95095  -fno-unique-section-names, unique ID can be
common. https://sourceware.org/bugzilla/show_bug.cgi?id=25490#c3 added the
support for `.section ,unique` to GNU as.

[Bug target/98112] Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC

2020-12-14 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112

--- Comment #3 from Fangrui Song  ---
Are you happy with the option name -f[no-]direct-access-external-data ?
https://reviews.llvm.org/D92633 is what I want to add to Clang.

I want GCC and Clang to use the same option names...

[Bug middle-end/93195] -fpatchable-function-entries : __patchable_function_entries should consider comdat groups

2020-12-16 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93195

--- Comment #10 from Fangrui Song  ---
(In reply to Jakub Jelinek from comment #9)
> I believe this broke building the kernel, see
> https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561974.html
> for details.

For

> ld: .init.data has both ordered [`__patchable_function_entries' in 
> init/main.o] and unordered [`.init.data' in 
> ./drivers/firmware/efi/libstub/vsprintf.stub.o] sections

ld should be flexible in mixed SHF_LINK_ORDER & non-SHF_LINK_ORDER components
in an output section
https://sourceware.org/bugzilla/show_bug.cgi?id=26256

[Bug c/94722] implement __attribute__((no_stack_protector)) function attribute

2020-12-16 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #7 from Fangrui Song  ---
(In reply to Martin Liška from comment #6)
> Implemented.

#include 
void foo(const char *a) { char b[34]; strcpy(b, a); }
__attribute__((no_stack_protector))
void bar(const char *a) { foo(a); }


#include 
__attribute__((no_stack_protector))
void foo(const char *a) { char b[34]; strcpy(b, a); }
void bar(const char *a) { foo(a); }

In both cases, foo can be inlined.

In Clang, the recent resolution https://reviews.llvm.org/D91816 is that a ssp
function cannot be inlined into a nossp function and a nossp function cannot be
inlined into a ssp function.

I think one argument for the no-inline behavior is that ssp conveys the
security enforcement intention and the GCC behavior may degrade the security
hardening while inlining a ssp chunk.

Previously Clang upgraded the caller from nossp to ssp after inlining. However,
that behavior caused
https://lore.kernel.org/lkml/20200422192113.gg26...@zn.tnic/T/#t
(the caller may not have set up %gs and upgrading it to ssp can break it)

The new Clang behavior also disallows a nossp callee from being inlined into a
ssp caller. That makes the rules easier to explain but I haven't thought very
clearly about the implications though.

[Bug target/98112] Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC

2020-12-28 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112

--- Comment #5 from Fangrui Song  ---
(In reply to Segher Boessenkool from comment #4)
> (In reply to Fangrui Song from comment #3)
> > Are you happy with the option name -f[no-]direct-access-external-data ?
> 
> Not at all, no :-(
> 
> The name does not explain its purpose at all, and the whole concept only
> makes sense for a fraction of all targets.

> A -mcopy-relocs ("generate copy
> relocations if that is a good idea"), defined *per target*, would be a lot
> better, or a -mpic-use-copy-relocs (since you say it is *not* just for pie),
> or something like that.

Please read my first comment why copy relocs is a bad name. The compiler
behavior is whether the external data symbol is accessed directly/indirectly.
Copy relocs is just the inferred ELF linker behavior (in -no-pie/-pie link
mode) when the symbol is external. The option name should mention the direct
behavior, instead of the inferred behavior at the linking stage.

-fdirect-access-external-data makes sense on other binary formats, though I
won't ask GCC to
implement relevant behaviors for other binary formats.

* For example, on COFF, the behavior is like always
-fdirect-access-external-data.  __declspec(dllimport) is needed to use indirect
access.
* On Mach-O, the behavior is like -fdirect-access-external-data for -fno-pic
(only available on arm) and the opposite for -fpic.

If you don't want to think of non-ELF, feel free to make the option specific to
ELF.
Also feel free to make it specific to -fno-pic/-fpie (disallowed for -fpic).
I have no plan to implement Clang -fdirect-access-external-data for -fpic as
well.

> You want to have this a generic option, while it is
> not clear at all what it would mean, what it would *do*, which is especially
> important if you want this to be an option used by multiple compilers: if it
> is not clear to every user what simple, sensible thing a flag is the knob
> for, that flag simply cannot be used at all -- or worse, some users *will*
> use it, but then their intentions are not clear to humans, and different
> compilers can (and will!) think the user wanted something else!

To be clear, GCC botched things with the inappropriate HAVE_LD_PIE_COPYRELOC
and I made the proposal to (1) let non-x86-64 leverage the missing optimization
for -pie (2) eventually fix the x86-64 STV_PROTECTED story.
I have considered all the potential simplification of internal representations
for Clang this option will enable.
(llvm/lib/Target/TargetMachine.cpp shouldAssumeDSOLocal can be further
simplified with this option)

[Bug target/98112] Add -fdirect-access-external-data & drop HAVE_LD_PIE_COPYRELOC

2020-12-28 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112

--- Comment #7 from Fangrui Song  ---
(In reply to Segher Boessenkool from comment #6)
> (In reply to Fangrui Song from comment #5)
> > Please read my first comment why copy relocs is a bad name.
> 
> Since I reply to some of that (namely, your argument 1)), you could assume I
> have read your comment already ;-)
> 
> > The compiler
> > behavior is whether the external data symbol is accessed
> > directly/indirectly.
> 
> Not really, no.  It isn't clear at all what "directly" even means!

> > Copy relocs is just the inferred ELF linker behavior
> > (in -no-pie/-pie link mode) when the symbol is external. The option name
> > should mention the direct behavior, instead of the inferred behavior at the
> > linking stage.
> 
> Yes.  But your proposed solution just makes this worse :-(

I try to use one term to describe absolute/PC-relative relocation types (e.g.
R_X86_64_64, R_X86_64_PC32)...
"Indirect" means GOT-generating relocation types and (PowerPC64) TOC-generating
relocation types.

"direct/indirect" are more descriptive and more accurate than "copy relocs"
(which is not the case if the symbol turns out to be defined locally; this term
does not apply to other binary formats).

> > -fdirect-access-external-data makes sense on other binary formats, though I
> > won't ask GCC to
> > implement relevant behaviors for other binary formats.
> 
> But what does that *mean*?  "direct access"?  (And, "external data", for that
> matter!  This isn't as obvious as it was thirty years ago.)

In PowerPC64 ELF v2, the term "GOT-indirect addressing" is used,
In x86-64 psABI, there is a section "Indirect Call via the GOT Slot".
Indirect calls/jumps are pretty common - so it is understood that GOT
relocation types generally mean "indirect".

"external data" is the best term I find for things like `extern int var;`
It means the data symbol is undefined in the current translation unit but may
be defined
in another translation unit or another linked unit.

> > * For example, on COFF, the behavior is like always
> > -fdirect-access-external-data.  __declspec(dllimport) is needed to use
> > indirect access.
> 
> I don't know what "declspec" is.  Something something mswindows?

Yes. `extern int var; int foo() { return var; }` compiles to `movl var(%rip),
%eax` (a "direct access" (PC-relative) relocation type).
Its behavior is like always -fdirect-access-external-data.
__declspec(dllimport) annotation can override the command line option.

> > * On Mach-O, the behavior is like -fdirect-access-external-data for -fno-pic
> > (only available on arm) and the opposite for -fpic.
> 
> So what you want is that object that are globally visible will be implemented
> as-is?  For if you do not do whole-program optimisation, for example?  So
> that
> a) those objects will actually *exist*, and b) they will be laid out in the
> way
> the program expects?

Undefined global objects and address-taken functions in the current translation
unit are affected.
A function taken address is very like a data symbol:

```
// gcc -fno-pic generates an absolute relocation type. If foo is defined in a
DSO,
// it will require a "canonical PLT entry" (st_shndx=0, st_value!=0) - a hack
agreed by the linker and ld.so
extern void foo();
void *addr() { return foo; }
```

The default ELF behavior on most architectures is: -fno-pic uses an absolute
relocation type while (non-x86-64) -fpie uses a GOT-generating relocation type
(x86-64) -fpie uses PC-relative.

If -fno-direct-access-external-data is specified, -fno-pic/-fpie will use
GOT-generating relocation types
to prevent
* copy relocations if the symbol turns out to be undefined in the module.
* canonical PLT entry for an address-taken function.

The proposed option is local to a translation unit (like most options).
However, if this information is recorded in LTO IR files, the optimizer can
assume the variable can be referenced via a direct relocation type in the
combined IR file.

> > If you don't want to think of non-ELF, feel free to make the option specific
> > to ELF.
> 
> The problem is not that I don't want to think about it, but that the way it
> seems to be defined only applies to ELF (and to some specific (sub-)targets
> using ELF, even).

As I mentioned earlier, this applies to other binary formats.  I'll just show
you evidence by pointing you directly to the code ;-)

In LLVM, generally speaking, a dso_local undefined global object is accessed
directly while a non-dso_local undefined global object is accessed via GOT
indirection.

In Clang, dso_local annotation is added in
https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CodeGenModule.cpp#L913-L988
(The internal abstraction is currently a bit unfortunate. LLVM IR has another
set of rules (many are duplicated)
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/TargetMachine.cpp#L94-L178
I intend to eventually clean up the LLVM IR side rules)
(Attributes generally supersede 

[Bug target/95095] Feature request: support -fno-unique-section-names

2021-01-16 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095

--- Comment #3 from Fangrui Song  ---
(In reply to Segher Boessenkool from comment #2)
> Can't we use ".text%name" for -ffunction-sections, like we did originally,
> in 1996?  See cf4403481dd6.  This does not conflict with other section
> names, and does not have all the problems you get from doing anything that
> is not a simple prefix.

A function named 'foo' compiles to '.text%foo'? It might have been better to
avoid conflicts with '.text.startup' '.text.hot' etc but now such a change
would just inconvenience users (think of various Linux kernel linker script
fragments).

.text%name does not address -fno-unique-section-names.

[Bug libstdc++/98785] New: _Unwind_ForcedUnwind going through a non-empty exception specification

2021-01-21 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98785

Bug ID: 98785
   Summary: _Unwind_ForcedUnwind going through a non-empty
exception specification
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

gcc/testsuite/g++.dg/eh/forced3.C says forced unwinding calls std::unexpected
going through a throw() function.

gcc/testsuite/g++.dg/eh/forced4.C says forced unwinding does not call
std::unexpected going through a throw(int) function.

The behavior looks strange: if we consider forced unwinding a special exception
type, both throw() and throw(int) should catch it.


Note: for nothrow, GCC emits minimum amount of .gcc_except_table section.
forced unwinding calls std::terminate.

[Bug target/95095] Feature request: support -fno-unique-section-names

2021-01-21 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095

--- Comment #5 from Fangrui Song  ---
Linux kernel

include/asm-generic/vmlinux.lds.h currently has
#define TEXT_TEXT   \
ALIGN_FUNCTION();   \
*(.text.hot .text.hot.*)\
*(TEXT_MAIN .text.fixup)\
*(.text.unlikely .text.unlikely.*)  \
*(.text.unknown .text.unknown.*)\
NOINSTR_TEXT\
*(.text..refcount)  \
*(.ref.text)\
MEM_KEEP(init.text*)\
MEM_KEEP(exit.text*)\

If you change .text.* to .text%* , this script will need a change, along with
other projects which use or adapt GNU ld's built-in linker script

  .text   :
  {
*(.text.unlikely .text.*_unlikely .text.unlikely.*)
*(.text.exit .text.exit.*)
*(.text.startup .text.startup.*)
*(.text.hot .text.hot.*)
*(SORT(.text.sorted.*))
*(.text .stub .text.* .gnu.linkonce.t.*)
/* .gnu.warning sections are handled specially by elf.em.  */
*(.gnu.warning)
  }

By default, -fno-unique-section-names produces '.text' instead of '.text.foo'
in the normal -ffunction-sections case.

For PGO, -fno-unique-section-names produces '.text.hot.' instead of
'.text.hot.foo' in the normal -ffunction-sections case.

'.text.hot.' is an attempt to distinguish PGO caused 'hot' from a regular
functions named 'hot'.

[Bug target/95095] Feature request: support -fno-unique-section-names

2021-01-25 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095

--- Comment #7 from Fangrui Song  ---
(In reply to Segher Boessenkool from comment #6)
> I was under the impression this unique section thing needed the trailing
> dot thing.  This probably is not true.
> 
> I still think the old "%" thing is much superior to the trailing dot thing,
> but that then is orthogonal to the "unique section" thing, so let's ignore
> it now :-)
> 
> It still remains that this flag needs a name that says what it *does*, as I
> mentioned at the end of Comment 4.

-ffunction-sections -fno-unique-section-names =>

.text.%
.text.startup.%
.text.hot.%
.text.cold.%
...

?

I agree that it is superior. If GCC wants to support this scheme, that looks
fine to me. It is likely that I can migrate Clang to this scheme as well.

I think

.text%
.text.startup%
.text.hot%
.text.cold%
...

is slightly worse.

[Bug target/95095] Feature request: support -fno-unique-section-names

2021-01-25 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95095

--- Comment #9 from Fangrui Song  ---
(In reply to Segher Boessenkool from comment #8)
> I say nothing like that.  I say that
>   .text.hot.
> is nasty (is easily mistaken for .text.hot).
> 
> I also say that and that named-per-function sections are better as
>   .text%name
> than as
>   .text.name
> (just as they were long ago), because this doesn't conflict with things like
>   .text.hot
> (and there is a very long history of such conflicts giving real-world
> problems).

.text%name and .text.hot%name will break existing output section descriptions
for .text

My scheme .text.% .text.hot.% is backward compatible.

[Bug c/99282] New: Emit .cfi_sections without arguments for -fno-asynchronous-unwind-tables

2021-02-25 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99282

Bug ID: 99282
   Summary: Emit .cfi_sections without arguments for
-fno-asynchronous-unwind-tables
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

.cfi_* in inline asm is rare, but can be useful if the user wants precise
unwind information.

% cat a.c
int main() {
  asm("pushl 0\n.cfi_adjust_cfa_offset 4\npop %%eax\n.cfi_adjust_cfa_offset -4"
::: "eax");
}
% gcc -m32 -c -fomit-frame-pointer -fno-asynchronous-unwind-tables a.c
a.c: Assembler messages:
a.c:3: Error: CFI instruction used without previous .cfi_startproc
a.c:5: Error: CFI instruction used without previous .cfi_startproc

-fasynchronous-unwind-tables & -fno-asynchronous-unwind-tables do not have a
predefined macro, so it is difficult for the inline asm to know whether CFI
directives should be used. For ergonomics, users just want to write CFI
directives and hope they will be silently ignored in
-fno-asynchronous-unwind-tables mode. However, GNU as errors for .cfi_* without
.cfi_startproc .

I suggest that (1) GCC emits ".cfi_sections" (no argument) at the beginning,
(2) GNU as suppresses the error if no .eh_frame/.debug_frame is needed (feature
request: https://sourceware.org/bugzilla/show_bug.cgi?id=27472).

[Bug inline-asm/99282] Emit .cfi_sections without arguments for -fno-asynchronous-unwind-tables

2021-02-26 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99282

--- Comment #2 from Fangrui Song  ---
(In reply to Jakub Jelinek from comment #1)
> There is the __GCC_HAVE_DWARF2_CFI_ASM predefined macro that tells if .cfi*
> directives are used or not.  And, inline asm that wishes to be usable in
> both can use that.

Thanks. I did not know this macro. So the user writing inline asm does have a
way to know whether .cfi_* should be inserted. If you think emitting
`.cfi_sections` is unnecessary, I am fine and happy that this is closed.

(GCC already generates `.cfi_sections .debug_frame\n`, so perhaps supporting
`.cfi_sections\n` is not that costly? :) Users will newer toolchain can be a
bit happier - they don't need to do `#ifdef __GCC_HAVE_DWARF2_CFI_ASM`).

[Bug demangler/100437] New: libiberty: Support more characters for function clones

2021-05-05 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100437

Bug ID: 100437
   Summary: libiberty: Support more characters for function clones
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: demangler
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

In the demangler, the ('.' (alpha|'_')+) ('.' digit+)* scheme as implemented
for PR40831 allows a decimal but not a hexadecimal.
It'd be great to support a hexadecimal (or more characters e.g. base64).
There are at least two use cases in clang now.

1. In Clang ThinLTO, a local symbol needs to be promoted to a global symbol so
that it can be imported into other modules.
  Such a symbol gets a suffix with a hash (a simple increasing ID scheme cannot
avoid collision), e.g. _ZL5localv.llvm.104029495979337208

  % c++filt <<< _ZL5localv.llvm.104029495979337208
  local() [clone .llvm.104029495979337208]

  # A suffix with mixed digits and letters (e.g. many hexadecimals) doesn't
work.
  % c++filt <<< _ZL5localv.llvm.11aa
  _ZL5localv.llvm.11aa

2. clang -funique-internal-linkage-names -c a.cc  # use clang trunk
  (Improve profile accuracy for local symbols)
  There is a long decimal representation of a MD5 module hash.

  _ZL5localv.__uniq.247706070344499593425200173608446019371

If more digits are allowed, clang can switch to that so that shorter symbol
names can be used, saving .strtab space.


I understand that the original digit/letter separation is to allow multiple
clones.
There should be some way supporting more characters.
If it is not useful to know there are 4 clones, just lift the restriction?

% c++filt <<< _ZL5localv.llvm.aaa.000.bbb.111.ccc.222
local() [clone .llvm] [clone .aaa.000] [clone .bbb.111] [clone .ccc.222]

[Bug c/100483] New: Extend -fno-semantic-interposition to global variables

2021-05-07 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483

Bug ID: 100483
   Summary: Extend -fno-semantic-interposition to global variables
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

% cat a.c
int var;
int foo() { return var; }


(I implemented this for clang 11 x86)
% clang -fpic -fno-semantic-interposition -O2 -S a.c
% cat a.s
...
foo:# @foo
.Lfoo$local:
# %bb.0:# %entry
movl.Lvar$local(%rip), %eax
retq
...
var:
.Lvar$local:
.long   0   # 0x0
.size   var, 4


# On x86-64, because of R_X86_64_REX_GOTPCRELX, it isn't too bad without the
optimization.
# This is more useful on other architectures without GOT optimization.
# With my clang patch https://reviews.llvm.org/D101873
% clang -target aarch64 -fpic -fno-semantic-interposition
-fno-asynchronous-unwind-tables -O2 -S a.c
% cat a.s
...
foo:// @foo
.Lfoo$local:
// %bb.0:   // %entry
adrpx8, .Lvar$local
ldr w0, [x8, :lo12:.Lvar$local]
ret

[Bug c/100593] New: [ELF] -fno-pic: Use GOT to take address of an external default visibility function

2021-05-13 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593

Bug ID: 100593
   Summary: [ELF] -fno-pic: Use GOT to take address of an external
default visibility function
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

Most ELF targets use an absolute relocation (e.g. R_X86_64_32) to take the
address of a default visibility non-definition function declaration.
The absolute relocation can cause a canonical PLT entry (st_shndx=0,
st_value!=0; The term is a parlance within a few LLD developers, but not
broadly adopted).
If the defining DSO is linked with Bsymbolic-functions (or -Bsymbolic), the
addresses taken within the DSO and outside of the DSO will be different.
Since C++ requires uniqueness of the address, this violates the language
standard.

Outside of the GNU ELF world, many dynamic linking implementations have shifted
to a direct binding and non-interposition by default world.
We have rants from people complaining about shared object performance.
(e.g.
https://lore.kernel.org/lkml/CAHk-=whs8QZf3YnifdLv57+FhBi5_WeNTG1B-suOES=rcus...@mail.gmail.com/
"Re: Very slow clang kernel config .."
https://www.facebook.com/dan.colascione/posts/10107358290728348 "Python is 1.3x
faster when compiled in a way that re-examines shitty technical decisions from
the 1990s.")
I believe ld -Bsymbolic-functions can materialize most of the savings other
implementations provide, without introducing complex things to ELF.
However, since -Bsymbolic-functions doesn't play well with -fno-pic's canonical
PLT entries, we should fix -fno-pic.

Converting a direct access to a GOT access for a function symbol cannot be in a
performance critical path,
so let's just do it.
Static linking is happy, too - the linker can either optimize out the GOT
(x86-64 GOTPCRELX, PPC64 TOC) or prefill the GOT entry with
a constant.

Once -fno-pic has the sane behavior (GOT by default), more and more shared
objects can be optionally built with -Bsymbolic-functions -
if they don't intend to support interposition, while still being compatible
with -fno-pic executables.

How effective is -Bsymbolic-functions? As a data point, my x86_64 Linux kernel
defconfig build with -Bsymbolic-functions linked Clang is 15% faster.
(83% JUMP_SLOT relocations are eliminated!)

% cat a.c
extern void fun();
void *get() { return (void*)fun; }

% gcc -fno-pic -S a.c -O2 -o -
get:
.LFB0:
.cfi_startproc
movl$fun, %eax
ret
% aarch64-linux-gnu-gcc -fno-pic -S a.c -O2 -o -
...
adrpx0, fun
add x0, x0, :lo12:fun

# good, ppc64 elfv2 always uses TOC
% powerpc64le-linux-gnu-gcc -fno-pic -S a.c -O2 -o -
...
addis 3,2,.LC0@toc@ha
ld 3,.LC0@toc@l(3)

[Bug c/100483] Extend -fno-semantic-interposition to global variables

2021-05-13 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483

--- Comment #1 from Fangrui Song  ---
Another request is a new option: -fno-semantic-interposition-function. With
this option, we only assume functions cannot be interposed.
-fno-semantic-interposition assumes both functions and variables cannot be
interposed.

[Bug c/100618] New: Add a -fno-semantic-interposition variant which allows variable interposition

2021-05-15 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100618

Bug ID: 100618
   Summary: Add a -fno-semantic-interposition variant which allows
variable interposition
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

Extracted from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483

The documentation says -fno-semantic-interposition applies to variables.

Having an option which only apply to external linkage function definitions will
be useful. Assuming no-variable-interposition is unfortunately incompatible
with the plethora of copy relocations: -fno-pic emits direct access relocations
referencing a global variable. If the global variable turns out to be defined
in a shared object, there will be a copy relocation in the executable. The
object the shared object sees and the executable sees will be different.

See
https://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic#the-last-alliance-of-elf-and-men
for more context.

[Bug c/100618] Add a -fno-semantic-interposition variant which allows variable interposition

2021-05-15 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100618

--- Comment #1 from Fangrui Song  ---
Perhaps

-fsemantic-interposition=function,variable (default -fpic/-fPIC)
-fsemantic-interposition=variable   (compatible with copy relocations but
enable function optimizations)
-fsemantic-interposition=  (alias: -fno-semantic-interposition)

?

[Bug c/100483] Extend -fno-semantic-interposition to global variables

2021-05-16 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483

Fangrui Song  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from Fangrui Song  ---
(In reply to Jan Hubicka from comment #3)

Thanks for the clarification. I misinterpreted the documentation.
Then it seems that -fno-semantic-interposition is a very safe optimization for
distributions to default to. Closing as intended.

I will try changing Clang to drop the local aliases for variables.
It is tricky not to use local aliases for address taking of functions, though.
Fortunately, this will not cause any problems once we do
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593

(In reply to H.J. Lu from comment #4)

I will much appreciate it if you want to fix some copy relocations/canonical
PLT entries
issues so that it will be more easy for distributions to switch to something
like a default
-Wl,-Bsymbolic-global-functions.
What does ld.so do for the proposed GNU_PROPERTY_SINGLE_GLOBAL_DEFINITION? Does
it apply to
STB_GLOBAL or also STB_WEAK?
Does it add all definitions to a global namespace to enforce single definition
for every candidate?
If it does the additional check, this would further slow down dynamic linking.

And I believe we should do the function oriented non-interposition-by-default
plan, which will not be blocked by copy relocation elimination.
(https://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic#copy-relocations)

[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function

2021-05-16 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593

--- Comment #2 from Fangrui Song  ---
(In reply to Alexander Monakov from comment #1)
> It is not necessary to change -fno-pic code generation to gain most of the
> -Bsymbolic benefit

It is necessary, otherwise the function address taken from the
-Bsymbolic/-Bsymbolic-functions/-Bsymbolic-global-functions shared object may
be different from the address taken from the -fno-pic code.

The ELF hack is called canonical PLT entry, similar to copy relocations.

> as you say, the most important point is to avoid jumping
> via PLT trampolines (or, with -fno-plt, GOT loads) for function calls, so
> the linker could do -Bsymbolic relaxation for sites where address doesn't
> matter (calls and jumps) while keeping a dynamic relocation for address
> loads? Under some new option of course, like -Bsymbolic-plt. Right?

There are two points: (1) R_*_JUMP_SLOT symbol lookup cost (2) whether call
sites get penalized by the PLT indirection.

-fno-pic code must use GOT (instead of an absolute relocation) for default
visibility external function access to be compatible with a
-Bsymbolic/-Bsymbolic-functions/-Bsymbolic-global-functions shared object.

[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function

2021-05-17 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593

--- Comment #4 from Fangrui Song  ---
(In reply to Alexander Monakov from comment #3)
> I understand what you're saying, but it seems we're talking past each other.
> 
> I agree that if a library is linked with any -Bsymbolic* flag, the main
> executable is at risk of broken address uniqueness unless it uses GOT
> indirection.
> 
> I am saying that if the library was linked with a more restrictive variant
> of -Bsymbolic (that I called -Bsymbolic-plt), it would still get most the
> benefit of -Bsymbolic, while remaining compatible with unmodified
> executables.
> 
> Would you agree?

You misunderstand this. Emitting GOT-generating relocation in -fno-pic mode is
the only way to avoid canonical PLT entry, if the function turns out to be
defined in a shared object. No -Bsymbolic variant can make this compatible.

Our goal is to eliminate symbol lookup for the function definition in the
shared object. We must eliminate symbolic dynamic relocations, i.e. no
JUMP_SLOT, no GLOB_DAT, no R_X86_64_64. The linker must set an address in the
shared object and bind references to that address. In many programs (not
long-running, not all code paths are exercised), the symbol lookup may cost
more than the PLT indirection, given the sheer amount of symbol lookups.

Now a -fno-pic program uses an absolute/PC-relative relocation => the linker
must set an address in the executable's address space as well. The traditional
ELF hack (st_value!=0, st_shndx=0) achieves this and let the shared object
symbol reference bind to the executable definition. Note that we have
explicitly eliminated symbol lookup for the defining shared object so the
pointer equality cannot be satisfied at all.

[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function

2021-05-17 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593

--- Comment #6 from Fangrui Song  ---
(In reply to Alexander Monakov from comment #5)
> Hm, I still don't think I'm misunderstanding what you're saying. I'm
> familiar with the ELF standard (and FWIW I have read your blog posts on
> related matters). I am responding to this sentiment from the opening comment:
> 
> > I believe ld -Bsymbolic-functions can materialize most of the savings other
> > implementations provide, without introducing complex things to ELF.
> > However, since -Bsymbolic-functions doesn't play well with -fno-pic's
> > canonical PLT entries, we should fix -fno-pic.
> 
> I am saying that fixing -fno-pic is not the only possible way forward.
> Rather, a restricted -Bsymbolic-functions that relaxes relocations that are
> not address-significant allows to still get some (but not all) of the
> benefits for unchanged -fno-pic executables.

You are right. A pure linker approach is possible. However, I think the
approach is inelegant, because the linker would have different preemptibility
ideas on
different relocation types and (as you said) indirect calls like vtable
definitions
are not optimized.

Let's say the proposed linker option for shared objects is -Bsymbolic-plt.
The discussion below focuses on default visibility definitions which would
otherwise be preemptible.

Let categorize relocation types first.

PLT-generating: R_X86_64_PLT32
GOT-generating: R_X86_64_GOTPCREL, R_X86_64_GOTPCRELX, R_X86_64_REX_GOTPCRELX
absolute (symbolic): R_X86_64_64

There are three choices.

(a) If all relocation types are PLT-generating, bind branch targets directly
and suppress the PLT entry.
If GOT-generating/absolute relocations are present, don't change behaviors.
This choice is less effective for some otherwise address-insignificant
functions, e.g. non-vague-linkage virtual functions.

b) If all relocation types are R_X86_64_PLT32 or GOT-generating, bind branch
targets directly and suppress the PLT entry.
If GOT-generating relocations are present, produce a GOT entry and an
associated R_X86_64_GLOB_DAT.
If absolute relocations are present,  don't change behaviors.

c) Always bind branch targets directly and suppress the PLT entry.
If GOT-generating relocations are present, produce a GOT entry and an
associated R_X86_64_GLOB_DAT.
If absolute relocations are present, produce outstanding dynamic relocations of
the same type.


> > You misunderstand this. Emitting GOT-generating relocation in -fno-pic mode
> > is the only way to avoid canonical PLT entry, if the function turns out to
> > be defined in a shared object. No -Bsymbolic variant can make this
> > compatible.
> 
> Well, if you frame the goal as "eliminate canonical PLT entries", then yes,
> but that in itself surely is not the end goal? The end goals are reducing
> startup time (which my idea helps only partially since it may bind direct
> calls but not e.g. vtable definitions) and runtime overheads (where again my
> proposal is weaker but not significantly so, assuming address loads are
> rarely on hot paths).

Yes, the end goal is to reduce startup time and bind call targets directly if
feasible.
Yes, -Bsymbolic-plt can help the goal partially.

> 
> To clarify once more. I am not outright rejecting the idea in your opening
> comment. I am saying that there potentially is a lighter-weight alternative,
> which may be implementable purely in the linker, and still gets most of the
> benefit you're promoting (like in your Clang example). Which is nice,
> because it can be rolled out sooner, individual libraries/distros/users can
> opt-in and experiment as they like, etc.

Such a -Bsymbolic-plt can achieve some goals.
But given that the function pointer equality problems are usually benign
(-fno-pic is relatively uncommon in many areas; making use of such pointer
equality is not a common practice),
I'd hope we just don't add that intermediate linker option.

[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function

2021-05-18 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593

--- Comment #8 from Fangrui Song  ---
Seems that -fno-plt -fno-pic does have the required properties.
A side effect is that all external calls use the   (x86-64) call
*f@GOTPCREL(%rip) (x86-32) call *f@GOT  form.

The instruction is one byte longer. (Calling a function is a common case.
Taking the address in a non-vtable case is uncommon. So I'd rather punish the
uncommon address taking).
When the linker notices that the branch target is defined in the executable, it
can optimize out the GOT to use an addr32 prefix instead.
(gold and ld.lld haven't implemented the optimization for 32-bit)

__attribute__((noplt))
int f();
void h() {}

void *g()
{
  h();   // call h
  f();   // call *f@GOTPCREL(%rip)
  return f;  // movq f@GOTPCREL(%rip), %rax
}

[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function

2021-05-26 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593

--- Comment #9 from Fangrui Song  ---
I have a patch to implement this Clang.

It'd be good to have a name even if GCC wants to postpone the implementation
for now. How about -fdirect-access-external-function &
-fno-direct-access-external-function ?  It is similar to the feature request
-fdirect-access-external-data

[Bug c/100618] Add a -fno-semantic-interposition variant which allows variable interposition

2021-06-04 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100618

Fangrui Song  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Fangrui Song  ---
Clang 13  -fno-semantic-interposition will be mostly consistent with GCC 
-fno-semantic-interposition. It looked like a misunderstand from my side.

[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function

2021-06-04 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593

--- Comment #11 from Fangrui Song  ---
(In reply to Alexander Monakov from comment #10)
> Is there something wrong or undesirable with making this under -fno-plt (or
> the noplt attribute as in your example)?
> 
> (after all, it is a kind of PLT-avoidance transformation, just for
> addressing rather than direct calling/jumping)

-fno-plt is generally undesired due to longer branch instructions and
performance lost when the branch target is defined in the exe/so when the
linker is gold/ld.lld (they cannot optimize jmp *got to jmp target)

For non-x86, -fno-plt doesn't exist at all. If implemented, there requires many
more instructions which are certainly undesirable.

So -fno-plt can never be a default.

Using GOT to take the address of an external function in -fno-pic is just a
better default. I want the behavior to become the behavior, so it should not be
under -fno-plt.

[Bug middle-end/100593] [ELF] -fno-pic: Use GOT to take address of an external default visibility function

2021-06-06 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593

--- Comment #13 from Fangrui Song  ---
(In reply to H.J. Lu from comment #12)
> We should handle it in the whole Linux software stack:
> 
> https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/8
> 
> not just in compiler.

It is great that you have the desire to fix these fundamental issues :)

I think a GNU_PROPERTY marker is over-engineering. See
https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/8 for details. Many things
(including this and PR98112) can be changed today. When
-fno-direct-access-external-data/-fno-direct-access-external-function as
-fno-pic default becomes prevailing, make ld warning by default for
R_*_COPY/canonical PLT entries. After a while (say one or two years), let glibc
ld.so warn for R_*_COPY/canonical PLT entries.

[Bug driver/100937] New: configure: Add --enable-default-semantic-interposition

2021-06-06 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100937

Bug ID: 100937
   Summary: configure: Add --enable-default-semantic-interposition
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: driver
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

Add a configure option --enable-default-semantic-interposition to customize
-f(no-)semantic-interposition default.

The suppression of interprocedural optimizations and inlining for such
default visibility non-vague-linkage function definitions is the biggest
difference between -fPIE/-fPIC.

Distributions may want to enable default -fno-semantic-interposition to
reclaim the lost performance from -fPIC (e.g. CPython is said to be 27% faster;
Clang is 3% faster).

[Bug driver/100937] configure: Add --enable-default-semantic-interposition

2021-06-06 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100937

Fangrui Song  changed:

   What|Removed |Added

 Resolution|WONTFIX |---
 Status|RESOLVED|UNCONFIRMED

--- Comment #2 from Fangrui Song  ---
How is it a portability problem?

clang -fpic has always been allowing interprocedural optimizations for
non-vague-linkage function definitions. FreeBSD uses clang and software works
with no problem.




For a vague-linkage function definition, a call site in the same
translation unit may inline the callee. Whether
-fno-semantic-interposition is enabled/disabled has no effect.

For a non-vague-linkage function definition, by default
(-fsemantic-interposition) the -fpic mode does not allow a call site
in the same translation unit to inline the callee or perform other
interprocedural optimizations.
-fno-semantic-interposition re-enables interprocedural optimizations.

If a caller inlines a callee, using LD_PRELOAD to interpose the callee
will not affect the caller. But many other LD_PRELOAD usage still
work.
We consider the small LD_PRELOAD limitation a good trade off for the speedup.

[Bug driver/100937] configure: Add --enable-default-semantic-interposition

2021-06-09 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100937

--- Comment #6 from Fangrui Song  ---
Then can you add a -fvisibility=protected variant which only applies to
non-weak defined functions?

Two issues need to be fixed:

(1): https://sourceware.org/bugzilla/show_bug.cgi?id=27973

__attribute__((visibility("protected"))) void *foo () {
  return (void *)foo;
}
% gcc -fpic -shared -fuse-ld=bfd a.s
/usr/bin/ld.bfd: /tmp/ccWPJCLw.o: relocation R_X86_64_PC32 against protected
symbol `foo' can not be used when making a shared object
/usr/bin/ld.bfd: final link failed: bad value
collect2: error: ld returned 1 exit status

(2): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593
[ELF] -fno-pic: Use GOT to take address of an external default visibility
function 


Distributions want fast C++ non-vague-linkage functions can enable this option.

[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation

2021-06-14 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #7 from Fangrui Song  ---
Some notes.

{gcc,clang} -fsanitize-coverage={trace-pc,trace-cmp} is another coverage
feature. It uses no_sanitize_coverage instead of no_instrument_function. The
GCC support for no_sanitize_coverage is very new (by Martin, in 2021-05-25).
(In Clang, the feature has more modes, e.g. you can control func/bb/edge.)

The Linux kernel use case (include/linux/compiler_types.h ) uses 'noinline' so
inlining is not a concern.

/* Section for code which can't be instrumented at all */
#define noinstr \
noinline notrace __attribute((__section__(".noinstr.text")))\
__no_kcsan __no_sanitize_address

Clang supports another filtering mechanism, -fprofile-list=
(https://reviews.llvm.org/D94820). But the kernel use case seems to prefer a
function attribute.

[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation

2021-06-15 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223

--- Comment #8 from Fangrui Song  ---
I am thinking of __attribute__((no_profile)).

In Clang,
-fprofile-generate(-fcs-profile-generate)/-fprofile-instr-generate/-fprofile-arcs
are all different. It will make sense to have a attribute disabling all such
profiling related features.

I am not sure an umbrella __attribute__((no_instrument_function)) is suitable.
The Linux kernel wanting noinstr to exclude -fprofile-* is a very specific
characteristic, not suitable for other applications.

[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation

2021-06-21 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223

--- Comment #14 from Fangrui Song  ---
(In reply to Martin Liška from comment #13)
> What's likely missing is that the attribute should prevent inlining. I'm
> going to test how it behaves right now. Then, the issue can be closed.

It's not clear to me that no_profile_instrument_function should prevent
inlining. I'll argue that attributes should be orthogonal.
https://lists.llvm.org/pipermail/llvm-dev/2021-April/150062.html
https://reviews.llvm.org/D101011#271

If the user wants to suppress inlining, add noinline.

Can a no_profile_instrument_function function be inlined to another
no_profile_instrument_function function? Why not.

Can a no_profile_instrument_function function be inlined into a function
without the attribute? This may be controversial but I'd argue that it can. GCC
no_stack_protector behaves this way. no_profile_instrument_function can mean
that user does not want profiling when the function is called with its entity,
not via another entity.

[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation

2021-06-23 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223

--- Comment #18 from Fangrui Song  ---
(In reply to Nick Desaulniers from comment #15)
> (In reply to Fangrui Song from comment #14)
> > Can a no_profile_instrument_function function be inlined into a function
> > without the attribute? This may be controversial but I'd argue that it can.
> > GCC no_stack_protector behaves this way. no_profile_instrument_function can
> > mean that user does not want profiling when the function is called with its
> > entity, not via another entity.
> 
> I respectfully but strongly disagree. It's surprising to developers when
> they ask for no stack protector, or no profiling instrumentation, then get
> one anyways.  For long call chains, it's hard for developers to diagnose on
> their own which function they called that missed such function attribute.
> 
> This reminds me of "what color is your function?"
> https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/
> As suddenly a developer would need to verify for a no_* attributed function
> that they only call no_* attributed functions, or add noinline (which is a
> big hammer to all call sites, and games with aliases that have the noinline
> attribute are kind of ridiculous).
> 
> It's less surprising to prevent inline substitution upon function attribute
> mismatch. Then a developer can self diagnose with -Rpass=inline. Either way,
> some form of diagnostics would be helpful for these kinds of issues, and has
> been requested by Android platform developers working on Zygote.
> 
> For no_stack_protector in LLVM, I implemented the rules: upon mismatch,
> prevent inline substitution unless the user specified always_inline.  This
> fixed suspend/resume bugs in x86 Linux kernels when built with LTO.
> 
> Though, I'm happy to revisit that behavior in LLVM; we could add
> 
> #define noinline_for_lto __attribute__((__noinline__))
> 
> then use that in the Linux kernel instead.

Our problem is that a boolean attribute with 1 bit information cannot express
whether a neg attribute function can be inlined into a pos attribute function.

Let's agree to disagree. I don't see why a no_profile_instrument_function
function suppress inlining into a function without the attribute. For the use
cases where users want to suppress inlining, they can add noinline. What I
worry about is that now GCC has an attitude and if the LLVM side doesn't follow
it is like diverging. However, the GCC patch is still in review. I think a
similar topic may need to be raided on llvm-dev side as I feel this is the tip
of the iceberg - more attributes can be similarly leveraged. So, how about a
llvm-dev discussion?

[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation

2021-06-23 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223

--- Comment #20 from Fangrui Song  ---
(In reply to Marco Elver from comment #19)

I am ok with "inlining suppression" as an implementation strategy and I agree
that it should be useful. What I objected strongly is "promised inlining
suppression".

For example, if an inlining pass happens after instrumentation, then the
function attribute doesn't necessarily need to suppress inlining. After
instrumentation is done, we can even treat the noprofile attribute as a no-op.

The example applies to the non-LTO case -fsanitize-coverage= . (We don't
actually use the noprofile function attribute for -fsanitize-coverage=, but I
cannot find a better example in LLVM; I think all other noprofile affected
instrumentations happen before the inliner pipeline).

So in a documentation, it can be said that the inlined copy (if any) will not
get instrumentation, but it **should not** say that a noprofile function cannot
be inlined into a function without the attribute.

[Bug gcov-profile/80223] RFE: Exclude functions from profile instrumentation

2021-06-23 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223

--- Comment #21 from Fangrui Song  ---
(In reply to Fangrui Song from comment #20)
> For example, if an inlining pass happens after instrumentation, then the
> function attribute doesn't necessarily need to suppress inlining. After
> instrumentation is done, we can even treat the noprofile attribute as a
> no-op.

Sent too early:)

Amendment: a smart inliner can inline the noprofile callee and then drop
instrumentation code. That will also be an approach which does not break the
"no instrumenting my code" contract. Other approaches can be (probably more
relevant to function specialization/clones): the instrumentation pass can leave
an un-instrumented copy which can be used by a subsequent inliner.

As we can see, all these approaches are much more complex than simply
"suppressing inlining". So I agree that "suppressing inlining" is a good
implementation detail here.

[Bug target/108622] New: x86 -fno-pic: use DW_EH_PE_indirect|DW_EH_PE_pcrel for personality/ttype encoding

2023-01-31 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108622

Bug ID: 108622
   Summary: x86 -fno-pic: use DW_EH_PE_indirect|DW_EH_PE_pcrel for
personality/ttype encoding
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

In .eh_frame and .gcc_except_table, the aarch64 and riscv ports use
DW_EH_PE_indirect|DW_EH_PE_pcrel for both -fno-pic and PIC code to avoid
canonical PLT entry/copy relocation, if the personality and typeinfo objects
are defined in a shared object (common case, libstdc++.so.6 or libc++.so.?).

AIUI there is no drawback other than a negligible size increase.

% g++ -fno-pic -no-pie -fuse-ld=bfd a.cc -o a
% readelf -Wr a | grep COPY
00403db8  00090005 R_X86_64_COPY  00403db8
_ZTIi@CXXABI_1.3 + 0
00403dc8  00080005 R_X86_64_COPY  00403dc8
_ZTIPKc@CXXABI_1.3 + 0
% readelf -W --dyn-syms a | grep __gxx_personality_v
10: 00401060 0 FUNCGLOBAL DEFAULT  UND
__gxx_personality_v0@CXXABI_1.3 (2)

% g++ -fpic -no-pie -fuse-ld=bfd a.cc -o a
% readelf -Wr a | grep COPY
% readelf -W --dyn-syms a | grep __gxx_personality_v0
 7:  0 FUNCGLOBAL DEFAULT  UND
__gxx_personality_v0@CXXABI_1.3 (2)

Essentially this applies -mno-direct-extern-access unconditionally to -fno-pic,
cleaning up gcc/config/i386/i386.cc:asm_preferred_eh_data_format

[Bug target/108622] x86 -fno-pic: use DW_EH_PE_indirect|DW_EH_PE_pcrel for personality/ttype encoding

2023-01-31 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108622

--- Comment #1 from Fangrui Song  ---
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611081.html [PATCH]
x86: Use DW_EH_PE_indirect|DW_EH_PE_pcrel encodings for -fno-pic code

[Bug c++/108761] New: Add option to produce a unique section for non-COMDAT __attribute__((section("foo"))) object

2023-02-10 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108761

Bug ID: 108761
   Summary: Add option to produce a unique section for non-COMDAT
__attribute__((section("foo"))) object
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

% cat a.cc
__attribute__((section("foo"))) void f() {}
__attribute__((section("foo"))) void g() {}
% g++ -c -ffunction-sections a.cc
% readelf -WS a.o | grep foo
  [ 4] foo   PROGBITS 40 0e 00  AX 
0   0  1

There is one section named `foo`, with f and g in it (they do not use COMDAT).
In ld --gc-sections, f and g are retained or discarded as a unit.

If we place f and g in two `foo` sections, --gc-sections can discard them
separately.
(We need assembler syntax `.section foo,"ax",@progbits,unique,1` which requires
binutils>=2.35.)


https://reviews.llvm.org/D143745 proposes to add such a feature with an option
name like `-ffunction-sections[=(default,all)]`.
I feel that the option argument is non-intuitive but do not come up with a
better name right now.
I raise this feature request to seek feedback from GCC :)

[Bug c++/108761] Add option to produce a unique section for non-COMDAT __attribute__((section("foo"))) object

2023-02-12 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108761

--- Comment #3 from Fangrui Song  ---
New syntax setting the flags will be useful. Also, currently there is no way to
customize the section type.

[Bug c/108978] New: Add __builtin_FILE_NAME() which behaves like the __FILE_NAME__ macro

2023-02-28 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108978

Bug ID: 108978
   Summary: Add __builtin_FILE_NAME() which behaves like the
__FILE_NAME__ macro
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

PR c/42579 added __FILE_NAME__. On the Clang side someone is proposing
__builtin_FILE_NAME (https://reviews.llvm.org/D144878) a la __builtin_FILE .

[Bug target/99888] Add powerpc ELFv2 support for -fpatchable-function-entry*

2022-08-11 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99888

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #5 from Fangrui Song  ---
* There is a restriction on the number of instructions between the function
label and the .localentry directive.
* For -fpatchable-function-entry=N[,M], M nops must precede the function label.

On aarch64/x86/etc, these nops are consecutive. Personally I think this
condition can be lifted for PowerPC ELFv2. The runtime library will need to
check st_other or do some instruction inspection, which may be fine.


nop
nop
nop
foo:
.LCF0:
.cfi_startproc
addis 2,12,.TOC.-.LCF0@ha
addi 2,2,.TOC.-.LCF0@l
.localentry foo,.-foo
nop
nop

[Bug driver/106897] New: driver: support -gz=zstd

2022-09-09 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106897

Bug ID: 106897
   Summary: driver: support -gz=zstd
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: driver
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

Translate -gz=std to --compress-debug-sections=zstd for as and ld. This
requires that binutils supports zstd, feature request:
https://sourceware.org/bugzilla/show_bug.cgi?id=29397

[Bug driver/106897] driver: support -gz=zstd

2022-09-10 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106897

--- Comment #4 from Fangrui Song  ---
Yes, the change will be straightforward, basically the files touched by  the
pending https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597586.html
("[PATCH] Remove legacy -gz=zlib-gnu").

I sent it because I knew that we would need a new compression format, and some
cleanup would make the logic more maintainable.

[Bug driver/93645] Support Clang 12 --ld-path=

2021-12-22 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93645

--- Comment #13 from Fangrui Song  ---
(In reply to Martin Liška from comment #12)
> (In reply to Fangrui Song from comment #11)
> > (In reply to Martin Liška from comment #10)
> > > I replied here:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573823.html
> > 
> > There are people wanting to use mold
> > https://www.reddit.com/r/rust/comments/rhcnzt/
> > mold_a_modern_linker_10_release/
> 
> I agree that's unfortunate. Note I'm having a patch that adds -fuse-ld=mold:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;
> h=759cdbb29dbe8fc80ba5c1f113a015cafe9eb69c
> 
> I can try suggesting that to the community for GCC 12 (and maybe backport
> that).
> Are you interested?

I think it may be useful to simply allow -fuse-ld=word (`word` cannot include a
separator).

If that may be troublesome, having -fuse-ld=mold in GCC 12 is still nice.

--ld-path is occasionally useful, but I can accept that GCC declines it.

> Note the linker is very interesting, but it lacks LTO support..

Right...

[Bug driver/93645] Support Clang 12 --ld-path=

2021-12-28 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93645

--- Comment #15 from Fangrui Song  ---
-- is definitely rare, but not non-existent.
In GCC, there is {-,--}specs.
In Clang, there are --cuda-path, --ptxas-path, --hip-path, --classpath, etc.
(In reply to Martin Liška from comment #14)
> > 
> > I think it may be useful to simply allow -fuse-ld=word (`word` cannot
> > include a separator).
> 
> Sure, but Jakub had some concerns:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573833.html

I do not see an objection to -fuse-ld=word.

For --ld-path, -- is definitely rare, but not non-existent.
In GCC, there are {-,--}specs --sysroot.
In Clang, there are --cuda-path, --ptxas-path, --hip-path, --classpath, etc.

-fuse-ld= users mostly care about whether another linker can build their
programs, not whether the option can bootstrap GCC. I actually think ld.lld is
quite sufficient in bootstrapping GCC but if there are edge-case extensions
which not supported, ld.lld developers may not want to the project with more
obscure options...

> > 
> > If that may be troublesome, having -fuse-ld=mold in GCC 12 is still nice.
> > 
> 
> I've just done that:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587426.html

Thanks

[Bug target/100896] --enable-initfini-array should be enabled for cross compiler to Linux

2021-11-05 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100896

Fangrui Song  changed:

   What|Removed |Added

 CC||i at maskray dot me

--- Comment #4 from Fangrui Song  ---
In gcc/acinclude.m4:285, if cross-compiling, (`if test "x${build}" =
"x${target}" && test "x${build}" = "x${host}"; then `) will not be taken.

  else
case "${target}" in
  aarch64*-linux-gnu*)
# AArch64 postdates glibc support for .init_array/.fini_array,
# so we don't need the preprocessor test above.
gcc_cv_initfini_array=yes
;;

  *)
AC_MSG_CHECKING(cross compile... guessing)
gcc_cv_initfini_array=no
;;
esac
  fi])

On non-aarch64, `gcc_cv_initfini_array=no` will run,
`HAVE_INITFINI_ARRAY_SUPPORT` will therefore be 0.

  compilers/powerpc64le-linux-gnu/gcc/gcc/auto-host.h
  1578:#define HAVE_INITFINI_ARRAY_SUPPORT 0

  compilers/powerpc64le-linux-gnu/gcc/gcc/config.status
  1257:D["HAVE_INITFINI_ARRAY_SUPPORT"]=" 0"

  compilers/powerpc64le-linux-gnu/gcc/gcc/config.log
  6900:| #define HAVE_INITFINI_ARRAY_SUPPORT 0
  7169:| #define HAVE_INITFINI_ARRAY_SUPPORT 0
  7484:| #define HAVE_INITFINI_ARRAY_SUPPORT 0
  8557:#define HAVE_INITFINI_ARRAY_SUPPORT 0

The built GCC will use the legacy .ctors

% many=/tmp/glibc-many
% cat a.c
__attribute__ ((constructor)) static int foo (void) { return 42; }
%
/tmp/glibc-many/install/compilers/powerpc64le-linux-gnu/bin/powerpc64le-glibc-linux-gnu-gcc
-c a.c && readelf -WS a.o | egrep 'ctors|init_array'
  [ 4] .ctorsPROGBITS 70 08 00  WA 
0   0  8
  [ 5] .rela.ctors   RELA 000218 18 18   I
10   4  8


---

Noticed the problem when using scripts/build-many-glibcs.py built GCC

(cd ~/Dev/glibc)
scripts/build-many-glibcs.py /tmp/glibc-many checkout --shallow
scripts/build-many-glibcs.py /tmp/glibc-many host-libraries
scripts/build-many-glibcs.py /tmp/glibc-many compilers powerpc64le-linux-gnu
--keep all

[Bug target/100896] --enable-initfini-array should be enabled for cross compiler to Linux

2021-11-05 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100896

--- Comment #5 from Fangrui Song  ---
Ah, ok, my /tmp/glibc-many/src/gcc is at releases/gcc-11 while the fix is for
12.0?

Anyway, you may want to clean up  gcc/acinclude.m4

[Bug driver/100937] configure: Add --enable-default-semantic-interposition

2021-11-22 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100937

--- Comment #11 from Fangrui Song  ---
To enable interposition on Mach-O, one needs a non-default configuration like:
ld -interposable, DYLD_FORCE_FLAT_NAMESPACE or
__attribute__((section("__DATA,__interpose"))).
On PE/COFF, such interposition just doesn't exist.

Having an option for -fno-semantic-interposition will actually improve
portability.

(The -fno-semantic-interposition thing is probably the biggest performance gap
between gcc -fpic and clang -fpic.)

As I said previously, -fvisibility=protected cannot be used because protected
visibility is very broken in the GCC/GNU ld system and there is no signal it
will be fixed anytime soon:
https://maskray.me/blog/2021-01-09-copy-relocations-canonical-plt-entries-and-protected#summary

[Bug driver/103398] New: configure: Enable --enable-default-pie by default for Linux

2021-11-23 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103398

Bug ID: 103398
   Summary: configure: Enable --enable-default-pie by default for
Linux
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: driver
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

Many Linux distros configure GCC with --enable-default-pie (at least
Arch/Debian/Fedora/Gentoo/Ubuntu). I think it makes sense to default to
--enable-default-pie for Linux.

[Bug driver/103398] configure: Enable --enable-default-pie by default for Linux

2021-11-23 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103398

--- Comment #2 from Fangrui Song  ---
I want to switch the default because:

* It seems to me that every Linux distro uses --enable-default-pie GCC. I use
"many", but it is likely "most" at this point (2021).
* When a user builds GCC on Linux, the generated GCC does not default PIE. This
almost certainly does not match the behavior of their host GCC. 

On the libc-alpha mailing list, I have seen that contributors waste time
because they don't notice that scripts/build-many-glibcs.py built GCC uses the
implicit --disable-default-pie, which has a behavior different from the host
GCC or cross compiler provided by system packages.

[Bug driver/93645] Support Clang 12 --ld-path=

2021-12-22 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93645

--- Comment #11 from Fangrui Song  ---
(In reply to Martin Liška from comment #10)
> I replied here:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573823.html

There are people wanting to use mold
https://www.reddit.com/r/rust/comments/rhcnzt/mold_a_modern_linker_10_release/

"clang does support it but gcc: --ld-path patch has been declined by GCC
maintainers, instead they advise to use a workaround: create directory
, then ln -s  /ld, and then pass -B
(-B tells GCC to look for ld in specified location)."

:(

[Bug c++/102168] New: -Wnon-virtual-dtor shouldn't fire for protected dtor in a class with a friend declaration

2021-09-01 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102168

Bug ID: 102168
   Summary: -Wnon-virtual-dtor shouldn't fire for protected dtor
in a class with a friend declaration
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

class base;

class b {
public:
  void del(base *x);
};

class base {
  friend b;
public:
  virtual void anchor();
protected:
  virtual  // why is this needed?
  ~base() = default;
};

class derived final : public base { 
public: 
  ~derived() {}
};

void b::del(base *x) {
  delete x;
}

% g++ -c -Wnon-virtual-dtor a.cc
a.cc:8:7: warning: ‘class base’ has virtual functions and accessible
non-virtual destructor [-Wnon-virtual-dtor]
8 | class base {
  |   ^~~~
a.cc:17:7: warning: base class ‘class base’ has accessible non-virtual
destructor [-Wnon-virtual-dtor]
   17 | class derived final : public base {
  |   ^~~


This diagnostic is due to a friend declaration because technically the friend
can invoke the dtor.

However, this seems a bit dumb (https://reviews.llvm.org/rG4852c770fe87)
It just checks the existence of friend, not actually checking whether the dtor
is actually used.
Checking whether the dtor is actually needed requires dataflow analysis (like
frontend devirtualization), which is apparently too heavy and may not fit into
a compiler diagnostic.
In addition, if the friend class ever uses the dtor, it'd trigger
-Wdelete-non-virtual-dtor.

Now to suppress the diagnostic, we have to add a `virtual`, wasting 2 entries
in the vtable and emitting unneeded D0/D2.

[Bug c/102502] New: C11: _Static_assert disallows const int operand in -O0 while allows it in higher -O

2021-09-27 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102502

Bug ID: 102502
   Summary: C11: _Static_assert disallows const int operand in -O0
while allows it in higher -O
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: i at maskray dot me
  Target Milestone: ---

Under some circumstances,

  const size_t allocation_size = 32768;
  _Static_assert (allocation_size >= sizeof (struct dirent64), "allocation_size
< sizeof (struct dirent64)");

-O0 and non -O0 have different behaviors whether the `const int` operand can be
used in a constant expression (-O0: `error: expression in static assertion is
not constant`).
This is different from a bug "fixed for GCC 8 by r8-4755".


git clone https://sourceware.org/git/glibc.git
cd glibc
mkdir -p out/gcc; cd out/gcc
../../configure --prefix=/tmp/glibc/gcc --disable-werror
make -j 20   # you can SIGINT after some needed files used below are generated

Comment out some lines to allow -O0 compiles:

--- i/include/libc-symbols.h
+++ w/include/libc-symbols.h
@@ -71,9 +71,9 @@
 #define _LIBC  1

 /* Some files must be compiled with optimization on.  */
-#if !defined __ASSEMBLER__ && !defined __OPTIMIZE__
-# error "glibc cannot be compiled without optimization"
-#endif
+//#if !defined __ASSEMBLER__ && !defined __OPTIMIZE__
+//# error "glibc cannot be compiled without optimization"
+//#endif

 /* -ffast-math cannot be applied to the C library, as it alters the ABI.
Some test components that use -ffast-math are currently not part of



# My source dir is at $HOME/Dev/glibc . You may need to adjust.
a=(../sysdeps/unix/sysv/linux/dl-opendir.c -std=gnu11 -fgnu89-inline -g -Wall
-Wwrite-strings -Wundef -fmerge-all-constants -frounding-math
-fno-stack-protector -fno-common -Wstrict-prototypes -Wold-style-definition
-fmath-errno -fPIC -fno-stack-protector -DSTACK_PROTECTOR_LEVEL=0 -mno-mmx
-ftls-model=initial-exec -I../include -I$HOME/Dev/glibc/out/gcc/elf
-I$HOME/Dev/glibc/out/gcc -I../sysdeps/unix/sysv/linux/x86_64/64
-I../sysdeps/unix/sysv/linux/x86_64 -I../sysdeps/unix/sysv/linux/x86/include
-I../sysdeps/unix/sysv/linux/x86 -I../sysdeps/x86/nptl
-I../sysdeps/unix/sysv/linux/wordsize-64 -I../sysdeps/x86_64/nptl
-I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux
-I../sysdeps/nptl -I../sysdeps/pthread -I../sysdeps/gnu -I../sysdeps/unix/inet
-I../sysdeps/unix/sysv -I../sysdeps/unix/x86_64 -I../sysdeps/unix
-I../sysdeps/posix -I../sysdeps/x86_64/64 -I../sysdeps/x86_64/fpu/multiarch
-I../sysdeps/x86_64/fpu -I../sysdeps/x86/fpu -I../sysdeps/x86_64/multiarch
-I../sysdeps/x86_64 -I../sysdeps/x86/include -I../sysdeps/x86
-I../sysdeps/ieee754/float128 -I../sysdeps/ieee754/ldbl-96/include
-I../sysdeps/ieee754/ldbl-96 -I../sysdeps/ieee754/dbl-64
-I../sysdeps/ieee754/flt-32 -I../sysdeps/wordsize-64 -I../sysdeps/ieee754
-I../sysdeps/generic -I.. -I../libio -I. -D_LIBC_REENTRANT -include
$HOME/Dev/glibc/out/gcc/libc-modules.h -include ../include/libc-symbols.h -DPIC
-DSHARED -DTOP_NAMESPACE=glibc -fsyntax-only)

cd $HOME/Dev/glibc/elf


% gcc-11 $=a -O2  # no diagnostic
% gcc-11 $=a -O1  # no diagnostic
% gcc-11 $=a -O0
In file included from ../include/features.h:488,
 from ../posix/sys/types.h:25,
 from ../include/sys/types.h:1,
 from ../sysdeps/unix/sysv/linux/dirstream.h:21,
 from ../include/dirent.h:3,
 from ../sysdeps/unix/sysv/linux/opendir.c:18,
 from ../sysdeps/unix/sysv/linux/dl-opendir.c:1:
../sysdeps/unix/sysv/linux/opendir.c: In function ‘__alloc_dir’:
../sysdeps/unix/sysv/linux/opendir.c:107:35: error: expression in static
assertion is not constant
  107 |   _Static_assert (allocation_size >= sizeof (struct dirent64),
  |   ^~~
../include/sys/cdefs.h:7:59: note: in definition of macro ‘_Static_assert’
7 | # define _Static_assert(expr, diagnostic) _Static_assert (expr,
diagnostic)
  |   ^~~~


gcc-8, gcc-9, and gcc-10 from Debian testing have the same behavior.

[Bug c/102502] C11: _Static_assert disallows const int operand in -O0 while allows it in higher -O

2021-09-27 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102502

--- Comment #3 from Fangrui Song  ---
OK, Andrew asked me to file it :)
I just wanted to fix glibc and run away from the GCC inconsistency.

I know that
https://www.iso-9899.info/n1570.html#6.6 p10 says
"An implementation may accept other forms of constant expressions."

Accepting `const int` in C mode is an extension, but it seems odd to be
inconsistent (-O0 and -O2 -Wpedantic reject it while -O2 allows it).


% cat reduce.i
const int __alloc_dir_allocation_size = 8;
void __alloc_dir() { _Static_assert(__alloc_dir_allocation_size, ""); }

% gcc reduce.i -c -std=c11 
reduce.i: In function ‘__alloc_dir’:
reduce.i:2:37: error: expression in static assertion is not constant
2 | void __alloc_dir() { _Static_assert(__alloc_dir_allocation_size, ""); }
  | ^~~
% gcc reduce.i -c -std=c11 -O1
% gcc reduce.i -c -std=c11 -O2
% gcc reduce.i -c -std=c11 -O2 -Wpedantic
reduce.i: In function ‘__alloc_dir’:
reduce.i:2:37: warning: expression in static assertion is not an integer
constant expression [-Wpedantic]
2 | void __alloc_dir() { _Static_assert(__alloc_dir_allocation_size, ""); }
  | ^~~


Clang just rejects it in all optimization levels.


% clang reduce.i -c -std=c11 -O0
reduce.i:2:37: error: static_assert expression is not an integral constant
expression
void __alloc_dir() { _Static_assert(__alloc_dir_allocation_size, ""); }
^~~
1 error generated.
% clang reduce.i -c -std=c11 -O1
reduce.i:2:37: error: static_assert expression is not an integral constant
expression
void __alloc_dir() { _Static_assert(__alloc_dir_allocation_size, ""); }
^~~
1 error generated.

[Bug libgcc/99759] morestack.S should support .init_array.0 besides .ctors.65535

2021-10-08 Thread i at maskray dot me via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99759

Fangrui Song  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #4 from Fangrui Song  ---
Fixed by f49e3d28be44179f07b8a06159139ce77096dda7 ("libgcc: use .init_stack for
constructors if available").

Thanks, Ian!

  1   2   >