[Bug testsuite/94036] [9 regression] gcc.target/powerpc/pr72804.c fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94036 Richard Biener changed: What|Removed |Added Target Milestone|--- |9.3
[Bug target/94037] Runtime varies 2x just by order of two int assignments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94037 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Target||x86_64-*-*, i?86-*-* Status|UNCONFIRMED |NEW Last reconfirmed||2020-03-05 Component|rtl-optimization|target Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- The only appearant difference is setge %sil movzbl %sil, %esi ... setl%sil movzbl %sil, %esi ... vs. clangs xorl%edx, %edx xorl%esi, %esi ... setle %dl setg%sil where eventually the xors are "free" and the setg/zext cause excessive latency. But it's all quite ugly and there has to be a better way to conditionally exchange two values in memory (fitting in a register) without branches (which is probably to avoid mispredicts). Note with GCC 10 you'll see the v[2] = { a, b } store "vectorized", -DFAST is still faster for me (Haswell), 7.4s vs. 10s. Fast loop body: .L12: movl(%rax), %ecx vmovd (%r11), %xmm1 cmpl%ecx, %esi setge %dl movzbl %dl, %edx vpinsrd $1, %ecx, %xmm1, %xmm0 movl%r8d, %ecx setge %dil subl%edx, %ecx vmovq %xmm0, 120(%rsp) movslq %ecx, %rcx movl120(%rsp,%rdx,4), %edx movl120(%rsp,%rcx,4), %ecx addq$4, %rax movl%ecx, -4(%rax) movl%edx, (%r11) movzbl %dil, %edx leaq(%r11,%rdx,4), %r11 cmpq24(%rsp), %rax jb .L12 slow one: .L12: movl(%rax), %esi vmovd (%r10), %xmm1 cmpl%esi, %edx vpinsrd $1, %esi, %xmm1, %xmm0 setge %dil setl%sil vmovq %xmm0, 120(%rsp) movzbl %dil, %edi movzbl %sil, %esi movl120(%rsp,%rdi,4), %edi movl120(%rsp,%rsi,4), %esi setge %cl movl%edi, (%r10) movzbl %cl, %ecx movl%esi, (%rax) addq$4, %rax leaq(%r10,%rcx,4), %r10 cmpq8(%rsp), %rax jb .L12
[Bug c++/94041] [10 Regression] temporary object destructor called before the end of the full-expression since r10-5577
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94041 Richard Biener changed: What|Removed |Added Priority|P3 |P1
[Bug bootstrap/94042] [10 Regression] Bootstrap fails on ppc-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94042 --- Comment #7 from Richard Biener --- Just to quote configury used: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib --libexecdir=/usr/lib --enable-languages=c,c++,objc,fortran,obj-c++,ada,go --disable-werror --with-gxx-include-dir=/usr/include/c++/10 --enable-ssp --disable-libssp --disable-libvtv --disable-cet --disable-libcc1 --enable-plugin --with-bugurl=https://bugs.opensuse.org/ '--with-pkgversion=SUSE Linux' --with-slibdir=/lib --with-system-zlib --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --with-gcc-major-version-only --enable-linker-build-id --enable-linux-futex --enable-gnu-indirect-function --program-suffix=-10 --without-system-libunwind --with-cpu=default32 --with-cpu-64=power4 --enable-secureplt --with-long-double-128 --build=powerpc64-suse-linux --host=powerpc64-suse-linux note it worked with fa1160f6e50500aa38162fefb43bfb10c25e0363 but now fails since at least 778a77357cad11e8dd4c810544330af0fbe843b1 so it's a recent regression.
[Bug tree-optimization/92645] Hand written vector code is 450 times slower when compiled with GCC compared to Clang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92645 --- Comment #20 from Richard Biener --- Small C testcase for one of the patterns we miss to optimize/vectorize: void foo (char * __restrict src, short * __restrict dest) { union { __int128_t i; char v[16]; } u; __builtin_memcpy (&u.i, src, 16); dest[0] = u.v[0]; dest[1] = u.v[1]; dest[2] = u.v[2]; dest[3] = u.v[3]; dest[4] = u.v[4]; dest[5] = u.v[5]; dest[6] = u.v[6]; dest[7] = u.v[7]; dest[8] = u.v[8]; dest[9] = u.v[9]; dest[10] = u.v[10]; dest[11] = u.v[11]; dest[12] = u.v[12]; dest[13] = u.v[13]; dest[14] = u.v[14]; dest[15] = u.v[15]; } presents itself as _19 = MEM <__int128 unsigned> [(char * {ref-all})src_18(D)]; _37 = (char) _19; _1 = (short int) _37; *dest_20(D) = _1; _38 = BIT_FIELD_REF <_19, 8, 8>; _2 = (short int) _38; MEM[(short int *)dest_20(D) + 2B] = _2; _39 = BIT_FIELD_REF <_19, 8, 16>; _3 = (short int) _39; MEM[(short int *)dest_20(D) + 4B] = _3; ... _16 = (short int) _52; MEM[(short int *)dest_20(D) + 30B] = _16; return; where SLP vectorization is confused about (char) _19 vs. BIT_FIELD_REF but also wouldn't handle BIT_FIELD_REFs. It neither vectorizes the store to a store from a CTOR which forwprop could then pattern-match.
[Bug c++/94044] [10 Regression] internal compiler error: in comptypes, at cp/typeck.c:1490 on riscv64-unknown-linux-gnu and arm-eabi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94044 Richard Biener changed: What|Removed |Added Target Milestone|--- |10.0 Summary|internal compiler error: in |[10 Regression] internal |comptypes, at |compiler error: in |cp/typeck.c:1490 on |comptypes, at |riscv64-unknown-linux-gnu |cp/typeck.c:1490 on |and arm-eabi|riscv64-unknown-linux-gnu ||and arm-eabi
[Bug target/94037] Runtime varies 2x just by order of two int assignments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94037 --- Comment #4 from Richard Biener --- (In reply to Uroš Bizjak from comment #3) > (In reply to Jakub Jelinek from comment #2) > > The > > setge %sil > > movzbl %sil, %esi > > to > > xorl%esi, %esi > > setge %sil > > This is quite important conversion, as the later avoids partial register > stall. Couldn't we fix this by pretending setge and friends produce SImode and always emit xor + setCC? So not rely on a peephole but emit the xor already during RTL expansion, eventually eliding it later if that's ever necessary.
[Bug target/94037] Runtime varies 2x just by order of two int assignments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94037 --- Comment #6 from Richard Biener --- (In reply to Jakub Jelinek from comment #2) > The > setge %sil > movzbl %sil, %esi > to > xorl%esi, %esi > setge %sil > transformation is something GCC does too with the > ;; Convert setcc + movzbl to xor + setcc if operands don't overlap. > peephole2s. > The reason it doesn't trigger here is that the single comparison has 3 uses > (why not 2 and something didn't CSE those?). > So, at least for the two setcc quite long after cmp case, we might need to > either not have the comparison in the peephole2 and instead walk backwards, > looking for the FLAGS_REG setter (or stopping on something that clobbers it) > and stopping on if the register we'd want to xor first is mentioned by any > insn in between. > We have at the start of peephole2: > (insn 37 36 38 5 (set (reg:CCGC 17 flags) > (compare:CCGC (reg/v:SI 1 dx [orig:86 pivot ] [86]) > (reg:SI 0 ax [orig:242 pretmp_404 ] [242]))) "pr94037.C":6:19 11 > {*cmpsi_1} > (expr_list:REG_DEAD (reg:SI 0 ax [orig:242 pretmp_404 ] [242]) > (nil))) > (note 38 37 39 5 NOTE_INSN_DELETED) > (note 39 38 1178 5 NOTE_INSN_DELETED) > (insn 1178 39 1179 5 (set (reg:QI 2 cx [288]) > (lt:QI (reg:CCGC 17 flags) > (const_int 0 [0]))) "pr94037.C":6:21 732 {*setcc_qi} > (nil)) > (insn 1179 1178 41 5 (set (reg:DI 2 cx [288]) > (zero_extend:DI (reg:QI 2 cx [288]))) "pr94037.C":6:21 115 > {zero_extendqidi2} > (nil)) > (insn 41 1179 42 5 (set (reg:SI 2 cx [289]) > (mem:SI (plus:DI (plus:DI (mult:DI (reg:DI 2 cx [288]) > (const_int 4 [0x4])) > (reg/f:DI 7 sp)) > (const_int 120 [0x78])) [1 MEM[(int[2] *)_213] S4 A32])) > "pr94037.C":6:15 67 {*movsi_internal} > (expr_list:REG_EQUIV (mem:SI (reg/v/f:DI 36 r8 [orig:283 begin ] [283]) > [1 *begin_329+0 S4 A32]) > (nil))) > (insn 42 41 45 5 (set (mem:SI (reg/v/f:DI 36 r8 [orig:283 begin ] [283]) [1 > *begin_329+0 S4 A32]) > (reg:SI 2 cx [289])) "pr94037.C":6:15 67 {*movsi_internal} > (expr_list:REG_DEAD (reg:SI 2 cx [289]) > (nil))) > (note 45 42 1180 5 NOTE_INSN_DELETED) > (insn 1180 45 1181 5 (set (reg:QI 0 ax [290]) > (ge:QI (reg:CCGC 17 flags) > (const_int 0 [0]))) "pr94037.C":15:10 732 {*setcc_qi} > (expr_list:REG_DEAD (reg:CCGC 17 flags) > (nil))) > (insn 1181 1180 47 5 (set (reg:DI 0 ax [290]) > (zero_extend:DI (reg:QI 0 ax [290]))) "pr94037.C":15:10 115 > {zero_extendqidi2} > (nil)) > and current peephole2 manages to handle the first setcc+movzbl. The second > one actually isn't doable because of the RA decisions, as the comparison > uses ax and we'd need to clear rax before the comparison. > Similarly, in partition we have 3 setcc+movzbls, but the first one has the > movzbl not adjacent to the setcc, second one would be doable if the > peephole2 pattern didn't include the FLAGS_REG setter and walked back and > the third one isn't doable because the comparison uses cx, i.e. the register > that's set by setcc and extended by movzbl. > > Now, on the GIMPLE level, we have e.g. in partition: > _2 = MEM[base: right_35, offset: 0B]; > _3 = _2 <= pivot_12; > _4 = (int) _3; > _19 = *left_31; > _7 = {_19, _2}; > MEM [(int *)&v] = _7; > _21 = v[_4]; > *left_31 = _21; > _22 = _2 > pivot_12; > _23 = (int) _22; > _24 = v[_23]; > MEM[base: right_35, offset: 0] = _24; > v ={v} {CLOBBER}; > _5 = _2 <= pivot_12 ? 4 : 0; > so I wonder why sccvn has not at least replaced the last _2 <= pivot_12 with > _3. > And, if SCCVN could be taught that if we have _3 = _2 <= pivot_12 and later > _22 = _2 > pivot_12, we can simplify the latter to _22 = 1 - _3; > 1 - VN has a hard time hee because _2 <= pivot_12 has no def and thus isn't value-numbered independently (IIRC I had some local hacks trying to fix this but thought we should fix GIMPLE isntead). VN also does not open-code the GENERIC compare which makes the special-casing even more ugly... That said, it _could_ be fixed during elimination by looking up the condition operand but then IIRC at least for VEC_COND_EXPR we don't really like CSEing the conditions even if they are available in a SSA name.
[Bug target/94037] Runtime varies 2x just by order of two int assignments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94037 --- Comment #7 from Richard Biener --- (In reply to Uroš Bizjak from comment #5) > (In reply to Richard Biener from comment #4) > > (In reply to Uroš Bizjak from comment #3) > > > (In reply to Jakub Jelinek from comment #2) > > > > The > > > > setge %sil > > > > movzbl %sil, %esi > > > > to > > > > xorl%esi, %esi > > > > setge %sil > > > > > > This is quite important conversion, as the later avoids partial register > > > stall. > > > > Couldn't we fix this by pretending setge and friends produce SImode > > and always emit xor + setCC? So not rely on a peephole but emit > > the xor already during RTL expansion, eventually eliding it later > > if that's ever necessary. > > xor clobbers flags, so they would be killed before setCC. OTOH, "mov $0, > %reg" doesn't clobber flags, but it also doesn't break partial reg > dependency. Oh, ok. That means if we want to more aggressively persue this we need sth before RA. I guess splitting it before RA would then depend on some scheduling moving the zeroing somewhere before the CC computation... Or we even pull in the actual CC computation into the early non-split pattern.
[Bug tree-optimization/94043] [9/10 Regression] ICE in superloop_at_depth, at cfgloop.c:78
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94043 Richard Biener changed: What|Removed |Added CC||rguenth at gcc dot gnu.org Target Milestone|--- |9.3
[Bug middle-end/94045] [i686] Compiler hang with -O2 -g -m32 -march=i686 -mtune=generic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94045 Richard Biener changed: What|Removed |Added Keywords||compile-time-hog Target||i?86-*-* Component|c++ |middle-end --- Comment #2 from Richard Biener --- Probably another CSElib issue.
[Bug rtl-optimization/94045] [i686] Compiler hang with -O2 -g -m32 -march=i686 -mtune=generic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94045 Richard Biener changed: What|Removed |Added Component|middle-end |rtl-optimization --- Comment #4 from Richard Biener --- Backtrace ends in var-tracking: #5811 0x00e1e893 in find_base_term (x=0x7fb52431af30) at /space/rguenther/src/gcc-work2/gcc/alias.c:2113 #5812 0x00e21e18 in true_dependence_1 (mem=0x7fb517ac51c8, mem_mode=E_SImode, mem_addr=0x7fb52431af30, x=0x7fb517ab3a50, x_addr=0x7fb52431aed0, mem_canonicalized=true) at /space/rguenther/src/gcc-work2/gcc/alias.c:3026 #5813 0x00e22092 in canon_true_dependence (mem=0x7fb517ac51c8, mem_mode=E_SImode, mem_addr=0x7fb52431af30, x=0x7fb517ab3a50, x_addr=0x7fb52431aed0) at /space/rguenther/src/gcc-work2/gcc/alias.c:3068 #5814 0x01ac0ec7 in vt_canon_true_dep (set=0x9b6cfe0, mloc=0x7fb517ac51c8, maddr=0x7fb52431af30, loc=0x7fb517ab3a50) at /space/rguenther/src/gcc-work2/gcc/var-tracking.c:2257 #5815 0x01ac0fa1 in drop_overlapping_mem_locs (slot=0xa78c598, coms=0x7ffde2e57670) at /space/rguenther/src/gcc-work2/gcc/var-tracking.c:2295 #5816 0x01adeda7 in hash_table::traverse_noresize (this=0x585c020, argument=0x7ffde2e57670) at /space/rguenther/src/gcc-work2/gcc/hash-table.h:1081 #5817 0x01add6f1 in hash_table::traverse (this=0x585c020, argument=0x7ffde2e57670) at /space/rguenther/src/gcc-work2/gcc/hash-table.h:1102 #5818 0x01ac128b in clobber_overlapping_mems (set=0x9b6cfe0, loc=0x7fb517ac51c8) at /space/rguenther/src/gcc-work2/gcc/var-tracking.c:2361 #5819 0x01ac1628 in val_bind (set=0x9b6cfe0, val=0x933e258, loc=0x7fb517ac51c8, modified=true) at /space/rguenther/src/gcc-work2/gcc/var-tracking.c:2473 #5820 0x01ac18aa in val_store (set=0x9b6cfe0, val=0x933e258, loc=0x7fb517ac51c8, insn=0x7fb518ffa0c0, modified=true) at /space/rguenther/src/gcc-work2/gcc/var-tracking.c:2534 #5821 0x01acff9f in compute_bb_dataflow (bb=) at /space/rguenther/src/gcc-work2/gcc/var-tracking.c:6940 note the actual recursions are deep but the main issue is the complexity in vt_find_locations and memory handling which makes that expensive with large loc lists.
[Bug bootstrap/94042] [10 Regression] Bootstrap fails on ppc-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94042 --- Comment #9 from Richard Biener --- You can also run big-endian kvm guests on a little-endian host.
[Bug debug/93888] Incorrect DW_AT_location generated for copy-constructed function argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93888 Richard Biener changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #6 from Richard Biener --- Fixed. Technically not a regression but it looks safe to backport if there's desire (I've pushed it to our LTS gcc 7 flavor).
[Bug target/94059] [10 Regression] m68k: Bootstrap fails configuring libiberty with 'cannot compute sizeof (long long)'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94059 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2020-03-06 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- It's odd that this is stage3-libiberty only, I'd have expected to obvious stage2 issues pop up when building stage2 target libraries. Note the build log itself isn't very useful, how the sizeof (long long) test fails might be interesting as well as sharing configury in a more direct way than looking at the build log. Bisecting also might help ... eventually related is PR94042 which had a bisection so you might try if that bisection applies to m68k as well.
[Bug libstdc++/94069] [9 Regression] doesn't compile unless PTHREAD_RWLOCK_INITIALIZER is defined
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94069 Richard Biener changed: What|Removed |Added Target Milestone|--- |9.3
[Bug tree-optimization/94071] Missed optimization with endian and alignment independent memory access
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94071 Richard Biener changed: What|Removed |Added Keywords||easyhack, ||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2020-03-06 Version|unknown |10.0 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Hmm, bswap load is supposed to detect this. Not sure what it doesn't like here - eventually the (int) promotion before the shift (wider representation than what it likely initializes the lattice from (uint16). Shouldn't be hard to fix. [local count: 1073741824]: _1 = data[addr_10(D)]; _2 = (signed short) _1; _3 = addr_10(D) + 1; _4 = data[_3]; _5 = (int) _4; _6 = _5 << 8; _7 = (signed short) _6; _8 = _2 | _7; _11 = (uint16_t) _8; return _11;
[Bug target/94072] [10 Regression] ICE: SIGSEGV due to infinite recursion in expand_expr/expand_expr_real_1 with -msve-vector-bits=512
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94072 Richard Biener changed: What|Removed |Added Target Milestone|--- |10.0
[Bug tree-optimization/94086] Missed optimization when converting a bitfield to an integer on x86-64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94086 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2020-03-09 --- Comment #2 from Richard Biener --- Confirmed. This is another task for a combined bswap/store-merging where the bswap tracking would need to be extended to cover bits. Also part of the reason for the missed optimization is that on GIMPLE we think 'half' is memory but in reality it is in a register.
[Bug tree-optimization/94094] New: [meta-bug] store-merging and/or bswap load/store-merging missed optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94094 Bug ID: 94094 Summary: [meta-bug] store-merging and/or bswap load/store-merging missed optimizations Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- Bug tracking missed cases, both passes could/should be merged.
[Bug target/94088] [10 Regression] ICE: in extract_insn, at recog.c:2294 (error: unrecognizable insn), or ICE: in elimination_costs_in_insn, at reload1.c:3538
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94088 Richard Biener changed: What|Removed |Added Target Milestone|--- |10.0
[Bug tree-optimization/94092] Code size and performance degradations after -ftree-loop-distribute-patterns was enabled at -O[2s]+
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94092 --- Comment #4 from Richard Biener --- With profile feedback we (target or middle-end) can produce specialized RTL expansion doing small copies inline and larget ones offline. The idea of GIMPLE level pattern detection is that even for small sizes the target usually knows how to expand the copy optimally while the user may have written a byte copying loop. Of course that requires targets to pay attention. Note most compiler optimization involves some heuristics and clearly heuristics can be off. I wonder if you can obtain better coremark results by using link-time optimization. Iff you're only after benchmark numbers...
[Bug target/94093] -malign-double changes alignment of double type only and not long double
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94093 Richard Biener changed: What|Removed |Added Known to fail||2.95.2, 3.2.3, 3.4.6, 4.0.0 Keywords|wrong-code |documentation --- Comment #3 from Richard Biener --- Probably. IMHO support for -malign-double should go away since it's effects on the psABI are not fully documented. Btw, clang appears to have alignof(long double) == 8, not matching GCCs behavior. GCC 4.0.0 is also "wrong", so are GCC 2.95, 3.2, 3.3 and 3.4. So I guess it works as designed and documentation should be fixed instead.
[Bug target/94093] -malign-double changes alignment of double type only and not long double
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94093 Richard Biener changed: What|Removed |Added Known to fail|2.95.2 | Known to work||2.95.2 --- Comment #4 from Richard Biener --- Pilot error, GCC 2.95 "properly" aligns long double but GCC 3.2+ do not.
[Bug target/94096] New: amdgcn build instructions missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94096 Bug ID: 94096 Summary: amdgcn build instructions missing Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- https://gcc.gnu.org/wiki/Offloading#How_to_try_offloading_enabled_GCC has documentation how to enable offloading for nvtpx, intel-mic and hsa but lacks any information on amdgcn. install.texi lacks everything. Please update.
[Bug target/94103] Wrong optimization: reading value of a variable changes its representation for optimizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94103 Richard Biener changed: What|Removed |Added Component|middle-end |target Keywords||wrong-code Status|UNCONFIRMED |NEW Last reconfirmed||2020-03-10 Target||x86_64-*-*, i?86-*-*, ||m68k-*-* CC||law at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #4 from Richard Biener --- There's a related bug about x87 stores not storing all bytes which was closed as INVALID, the testcase was using a union to pun long double and a character array IIRC. Here the situation is quite similar - the IL has (from your unused load): x.1_6 = x; _18 = MEM[(char * {ref-all})&x]; MEM[(char * {ref-all})&u] = _18; _7 = u[0]; _8 = u[1]; printf ("%016lX %016lX\n", _8, _7); where x.1_6 is of type long double and _18 is __int128. Value-numbering then decides that it can elide the load to _18 and also optimize the loads from u[] as: x.1_6 = x; _31 = VIEW_CONVERT_EXPR<__int128 unsigned>(x.1_6); MEM[(char * {ref-all})&u] = _31; _32 = (long unsigned int) _31; _33 = BIT_FIELD_REF <_31, 64, 64>; printf ("%016lX %016lX\n", _33, _32); which is because the backend tells us the FP load x.1_6 = x loads all bits and do not modify the underlying representation. Now, for x87 modes GET_MODE_SIZE isn't in agreement with what the actual instruction does nor does the load reflect the fact that a x87 load (not the long double variant) can end up doing a rounding step. m68k is probably similarly affected. And yes, compared to the other bug I was able to close as INVALID conveniently this one looks "real" (if also artificially constructed and unfortunate...). Jeff, you also were involved in the other bug, do you agree here? I still don't see any good solution though. I agree that the decimal float variant is an entirely different bug, maybe you can open a new one for this?
[Bug c/94106] [8/9/10 Regression] error on a function redeclaration with attribute transaction_safe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94106 Richard Biener changed: What|Removed |Added Target Milestone|--- |8.5
[Bug middle-end/94111] Wrong constant folding: decimal floating-point infinity casted to double -> zero
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94111 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW CC||rguenth at gcc dot gnu.org Keywords||wrong-code Last reconfirmed||2020-03-10 Ever confirmed|0 |1 Summary|Wrong optimization: decimal |Wrong constant folding: |floating-point infinity |decimal floating-point |casted to double -> zero|infinity casted to double ||-> zero --- Comment #1 from Richard Biener --- This goes wrong somewhere in constant folding: d.1_2 = d_13; _3 = (double) d.1_2; -> d.1_2 = Inf; _3 = 0.0; so (double) Inf is computed wrong.
[Bug tree-optimization/94114] [8/9/10 Regression] ICE in gimplify_modify_expr, at gimplify.c:5936
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94114 Richard Biener changed: What|Removed |Added Priority|P3 |P2
[Bug middle-end/94122] Wrong optimization: reading value of a decimal FP variable changes its representation for optimizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94122 Richard Biener changed: What|Removed |Added Last reconfirmed||2020-03-10 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- This works for me with GCC 9. On trunk this is wrong FRE: Value numbering stmt = _3 = MEM[(unsigned char *)&x]; Setting value number of _3 to 127 (changed) Value numbering stmt = _4 = _3 + 1; Match-and-simplified _3 + 1 to 128 RHS _3 + 1 simplified to 128 Setting value number of _4 to 128 (changed) Value numbering stmt = MEM[(unsigned char *)&x] = _4; No store match Value numbering store MEM[(unsigned char *)&x] to 128 Setting value number of .MEM_12 to .MEM_12 (changed) Value numbering stmt = x.2_5 = x; Successfully combined 2 partial definitions Setting value number of x.2_5 to 0 (changed) Value numbering stmt = _13 = MEM [(char * {ref-all})&x]; Setting value number of _13 to 847249408 (changed) possibly caused by real_{to,from}_target doing normalization during encoding/decoding? (that would IMHO be wrong?)
[Bug target/94123] [10 regression] r10-7093 causes gcc.target/powerpc/pr87507.c to fail
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94123 Richard Biener changed: What|Removed |Added Target Milestone|--- |10.0
[Bug tree-optimization/94125] [9/10 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94125 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED Priority|P1 |P2 Known to work||8.3.0 --- Comment #3 from Richard Biener --- I'll have a look. Note the bisection point is a correctness fix only possibly resulting in less optimization.
[Bug fortran/94129] Using almost any openacc !$acc directive causes ICE "compressed stream: data error"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94129 Richard Biener changed: What|Removed |Added Keywords||openacc CC||doko at gcc dot gnu.org, ||marxin at gcc dot gnu.org --- Comment #1 from Richard Biener --- You should report this to Ubuntu, I suspect a build problem there, eventually enabling zstd compression for gfortran-10 but zlib compression for gcc-10-offload-nvptx (I'm not sure we inter-operate here - Martin?)
[Bug tree-optimization/94130] [8/9/10 Regression] Unintended result with optimization option when assigning two structures, memset and 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94130 Richard Biener changed: What|Removed |Added CC||law at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Status|UNCONFIRMED |NEW Target Milestone|--- |8.5 Last reconfirmed||2020-03-11 Known to work||6.5.0 Ever confirmed|0 |1 Priority|P3 |P2 Summary|Unintended result with |[8/9/10 Regression] |optimization option when|Unintended result with |assigning two structures, |optimization option when |memset and 0|assigning two structures, ||memset and 0 Known to fail||7.5.0, 8.3.0, 9.2.0 --- Comment #1 from Richard Biener --- Confirmed. DSE prunes the memset from the wrong side it seems. Jeff? Breaks with -O -fno-inline already, fixed with -fno-tree-dse.
[Bug tree-optimization/94130] [8/9/10 Regression] Unintended result with optimization option when assigning two structures, memset and 0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94130 --- Comment #2 from Richard Biener --- Doing memset(&s_myvalue, 0, sizeof(s_myvalue)); s_myreq.m_data = &s_myvalue; also works around the issue. Odd.
[Bug tree-optimization/94131] [10 Regression] ICE: tree check: expected integer_cst, have plus_expr in get_len, at tree.h:5927
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94131 Richard Biener changed: What|Removed |Added Target Milestone|--- |10.0 Priority|P3 |P1 CC||msebor at gcc dot gnu.org
[Bug c++/94132] Valid usage of flexible array member failing to compile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94132 Richard Biener changed: What|Removed |Added CC||msebor at gcc dot gnu.org --- Comment #3 from Richard Biener --- I guess at _least_ [[no_unique_address]] would be needed but then who knows how exactly previous behavior was in this regard? Martin, you tightened the checks (I agree to those), any input?
[Bug target/94136] GCC doc for built-in function __builtin___clear_cache() not 100% correct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94136 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Keywords||documentation Component|other |target Target||riscv Last reconfirmed||2020-03-11 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Confirmed.
[Bug tree-optimization/94125] [9/10 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94125 Richard Biener changed: What|Removed |Added CC||amker at gcc dot gnu.org --- Comment #4 from Richard Biener --- Hm, it computes a dependence distance of two and in the end sorts partitions in the wrong order from a bogus partition dependence edge. The odd thing is that for for (int c = 0; c <= 2; c++) { b = f; *g = k[c + 3]; k[c + 1] = 0; } we compute a distance of minus two (two + DDR_REVERSED_P) but in both cases we need to use the same partition ordering, memset after the partition containing the k[c+3] load but the partition dependence code from the DDRs appearant different direction handles both cases differently. Something is missing here. Not sure what - Bin, any idea?
[Bug fortran/94129] Using almost any openacc !$acc directive causes ICE "compressed stream: data error"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94129 --- Comment #4 from Richard Biener --- So I tried it with the SUSE GCC 10 packages and it works fine (I've double-checked nvptx is offloaded). But my packages are only configured for zlib ... (I'm testing on Leap 15.1 which doesn't have zstd I think).
[Bug fortran/94129] Using almost any openacc !$acc directive causes ICE "compressed stream: data error"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94129 --- Comment #5 from Richard Biener --- Btw, your backtrace ends up in lto_uncompression_zlib but Matthias shows the Ubuntu packages have zstd enabled. I'd have expected only zstd compressed sections there. Matthias, can you reproduce the issue?
[Bug libfortran/94143] [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94143 Richard Biener changed: What|Removed |Added Priority|P3 |P4 Target Milestone|--- |9.3
[Bug target/94145] Longcalls mis-optimize loading the function address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94145 --- Comment #5 from Richard Biener --- So what prevents GIMPLE from doing the transform to an indirect call and hoisting the call address computation out of the loop? I fear your volatile marking is papering over an entirely different issue. Of course it will likely work as a workaround since nobody is going to do that above mentioned dance. Maybe code like void foo(); void bar() { void (volatile fn*)() = foo; void (fn2 *)() = fn; for (int i = 0; i<1; ++i) fn2(); } will expose the same "issue" whatever it really is?
[Bug middle-end/94146] [10 Regression] Merging functions with same bodies stopped working
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94146 --- Comment #4 from Richard Biener --- (In reply to Jakub Jelinek from comment #3) > If not already marked clearly as an ICF created thunk, I'd say it should be > and then inliner should take it into account (and only inline if the > function became very small or not at all). It looks like ICF really creates a forwarding call: ternary2 (int i) { int retval.4; [local count: 1073741824]: retval.4_3 = ternary (i_2(D)); [tail call] return retval.4_3; } so IMHO for small functions the inlining is good (but why don't we create an alias or an alternate entry symbol instead of a full (aligned) function?) For big functions the inlining shouldn't happen indeed, possibly by detecting this kind of forwarders? So we're missing a testcase showing the regression IMHO. It still works with -Os for the testcase.
[Bug rtl-optimization/94148] The DF framework uses bb->aux, which is for passes only
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94148 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2020-03-12 --- Comment #1 from Richard Biener --- In fact it even assumes it is cleared on entry!
[Bug lto/94150] Improve LTO diagnosis for LTO triggered warnings/error: print source.o or source.a(lib.o) when printing location
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94150 Richard Biener changed: What|Removed |Added Keywords||diagnostic Last reconfirmed||2020-03-12 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #1 from Richard Biener --- Confirmed. Probably easy for the special LTO aware diagnostics but harder in general since there's no concept of a translation unit in our locations and locations are also used for debug info generation.
[Bug c++/94152] New: Mistyped destructor name diagnostic suboptimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94152 Bug ID: 94152 Summary: Mistyped destructor name diagnostic suboptimal Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- struct _X { ~_X(); }; _X::~X () {} produces t.C:2:8: error: expected class-name before ‘(’ token _X::~X () {} ^ where clang says t.C:2:6: error: expected the class name after '~' to name a destructor _X::~X () {} ^ _X which is clearly better.
[Bug target/94145] Longcalls mis-optimize loading the function address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94145 Richard Biener changed: What|Removed |Added Keywords||missed-optimization --- Comment #7 from Richard Biener --- Ah, I read "mis-optimize" as produce wrong-code ... OTOH CSEing the load from the PLT once it is resolved _would_ be an optimization. Asks for loop peeling I guess?
[Bug target/94103] Wrong optimization: reading value of a variable changes its representation for optimizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94103 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED Known to work||10.0 --- Comment #7 from Richard Biener --- commit 1dc00a8ec9aeba86b74b16bff6f171824bb7b4a1 (HEAD -> master, origin/master, origin/HEAD) Author: Richard Biener Date: Thu Mar 12 14:18:35 2020 +0100 tree-optimization/94103 avoid CSE of loads with padding VN currently replaces a load of a 16 byte entity 128 bits of precision (TImode) with the result of a load of a 16 byte entity with 80 bits of mode precision (XFmode). That will go downhill since if the padding bits are not actually filled with memory contents those bits are missing. 2020-03-12 Richard Biener PR tree-optimization/94103 * tree-ssa-sccvn.c (visit_reference_op_load): Avoid type punning when the mode precision is not sufficient. * gcc.target/i386/pr94103.c: New testcase.
[Bug target/94103] Wrong optimization: reading value of a variable changes its representation for optimizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94103 Richard Biener changed: What|Removed |Added Known to fail||9.3.0 --- Comment #8 from Richard Biener --- The old VN doesn't value-number unused defs so the actual testcase doesn't fail there. Still broken with GCC 9 though. Adjusted testcases that actually use the FP load would trigger with older compilers as well.
[Bug target/94145] Longcalls mis-optimize loading the function address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94145 --- Comment #9 from Richard Biener --- (In reply to Alan Modra from comment #8) > (In reply to Richard Biener from comment #7) > > OTOH CSEing the load from the PLT once it is resolved _would_ be an > > optimization. > > Possibly. Sometimes making the call sequence seem more efficient runs into > stalls particularly when the called function is short. > > > Asks for loop peeling I guess? > > Yeah, that might be one way to get the first call of a function inside a > loop over and done with. And so you'd know the PLT entry was resolved and > thus no longer volatile. I suppose there's no (portable) way to "speculate" the call, thus _just_ eventually resolve the PLT? That way we could do such hoisted PLT loads as load PTL + speculate call load PTL or rather always do resolve-and-load-PLT since the times we want to lazily load the PLT with resolving later are scarce?
[Bug tree-optimization/94163] [8/9 Regression] ICE in set_ptr_info_alignment with -O2 and __builtin_assume_aligned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94163 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Priority|P3 |P2 Ever confirmed|0 |1 Last reconfirmed||2020-03-13 --- Comment #2 from Richard Biener --- Likely - mine.
[Bug tree-optimization/94163] [8/9 Regression] ICE in set_ptr_info_alignment with -O2 and __builtin_assume_aligned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94163 --- Comment #3 from Richard Biener --- From /* There's no CCP pass after PRE which would re-compute alignment information so make sure we re-materialize this here. */ if (gimple_call_builtin_p (call, BUILT_IN_ASSUME_ALIGNED) && args.length () - 2 <= 1 && tree_fits_uhwi_p (args[1]) && (args.length () != 3 || tree_fits_uhwi_p (args[2]))) { unsigned HOST_WIDE_INT halign = tree_to_uhwi (args[1]); unsigned HOST_WIDE_INT hmisalign = args.length () == 3 ? tree_to_uhwi (args[2]) : 0; if ((halign & (halign - 1)) == 0 && (hmisalign & ~(halign - 1)) == 0) set_ptr_info_alignment (get_ptr_info (forcedname), halign, hmisalign); } where set_ptr_info_alignment ICEs for align == 0. set_ptr_info_alignment takes unsigned int args but the above computes HWI quantities that get truncated here.
[Bug tree-optimization/94163] [8/9 Regression] ICE in set_ptr_info_alignment with -O2 and __builtin_assume_aligned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94163 --- Comment #6 from Richard Biener --- The patch in testing does the same as CCP. I agree that we possibly want saturation behavior but that can be done separately for GCC 11.
[Bug libstdc++/94164] [10 Regression] std::unintialized_fill_n fails to compile
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94164 Richard Biener changed: What|Removed |Added Target Milestone|--- |10.0 Summary|[Regression 10] |[10 Regression] |std::unintialized_fill_n|std::unintialized_fill_n |fails to compile|fails to compile
[Bug tree-optimization/94163] [8/9 Regression] ICE in set_ptr_info_alignment with -O2 and __builtin_assume_aligned
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94163 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #7 from Richard Biener --- Fixed everywhere.
[Bug debug/92468] gcc generates wrong debug information at -O2 and -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92468 Richard Biener changed: What|Removed |Added Last reconfirmed||2020-03-16 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #2 from Richard Biener --- Confirmed. THe issue is there isn't any real instruction at line 20 since the inner loop is unrolled completely and the if (g) stmt is elided as false. The f loop is also unrolled completely and debug stmts and breakpoints align with the case where k is one.
[Bug c/94179] [10 Regression] ICE in gimplify_expr, at gimplify.c:14399 since r10-7127-gcb99630f254aaec6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94179 --- Comment #3 from Richard Biener --- (In reply to Jakub Jelinek from comment #2) > Created attachment 48036 [details] > gcc10-pr94179.patch > > Untested fix. > And/or we could limit the match.pd optimization to GIMPLE only, as at least > the C FE doesn't seem to be prepared to handle MEM_REFs and it is unclear if > this is the only spot that needs fixing. Doing it GIMPLE only may need additional "fixes" when people disable forwprop though (didn't check). IMHO the FEs better deal with all GENERIC if they call into middle-end routines. But I'm not against doing this and I don't think we need to care about people doing -fno-tree-forwprop, it's just you brought it up ... ;)
[Bug middle-end/68785] [6 Regression] valgrind reports issues with folding on x86_64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68785 --- Comment #15 from Richard Biener --- (In reply to David Binderman from comment #14) > Interestingly, I ran the code in comment 8 through a valgrind version > of recent gcc trunk, with the compiler flag -O2, and got this: > > ./gcc.dg/pr68785.c > ==49861== Invalid read of size 1 > ==49861==at 0xD9CDDD: count_nonzero_bytes(tree_node*, unsigned long, > unsigned long, unsigned int*, bool*, bool*, bool*, vr_values const*, > ssa_name_limit_t&) (tree-ssa-strlen.c:4891) > ==49861==by 0xD9CF17: count_nonzero_bytes(tree_node*, unsigned long, > unsigned long, unsigned int*, bool*, bool*, bool*, vr_values const*, > ssa_name_limit_t&) (tree-ssa-strlen.c:4801) > ==49861==by 0xDA19EE: count_nonzero_bytes (tree-ssa-strlen.c:4920) > ==49861==by 0xDA19EE: handle_integral_assign(gimple_stmt_iterator*, > bool*, vr_values const*) (tree-ssa-strlen.c:5547) Can you file a new bug please?
[Bug tree-optimization/94125] [9 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94125 Richard Biener changed: What|Removed |Added Summary|[9/10 Regression] wrong |[9 Regression] wrong code |code at -O3 on |at -O3 on x86_64-linux-gnu |x86_64-linux-gnu| Known to fail||9.3.0 Known to work||10.0 --- Comment #10 from Richard Biener --- Thanks Bin, fixed on trunk sofar.
[Bug target/94185] [10 Regression] crashes with "error: unable to generate reloads for {*zero_extendsidi2} internal compiler error: in curr_insn_transform, at lra-constraints.c:4006
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94185 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2020-03-16 Target Milestone|--- |10.0 Summary|gcc-10 crashes with "error: |[10 Regression] crashes |unable to generate reloads |with "error: unable to |for {*zero_extendsidi2} |generate reloads for |internal compiler error: in |{*zero_extendsidi2} |curr_insn_transform, at |internal compiler error: in |lra-constraints.c:4006 |curr_insn_transform, at ||lra-constraints.c:4006 Priority|P3 |P1 Keywords||ra Known to work||9.3.0 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Must be a recent regression.
[Bug c/94188] [10 Regression] error: request for member ‘node’ in something not a structure or union since r10-7127-gcb99630f254aaec6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94188 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #1 from Richard Biener --- I have a patch.
[Bug sanitizer/94191] ubsan bootstrap memory hog with -enable-checking=rtl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94191 Richard Biener changed: What|Removed |Added Keywords||memory-hog --- Comment #2 from Richard Biener --- I suggest to use -O1, that should help.
[Bug c/94188] [10 Regression] error: request for member ‘node’ in something not a structure or union since r10-7127-gcb99630f254aaec6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94188 --- Comment #3 from Richard Biener --- Created attachment 48043 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48043&action=edit patch in testing This is what I have now, bootstrapped OK after the extra two hunks but I still see ICEs during testing. Still fixing build_fold_addr_expr_with_type looks inevitable ... (other option would be to remove the optimization when the type doesn't match, but I guess that will regress as well).
[Bug middle-end/51663] Desirable/undesirable elimination of unused variables & functions at -O0, -O0 -flto and -O0 -fwhole-program
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51663 --- Comment #12 from Richard Biener --- (In reply to Tom de Vries from comment #11) > Cross-referencing PR gdb/25684 - "gdb testing with gcc -flto" ( > https://sourceware.org/bugzilla/show_bug.cgi?id=25684 ). > > Ideally there would be a way to enable the lto infrastructure without > actually optimizing, such that when running the gdb testsuite with and > without flto and comparing results, any regression would indicate something > that needs fixing. > > In the current situation, each individual regression needs investigation > whether something needs fixing or whether the failure is just an > optimization artifact. And due to the fact there are optimizations, there > are thousands of such regressions. I suppose we're talking about -O0 -flto here. What kind of transforms are undesirable? I think at -O0 you'll get - more aggressive unused variable/function removal - promotion of variables from global to local some of the transforms are unavoidable due to partitioning(?) but we could default to 1:1 partitioning at -O0 ...
[Bug middle-end/51663] Desirable/undesirable elimination of unused variables & functions at -O0, -O0 -flto and -O0 -fwhole-program
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51663 --- Comment #14 from Richard Biener --- (In reply to Tom de Vries from comment #13) > (In reply to Richard Biener from comment #12) > > (In reply to Tom de Vries from comment #11) > > > Cross-referencing PR gdb/25684 - "gdb testing with gcc -flto" ( > > > https://sourceware.org/bugzilla/show_bug.cgi?id=25684 ). > > > > > > Ideally there would be a way to enable the lto infrastructure without > > > actually optimizing, such that when running the gdb testsuite with and > > > without flto and comparing results, any regression would indicate > > > something > > > that needs fixing. > > > > > > In the current situation, each individual regression needs investigation > > > whether something needs fixing or whether the failure is just an > > > optimization artifact. And due to the fact there are optimizations, there > > > are thousands of such regressions. > > > > I suppose we're talking about -O0 -flto here. > > Right, and ideally -flto plain, with -O0 implicit. > > > What kind of transforms > > are undesirable? I think at -O0 you'll get > > > > - more aggressive unused variable/function removal > > - promotion of variables from global to local > > > > Right, is there a way to switch these off? Not at the moment I think. The main unused variable/function removal is in cgraphunit.c:analyze_functions. I guess the simplest thing would be to try diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index a9dd288be57..3aa8137efad 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -1158,7 +1158,7 @@ analyze_functions (bool first_time) { /* Convert COMDAT group designators to IDENTIFIER_NODEs. */ node->get_comdat_group_id (); - if (node->needed_p ()) + if (!optimize || node->needed_p ()) { enqueue_node (node); if (!changed && symtab->dump_file) but I'm not sure this is enough since we remove not refered to things at several points during the compilation (including during partitioning I think). Another possibility would be to make all nodes force_output when not optimizing like with diff --git a/gcc/cgraph.h b/gcc/cgraph.h index aa4cdc95506..b07bf9745d0 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -115,7 +115,7 @@ public: transparent_alias (false), weakref (false), cpp_implicit_alias (false), symver (false), analyzed (false), writeonly (false), refuse_visibility_changes (false), externally_visible (false), - no_reorder (false), force_output (false), forced_by_abi (false), + no_reorder (false), force_output (optimize), forced_by_abi (false), unique_name (false), implicit_section (false), body_removed (false), used_from_other_partition (false), in_other_partition (false), address_taken (false), in_init_priority_hash (false), or in a more suitable place. That should eventually also avoid promotion to local. > > some of the transforms are unavoidable due to partitioning(?) but we could > > default to 1:1 partitioning at -O0 ... > > At this point I'm not interested in defaults yet. I can achieve 1:1 > partition by testing target board unix/-flto/-flto-partition=1to1. > > For now I'm interested in a combination of flags that exercises the specific > type of debug info generation as is done for lto, without actually doing any > optimizations. > > F.i., an open question for me is the following: I'm now using > -flto-partition=none for testing, but maybe 1to1 should yield better results?
[Bug middle-end/51663] Desirable/undesirable elimination of unused variables & functions at -O0, -O0 -flto and -O0 -fwhole-program
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51663 --- Comment #15 from Richard Biener --- (In reply to Tom de Vries from comment #13) > F.i., an open question for me is the following: I'm now using > -flto-partition=none for testing, but maybe 1to1 should yield better results? I guess it better mimics how the testcases are set up, but I think the differences will be due to the other issues so the exact partitioning shouldn't matter (you likely always get a single partition and thus behavior equal to -flto-partition=none or -flto-partition=one by default). -flto-partition=none will be faster because it elides the LTRANS IL streaming.
[Bug middle-end/94206] Wrong optimization: memset of n-bit integer types (from bit-fields) is truncated to n bits (instead of sizeof)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94206 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2020-03-18 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED Known to fail||4.8.5 --- Comment #2 from Richard Biener --- Broken since long via memset folding: __MEM ((void *)&x) = _Literal (uint33) 8589934591;
[Bug c/94188] [10 Regression] error: request for member ‘node’ in something not a structure or union since r10-7127-gcb99630f254aaec6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94188 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #6 from Richard Biener --- Fixed.
[Bug tree-optimization/94211] [9/10 Regression] -fcompare-debug failure on phi-opt-13.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94211 Richard Biener changed: What|Removed |Added Known to fail||9.3.0 Target Milestone|--- |9.4 Known to work||7.4.0
[Bug middle-end/94206] Wrong optimization: memset of n-bit integer types (from bit-fields) is truncated to n bits (instead of sizeof)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94206 Richard Biener changed: What|Removed |Added Keywords||wrong-code Known to work||10.0 --- Comment #4 from Richard Biener --- Fixed on trunk sofar.
[Bug tree-optimization/94212] [8/9/10 Regression] Incorrect vectorization of loop with FP calculations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94212 Richard Biener changed: What|Removed |Added Known to work||7.4.0 Summary|Incorrect vectorization of |[8/9/10 Regression] |loop with FP calculations |Incorrect vectorization of ||loop with FP calculations Component|middle-end |tree-optimization Version|tree-ssa|10.0 Target Milestone|--- |8.5 CC||rguenth at gcc dot gnu.org, ||rsandifo at gcc dot gnu.org Target||aarch64 --- Comment #5 from Richard Biener --- The vectorizer vectorizes the reduction in-order but appearantly sth goes wrong there.
[Bug tree-optimization/94212] [8/9/10 Regression] Incorrect vectorization of loop with FP calculations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94212 --- Comment #7 from Richard Biener --- (In reply to Dmitrij Pochepko from comment #6) > Just checked: non-vectorized assembly for aarch64 (O2) is using fmadd and > fmsub intensively. Try with -ffp-contract=off then. Note due to effective unrolling of the loop with vectorization we might end up forming "different" fmadd groups. So you might also want to check whether the vectorized loop still sees fmadd use.
[Bug ipa/94217] [10 Regression] ICE in ipa_find_agg_cst_for_param, at ipa-prop.c:3467 since r10-7237-g4e3d3e40726e1b68
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94217 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #6 from Richard Biener --- Mine.
[Bug tree-optimization/94216] [10 Regression] ICE in maybe_canonicalize_mem_ref_addr, at gimple-fold.c:4899 since r10-7237-g4e3d3e40726e1b68bf52fa205c68495124ea60b8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94216 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #4 from Richard Biener --- Mine.
[Bug tree-optimization/94216] [10 Regression] ICE in maybe_canonicalize_mem_ref_addr, at gimple-fold.c:4899 since r10-7237-g4e3d3e40726e1b68bf52fa205c68495124ea60b8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94216 --- Comment #5 from Richard Biener --- (In reply to Jakub Jelinek from comment #1) > I wonder if we shouldn't do: > --- gcc/fold-const.c.jj 2020-03-18 12:47:36.0 +0100 > +++ gcc/fold-const.c 2020-03-18 17:34:14.586455801 +0100 > @@ -82,6 +82,7 @@ along with GCC; see the file COPYING3. > #include "attribs.h" > #include "tree-vector-builder.h" > #include "vec-perm-indices.h" > +#include "tree-ssa.h" > > /* Nonzero if we are folding constants inside an initializer; zero > otherwise. */ > @@ -10262,6 +10263,10 @@ fold_binary_loc (location_t loc, enum tr >switch (code) > { > case MEM_REF: > + STRIP_USELESS_TYPE_CONVERSION (arg0); We already applied STRIP_NOPS to arg0 > + if (arg0 != op0) > + return fold_build2 (MEM_REF, type, arg0, op1); > + >/* MEM[&MEM[p, CST1], CST2] -> MEM[p, CST1 + CST2]. */ >if (TREE_CODE (arg0) == ADDR_EXPR > && TREE_CODE (TREE_OPERAND (arg0, 0)) == MEM_REF) > to catch all similar issues. Otherwise, we'd need to strip the useless type > conversion at least in the case which triggers this: > return fold_build2 (MEM_REF, type, > build_fold_addr_expr (base), > int_const_binop (PLUS_EXPR, arg1, >size_int (coffset))); > a few lines below this, where build_fold_addr_expr now returns a NOP_EXPR > that we really want to strip again, even when op0 wasn't a NOP_EXPR. True. But note there could be a not useless type conversion here, for example for MEM [&a] and void *a for example. Here I think the better fix is (again) to use build1 and then in case the base was a MEM_REF recurse to the preceeding pattern. I'm testing such a patch.
[Bug ipa/94217] [10 Regression] ICE in ipa_find_agg_cst_for_param, at ipa-prop.c:3467 since r10-7237-g4e3d3e40726e1b68
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94217 --- Comment #7 from Richard Biener --- Testing patch.
[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264 --- Comment #15 from Richard Biener --- WRF initial_config has very very very many (nested) loops to initialize globals. IIRC there's a related bug running into the very same issue when prefetching is enabled.
[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264 --- Comment #17 from Richard Biener --- Created attachment 48061 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48061&action=edit cache base term I wonder if we could simply cache the base terms in elt_loc_list? Does that make a difference?
[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264 --- Comment #18 from Richard Biener --- Note also that param_max_find_base_term_values limits recursion depth but not width (the loc list traversals). The original visited_vals thing was to prevent infinite recursion only. If the global caching works a safer approach would be to turn that visited_vals things into a local cache and see if that's enough as well.
[Bug tree-optimization/94216] [10 Regression] ICE in maybe_canonicalize_mem_ref_addr, at gimple-fold.c:4899 since r10-7237-g4e3d3e40726e1b68bf52fa205c68495124ea60b8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94216 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #7 from Richard Biener --- Fixed.
[Bug ipa/94217] [10 Regression] ICE in ipa_find_agg_cst_for_param, at ipa-prop.c:3467 since r10-7237-g4e3d3e40726e1b68
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94217 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #9 from Richard Biener --- Fixed.
[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264 --- Comment #21 from Richard Biener --- (In reply to Jakub Jelinek from comment #19) > I think caching is problematic, for a couple of reasons: > 1) for non-cselib_preserved_value_p, the loc list is dynamic and keeps > changing, locs are added and removed as we go through the basic blocks Sure, but if you look at my patch I'm caching on individual locs, not on the whole list. That then still doesn't avoid traversing the whole list if we don't find a base but we'd at least elide further recursion. The base we pick also depends on the ordering of the list currently if there ever are two possible choices. > 2) because of the recursion prevention, doesn't it matter from exactly what > VALUE we start walking (in case we have a cycle or cycles)? I think the outcome is essentially random anyways since we pick the first base we find. I'm quite sure that if we collected "all" bases we'd find they are not equivalent in some cases. > 3) plus the new param on visited_vals, if we reach it, caching is unreliable Sure, but does anybody care? Note what I'd really propose would be find_base_term-local caching, eliding both the recursion prevention and the param.
[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264 --- Comment #22 from Richard Biener --- Created attachment 48063 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48063&action=edit more localized caching Like this. Martin, can you also check the effect on this one?
[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264 --- Comment #23 from Richard Biener --- (In reply to Richard Biener from comment #22) > Created attachment 48063 [details] > more localized caching > > Like this. Martin, can you also check the effect on this one? We can actually simplify since we won't ever use non-NULL cached vals which also means this patch is a no-op and should be worse than the existing state :/
[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264 Richard Biener changed: What|Removed |Added Attachment #48063|0 |1 is obsolete|| --- Comment #25 from Richard Biener --- Created attachment 48064 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48064&action=edit more localized caching Updated and simplified patch. Maybe it does help depending on how we have shared locs for multiple values ... The update is to not bother updating cache values with non-NULL found ones.
[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264 --- Comment #26 from Richard Biener --- (In reply to Martin Liška from comment #24) > (In reply to Richard Biener from comment #23) > > (In reply to Richard Biener from comment #22) > > > Created attachment 48063 [details] > > > more localized caching > > > > > > Like this. Martin, can you also check the effect on this one? > > > > We can actually simplify since we won't ever use non-NULL cached vals which > > also means this patch is a no-op and should be worse than the existing > > state :/ > > So no testing is needed? would be still interesting (but surprising) if it helps (doesn't matter whether the original or the updated patch, the update is just constant time improvement)
[Bug fortran/94221] Explicit assignment in type is ignored in some cases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94221 Richard Biener changed: What|Removed |Added Known to fail||10.0 Status|UNCONFIRMED |NEW Last reconfirmed||2020-03-19 Ever confirmed|0 |1 Keywords||wrong-code --- Comment #1 from Richard Biener --- test2 lacks a return stmt and initialization of the return value: test2 () { { struct __st_parameter_dt dt_parm.0; dt_parm.0.common.filename = &"t.f90"[1]{lb: 1 sz: 1}; dt_parm.0.common.line = 15; dt_parm.0.common.flags = 128; dt_parm.0.common.unit = 6; _gfortran_st_write (&dt_parm.0); _gfortran_transfer_character_write (&dt_parm.0, &"running test - here test%val is not initialized"[1]{lb: 1 sz: 1}, 47); _gfortran_st_write_done (&dt_parm.0); } } but the caller still expects one: a = test (); indeed a frontend issue.
[Bug middle-end/94226] [10 regression] r10-7272 eliminated some warning messages
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94226 Richard Biener changed: What|Removed |Added Target Milestone|--- |10.0 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #2 from Richard Biener --- I will have a look.
[Bug middle-end/94226] [10 regression] r10-7272 eliminated some warning messages
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94226 --- Comment #3 from Richard Biener --- The issue is most likely get_origin_and_offset which looks at the pointer argument pointed-to-type: tree xtype = TREE_TYPE (TREE_TYPE (x)); /* The byte offset of the most basic struct member the byte offset *OFF corresponds to, or for a (multidimensional) array member, the byte offset of the array element. */ HOST_WIDE_INT index = 0; if ((RECORD_OR_UNION_TYPE_P (xtype) && field_at_offset (xtype, *off, &index)) || (TREE_CODE (xtype) == ARRAY_TYPE && TREE_CODE (TREE_TYPE (xtype)) == ARRAY_TYPE && array_elt_at_offset (xtype, *off, &index))) { *fldoff += index; *off -= index; } here we now see a single-dimensional array because the MEM[&MEM] propagation simply preserves the original pointer type. Since the pointer type doesn't have any semantics heuristics should better look at the type of the underlying object (if there is any). The following fixes this testcase (not that I agree with the way all this is written): diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c index 13640e0fd36..1879686ce0a 100644 --- a/gcc/gimple-ssa-sprintf.c +++ b/gcc/gimple-ssa-sprintf.c @@ -2331,7 +2331,9 @@ get_origin_and_offset (tree x, HOST_WIDE_INT *fldoff, HOST_WIDE_INT *off) if (off) { - tree xtype = TREE_TYPE (TREE_TYPE (x)); + tree xtype + = (TREE_CODE (x) == ADDR_EXPR + ? TREE_TYPE (TREE_OPERAND (x, 0)) : TREE_TYPE (TREE_TYPE (x))); /* The byte offset of the most basic struct member the byte offset *OFF corresponds to, or for a (multidimensional)
[Bug middle-end/93437] [9 Regression] bogus -Warray-bounds on protobuf generated code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93437 Richard Biener changed: What|Removed |Added Priority|P3 |P2 Target Milestone|--- |9.4
[Bug c/93572] [8/9/10 Regression] internal compiler error: q from h referenced in main
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93572 Richard Biener changed: What|Removed |Added Target Milestone|--- |8.5 Target||x86_64-linux-gnu Priority|P3 |P4
[Bug c/93573] [8/9/10 Regression] internal compiler error: in force_constant_size, at gimplify.c:733
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93573 Richard Biener changed: What|Removed |Added Target Milestone|--- |8.5 Priority|P3 |P4
[Bug target/93932] PowerPC vec_extract with variable element number has code regressions for V2DI/V2DF vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93932 Richard Biener changed: What|Removed |Added Keywords||missed-optimization --- Comment #5 from Richard Biener --- Fixed on trunk?
[Bug c++/94223] [10 Regression] -fcompare-debug -O0 failure on cpp1z/init-statement6.C
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94223 Richard Biener changed: What|Removed |Added Target Milestone|--- |10.0 CC||rguenth at gcc dot gnu.org --- Comment #2 from Richard Biener --- lhd_set_decl_assembler_name seems to only do this for local decls though so it shouldn't matter for actual generated code but is just a compare-debug artifact? If it matters for code-generation then yes, a local counter should do (does it really need to be GTY?)
[Bug tree-optimization/91322] [10 regression] alias-4 test failure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91322 Richard Biener changed: What|Removed |Added Priority|P3 |P1
[Bug lto/91028] [10 Regression] g++.dg/lto/alias-2 FAILs with -fno-use-linker-plugin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91028 Richard Biener changed: What|Removed |Added Priority|P3 |P1
[Bug target/91498] [10 Regression] STV change in r274481 causes 300.twolf regression on Haswell
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91498 Richard Biener changed: What|Removed |Added Status|NEW |WAITING Blocks||26163 --- Comment #20 from Richard Biener --- The patch from comment#4 got installed so re-analysis is necessary I think. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
[Bug target/91634] [10 Regression] 508.namd_r (and 435.gromacs) speed regression after r274994
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91634 Richard Biener changed: What|Removed |Added Status|NEW |WAITING --- Comment #1 from Richard Biener --- Analysis needed.
[Bug tree-optimization/92029] [10 Regression] 'libgomp.fortran/pr90779.f90' ICE for nvptx offloading
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92029 Richard Biener changed: What|Removed |Added Priority|P3 |P2 --- Comment #7 from Richard Biener --- More-or-less a latent issue.