from:"rguenth at gcc dot gnu.org"

[Bug testsuite/94036] [9 regression] gcc.target/powerpc/pr72804.c fails

2020-03-05 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94036

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |9.3

[Bug target/94037] Runtime varies 2x just by order of two int assignments

2020-03-05 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94037

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||x86_64-*-*, i?86-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-03-05
  Component|rtl-optimization|target
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
The only appearant difference is

setge   %sil
movzbl  %sil, %esi
...
setl%sil
movzbl  %sil, %esi
...

vs. clangs

xorl%edx, %edx
xorl%esi, %esi
...
setle   %dl
setg%sil

where eventually the xors are "free" and the setg/zext cause excessive
latency.  But it's all quite ugly and there has to be a better way to
conditionally exchange two values in memory (fitting in a register)
without branches (which is probably to avoid mispredicts).

Note with GCC 10 you'll see the v[2] = { a, b } store "vectorized",
-DFAST is still faster for me (Haswell), 7.4s vs. 10s.

Fast loop body:

.L12:
movl(%rax), %ecx
vmovd   (%r11), %xmm1
cmpl%ecx, %esi
setge   %dl
movzbl  %dl, %edx
vpinsrd $1, %ecx, %xmm1, %xmm0
movl%r8d, %ecx
setge   %dil
subl%edx, %ecx
vmovq   %xmm0, 120(%rsp)
movslq  %ecx, %rcx
movl120(%rsp,%rdx,4), %edx
movl120(%rsp,%rcx,4), %ecx
addq$4, %rax
movl%ecx, -4(%rax)
movl%edx, (%r11)
movzbl  %dil, %edx
leaq(%r11,%rdx,4), %r11
cmpq24(%rsp), %rax
jb  .L12

slow one:

.L12:
movl(%rax), %esi
vmovd   (%r10), %xmm1
cmpl%esi, %edx
vpinsrd $1, %esi, %xmm1, %xmm0
setge   %dil
setl%sil
vmovq   %xmm0, 120(%rsp)
movzbl  %dil, %edi
movzbl  %sil, %esi
movl120(%rsp,%rdi,4), %edi
movl120(%rsp,%rsi,4), %esi
setge   %cl
movl%edi, (%r10)
movzbl  %cl, %ecx
movl%esi, (%rax)
addq$4, %rax
leaq(%r10,%rcx,4), %r10
cmpq8(%rsp), %rax
jb  .L12

[Bug c++/94041] [10 Regression] temporary object destructor called before the end of the full-expression since r10-5577

2020-03-05 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94041

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug bootstrap/94042] [10 Regression] Bootstrap fails on ppc-linux-gnu

2020-03-05 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94042

--- Comment #7 from Richard Biener  ---
Just to quote configury used:

../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man
--libdir=/usr/lib --libexecdir=/usr/lib
--enable-languages=c,c++,objc,fortran,obj-c++,ada,go --disable-werror
--with-gxx-include-dir=/usr/include/c++/10 --enable-ssp --disable-libssp
--disable-libvtv --disable-cet --disable-libcc1 --enable-plugin
--with-bugurl=https://bugs.opensuse.org/ '--with-pkgversion=SUSE Linux'
--with-slibdir=/lib --with-system-zlib --enable-libstdcxx-allocator=new
--disable-libstdcxx-pch --enable-version-specific-runtime-libs
--with-gcc-major-version-only --enable-linker-build-id --enable-linux-futex
--enable-gnu-indirect-function --program-suffix=-10 --without-system-libunwind
--with-cpu=default32 --with-cpu-64=power4 --enable-secureplt
--with-long-double-128 --build=powerpc64-suse-linux --host=powerpc64-suse-linux

note it worked with fa1160f6e50500aa38162fefb43bfb10c25e0363 but now fails
since at least 778a77357cad11e8dd4c810544330af0fbe843b1 so it's a recent
regression.

[Bug tree-optimization/92645] Hand written vector code is 450 times slower when compiled with GCC compared to Clang

2020-03-05 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92645

--- Comment #20 from Richard Biener  ---
Small C testcase for one of the patterns we miss to optimize/vectorize:

void foo (char * __restrict src, short * __restrict dest)
{
  union {
  __int128_t i;
  char v[16];
  } u;
  __builtin_memcpy (&u.i, src, 16);
  dest[0] = u.v[0];
  dest[1] = u.v[1];
  dest[2] = u.v[2];
  dest[3] = u.v[3];
  dest[4] = u.v[4];
  dest[5] = u.v[5];
  dest[6] = u.v[6];
  dest[7] = u.v[7];
  dest[8] = u.v[8];
  dest[9] = u.v[9];
  dest[10] = u.v[10];
  dest[11] = u.v[11];
  dest[12] = u.v[12];
  dest[13] = u.v[13];
  dest[14] = u.v[14];
  dest[15] = u.v[15];
}

presents itself as

  _19 = MEM <__int128 unsigned> [(char * {ref-all})src_18(D)];
  _37 = (char) _19;
  _1 = (short int) _37;
  *dest_20(D) = _1;
  _38 = BIT_FIELD_REF <_19, 8, 8>;
  _2 = (short int) _38;
  MEM[(short int *)dest_20(D) + 2B] = _2;
  _39 = BIT_FIELD_REF <_19, 8, 16>;
  _3 = (short int) _39;
  MEM[(short int *)dest_20(D) + 4B] = _3;
...
  _16 = (short int) _52;
  MEM[(short int *)dest_20(D) + 30B] = _16;
  return;

where SLP vectorization is confused about (char) _19 vs. BIT_FIELD_REF
but also wouldn't handle BIT_FIELD_REFs.  It neither vectorizes the
store to a store from a CTOR which forwprop could then pattern-match.

[Bug c++/94044] [10 Regression] internal compiler error: in comptypes, at cp/typeck.c:1490 on riscv64-unknown-linux-gnu and arm-eabi

2020-03-05 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94044

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.0
Summary|internal compiler error: in |[10 Regression] internal
   |comptypes, at   |compiler error: in
   |cp/typeck.c:1490 on |comptypes, at
   |riscv64-unknown-linux-gnu   |cp/typeck.c:1490 on
   |and arm-eabi|riscv64-unknown-linux-gnu
   ||and arm-eabi

[Bug target/94037] Runtime varies 2x just by order of two int assignments

2020-03-05 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94037

--- Comment #4 from Richard Biener  ---
(In reply to Uroš Bizjak from comment #3)
> (In reply to Jakub Jelinek from comment #2)
> > The
> > setge   %sil
> > movzbl  %sil, %esi
> > to
> > xorl%esi, %esi
> > setge   %sil
> 
> This is quite important conversion, as the later avoids partial register
> stall.

Couldn't we fix this by pretending setge and friends produce SImode
and always emit xor + setCC?  So not rely on a peephole but emit
the xor already during RTL expansion, eventually eliding it later
if that's ever necessary.

[Bug target/94037] Runtime varies 2x just by order of two int assignments

2020-03-05 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94037

--- Comment #6 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #2)
> The
> setge   %sil
> movzbl  %sil, %esi
> to
> xorl%esi, %esi
> setge   %sil
> transformation is something GCC does too with the
> ;; Convert setcc + movzbl to xor + setcc if operands don't overlap.
> peephole2s.
> The reason it doesn't trigger here is that the single comparison has 3 uses
> (why not 2 and something didn't CSE those?).
> So, at least for the two setcc quite long after cmp case, we might need to
> either not have the comparison in the peephole2 and instead walk backwards,
> looking for the FLAGS_REG setter (or stopping on something that clobbers it)
> and stopping on if the register we'd want to xor first is mentioned by any
> insn in between.
> We have at the start of peephole2:
> (insn 37 36 38 5 (set (reg:CCGC 17 flags)
> (compare:CCGC (reg/v:SI 1 dx [orig:86 pivot ] [86])
> (reg:SI 0 ax [orig:242 pretmp_404 ] [242]))) "pr94037.C":6:19 11
> {*cmpsi_1}
>  (expr_list:REG_DEAD (reg:SI 0 ax [orig:242 pretmp_404 ] [242])
> (nil)))
> (note 38 37 39 5 NOTE_INSN_DELETED)
> (note 39 38 1178 5 NOTE_INSN_DELETED)
> (insn 1178 39 1179 5 (set (reg:QI 2 cx [288])
> (lt:QI (reg:CCGC 17 flags)
> (const_int 0 [0]))) "pr94037.C":6:21 732 {*setcc_qi}
>  (nil))
> (insn 1179 1178 41 5 (set (reg:DI 2 cx [288])
> (zero_extend:DI (reg:QI 2 cx [288]))) "pr94037.C":6:21 115
> {zero_extendqidi2}
>  (nil))
> (insn 41 1179 42 5 (set (reg:SI 2 cx [289])
> (mem:SI (plus:DI (plus:DI (mult:DI (reg:DI 2 cx [288])
> (const_int 4 [0x4]))
> (reg/f:DI 7 sp))
> (const_int 120 [0x78])) [1 MEM[(int[2] *)_213] S4 A32]))
> "pr94037.C":6:15 67 {*movsi_internal}
>  (expr_list:REG_EQUIV (mem:SI (reg/v/f:DI 36 r8 [orig:283 begin ] [283])
> [1 *begin_329+0 S4 A32])
> (nil)))
> (insn 42 41 45 5 (set (mem:SI (reg/v/f:DI 36 r8 [orig:283 begin ] [283]) [1
> *begin_329+0 S4 A32])
> (reg:SI 2 cx [289])) "pr94037.C":6:15 67 {*movsi_internal}
>  (expr_list:REG_DEAD (reg:SI 2 cx [289])
> (nil)))
> (note 45 42 1180 5 NOTE_INSN_DELETED)
> (insn 1180 45 1181 5 (set (reg:QI 0 ax [290])
> (ge:QI (reg:CCGC 17 flags)
> (const_int 0 [0]))) "pr94037.C":15:10 732 {*setcc_qi}
>  (expr_list:REG_DEAD (reg:CCGC 17 flags)
> (nil)))
> (insn 1181 1180 47 5 (set (reg:DI 0 ax [290])
> (zero_extend:DI (reg:QI 0 ax [290]))) "pr94037.C":15:10 115
> {zero_extendqidi2}
>  (nil))
> and current peephole2 manages to handle the first setcc+movzbl.  The second
> one actually isn't doable because of the RA decisions, as the comparison
> uses ax and we'd need to clear rax before the comparison.
> Similarly, in partition we have 3 setcc+movzbls, but the first one has the
> movzbl not adjacent to the setcc, second one would be doable if the
> peephole2 pattern didn't include the FLAGS_REG setter and walked back and
> the third one isn't doable because the comparison uses cx, i.e. the register
> that's set by setcc and extended by movzbl.
> 
> Now, on the GIMPLE level, we have e.g. in partition:
>   _2 = MEM[base: right_35, offset: 0B];
>   _3 = _2 <= pivot_12;
>   _4 = (int) _3;
>   _19 = *left_31;
>   _7 = {_19, _2};
>   MEM  [(int *)&v] = _7;
>   _21 = v[_4];
>   *left_31 = _21;
>   _22 = _2 > pivot_12;
>   _23 = (int) _22;
>   _24 = v[_23];
>   MEM[base: right_35, offset: 0] = _24;
>   v ={v} {CLOBBER};
>   _5 = _2 <= pivot_12 ? 4 : 0;
> so I wonder why sccvn has not at least replaced the last _2 <= pivot_12 with
> _3.
> And, if SCCVN could be taught that if we have _3 = _2 <= pivot_12 and later
> _22 = _2 > pivot_12, we can simplify the latter to _22 = 1 - _3;
> 1 -

VN has a hard time hee because _2 <= pivot_12 has no def and thus
isn't value-numbered independently (IIRC I had some local hacks trying
to fix this but thought we should fix GIMPLE isntead).  VN also
does not open-code the GENERIC compare which makes the special-casing
even more ugly...

That said, it _could_ be fixed during elimination by looking up the
condition operand but then IIRC at least for VEC_COND_EXPR we don't
really like CSEing the conditions even if they are available in
a SSA name.

[Bug target/94037] Runtime varies 2x just by order of two int assignments

2020-03-05 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94037

--- Comment #7 from Richard Biener  ---
(In reply to Uroš Bizjak from comment #5)
> (In reply to Richard Biener from comment #4)
> > (In reply to Uroš Bizjak from comment #3)
> > > (In reply to Jakub Jelinek from comment #2)
> > > > The
> > > > setge   %sil
> > > > movzbl  %sil, %esi
> > > > to
> > > > xorl%esi, %esi
> > > > setge   %sil
> > > 
> > > This is quite important conversion, as the later avoids partial register
> > > stall.
> > 
> > Couldn't we fix this by pretending setge and friends produce SImode
> > and always emit xor + setCC?  So not rely on a peephole but emit
> > the xor already during RTL expansion, eventually eliding it later
> > if that's ever necessary.
> 
> xor clobbers flags, so they would be killed before setCC. OTOH, "mov $0,
> %reg" doesn't clobber flags, but it also doesn't break partial reg
> dependency.

Oh, ok.  That means if we want to more aggressively persue this we need
sth before RA.  I guess splitting it before RA would then depend on some
scheduling moving the zeroing somewhere before the CC computation...

Or we even pull in the actual CC computation into the early non-split
pattern.

[Bug tree-optimization/94043] [9/10 Regression] ICE in superloop_at_depth, at cfgloop.c:78

2020-03-05 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94043

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org
   Target Milestone|--- |9.3

[Bug middle-end/94045] [i686] Compiler hang with -O2 -g -m32 -march=i686 -mtune=generic

2020-03-05 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94045

Richard Biener  changed:

   What|Removed |Added

   Keywords||compile-time-hog
 Target||i?86-*-*
  Component|c++ |middle-end

--- Comment #2 from Richard Biener  ---
Probably another CSElib issue.

[Bug rtl-optimization/94045] [i686] Compiler hang with -O2 -g -m32 -march=i686 -mtune=generic

2020-03-05 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94045

Richard Biener  changed:

   What|Removed |Added

  Component|middle-end  |rtl-optimization

--- Comment #4 from Richard Biener  ---
Backtrace ends in var-tracking:

#5811 0x00e1e893 in find_base_term (x=0x7fb52431af30) at
/space/rguenther/src/gcc-work2/gcc/alias.c:2113
#5812 0x00e21e18 in true_dependence_1 (mem=0x7fb517ac51c8,
mem_mode=E_SImode, mem_addr=0x7fb52431af30, x=0x7fb517ab3a50,
x_addr=0x7fb52431aed0, mem_canonicalized=true) at
/space/rguenther/src/gcc-work2/gcc/alias.c:3026
#5813 0x00e22092 in canon_true_dependence (mem=0x7fb517ac51c8,
mem_mode=E_SImode, mem_addr=0x7fb52431af30, x=0x7fb517ab3a50,
x_addr=0x7fb52431aed0) at /space/rguenther/src/gcc-work2/gcc/alias.c:3068
#5814 0x01ac0ec7 in vt_canon_true_dep (set=0x9b6cfe0,
mloc=0x7fb517ac51c8, maddr=0x7fb52431af30, loc=0x7fb517ab3a50) at
/space/rguenther/src/gcc-work2/gcc/var-tracking.c:2257
#5815 0x01ac0fa1 in drop_overlapping_mem_locs (slot=0xa78c598,
coms=0x7ffde2e57670) at /space/rguenther/src/gcc-work2/gcc/var-tracking.c:2295
#5816 0x01adeda7 in hash_table::traverse_noresize (this=0x585c020,
argument=0x7ffde2e57670) at
/space/rguenther/src/gcc-work2/gcc/hash-table.h:1081
#5817 0x01add6f1 in hash_table::traverse (this=0x585c020,
argument=0x7ffde2e57670) at
/space/rguenther/src/gcc-work2/gcc/hash-table.h:1102
#5818 0x01ac128b in clobber_overlapping_mems (set=0x9b6cfe0,
loc=0x7fb517ac51c8) at /space/rguenther/src/gcc-work2/gcc/var-tracking.c:2361
#5819 0x01ac1628 in val_bind (set=0x9b6cfe0, val=0x933e258,
loc=0x7fb517ac51c8, modified=true) at
/space/rguenther/src/gcc-work2/gcc/var-tracking.c:2473
#5820 0x01ac18aa in val_store (set=0x9b6cfe0, val=0x933e258,
loc=0x7fb517ac51c8, insn=0x7fb518ffa0c0, modified=true) at
/space/rguenther/src/gcc-work2/gcc/var-tracking.c:2534
#5821 0x01acff9f in compute_bb_dataflow (bb=) at /space/rguenther/src/gcc-work2/gcc/var-tracking.c:6940

note the actual recursions are deep but the main issue is the complexity
in vt_find_locations and memory handling which makes that expensive with
large loc lists.

[Bug bootstrap/94042] [10 Regression] Bootstrap fails on ppc-linux-gnu

2020-03-05 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94042

--- Comment #9 from Richard Biener  ---
You can also run big-endian kvm guests on a little-endian host.

[Bug debug/93888] Incorrect DW_AT_location generated for copy-constructed function argument

2020-03-05 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93888

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Richard Biener  ---
Fixed.  Technically not a regression but it looks safe to backport if there's
desire (I've pushed it to our LTS gcc 7 flavor).

[Bug target/94059] [10 Regression] m68k: Bootstrap fails configuring libiberty with 'cannot compute sizeof (long long)'

2020-03-06 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94059

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2020-03-06
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
It's odd that this is stage3-libiberty only, I'd have expected to obvious
stage2 issues pop up when building stage2 target libraries.

Note the build log itself isn't very useful, how the sizeof (long long) test
fails might be interesting as well as sharing configury in a more direct
way than looking at the build log.

Bisecting also might help ... eventually related is PR94042 which had a
bisection so you might try if that bisection applies to m68k as well.

[Bug libstdc++/94069] [9 Regression] doesn't compile unless PTHREAD_RWLOCK_INITIALIZER is defined

2020-03-06 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94069

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |9.3

[Bug tree-optimization/94071] Missed optimization with endian and alignment independent memory access

2020-03-06 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94071

Richard Biener  changed:

   What|Removed |Added

   Keywords||easyhack,
   ||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-03-06
Version|unknown |10.0
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Hmm, bswap load is supposed to detect this.  Not sure what it doesn't like
here - eventually the (int) promotion before the shift (wider representation
than what it likely initializes the lattice from (uint16).  Shouldn't be
hard to fix.

   [local count: 1073741824]:
  _1 = data[addr_10(D)];
  _2 = (signed short) _1;
  _3 = addr_10(D) + 1;
  _4 = data[_3];
  _5 = (int) _4;
  _6 = _5 << 8;
  _7 = (signed short) _6;
  _8 = _2 | _7;
  _11 = (uint16_t) _8;
  return _11;

[Bug target/94072] [10 Regression] ICE: SIGSEGV due to infinite recursion in expand_expr/expand_expr_real_1 with -msve-vector-bits=512

2020-03-09 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94072

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.0

[Bug tree-optimization/94086] Missed optimization when converting a bitfield to an integer on x86-64

2020-03-09 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94086

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-03-09

--- Comment #2 from Richard Biener  ---
Confirmed.  This is another task for a combined bswap/store-merging where the
bswap tracking would need to be extended to cover bits.

Also part of the reason for the missed optimization is that on GIMPLE we
think 'half' is memory but in reality it is in a register.

[Bug tree-optimization/94094] New: [meta-bug] store-merging and/or bswap load/store-merging missed optimizations

2020-03-09 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94094

Bug ID: 94094
   Summary: [meta-bug] store-merging and/or bswap
load/store-merging missed optimizations
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Bug tracking missed cases, both passes could/should be merged.

[Bug target/94088] [10 Regression] ICE: in extract_insn, at recog.c:2294 (error: unrecognizable insn), or ICE: in elimination_costs_in_insn, at reload1.c:3538

2020-03-09 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94088

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.0

[Bug tree-optimization/94092] Code size and performance degradations after -ftree-loop-distribute-patterns was enabled at -O[2s]+

2020-03-09 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94092

--- Comment #4 from Richard Biener  ---
With profile feedback we (target or middle-end) can produce specialized
RTL expansion doing small copies inline and larget ones offline.  The
idea of GIMPLE level pattern detection is that even for small sizes
the target usually knows how to expand the copy optimally while the
user may have written a byte copying loop.

Of course that requires targets to pay attention.

Note most compiler optimization involves some heuristics and clearly heuristics
can be off.  I wonder if you can obtain better coremark results by using
link-time optimization.  Iff you're only after benchmark numbers...

[Bug target/94093] -malign-double changes alignment of double type only and not long double

2020-03-09 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94093

Richard Biener  changed:

   What|Removed |Added

  Known to fail||2.95.2, 3.2.3, 3.4.6, 4.0.0
   Keywords|wrong-code  |documentation

--- Comment #3 from Richard Biener  ---
Probably.  IMHO support for -malign-double should go away since it's effects
on the psABI are not fully documented.

Btw, clang appears to have alignof(long double) == 8, not matching GCCs
behavior.

GCC 4.0.0 is also "wrong", so are GCC 2.95, 3.2, 3.3 and 3.4.

So I guess it works as designed and documentation should be fixed instead.

[Bug target/94093] -malign-double changes alignment of double type only and not long double

2020-03-09 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94093

Richard Biener  changed:

   What|Removed |Added

  Known to fail|2.95.2  |
  Known to work||2.95.2

--- Comment #4 from Richard Biener  ---
Pilot error, GCC 2.95 "properly" aligns long double but GCC 3.2+ do not.

[Bug target/94096] New: amdgcn build instructions missing

2020-03-09 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94096

Bug ID: 94096
   Summary: amdgcn build instructions missing
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

https://gcc.gnu.org/wiki/Offloading#How_to_try_offloading_enabled_GCC

has documentation how to enable offloading for nvtpx, intel-mic and hsa but
lacks any information on amdgcn.  install.texi lacks everything.

Please update.

[Bug target/94103] Wrong optimization: reading value of a variable changes its representation for optimizer

2020-03-10 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94103

Richard Biener  changed:

   What|Removed |Added

  Component|middle-end  |target
   Keywords||wrong-code
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-03-10
 Target||x86_64-*-*, i?86-*-*,
   ||m68k-*-*
 CC||law at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #4 from Richard Biener  ---
There's a related bug about x87 stores not storing all bytes which was closed
as INVALID, the testcase was using a union to pun long double and a character
array IIRC.  Here the situation is quite similar - the IL has (from your
unused load):

  x.1_6 = x;
  _18 = MEM[(char * {ref-all})&x];
  MEM[(char * {ref-all})&u] = _18;
  _7 = u[0];
  _8 = u[1];
  printf ("%016lX %016lX\n", _8, _7);

where x.1_6 is of type long double and _18 is __int128.  Value-numbering
then decides that it can elide the load to _18 and also optimize the loads
from u[] as:

  x.1_6 = x;
  _31 = VIEW_CONVERT_EXPR<__int128 unsigned>(x.1_6);
  MEM[(char * {ref-all})&u] = _31;
  _32 = (long unsigned int) _31;
  _33 = BIT_FIELD_REF <_31, 64, 64>;
  printf ("%016lX %016lX\n", _33, _32);

which is because the backend tells us the FP load x.1_6 = x loads all
bits and do not modify the underlying representation.  Now, for x87
modes GET_MODE_SIZE isn't in agreement with what the actual instruction
does nor does the load reflect the fact that a x87 load (not the
long double variant) can end up doing a rounding step.  m68k is
probably similarly affected.  And yes, compared to the other bug I
was able to close as INVALID conveniently this one looks "real"
(if also artificially constructed and unfortunate...).  Jeff, you
also were involved in the other bug, do you agree here?  I still don't
see any good solution though.

I agree that the decimal float variant is an entirely different bug,
maybe you can open a new one for this?

[Bug c/94106] [8/9/10 Regression] error on a function redeclaration with attribute transaction_safe

2020-03-10 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94106

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |8.5

[Bug middle-end/94111] Wrong constant folding: decimal floating-point infinity casted to double -> zero

2020-03-10 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94111

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 CC||rguenth at gcc dot gnu.org
   Keywords||wrong-code
   Last reconfirmed||2020-03-10
 Ever confirmed|0   |1
Summary|Wrong optimization: decimal |Wrong constant folding:
   |floating-point infinity |decimal floating-point
   |casted to double -> zero|infinity casted to double
   ||-> zero

--- Comment #1 from Richard Biener  ---
This goes wrong somewhere in constant folding:

  d.1_2 = d_13;
  _3 = (double) d.1_2;

 ->

  d.1_2 =  Inf;
  _3 = 0.0;

so (double) Inf is computed wrong.

[Bug tree-optimization/94114] [8/9/10 Regression] ICE in gimplify_modify_expr, at gimplify.c:5936

2020-03-10 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94114

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug middle-end/94122] Wrong optimization: reading value of a decimal FP variable changes its representation for optimizer

2020-03-10 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94122

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2020-03-10
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
This works for me with GCC 9.  On trunk this is wrong FRE:

Value numbering stmt = _3 = MEM[(unsigned char *)&x];
Setting value number of _3 to 127 (changed)
Value numbering stmt = _4 = _3 + 1;
Match-and-simplified _3 + 1 to 128
RHS _3 + 1 simplified to 128
Setting value number of _4 to 128 (changed)
Value numbering stmt = MEM[(unsigned char *)&x] = _4;
No store match
Value numbering store MEM[(unsigned char *)&x] to 128
Setting value number of .MEM_12 to .MEM_12 (changed)
Value numbering stmt = x.2_5 = x;
Successfully combined 2 partial definitions
Setting value number of x.2_5 to 0 (changed)
Value numbering stmt = _13 = MEM  [(char * {ref-all})&x];
Setting value number of _13 to 847249408 (changed)

possibly caused by real_{to,from}_target doing normalization during
encoding/decoding? (that would IMHO be wrong?)

[Bug target/94123] [10 regression] r10-7093 causes gcc.target/powerpc/pr87507.c to fail

2020-03-11 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94123

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.0

[Bug tree-optimization/94125] [9/10 Regression] wrong code at -O3 on x86_64-linux-gnu

2020-03-11 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94125

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED
   Priority|P1  |P2
  Known to work||8.3.0

--- Comment #3 from Richard Biener  ---
I'll have a look.  Note the bisection point is a correctness fix only possibly
resulting in less optimization.

[Bug fortran/94129] Using almost any openacc !$acc directive causes ICE "compressed stream: data error"

2020-03-11 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94129

Richard Biener  changed:

   What|Removed |Added

   Keywords||openacc
 CC||doko at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org

--- Comment #1 from Richard Biener  ---
You should report this to Ubuntu, I suspect a build problem there, eventually
enabling zstd compression for gfortran-10 but zlib compression for
gcc-10-offload-nvptx (I'm not sure we inter-operate here - Martin?)

[Bug tree-optimization/94130] [8/9/10 Regression] Unintended result with optimization option when assigning two structures, memset and 0

2020-03-11 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94130

Richard Biener  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
   Target Milestone|--- |8.5
   Last reconfirmed||2020-03-11
  Known to work||6.5.0
 Ever confirmed|0   |1
   Priority|P3  |P2
Summary|Unintended result with  |[8/9/10 Regression]
   |optimization option when|Unintended result with
   |assigning two structures,   |optimization option when
   |memset and 0|assigning two structures,
   ||memset and 0
  Known to fail||7.5.0, 8.3.0, 9.2.0

--- Comment #1 from Richard Biener  ---
Confirmed.  DSE prunes the memset from the wrong side it seems.  Jeff?

Breaks with -O -fno-inline already, fixed with -fno-tree-dse.

[Bug tree-optimization/94130] [8/9/10 Regression] Unintended result with optimization option when assigning two structures, memset and 0

2020-03-11 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94130

--- Comment #2 from Richard Biener  ---
Doing

memset(&s_myvalue, 0, sizeof(s_myvalue));
s_myreq.m_data = &s_myvalue;

also works around the issue.  Odd.

[Bug tree-optimization/94131] [10 Regression] ICE: tree check: expected integer_cst, have plus_expr in get_len, at tree.h:5927

2020-03-11 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94131

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.0
   Priority|P3  |P1
 CC||msebor at gcc dot gnu.org

[Bug c++/94132] Valid usage of flexible array member failing to compile

2020-03-11 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94132

Richard Biener  changed:

   What|Removed |Added

 CC||msebor at gcc dot gnu.org

--- Comment #3 from Richard Biener  ---
I guess at _least_ [[no_unique_address]] would be needed but then who knows
how exactly previous behavior was in this regard?

Martin, you tightened the checks (I agree to those), any input?

[Bug target/94136] GCC doc for built-in function builtin_clear_cache() not 100% correct

2020-03-11 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94136

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Keywords||documentation
  Component|other   |target
 Target||riscv
   Last reconfirmed||2020-03-11
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Confirmed.

[Bug tree-optimization/94125] [9/10 Regression] wrong code at -O3 on x86_64-linux-gnu

2020-03-11 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94125

Richard Biener  changed:

   What|Removed |Added

 CC||amker at gcc dot gnu.org

--- Comment #4 from Richard Biener  ---
Hm, it computes a dependence distance of two and in the end sorts partitions
in the wrong order from a bogus partition dependence edge.  The odd thing is
that for

  for (int c = 0; c <= 2; c++)
{
  b = f;
  *g = k[c + 3];
  k[c + 1] = 0;
}

we compute a distance of minus two (two + DDR_REVERSED_P) but in both
cases we need to use the same partition ordering, memset after the
partition containing the k[c+3] load but the partition dependence code
from the DDRs appearant different direction handles both cases differently.

Something is missing here.  Not sure what - Bin, any idea?

[Bug fortran/94129] Using almost any openacc !$acc directive causes ICE "compressed stream: data error"

2020-03-11 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94129

--- Comment #4 from Richard Biener  ---
So I tried it with the SUSE GCC 10 packages and it works fine (I've
double-checked nvptx is offloaded).  But my packages are only configured for
zlib ...
(I'm testing on Leap 15.1 which doesn't have zstd I think).

[Bug fortran/94129] Using almost any openacc !$acc directive causes ICE "compressed stream: data error"

2020-03-11 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94129

--- Comment #5 from Richard Biener  ---
Btw, your backtrace ends up in lto_uncompression_zlib but Matthias shows the
Ubuntu packages have zstd enabled.  I'd have expected only zstd compressed
sections there.  Matthias, can you reproduce the issue?

[Bug libfortran/94143] [9/10 Regression] Asynchronous execute_command_line() breaks following synchronous calls

2020-03-12 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94143

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P4
   Target Milestone|--- |9.3

[Bug target/94145] Longcalls mis-optimize loading the function address

2020-03-12 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94145

--- Comment #5 from Richard Biener  ---
So what prevents GIMPLE from doing the transform to an indirect call and
hoisting the call address computation out of the loop?  I fear your volatile
marking is
papering over an entirely different issue.  Of course it will likely work
as a workaround since nobody is going to do that above mentioned dance.  Maybe
code like

void foo();

void bar()
{
  void (volatile fn*)() = foo;
  void (fn2 *)() = fn;
  for (int i = 0; i<1; ++i)
fn2();
}

will expose the same "issue" whatever it really is?

[Bug middle-end/94146] [10 Regression] Merging functions with same bodies stopped working

2020-03-12 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94146

--- Comment #4 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #3)
> If not already marked clearly as an ICF created thunk, I'd say it should be
> and then inliner should take it into account (and only inline if the
> function became very small or not at all).

It looks like ICF really creates a forwarding call:

ternary2 (int i)
{
  int retval.4;

   [local count: 1073741824]:
  retval.4_3 = ternary (i_2(D)); [tail call]
  return retval.4_3;

}

so IMHO for small functions the inlining is good (but why don't we create
an alias or an alternate entry symbol instead of a full (aligned) function?)

For big functions the inlining shouldn't happen indeed, possibly by detecting
this kind of forwarders?

So we're missing a testcase showing the regression IMHO.  It still works
with -Os for the testcase.

[Bug rtl-optimization/94148] The DF framework uses bb->aux, which is for passes only

2020-03-12 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94148

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-03-12

--- Comment #1 from Richard Biener  ---
In fact it even assumes it is cleared on entry!

[Bug lto/94150] Improve LTO diagnosis for LTO triggered warnings/error: print source.o or source.a(lib.o) when printing location

2020-03-12 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94150

Richard Biener  changed:

   What|Removed |Added

   Keywords||diagnostic
   Last reconfirmed||2020-03-12
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from Richard Biener  ---
Confirmed.  Probably easy for the special LTO aware diagnostics but harder in
general since there's no concept of a translation unit in our locations
and locations are also used for debug info generation.

[Bug c++/94152] New: Mistyped destructor name diagnostic suboptimal

2020-03-12 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94152

Bug ID: 94152
   Summary: Mistyped destructor name diagnostic suboptimal
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

struct _X { ~_X(); };
_X::~X () {}

produces

t.C:2:8: error: expected class-name before ‘(’ token
 _X::~X () {}
^

where clang says

t.C:2:6: error: expected the class name after '~' to name a destructor
_X::~X () {}
 ^
 _X

which is clearly better.

[Bug target/94145] Longcalls mis-optimize loading the function address

2020-03-12 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94145

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #7 from Richard Biener  ---
Ah, I read "mis-optimize" as produce wrong-code ...  OTOH CSEing the load from
the PLT once it is resolved _would_ be an optimization.  Asks for loop
peeling I guess?

[Bug target/94103] Wrong optimization: reading value of a variable changes its representation for optimizer

2020-03-12 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94103

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED
  Known to work||10.0

--- Comment #7 from Richard Biener  ---
commit 1dc00a8ec9aeba86b74b16bff6f171824bb7b4a1 (HEAD -> master, origin/master,
origin/HEAD)
Author: Richard Biener 
Date:   Thu Mar 12 14:18:35 2020 +0100

tree-optimization/94103 avoid CSE of loads with padding

VN currently replaces a load of a 16 byte entity 128 bits of precision
(TImode) with the result of a load of a 16 byte entity with 80 bits of
mode precision (XFmode).  That will go downhill since if the padding
bits are not actually filled with memory contents those bits are
missing.

2020-03-12  Richard Biener  

PR tree-optimization/94103
* tree-ssa-sccvn.c (visit_reference_op_load): Avoid type
punning when the mode precision is not sufficient.

* gcc.target/i386/pr94103.c: New testcase.

[Bug target/94103] Wrong optimization: reading value of a variable changes its representation for optimizer

2020-03-12 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94103

Richard Biener  changed:

   What|Removed |Added

  Known to fail||9.3.0

--- Comment #8 from Richard Biener  ---
The old VN doesn't value-number unused defs so the actual testcase doesn't fail
there.  Still broken with GCC 9 though.  Adjusted testcases that actually
use the FP load would trigger with older compilers as well.

[Bug target/94145] Longcalls mis-optimize loading the function address

2020-03-12 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94145

--- Comment #9 from Richard Biener  ---
(In reply to Alan Modra from comment #8)
> (In reply to Richard Biener from comment #7)
> > OTOH CSEing the load from the PLT once it is resolved _would_ be an
> > optimization.
> 
> Possibly.  Sometimes making the call sequence seem more efficient runs into
> stalls particularly when the called function is short.
>
> >  Asks for loop peeling I guess?
> 
> Yeah, that might be one way to get the first call of a function inside a
> loop over and done with.  And so you'd know the PLT entry was resolved and
> thus no longer volatile.

I suppose there's no (portable) way to "speculate" the call, thus _just_
eventually resolve the PLT?  That way we could do such hoisted PLT loads
as

  load PTL + speculate call
  load PTL

or rather always do

  resolve-and-load-PLT

since the times we want to lazily load the PLT with resolving later are
scarce?

[Bug tree-optimization/94163] [8/9 Regression] ICE in set_ptr_info_alignment with -O2 and __builtin_assume_aligned

2020-03-13 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94163

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Priority|P3  |P2
 Ever confirmed|0   |1
   Last reconfirmed||2020-03-13

--- Comment #2 from Richard Biener  ---
Likely - mine.

[Bug tree-optimization/94163] [8/9 Regression] ICE in set_ptr_info_alignment with -O2 and __builtin_assume_aligned

2020-03-13 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94163

--- Comment #3 from Richard Biener  ---
From

  /* There's no CCP pass after PRE which would re-compute alignment
 information so make sure we re-materialize this here.  */
  if (gimple_call_builtin_p (call, BUILT_IN_ASSUME_ALIGNED)
  && args.length () - 2 <= 1
  && tree_fits_uhwi_p (args[1])
  && (args.length () != 3 || tree_fits_uhwi_p (args[2])))
{
  unsigned HOST_WIDE_INT halign = tree_to_uhwi (args[1]);
  unsigned HOST_WIDE_INT hmisalign
= args.length () == 3 ? tree_to_uhwi (args[2]) : 0;
  if ((halign & (halign - 1)) == 0
  && (hmisalign & ~(halign - 1)) == 0)
set_ptr_info_alignment (get_ptr_info (forcedname),
halign, hmisalign);
}

where set_ptr_info_alignment ICEs for align == 0.  set_ptr_info_alignment
takes unsigned int args but the above computes HWI quantities that get
truncated here.

[Bug tree-optimization/94163] [8/9 Regression] ICE in set_ptr_info_alignment with -O2 and __builtin_assume_aligned

2020-03-13 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94163

--- Comment #6 from Richard Biener  ---
The patch in testing does the same as CCP.  I agree that we possibly want
saturation behavior but that can be done separately for GCC 11.

[Bug libstdc++/94164] [10 Regression] std::unintialized_fill_n fails to compile

2020-03-13 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94164

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.0
Summary|[Regression 10] |[10 Regression]
   |std::unintialized_fill_n|std::unintialized_fill_n
   |fails to compile|fails to compile

[Bug tree-optimization/94163] [8/9 Regression] ICE in set_ptr_info_alignment with -O2 and __builtin_assume_aligned

2020-03-13 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94163

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #7 from Richard Biener  ---
Fixed everywhere.

[Bug debug/92468] gcc generates wrong debug information at -O2 and -O3

2020-03-16 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92468

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2020-03-16
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
Confirmed.  THe issue is there isn't any real instruction at line 20 since
the inner loop is unrolled completely and the if (g) stmt is elided as false.
The f loop is also unrolled completely and debug stmts and breakpoints
align with the case where k is one.

[Bug c/94179] [10 Regression] ICE in gimplify_expr, at gimplify.c:14399 since r10-7127-gcb99630f254aaec6

2020-03-16 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94179

--- Comment #3 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #2)
> Created attachment 48036 [details]
> gcc10-pr94179.patch
> 
> Untested fix.
> And/or we could limit the match.pd optimization to GIMPLE only, as at least
> the C FE doesn't seem to be prepared to handle MEM_REFs and it is unclear if
> this is the only spot that needs fixing.

Doing it GIMPLE only may need additional "fixes" when people disable
forwprop though (didn't check).  IMHO the FEs better deal with all GENERIC
if they call into middle-end routines.  But I'm not against doing this
and I don't think we need to care about people doing -fno-tree-forwprop,
it's just you brought it up ... ;)

[Bug middle-end/68785] [6 Regression] valgrind reports issues with folding on x86_64

2020-03-16 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68785

--- Comment #15 from Richard Biener  ---
(In reply to David Binderman from comment #14)
> Interestingly, I ran the code in comment 8 through a valgrind version
> of recent gcc trunk, with the compiler flag -O2, and got this:
> 
> ./gcc.dg/pr68785.c
> ==49861== Invalid read of size 1
> ==49861==at 0xD9CDDD: count_nonzero_bytes(tree_node*, unsigned long,
> unsigned long, unsigned int*, bool*, bool*, bool*, vr_values const*,
> ssa_name_limit_t&) (tree-ssa-strlen.c:4891)
> ==49861==by 0xD9CF17: count_nonzero_bytes(tree_node*, unsigned long,
> unsigned long, unsigned int*, bool*, bool*, bool*, vr_values const*,
> ssa_name_limit_t&) (tree-ssa-strlen.c:4801)
> ==49861==by 0xDA19EE: count_nonzero_bytes (tree-ssa-strlen.c:4920)
> ==49861==by 0xDA19EE: handle_integral_assign(gimple_stmt_iterator*,
> bool*, vr_values const*) (tree-ssa-strlen.c:5547)

Can you file a new bug please?

[Bug tree-optimization/94125] [9 Regression] wrong code at -O3 on x86_64-linux-gnu

2020-03-16 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94125

Richard Biener  changed:

   What|Removed |Added

Summary|[9/10 Regression] wrong |[9 Regression] wrong code
   |code at -O3 on  |at -O3 on x86_64-linux-gnu
   |x86_64-linux-gnu|
  Known to fail||9.3.0
  Known to work||10.0

--- Comment #10 from Richard Biener  ---
Thanks Bin, fixed on trunk sofar.

[Bug target/94185] [10 Regression] crashes with "error: unable to generate reloads for {*zero_extendsidi2} internal compiler error: in curr_insn_transform, at lra-constraints.c:4006

2020-03-16 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94185

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-03-16
   Target Milestone|--- |10.0
Summary|gcc-10 crashes with "error: |[10 Regression] crashes
   |unable to generate reloads  |with "error: unable to
   |for {*zero_extendsidi2} |generate reloads for
   |internal compiler error: in |{*zero_extendsidi2}
   |curr_insn_transform, at |internal compiler error: in
   |lra-constraints.c:4006  |curr_insn_transform, at
   ||lra-constraints.c:4006
   Priority|P3  |P1
   Keywords||ra
  Known to work||9.3.0
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Must be a recent regression.

[Bug c/94188] [10 Regression] error: request for member ‘node’ in something not a structure or union since r10-7127-gcb99630f254aaec6

2020-03-16 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94188

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #1 from Richard Biener  ---
I have a patch.

[Bug sanitizer/94191] ubsan bootstrap memory hog with -enable-checking=rtl

2020-03-16 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94191

Richard Biener  changed:

   What|Removed |Added

   Keywords||memory-hog

--- Comment #2 from Richard Biener  ---
I suggest to use -O1, that should help.

[Bug c/94188] [10 Regression] error: request for member ‘node’ in something not a structure or union since r10-7127-gcb99630f254aaec6

2020-03-16 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94188

--- Comment #3 from Richard Biener  ---
Created attachment 48043
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48043&action=edit
patch in testing

This is what I have now, bootstrapped OK after the extra two hunks but I still
see ICEs during testing.  Still fixing build_fold_addr_expr_with_type looks
inevitable ... (other option would be to remove the optimization when the type
doesn't match, but I guess that will regress as well).

[Bug middle-end/51663] Desirable/undesirable elimination of unused variables & functions at -O0, -O0 -flto and -O0 -fwhole-program

2020-03-17 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51663

--- Comment #12 from Richard Biener  ---
(In reply to Tom de Vries from comment #11)
> Cross-referencing PR gdb/25684 - "gdb testing with gcc -flto" (
> https://sourceware.org/bugzilla/show_bug.cgi?id=25684 ).
> 
> Ideally there would be a way to enable the lto infrastructure without
> actually optimizing, such that when running the gdb testsuite with and
> without flto and comparing results, any regression would indicate something
> that needs fixing.
> 
> In the current situation, each individual regression needs investigation
> whether something needs fixing or whether the failure is just an
> optimization artifact. And due to the fact there are optimizations, there
> are thousands of such regressions.

I suppose we're talking about -O0 -flto here.  What kind of transforms
are undesirable?  I think at -O0 you'll get

 - more aggressive unused variable/function removal
 - promotion of variables from global to local

some of the transforms are unavoidable due to partitioning(?) but we could
default to 1:1 partitioning at -O0 ...

[Bug middle-end/51663] Desirable/undesirable elimination of unused variables & functions at -O0, -O0 -flto and -O0 -fwhole-program

2020-03-17 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51663

--- Comment #14 from Richard Biener  ---
(In reply to Tom de Vries from comment #13)
> (In reply to Richard Biener from comment #12)
> > (In reply to Tom de Vries from comment #11)
> > > Cross-referencing PR gdb/25684 - "gdb testing with gcc -flto" (
> > > https://sourceware.org/bugzilla/show_bug.cgi?id=25684 ).
> > > 
> > > Ideally there would be a way to enable the lto infrastructure without
> > > actually optimizing, such that when running the gdb testsuite with and
> > > without flto and comparing results, any regression would indicate 
> > > something
> > > that needs fixing.
> > > 
> > > In the current situation, each individual regression needs investigation
> > > whether something needs fixing or whether the failure is just an
> > > optimization artifact. And due to the fact there are optimizations, there
> > > are thousands of such regressions.
> > 
> > I suppose we're talking about -O0 -flto here.
> 
> Right, and ideally -flto plain, with -O0 implicit.
> 
> >  What kind of transforms
> > are undesirable?  I think at -O0 you'll get
> > 
> >  - more aggressive unused variable/function removal
> >  - promotion of variables from global to local
> > 
> 
> Right, is there a way to switch these off?

Not at the moment I think.  The main unused variable/function removal
is in cgraphunit.c:analyze_functions.  I guess the simplest thing would
be to try

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index a9dd288be57..3aa8137efad 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1158,7 +1158,7 @@ analyze_functions (bool first_time)
{
  /* Convert COMDAT group designators to IDENTIFIER_NODEs.  */
  node->get_comdat_group_id ();
- if (node->needed_p ())
+ if (!optimize || node->needed_p ())
{
  enqueue_node (node);
  if (!changed && symtab->dump_file)

but I'm not sure this is enough since we remove not refered to things
at several points during the compilation (including during partitioning
I think).  Another possibility would be to make all nodes force_output
when not optimizing like with

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index aa4cdc95506..b07bf9745d0 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -115,7 +115,7 @@ public:
   transparent_alias (false), weakref (false), cpp_implicit_alias (false),
   symver (false), analyzed (false), writeonly (false),
   refuse_visibility_changes (false), externally_visible (false),
-  no_reorder (false), force_output (false), forced_by_abi (false),
+  no_reorder (false), force_output (optimize), forced_by_abi (false),
   unique_name (false), implicit_section (false), body_removed (false),
   used_from_other_partition (false), in_other_partition (false),
   address_taken (false), in_init_priority_hash (false),

or in a more suitable place.  That should eventually also avoid
promotion to local.

> > some of the transforms are unavoidable due to partitioning(?) but we could
> > default to 1:1 partitioning at -O0 ...
> 
> At this point I'm not interested in defaults yet. I can achieve 1:1
> partition by testing target board unix/-flto/-flto-partition=1to1.
> 
> For now I'm interested in a combination of flags that exercises the specific
> type of debug info generation as is done for lto, without actually doing any
> optimizations.
> 
> F.i., an open question for me is the following: I'm now using
> -flto-partition=none for testing, but maybe 1to1 should yield better results?

[Bug middle-end/51663] Desirable/undesirable elimination of unused variables & functions at -O0, -O0 -flto and -O0 -fwhole-program

2020-03-17 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51663

--- Comment #15 from Richard Biener  ---
(In reply to Tom de Vries from comment #13)
> F.i., an open question for me is the following: I'm now using
> -flto-partition=none for testing, but maybe 1to1 should yield better results?

I guess it better mimics how the testcases are set up, but I think the
differences will be due to the other issues so the exact partitioning
shouldn't matter (you likely always get a single partition and thus
behavior equal to -flto-partition=none or -flto-partition=one by default).
-flto-partition=none will be faster because it elides the LTRANS IL streaming.

[Bug middle-end/94206] Wrong optimization: memset of n-bit integer types (from bit-fields) is truncated to n bits (instead of sizeof)

2020-03-18 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94206

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2020-03-18
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|UNCONFIRMED |ASSIGNED
  Known to fail||4.8.5

--- Comment #2 from Richard Biener  ---
Broken since long via memset folding:

  __MEM  ((void *)&x) = _Literal (uint33) 8589934591;

[Bug c/94188] [10 Regression] error: request for member ‘node’ in something not a structure or union since r10-7127-gcb99630f254aaec6

2020-03-18 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94188

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Richard Biener  ---
Fixed.

[Bug tree-optimization/94211] [9/10 Regression] -fcompare-debug failure on phi-opt-13.c

2020-03-18 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94211

Richard Biener  changed:

   What|Removed |Added

  Known to fail||9.3.0
   Target Milestone|--- |9.4
  Known to work||7.4.0

[Bug middle-end/94206] Wrong optimization: memset of n-bit integer types (from bit-fields) is truncated to n bits (instead of sizeof)

2020-03-18 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94206

Richard Biener  changed:

   What|Removed |Added

   Keywords||wrong-code
  Known to work||10.0

--- Comment #4 from Richard Biener  ---
Fixed on trunk sofar.

[Bug tree-optimization/94212] [8/9/10 Regression] Incorrect vectorization of loop with FP calculations

2020-03-18 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94212

Richard Biener  changed:

   What|Removed |Added

  Known to work||7.4.0
Summary|Incorrect vectorization of  |[8/9/10 Regression]
   |loop with FP calculations   |Incorrect vectorization of
   ||loop with FP calculations
  Component|middle-end  |tree-optimization
Version|tree-ssa|10.0
   Target Milestone|--- |8.5
 CC||rguenth at gcc dot gnu.org,
   ||rsandifo at gcc dot gnu.org
 Target||aarch64

--- Comment #5 from Richard Biener  ---
The vectorizer vectorizes the reduction in-order but appearantly sth goes wrong
there.

[Bug tree-optimization/94212] [8/9/10 Regression] Incorrect vectorization of loop with FP calculations

2020-03-18 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94212

--- Comment #7 from Richard Biener  ---
(In reply to Dmitrij Pochepko from comment #6)
> Just checked: non-vectorized assembly for aarch64 (O2) is using fmadd and
> fmsub intensively.

Try with -ffp-contract=off then.  Note due to effective unrolling of
the loop with vectorization we might end up forming "different" fmadd
groups.  So you might also want to check whether the vectorized loop still
sees fmadd use.

[Bug ipa/94217] [10 Regression] ICE in ipa_find_agg_cst_for_param, at ipa-prop.c:3467 since r10-7237-g4e3d3e40726e1b68

2020-03-18 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94217

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #6 from Richard Biener  ---
Mine.

[Bug tree-optimization/94216] [10 Regression] ICE in maybe_canonicalize_mem_ref_addr, at gimple-fold.c:4899 since r10-7237-g4e3d3e40726e1b68bf52fa205c68495124ea60b8

2020-03-18 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94216

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from Richard Biener  ---
Mine.

[Bug tree-optimization/94216] [10 Regression] ICE in maybe_canonicalize_mem_ref_addr, at gimple-fold.c:4899 since r10-7237-g4e3d3e40726e1b68bf52fa205c68495124ea60b8

2020-03-18 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94216

--- Comment #5 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #1)
> I wonder if we shouldn't do:
> --- gcc/fold-const.c.jj   2020-03-18 12:47:36.0 +0100
> +++ gcc/fold-const.c  2020-03-18 17:34:14.586455801 +0100
> @@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.
>  #include "attribs.h"
>  #include "tree-vector-builder.h"
>  #include "vec-perm-indices.h"
> +#include "tree-ssa.h"
>  
>  /* Nonzero if we are folding constants inside an initializer; zero
> otherwise.  */
> @@ -10262,6 +10263,10 @@ fold_binary_loc (location_t loc, enum tr
>switch (code)
>  {
>  case MEM_REF:
> +  STRIP_USELESS_TYPE_CONVERSION (arg0);

We already applied STRIP_NOPS to arg0

> +  if (arg0 != op0)
> + return fold_build2 (MEM_REF, type, arg0, op1);
> +
>/* MEM[&MEM[p, CST1], CST2] -> MEM[p, CST1 + CST2].  */
>if (TREE_CODE (arg0) == ADDR_EXPR
> && TREE_CODE (TREE_OPERAND (arg0, 0)) == MEM_REF)
> to catch all similar issues.  Otherwise, we'd need to strip the useless type
> conversion at least in the case which triggers this:
>   return fold_build2 (MEM_REF, type,
>   build_fold_addr_expr (base),
>   int_const_binop (PLUS_EXPR, arg1,
>size_int (coffset)));
> a few lines below this, where build_fold_addr_expr now returns a NOP_EXPR
> that we really want to strip again, even when op0 wasn't a NOP_EXPR.

True.  But note there could be a not useless type conversion here, for
example for MEM [&a] and void *a for example.  Here I think
the better fix is (again) to use build1 and then in case the base was
a MEM_REF recurse to the preceeding pattern.

I'm testing such a patch.

[Bug ipa/94217] [10 Regression] ICE in ipa_find_agg_cst_for_param, at ipa-prop.c:3467 since r10-7237-g4e3d3e40726e1b68

2020-03-19 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94217

--- Comment #7 from Richard Biener  ---
Testing patch.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #15 from Richard Biener  ---
WRF initial_config has very very very many (nested) loops to initialize
globals.
IIRC there's a related bug running into the very same issue when prefetching
is enabled.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #17 from Richard Biener  ---
Created attachment 48061
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48061&action=edit
cache base term

I wonder if we could simply cache the base terms in elt_loc_list?  Does that
make a difference?

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #18 from Richard Biener  ---
Note also that param_max_find_base_term_values limits recursion depth but not
width (the loc list traversals).  The original visited_vals thing was to
prevent infinite recursion only.  If the global caching works a safer approach
would be to turn that visited_vals things into a local cache and see if that's
enough as well.

[Bug tree-optimization/94216] [10 Regression] ICE in maybe_canonicalize_mem_ref_addr, at gimple-fold.c:4899 since r10-7237-g4e3d3e40726e1b68bf52fa205c68495124ea60b8

2020-03-19 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94216

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #7 from Richard Biener  ---
Fixed.

[Bug ipa/94217] [10 Regression] ICE in ipa_find_agg_cst_for_param, at ipa-prop.c:3467 since r10-7237-g4e3d3e40726e1b68

2020-03-19 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94217

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #9 from Richard Biener  ---
Fixed.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #21 from Richard Biener  ---
(In reply to Jakub Jelinek from comment #19)
> I think caching is problematic, for a couple of reasons:
> 1) for non-cselib_preserved_value_p, the loc list is dynamic and keeps
> changing, locs are added and removed as we go through the basic blocks

Sure, but if you look at my patch I'm caching on individual locs, not on
the whole list.  That then still doesn't avoid traversing the whole list
if we don't find a base but we'd at least elide further recursion.

The base we pick also depends on the ordering of the list currently
if there ever are two possible choices.

> 2) because of the recursion prevention, doesn't it matter from exactly what
> VALUE we start walking (in case we have a cycle or cycles)?

I think the outcome is essentially random anyways since we pick the first
base we find.  I'm quite sure that if we collected "all" bases we'd find
they are not equivalent in some cases.

> 3) plus the new param on visited_vals, if we reach it, caching is unreliable

Sure, but does anybody care?  Note what I'd really propose would be
find_base_term-local caching, eliding both the recursion prevention and
the param.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #22 from Richard Biener  ---
Created attachment 48063
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48063&action=edit
more localized caching

Like this.  Martin, can you also check the effect on this one?

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #23 from Richard Biener  ---
(In reply to Richard Biener from comment #22)
> Created attachment 48063 [details]
> more localized caching
> 
> Like this.  Martin, can you also check the effect on this one?

We can actually simplify since we won't ever use non-NULL cached vals which
also means this patch is a no-op and should be worse than the existing
state :/

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

Richard Biener  changed:

   What|Removed |Added

  Attachment #48063|0   |1
is obsolete||

--- Comment #25 from Richard Biener  ---
Created attachment 48064
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48064&action=edit
more localized caching

Updated and simplified patch.  Maybe it does help depending on how we have
shared locs for multiple values ...

The update is to not bother updating cache values with non-NULL found ones.

[Bug rtl-optimization/92264] [10 Regression] Compile time hog in 521.wrf_r with -Ofast -march=znver2 -g since r276318

2020-03-19 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92264

--- Comment #26 from Richard Biener  ---
(In reply to Martin Liška from comment #24)
> (In reply to Richard Biener from comment #23)
> > (In reply to Richard Biener from comment #22)
> > > Created attachment 48063 [details]
> > > more localized caching
> > > 
> > > Like this.  Martin, can you also check the effect on this one?
> > 
> > We can actually simplify since we won't ever use non-NULL cached vals which
> > also means this patch is a no-op and should be worse than the existing
> > state :/
> 
> So no testing is needed?

would be still interesting (but surprising) if it helps (doesn't matter
whether the original or the updated patch, the update is just constant
time improvement)

[Bug fortran/94221] Explicit assignment in type is ignored in some cases

2020-03-19 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94221

Richard Biener  changed:

   What|Removed |Added

  Known to fail||10.0
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-03-19
 Ever confirmed|0   |1
   Keywords||wrong-code

--- Comment #1 from Richard Biener  ---
test2 lacks a return stmt and initialization of the return value:

test2 ()
{
  {
struct __st_parameter_dt dt_parm.0;

dt_parm.0.common.filename = &"t.f90"[1]{lb: 1 sz: 1};
dt_parm.0.common.line = 15;
dt_parm.0.common.flags = 128;
dt_parm.0.common.unit = 6;
_gfortran_st_write (&dt_parm.0);
_gfortran_transfer_character_write (&dt_parm.0, &"running test - here
test%val is not initialized"[1]{lb: 1 sz: 1}, 47);
_gfortran_st_write_done (&dt_parm.0);
  }
}

but the caller still expects one:

  a = test ();

indeed a frontend issue.

[Bug middle-end/94226] [10 regression] r10-7272 eliminated some warning messages

2020-03-20 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94226

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.0
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #2 from Richard Biener  ---
I will have a look.

[Bug middle-end/94226] [10 regression] r10-7272 eliminated some warning messages

2020-03-20 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94226

--- Comment #3 from Richard Biener  ---
The issue is most likely get_origin_and_offset which looks at the pointer
argument pointed-to-type:

  tree xtype = TREE_TYPE (TREE_TYPE (x));

  /* The byte offset of the most basic struct member the byte
 offset *OFF corresponds to, or for a (multidimensional)
 array member, the byte offset of the array element.  */
  HOST_WIDE_INT index = 0;

  if ((RECORD_OR_UNION_TYPE_P (xtype)
   && field_at_offset (xtype, *off, &index))
  || (TREE_CODE (xtype) == ARRAY_TYPE
  && TREE_CODE (TREE_TYPE (xtype)) == ARRAY_TYPE
  && array_elt_at_offset (xtype, *off, &index)))
{
  *fldoff += index;
  *off -= index;
}

here we now see a single-dimensional array because the MEM[&MEM] propagation
simply preserves the original pointer type.  Since the pointer type doesn't
have any semantics heuristics should better look at the type of the underlying
object (if there is any).

The following fixes this testcase (not that I agree with the way all this
is written):

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 13640e0fd36..1879686ce0a 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -2331,7 +2331,9 @@ get_origin_and_offset (tree x, HOST_WIDE_INT *fldoff,
HOST_WIDE_INT *off)

   if (off)
{
- tree xtype = TREE_TYPE (TREE_TYPE (x));
+ tree xtype
+   = (TREE_CODE (x) == ADDR_EXPR
+  ? TREE_TYPE (TREE_OPERAND (x, 0)) : TREE_TYPE (TREE_TYPE (x)));

  /* The byte offset of the most basic struct member the byte
 offset *OFF corresponds to, or for a (multidimensional)

[Bug middle-end/93437] [9 Regression] bogus -Warray-bounds on protobuf generated code

2020-03-20 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93437

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
   Target Milestone|--- |9.4

[Bug c/93572] [8/9/10 Regression] internal compiler error: q from h referenced in main

2020-03-20 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93572

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |8.5
 Target||x86_64-linux-gnu
   Priority|P3  |P4

[Bug c/93573] [8/9/10 Regression] internal compiler error: in force_constant_size, at gimplify.c:733

2020-03-20 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93573

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |8.5
   Priority|P3  |P4

[Bug target/93932] PowerPC vec_extract with variable element number has code regressions for V2DI/V2DF vectors

2020-03-20 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93932

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #5 from Richard Biener  ---
Fixed on trunk?

[Bug c++/94223] [10 Regression] -fcompare-debug -O0 failure on cpp1z/init-statement6.C

2020-03-20 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94223

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.0
 CC||rguenth at gcc dot gnu.org

--- Comment #2 from Richard Biener  ---
lhd_set_decl_assembler_name seems to only do this for local decls though
so it shouldn't matter for actual generated code but is just a
compare-debug artifact?  If it matters for code-generation then yes,
a local counter should do (does it really need to be GTY?)

[Bug tree-optimization/91322] [10 regression] alias-4 test failure

2020-03-20 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91322

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug lto/91028] [10 Regression] g++.dg/lto/alias-2 FAILs with -fno-use-linker-plugin

2020-03-20 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91028

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug target/91498] [10 Regression] STV change in r274481 causes 300.twolf regression on Haswell

2020-03-20 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91498

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |WAITING
 Blocks||26163

--- Comment #20 from Richard Biener  ---
The patch from comment#4 got installed so re-analysis is necessary I think.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

[Bug target/91634] [10 Regression] 508.namd_r (and 435.gromacs) speed regression after r274994

2020-03-20 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91634

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |WAITING

--- Comment #1 from Richard Biener  ---
Analysis needed.

[Bug tree-optimization/92029] [10 Regression] 'libgomp.fortran/pr90779.f90' ICE for nvptx offloading

2020-03-20 Thread rguenth at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92029

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

--- Comment #7 from Richard Biener  ---
More-or-less a latent issue.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 49442 matches

Mail list logo