[Bug testsuite/96574] FAIL: gcc.target/i386/pr92865-1.c scan-assembler-times vmovdq[au]16[\t ] 6

2020-09-07 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96574

--- Comment #6 from CVS Commits  ---
The releases/gcc-10 branch has been updated by hongtao Liu
:

https://gcc.gnu.org/g:ce3001ff1d734e0763a1a5e434272bf89df1fe06

commit r10-8715-gce3001ff1d734e0763a1a5e434272bf89df1fe06
Author: liuhongt 
Date:   Tue Aug 18 13:18:03 2020 +0800

Adjust testcase.

Since This testcase is used to check generation of AVX512 vector
comparison, scan-assembler for vmov instruction could be deleted, also
-mprefer-vector-width=512 is added to avoid impact of different
default arch/tune of GCC.

gcc/testsuite
PR target/96574
* gcc.target/i386/pr92865-1.c: Adjust testcase.

[Bug libstdc++/96942] std::pmr::monotonic_buffer_resource causes CPU cache misses

2020-09-07 Thread dmitriy.ovdienko at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942

--- Comment #3 from Dmitriy Ovdienko  ---
Created attachment 49189
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49189&action=edit
Original implementation (simplified, single threaded)

Attached is a simplified original version of the benchmark.

[Bug libstdc++/96942] std::pmr::monotonic_buffer_resource causes CPU cache misses

2020-09-07 Thread dmitriy.ovdienko at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942

--- Comment #4 from Dmitriy Ovdienko  ---
Created attachment 49190
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49190&action=edit
Modified solution with custom allocator based on malloc (simplified, single
threaded)

Attached is a benchmark based on Malloc allocator, modified simplified single
threaded.

Following is a execution time for different tree depth:

depth_17depth_18depth_19
bt_pmr_0thrd0.105s  0.313s  0.577s
bt_malloc_0thrd 0.087s  0.147s  0.448s

Commandline is:

  time ./bt_pmr_0thrd 
  time ./bt_malloc_0thrd 

On depth=18 boundary there is 2x times difference.

[Bug target/96933] rs6000: inefficient code for char/short vec CTOR

2020-09-07 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933

--- Comment #6 from Kewen Lin  ---
(In reply to Kewen Lin from comment #5)
> (In reply to Segher Boessenkool from comment #4)
> > Yes, timing suggests there is some SHL/LHS flush.
> > 
> > On p9 and later we can use mtvsrdd instead of mtvsrd (moving two
> > bytes into place at one), which reduces the number of moves from
> > 16 to 8, and the number of merges from 15 to 7 (and reduces path
> > length by 1).  This sounds like a no-brainer win with that :-)
> 
> Good idea, it looks better on P9. One thing to double confirm, currently
> there are no instructions like vmrgob and vmrgoh, so of the mergings you
> mentioned from vector bytes to vector short and vector short to vector word
> needs artificial control vector?

Improve the patch to support mtvsrdd, the asm for char looks like:

 :
   0:   00 00 4c 3c addis   r2,r12,0
0: R_PPC64_REL16_HA .TOC.
   4:   00 00 42 38 addir2,r2,0
4: R_PPC64_REL16_LO .TOC.+0x4
   8:   e8 ff a1 fb std r29,-24(r1)
   c:   00 00 a2 3f addis   r29,r2,0
c: R_PPC64_TOC16_HA .rodata.cst16
  10:   f0 ff c1 fb std r30,-16(r1)
  14:   f8 ff e1 fb std r31,-8(r1)
  18:   67 1b 24 7c mtvsrdd vs33,r4,r3
  1c:   67 3b 28 7d mtvsrdd vs41,r8,r7
  20:   68 00 c1 8b lbz r30,104(r1)
  24:   78 00 e1 8b lbz r31,120(r1)
  28:   00 00 bd 3b addir29,r29,0
28: R_PPC64_TOC16_LO.rodata.cst16
  2c:   60 00 81 89 lbz r12,96(r1)
  30:   70 00 61 89 lbz r11,112(r1)
  34:   80 00 81 88 lbz r4,128(r1)
  38:   88 00 61 88 lbz r3,136(r1)
  3c:   90 00 01 89 lbz r8,144(r1)
  40:   98 00 e1 88 lbz r7,152(r1)
  44:   67 2b 46 7c mtvsrdd vs34,r6,r5
  48:   67 4b aa 7d mtvsrdd vs45,r10,r9
  4c:   09 00 9d f5 lxv vs44,0(r29)
  50:   67 63 5e 7d mtvsrdd vs42,r30,r12
  54:   67 5b 1f 7c mtvsrdd vs32,r31,r11
  58:   e8 ff a1 eb ld  r29,-24(r1)
  5c:   f0 ff c1 eb ld  r30,-16(r1)
  60:   67 23 63 7d mtvsrdd vs43,r3,r4
  64:   f8 ff e1 eb ld  r31,-8(r1)
  68:   3b 0b 42 10 vpermr  v2,v2,v1,v12
  6c:   67 43 27 7c mtvsrdd vs33,r7,r8
  70:   3b 4b ad 11 vpermr  v13,v13,v9,v12
  74:   3b 53 00 10 vpermr  v0,v0,v10,v12
  78:   3b 5b 21 10 vpermr  v1,v1,v11,v12
  7c:   97 11 4d f0 xxmrglw vs34,vs45,vs34
  80:   97 01 01 f0 xxmrglw vs32,vs33,vs32
  84:   57 13 40 f0 xxmrgld vs34,vs32,vs34
  88:   20 00 80 4e blr

For:
  1) mtvsrdd under TARGET_DIRECT_MOVE_128
  2) mtvsrd under  TARGET_DIRECT_MOVE
  3) original

The time evaluation on Power9 looks like
  1) 7.28s
  2) 7.41s
  3) 18.19s

[Bug target/96861] Integer min/max optimization failed under -march=skylake-avx512

2020-09-07 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96861

Hongtao.liu  changed:

   What|Removed |Added

 Status|RESOLVED|NEW
 Resolution|FIXED   |---

--- Comment #4 from Hongtao.liu  ---
Closed by mistake.

[Bug debug/94235] worse debug info with O0 than with O2 with flto

2020-09-07 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94235

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:fea13fcd0da0353520eb2675ad24c2f296611b85

commit r11-3026-gfea13fcd0da0353520eb2675ad24c2f296611b85
Author: Jakub Jelinek 
Date:   Mon Sep 7 09:54:38 2020 +0200

lto: Stream edge goto_locus [PR94235]

The following patch adds streaming of edge goto_locus (both LOCATION_LOCUS
and LOCATION_BLOCK from it), the PR shows a testcase (inappropriate for
gcc testsuite) where the lack of streaming of goto_locus results in worse
debug info.
Earlier version of the patch (without the output_function changes) failed
miserably, because on the order mismatch - input_function would
first input_cfg, then input_eh_regions and then input_bb (all of which now
have locations), while output_function used output_eh_regions, then
output_bb
and then output_cfg.  *_cfg went to a separate stream...
Now, is there a reason why the order is different?

If the intent is that the cfg could be read separately from the rest of
function or vice versa, alternatively we'd need to clear_line_info ();
before output_eh_regions and before/after output_cfg to make them
independent.

2020-09-07  Jakub Jelinek  

PR debug/94235
* lto-streamer-out.c (output_cfg): Also stream goto_locus for
edges.
Use bp_pack_var_len_unsigned instead of streamer_write_uhwi to
stream
e->dest->index and e->flags.
(output_function): Call output_cfg before output_ssa_name, rather
than
after streaming all bbs.
* lto-streamer-in.c (input_cfg): Stream in goto_locus for edges.
Use bp_unpack_var_len_unsigned instead of streamer_read_uhwi to
stream
in dest_index and edge_flags.

[Bug target/96939] LTO vs. different arm arch options

2020-09-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96939

--- Comment #8 from Jakub Jelinek  ---
So I think this bug has been introduced with
https://gcc.gnu.org/legacy-ml/gcc-patches/2016-12/msg01390.html
I think the right change is:
--- gcc/config/arm/arm.c.jj 2020-07-30 15:04:38.136293101 +0200
+++ gcc/config/arm/arm.c2020-09-07 10:43:54.809561852 +0200
@@ -3037,10 +3037,6 @@ arm_override_options_after_change_1 (str
 static void
 arm_override_options_after_change (void)
 {
-  arm_configure_build_target (&arm_active_target,
- TREE_TARGET_OPTION (target_option_default_node),
- &global_options_set, false);
-
   arm_override_options_after_change_1 (&global_options);
 }

@@ -32338,6 +32334,8 @@ arm_set_current_function (tree fndecl)
   cl_target_option_restore (&global_options, TREE_TARGET_OPTION (new_tree));

   save_restore_target_globals (new_tree);
+
+  arm_override_options_after_change_1 (&global_options);
 }

 /* Implement TARGET_OPTION_PRINT.  */
because random Optimization option changes, even just temporary, shouldn't
change what arm_arch_string etc. is, only option parsing or
set_current_function which invokes cl_target_option_restore ->
arm_option_restore.

[Bug tree-optimization/96951] New: strncpy truncation warning does not recognize truncation check

2020-09-07 Thread fw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96951

Bug ID: 96951
   Summary: strncpy truncation warning does not recognize
truncation check
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: fw at gcc dot gnu.org
  Target Milestone: ---

This code example produces a warning:

#include 

struct buffer {
  char string[10];
};

int
f (struct buffer *p, const char *s)
{
  strncpy (p->string, s, sizeof (p->string));
  if (p->string[sizeof (p->string) - 1] != '\0')
return -1;
  return 0;
}

t.c: In function ‘f’:
t.c:10:3: warning: ‘strncpy’ specified bound 10 equals destination size
[-Wstringop-truncation]
   10 |   strncpy (p->string, s, sizeof (p->string));
  |   ^~

There is an explicit truncation check, however, so the warning does not apply.

Suggested by Kim Barrett here:
https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-September/042890.html

-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

[Bug tree-optimization/96951] strncpy truncation warning does not recognize truncation check

2020-09-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96951

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
With the exception of code that needs to ensure the rest of the buffer is
filled with zeros (typically code dealing with passwords etc.), strncpy is
pretty much never something one should use.  Even the above example, if it is
only meant to truncate and return -1 if it doesn't fit, but otherwise just copy
the string, is wasting time on the extra clearing (sure, when it is just 10
bytes that isn't that big deal).

[Bug tree-optimization/96951] strncpy truncation warning does not recognize truncation check

2020-09-07 Thread fw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96951

--- Comment #2 from Florian Weimer  ---
Then the warning should recommend to use memccpy, perhaps?

  if (memccpy (p->string, s, '\0', sizeof (p->string)) == NULL)
return -1;
  return 0;

-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

[Bug target/53929] Bug in the use of Intel asm syntax when a global is named "and"

2020-09-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53929

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
The problem is that the intel asm syntax is just badly defined (broken by
design).  I'm not aware of any compiler that would emit for such testcases
something that could be assembled correctly with gas.

[Bug fortran/96896] Bogus 'Different ranks in pointer assignment' with 'array-variable = scalar' if LHS is a function

2020-09-07 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96896

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Tobias Burnus :

https://gcc.gnu.org/g:2b0df0a6ac79b34f5fac4f3d456e8e14db220e4a

commit r11-3029-g2b0df0a6ac79b34f5fac4f3d456e8e14db220e4a
Author: Tobias Burnus 
Date:   Mon Sep 7 12:29:05 2020 +0200

Fortran: Fixes for pointer function call as variable (PR96896)

gcc/fortran/ChangeLog:

PR fortran/96896
* resolve.c (get_temp_from_expr): Also reset proc_pointer +
use_assoc attribute.
(resolve_ptr_fcn_assign): Use information from the LHS.

gcc/testsuite/ChangeLog:

PR fortran/96896
* gfortran.dg/ptr_func_assign_4.f08: Update dg-error.
* gfortran.dg/ptr-func-3.f90: New test.

[Bug fortran/96896] Bogus 'Different ranks in pointer assignment' with 'array-variable = scalar' if LHS is a function

2020-09-07 Thread burnus at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96896

Tobias Burnus  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Tobias Burnus  ---
FIXED on GCC 11 mainline.

[Bug libstdc++/96946] libstdc++ std::shared_ptr makes an "unrelated cast" that causes Clang's Control Flow Integrity sanitiser to crash

2020-09-07 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96946

--- Comment #1 from Jonathan Wakely  ---
I don't know what the problem is, the code should be roughly equivalent to:

#include 

namespace x __attribute__((visibility("default")))
{
template
struct buffer
{
  alignas(__alignof__(T)) unsigned char buf[sizeof(T)];

  void* addr() { return static_cast(buf); }

  T* ptr() { return static_cast(addr()); }
};
}

struct IReporterFactory {
virtual ~IReporterFactory() = default;
};

class ReporterFactory : public IReporterFactory {};

int main()
{
  auto p = new x::buffer;
  ::new(p->addr()) ReporterFactory;
  p->ptr()->~ReporterFactory();
  delete p;
}

And clang has no problem with that (as expected).

Is it normal for cfi-unrelated-cast to just trap on a UD2 instruction, rather
than giving a UBsan diagnostic?

[Bug target/53929] Bug in the use of Intel asm syntax when a global is named "and"

2020-09-07 Thread u1049321969 at caramail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53929

--- Comment #4 from tk  ---
I have found that if I manually change
lea rax, bx[rip]
to something like
lea rax, __bx[rip]
...
.weakref __bx, bx
the assembly pass succeeds, with the correct results.

(It seems that the names "bx" and "and" only pose problems when they are used
within expressions.  If they are used in a context which unequivocally demands
a symbol, then gas can parse them.)

Thank you!

[Bug libstdc++/96942] std::pmr::monotonic_buffer_resource causes CPU cache misses

2020-09-07 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #5 from Alexander Monakov  ---
You raise valid points (i.e. it would be good to understand why preallocation
is not beneficial, or what's causing the performance gap w.r.t malloc), but
looking at cache-misses counter does not make sense here (perf is not explicit
about that, but it counts misses in L3, and as you see the count is three
magnitudes lower than that of cycles&instructions, so it's not the main factor
in overall performance picture).

As for comparison against Rust, it spreads more work over available cores: you
can see that its "user time" is higher, though "wall-clock time" is same or
lower. In other words, the C++ variant does not achieve good multicore scaling.

The main gotcha here is m_b_r does not allocate on construction, but rather
allocates 2x of the preallocation size on first call to 'allocate', and then
deallocates when 'release' is called. So it repeatedly calls malloc/free in the
inner benchmark loop, whereas you custom allocator allocates on construction
and deallocates on destruction, avoiding repeated malloc/free calls in the loop
and associated lock contention when multithreaded.

(also obviously it simply does more work in 'allocate', which costs extra
cycles)

[Bug target/53929] Bug in the use of Intel asm syntax when a global is named "and"

2020-09-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53929

--- Comment #5 from Jakub Jelinek  ---
It is far easier to use (the default) assembler syntax that is properly
designed and doesn't have flaws like this.

[Bug sanitizer/96332] Asan (libasan) deadlock inside a malloc

2020-09-07 Thread raster at rasterman dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96332

--- Comment #3 from Carsten Haitzler  ---
Anyone able to reproduce?

[Bug preprocessor/96952] New: __builtin_thread_pointer support cannot be probed

2020-09-07 Thread sorear at fastmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96952

Bug ID: 96952
   Summary: __builtin_thread_pointer support cannot be probed
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: preprocessor
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sorear at fastmail dot com
  Target Milestone: ---

I would like to use __builtin_thread_pointer instead of inline asm to access
the thread pointer (concretely, in kernel or libc code for risc-v), but it's
only supported by extremely recent versions of gcc and clang.  So, I would like
to detect it at compile time, but this does not work:

#if __has_builtin(__builtin_thread_pointer)
void *get_tp() { return __builtin_thread_pointer(); }
#else
/* inline asm fallback */
#endif

x.c: In function 'get_tp':
x.c:2:25: error: '__builtin_thread_pointer' is not supported on this target
2 | void *get_tp() { return __builtin_thread_pointer(); }
  | ^~

__has_builtin seems to be falsely returning true?

[Bug c++/96953] New: junk at end of line: convert const char[] to std::string

2020-09-07 Thread alex.wolf at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96953

Bug ID: 96953
   Summary: junk at end of line: convert const char[] to
std::string
   Product: gcc
   Version: 8.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: alex.wolf at gmail dot com
  Target Milestone: ---

compiler sends bad optimized data to assembler when compiling. 

g++-8 --version 
g++-8 (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0


content of: test.cpp

#include 
const char a[] = "\n";
std::string f() {
return std::string(a);
}

g++-8 -std=c++11 -Og -dA -ggdb -c test.cpp

Workarounds: 
- use different compiler flags
- declare 'a' as ptr: const char *a = "\n";

[Bug preprocessor/96952] __builtin_thread_pointer support cannot be probed

2020-09-07 Thread fw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96952

Florian Weimer  changed:

   What|Removed |Added

 CC||fw at gcc dot gnu.org
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=96200

--- Comment #1 from Florian Weimer  ---
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

[Bug c++/96953] junk at end of line: convert const char[] to std::string

2020-09-07 Thread alex.wolf at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96953

--- Comment #1 from Alex  ---
Created attachment 49191
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49191&action=edit
g++-8 -std=c++11 -Og -dA -ggdb -S ~/test/abc.cpp

Here you can see the Quote being generated on the wrong line at 523/524

[Bug c++/96953] junk at end of line: convert const char[] to std::string

2020-09-07 Thread alex.wolf at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96953

--- Comment #2 from Alex  ---
Created attachment 49192
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49192&action=edit
g++-8 -std=c++11 -Og -dA -ggdb -c ~/test/abc.cpp 2>out.txt

this is the output of the compile command:

/tmp/cce5iZjM.s: Assembler messages:
/tmp/cce5iZjM.s:524: Error: invalid character (0xa) in mnemonic
/tmp/cce5iZjM.s:525: Error: invalid character (0xa) in mnemonic
/tmp/cce5iZjM.s:526: Warning: missing closing `"'
/tmp/cce5iZjM.s:526: Error: invalid character (0xa) in mnemonic
/tmp/cce5iZjM.s:582: Error: junk at end of line, first unrecognized character
valued 0x9
/tmp/cce5iZjM.s:583: Error: junk at end of line, first unrecognized character
valued 0x9
/tmp/cce5iZjM.s:584: Error: junk at end of line, first unrecognized character
valued 0x9

This repeats for about 16k lines and then:


/tmp/cce5iZjM.s:18983: Error: junk at end of line, first unrecognized character
valued 0x9
/tmp/cce5iZjM.s:18989: Error: junk at end of line, first unrecognized character
valued 0x9
/tmp/cce5iZjM.s:18991: Error: junk at end of line, first unrecognized character
valued 0x9
/tmp/cce5iZjM.s:20870: Warning: end of file in string; '"' inserted
/tmp/cce5iZjM.s:21072: Warning: missing closing '"'
/tmp/cce5iZjM.s:21072: Error: invalid character (0xa) in mnemonic
/tmp/cce5iZjM.s: Error: open CFI at the end of file; missing .cfi_endproc
directive
/tmp/cce5iZjM.s:18765: Error: leb128 operand is an undefined symbol: .LVU79
/tmp/cce5iZjM.s:18777: Error: leb128 operand is an undefined symbol: .LVU79
/tmp/cce5iZjM.s:18778: Error: leb128 operand is an undefined symbol: .LVU79

this repeats about 30 lines and then:

/tmp/cce5iZjM.s:18947: Error: leb128 operand is an undefined symbol: .LVU79
/tmp/cce5iZjM.s:18948: Error: leb128 operand is an undefined symbol: .LVU79
/tmp/cce5iZjM.s:18949: Error: leb128 operand is an undefined symbol: .LVU79
/tmp/cce5iZjM.s:15745: Error: can't resolve `.LFE1019' {*UND* section} -
`.LFB1019' {.text section}
/tmp/cce5iZjM.s:15814: Error: can't resolve `.LBE191' {*UND* section} -
`.LBB191' {.text section}
/tmp/cce5iZjM.s:15867: Error: can't resolve `.LBE195' {*UND* section} -
`.LBB195' {*UND* section}
/tmp/cce5iZjM.s:15882: Error: can't resolve `.LBE196' {*UND* section} -
`.LBB196' {*UND* section}
/tmp/cce5iZjM.s:15900: Error: can't resolve `.LBE197' {*UND* section} -
`.LBB197' {*UND* section}
/tmp/cce5iZjM.s:15906: Error: can't resolve `.LBE198' {*UND* section} -
`.LBB198' {*UND* section}
/tmp/cce5iZjM.s:15926: Error: can't resolve `.LBE199' {*UND* section} -
`.LBB199' {*UND* section}
/tmp/cce5iZjM.s:15959: Error: can't resolve `.LBE202' {*UND* section} -
`.LBB202' {*UND* section}
/tmp/cce5iZjM.s:15970: Error: can't resolve `.LBE203' {*UND* section} -
`.LBB203' {*UND* section}

[Bug preprocessor/96952] __builtin_thread_pointer support cannot be probed

2020-09-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96952

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
__has_builtin does what is documented:

The special operator @code{__has_builtin (@var{operand})} may be used in
constant integer contexts and in preprocessor @samp{#if} and @samp{#elif}
expressions to test whether the symbol named by its @var{operand} is
recognized as a built-in function by GCC in the current language and
conformance mode.  It evaluates to a constant integer with a nonzero
value if the argument refers to such a function, and to zero otherwise.

__builtin_thread_pointer is recognized as a built-in function by GCC, so you
get 1 from __has_builtin.  It is not feasible to have at preprocessing time
complete knowledge of the details how the builtin will be lowered (if at all),
expanded (if at all), whether it needs any optabs etc.  A lot of those details
are also dependent on the builtin arguments and other details (e.g. from which
function it is called, what function specific options are enabled etc.).

[Bug c++/96953] junk at end of line: convert const char[] to std::string

2020-09-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96953

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from Jakub Jelinek  ---
Dup.

*** This bug has been marked as a duplicate of bug 93399 ***

[Bug middle-end/93399] [8 Regression] Annotate assembler option failure

2020-09-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93399

Jakub Jelinek  changed:

   What|Removed |Added

 CC||alex.wolf at gmail dot com

--- Comment #12 from Jakub Jelinek  ---
*** Bug 96953 has been marked as a duplicate of this bug. ***

[Bug libstdc++/96946] libstdc++ std::shared_ptr makes an "unrelated cast" that causes Clang's Control Flow Integrity sanitiser to crash

2020-09-07 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96946

--- Comment #2 from Jonathan Wakely  ---
Ah I see what's wrong. We do the static_cast before creating an object at that
address:

#include 

namespace x __attribute__((visibility("default")))
{
template
struct buffer
{
  alignas(__alignof__(T)) unsigned char buf[sizeof(T)];

  void* addr() { return static_cast(buf); }

  T* ptr() { return static_cast(addr()); }
};
}

struct IReporterFactory {
virtual ~IReporterFactory() = default;
};

class ReporterFactory : public IReporterFactory {};

int main()
{
  auto p = new x::buffer;
  auto p2 = p->ptr();  // undefined here
  ::new(static_cast(p2)) ReporterFactory;
  p->ptr()->~ReporterFactory();
  delete p;
}

But we have to do this, because we need to pass a T* to the
allocator_traits::construct function. This clang sanitizer seems incompatible
with any allocator that allocates bytes and then casts std::byte* or unsigned
char* to T*. For example, this also fails at runtime:

#include 

struct IReporterFactory {
virtual ~IReporterFactory() = default;
};

class ReporterFactory : public IReporterFactory {};

int main()
{
  alignas(alignof(ReporterFactory)) unsigned char b[sizeof(ReporterFactory)];
  using A = std::allocator;
  A a;
  std::allocator_traits::construct(a,
reinterpret_cast(b));
  std::allocator_traits::destroy(a, reinterpret_cast(b));
}

[Bug libstdc++/96946] libstdc++ std::shared_ptr makes an "unrelated cast" that causes Clang's Control Flow Integrity sanitiser to crash

2020-09-07 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96946

Jonathan Wakely  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Jonathan Wakely  ---
And this fails:

#include 

struct IReporterFactory {
virtual ~IReporterFactory() = default;
};

class ReporterFactory : public IReporterFactory {};

int main()
{
  std::list l(1);
}

This sanitizer is crap.

[Bug target/85830] vec_popcntd is improperly defined in altivec.h

2020-09-07 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85830

--- Comment #10 from Segher Boessenkool  ---
Thanks Carl!

[Bug target/96827] [10/11 Regression] __m128i from _mm_set_epi32 is backwards with -O3

2020-09-07 Thread joel.hutton at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96827

--- Comment #8 from Joel Hutton  ---
I'm working on this.

I believe this may have been introduced by my earlier SLP vector constructor
patch.(commit 10d1592)

What I believe to be the relevant section:

+  else if (constructor)
+{
+  tree rhs = gimple_assign_rhs1 (stmt_info->stmt);
+  tree val;
+  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val)
+   {
+ if (TREE_CODE (val) == SSA_NAME)
+   {
+ gimple* def = SSA_NAME_DEF_STMT (val);
+ stmt_vec_info def_info = vinfo->lookup_stmt (def);
+ /* Value is defined in another basic block.  */
+ if (!def_info)
+   return false;
+ scalar_stmts.safe_push (def_info);
+   }
+ else
+   return false;
+   }
+}

I'm investigating, but I suspect pushing to a stack which is then popped from
later has created a reversal of element order.

[Bug c++/96954] New: gcc times out with -O2/-O3/-Os

2020-09-07 Thread tangyixuan at mail dot dlut.edu.cn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96954

Bug ID: 96954
   Summary: gcc times out with -O2/-O3/-Os
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tangyixuan at mail dot dlut.edu.cn
  Target Milestone: ---

Hi, gcc times out during execution of ./a.out with -O2/-O3/-Os:

$ cat s.cpp

template < class A > int B ( const A &){
int a = B ( a );
return a ;
}
int main (){ return B (0);}

$ g++ -O2 1337-2-runtime-out.C && ./a.out
times out

$ g++ -O0 1337-2-runtime-out.C && ./a.out
Segmentation fault (core dumped)

$ g++ -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc-20200823/bin/g++
COLLECT_LTO_WRAPPER=/usr/local/gcc-20200823/libexec/gcc/x86_64-pc-linux-gnu/11.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-11-20200823/configure --prefix=/usr/local/gcc-20200823
--enable-checking=release --enable-languages=c,c++ --disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.0.0 20200823 (experimental) (GCC)

[Bug c++/96954] gcc times out with -O2/-O3/-Os

2020-09-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96954

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED
 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
GCC does not time out nor crash, it is the buggy testcase with infinite
recursion that does.  With -O2 when it is tail call optimized there is an
infinite loop (so hangs), while with -O0 it is not optimized and therefore it
segfaults when it runs out of stack.

[Bug target/96955] New: Implement __builtin_thread_pointer

2020-09-07 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96955

Bug ID: 96955
   Summary: Implement __builtin_thread_pointer
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hjl.tools at gmail dot com
CC: crazylht at gmail dot com, wwwhhhyyy333 at gmail dot com
Blocks: 96200
  Target Milestone: ---
Target: i386,x86-64

On Linux/x86-64, The %fs segment register is used to implement the thread
pointer. The linear address of the thread pointer is stored at offset 0
relative to the %fs segment register. The following code loads the thread
pointer in the %rax register:

movq %fs:0, %rax

On Linux/i386, the %gs segment register is used:

movl %gs:0, %eax

We need to

1. Implement __builtin_thread_pointer for Linux/x86.
2. Document its behavior.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96200
[Bug 96200] Implement __builtin_thread_pointer() and
__builtin_set_thread_pointer() if TLS is supported

gcc-bugs@gcc.gnu.org

2020-09-07 Thread thomas.lynch at reasoningtechnology dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96956

Bug ID: 96956
   Summary: When gcc does not see a label used in a goto it gives
the wrong label  address &&label
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: thomas.lynch at reasoningtechnology dot com
  Target Milestone: ---

Created attachment 49193
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49193&action=edit
.c .i .s gcc version system info

The attached is an example distilled from much larger code.  It makes use of
the gcc extension  of  taking the address of a label. All optimizations are
turned off `-O0`.

The && operator returns the address for a different label than the one
requested.  Specificaly in this exmaple `&&nominal` returns the address for a
different label, `&&test0`.

The code example shown makes use of an inlined nested function to hide the goto
from gcc. However, the error occurs before the nested function is called. Note
the printf %p printf statements. 

With small variations of this code, gcc will return the correct address for
`&&nominal`.  If gcc sees an explicit goto to the label for
`&&nominal`, then the printf %p will give the correct value.

See attached for .c .i .s gcc version and sys info.


/* broken.c 

gcc -std=gnu2x -Wall -O0 -ggdb  -o broken broken.c 

*/

#include 
#include 
#include 
#include 

typedef void **CV·Ptr; 
CV·Ptr target_pt;
uint i = 0;

int main(){

  goto test0;

  inline void do_jmp(CV·Ptr target_pt){
goto *target_pt;
  }

  test0:;

  i++; // gets optimized away
  printf("%x\n", i);
  goto report;

  nominal:
  i++;
  goto tests_finished;

  report:;
  target_pt = &&nominal;
  printf("test0: %p\n", &&test0);
  // this will be identical to &&test0 and we haven't even called do_jmp:
  printf("nominal: %p\n", target_pt); 

  if( i == 2 ){
printf("foo!\n");
goto tests_finished;
  }
  do_jmp(target_pt);

  tests_finished:;
}

[Bug middle-end/96200] Implement __builtin_thread_pointer() and __builtin_set_thread_pointer() if TLS is supported

2020-09-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96200

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #10 from Jakub Jelinek  ---
(In reply to Florian Weimer from comment #6)
> (In reply to H.J. Lu from comment #4)
> > On Linux/i386 and Linux/x86-64, thread pointer access is done via syscall.
> > On Linux/x86-64, __builtin_thread_pointer and __builtin_set_thread_pointer
> > may be implemented with FSGSBASE ISA.  Is it possible to implement these
> > builtins on Linux/i386 and Linux/x86-64 for all processors?
> 
> It's effectively part of the x86-64 ABI, but I think it's currently
> undocumented. On x86-64, it looks like this:
> 
> static inline void *
> thread_pointer (void)
> {
>   void *result;
>   asm ("mov %%fs:0, %0" : "=r" (result));
>   return result;
> }
> 
> i386 is similar, but with %gs, I think.
> 
> This is ABI since the early NPTL days, and GCC knows about this very
> explicitly, to implement the -mno-tls-direct-seg-refs option.

It is documented, see https://akkadia.org/drepper/tls.pdf
"The only requirement about this register is that the actual thread pointer tpt
can be loaded from the absolute address 0 via the %gs register. The following
code would load the thread pointer in the %eax register:
movl %gs:0, %eax"
for ia32.  For x86_64, it is not spelled explicitly, but it is implicit from:
"The x86-64 ABI is at its base virtually the same as the IA-32 ABI. The
difference is mainly in different size of variables containing pointers and
that it only provides one variant which closely matches the IA-32 GNU variant.
Instead of segment register %gs it uses the %fs segment register."
as well as all the explicit insn sequences that use it.  The psABI doesn't have
TLS details I believe, so Ulrich's tls.pdf is the ABI document that covers
that.

[Bug target/96955] Implement __builtin_thread_pointer for x86 TLS

2020-09-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96955

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
And if possible, optimize, so that if one does say
int *p = (int *)__builtin_thread_pointer ();
return p[4];
or
return p[i];
it will not read %fs:0 into a register and read 16(%reg), but rather read
%fs:16
etc. (of course only if not -mno-tls-direct-seg-refs) or not read
16(%reg,%regI,4) but %fs:16(,%regI,4) etc.

[Bug libstdc++/96946] libstdc++ std::shared_ptr makes an "unrelated cast" that causes Clang's Control Flow Integrity sanitiser to crash

2020-09-07 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96946

--- Comment #4 from Jonathan Wakely  ---
Libc++ seems to pass the std::list example in comment 3 because they rely on
different UB: https://godbolt.org/z/xvdaxh

The libc++ std::list node is allocated as though it's raw storage, and so is
never constructed. That means they don't need to cast from raw bytes to a T*,
because they just take the address of a T subobject inside an object which
doesn't exist.

[Bug target/96933] rs6000: inefficient code for char/short vec CTOR

2020-09-07 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933

--- Comment #7 from Segher Boessenkool  ---
There are vmrglb and vrghb etc.?

[Bug c++/96957] New: No name-lookup into base class when using an non dependent base class via template alias with dummy parameter.

2020-09-07 Thread anders.granlund.0 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96957

Bug ID: 96957
   Summary: No name-lookup into base class when using an non
dependent base class via template alias with dummy
parameter.
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: anders.granlund.0 at gmail dot com
  Target Milestone: ---

Consider the following c++ program:

  class A
  {
  protected:
  int x;
  };

  template
  using B = A;

  template
  class C : public B
  {
  public:
  void f()
  {
  x = 0;
  }
  };

  int main()
  {
  }

Compile it with "-std=c++17 -pedantic-errors".

Expected behaviour:

  Since the base class is not dependent (see http://wg21.link/cwg1390 ) the
  name lookup of x should succeed and no error message should be outputed
  during the compilation.

Observed behaviour:

  An compilation error about failing the name lookup of x was outputed during
  the compilation.

GCC is correctly compiling the program with no error messages outputed. See the
discussion in: https://bugs.llvm.org/show_bug.cgi?id=47435

[Bug libstdc++/86419] codecvt::in() and out() incorrectly return ok in some cases.

2020-09-07 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86419

--- Comment #11 from Jonathan Wakely  ---
(In reply to Dimitrij Mijoski from comment #10)
> Well, whatever, I will pause my work on this.

Thanks again. There are a few code conversion issues in my TODO list, so I'll
get around to properly reviewing everything above next time I get a chance to
work on it.

gcc-bugs@gcc.gnu.org

2020-09-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96956

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Jakub Jelinek  ---
Your testcase is invalid.
https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Labels-as-Values.html#Labels-as-Values
"You may not use this mechanism to jump to code in a different function. If you
do that, totally unpredictable things happen."
GCC has a different extension, described in the Local labels chapter, non-local
labels, but in that case the goto needs to be direct to the non-local label,
not using a computed goto.
What you see is the result of the compiler not adding any edges in the cfg from
the nested function to the label with address taken, the compiler only does
that for non-local labels, or between computed gotos in the current function
and the labels whose address is taken.

[Bug c++/96957] No name-lookup into base class when using an non dependent base class via template alias with dummy parameter.

2020-09-07 Thread anders.granlund.0 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96957

--- Comment #1 from Anders Granlund  ---
Also see the following stack overflow post:

https://stackoverflow.com/questions/63761866/difference-in-behaviour-between-clang-and-gcc-when-trying-to-confuse-them-by-usi

[Bug debug/93865] .debug_line with LTO refers to bogus file-names

2020-09-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93865

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
I guess this isn't only about the main source files, but about any includes (if
they are relative, not absolute).
Perhaps when streaming out lto we should stream for each TU also the
get_src_pwd () string, and in canon_file_name in lto-streamer-in.c take into
account the src pwd read from the current TU vs. get_src_pwd () for the LTO
link.
If they are the same, don't do any changes, similarly for absolute paths no
difference, otherwise canonicalize relative paths for the difference in the
paths.

[Bug preprocessor/96952] __builtin_thread_pointer support cannot be probed

2020-09-07 Thread bugdal at aerifal dot cx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96952

Rich Felker  changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #3 from Rich Felker  ---
This answer does not seem satisfactory. Whether it will be optimized is not the
question. Just whether it's semantically defined. That should either be
universally true on GCC versions that offer the builtin (via a libgcc function
if nothing else is available) or target-specific (which is known at
preprocessing time).

[Bug libstdc++/96958] New: Long Double in Hash Table policy forces soft-float calculations

2020-09-07 Thread jgreenhalgh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96958

Bug ID: 96958
   Summary: Long Double in Hash Table policy forces soft-float
calculations
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jgreenhalgh at gcc dot gnu.org
  Target Milestone: ---

It was pointed out that some forks of GCC (
https://github.com/FEX-Emu/gcc/commit/8a2b7389f50a50a4e26ec98101d47fb1fc1c1bcd
) reduce the hashtable policy implementation from a long double to a double.
Doing this reduces it from a soft-float calculation to hardware floating-point.

Reading the discussion on libstdc++ from when this code was introduced the
intention was to provide massive amounts of forwards compatibility for Very Big
hash tables. We're taking quite an efficiency hit for that future proofing.

[Bug libstdc++/96942] std::pmr::monotonic_buffer_resource causes CPU cache misses

2020-09-07 Thread dmitriy.ovdienko at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942

--- Comment #6 from Dmitriy Ovdienko  ---
> looking at cache-misses counter does not make sense here

Well, if you compare Rust and C++, cache-misses CPU counter differs
dramatically... and page-faults too... while amount of instructions is the
same.

Page-faults btw, can significantly affect performance too. It could happen that
that is the reason.

I've put all numbers into one table for convenience:


| CPU counter  | PMR| Malloc | Rust   |
|--||||
| cache-references | 45,104,136 | 40,713,525 | 29,268,774 |
| cache-misses | 24,448,475 | 14,147,648 | 12,147,041 |
| cycles   | 19,904,251,283 | 14,823,743,812 | 24,539,557,585 |
| instructions | 30,462,013,065 | 22,306,442,507 | 31,784,741,964 |
| branches |  4,834,392,341 |  4,331,968,591 |  4,829,547,556 |
| faults   |234,796 | 60,227 | 68,023 |
| migrations   |  2 |  6 |  8 |

> The main gotcha here is m_b_r does not allocate on construction, but rather 
> allocates 2x of the preallocation size on first call to 'allocate'

In the two previous posts I've attached a code that does not create any thread
and allocates/deallocates memory in the loop. So, both samples have the same
behaviour.

[Bug target/94595] gcc.target/arm/thumb2-cond-cmp-*.c fail for cortex-m

2020-09-07 Thread clyon at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94595

--- Comment #1 from Christophe Lyon  ---
For thumb2-cond-cmp-4.c (if ( (i >= '+') ? (j <= '-') : 1) ) we generate:

* cortex-m0:
f:
cmp r0, #42
ble .L3
movsr3, #45
movsr2, #0
lsrsr0, r1, #31
cmp r3, r1
adcsr0, r0, r2
.L1:
bx  lr
.L3:
movsr0, #1
b   .L1

* cortex-m3:
f:
cmp r0, #42
ble .L3
cmp r1, #45
ite gt
movgt   r0, #0
movle   r0, #1
bx  lr
.L3:
movsr0, #1
bx  lr

* cortex-m7:
f:
cmp r1, #45
it  gt
cmpgt   r0, #42
ite le
movle   r0, #1
movgt   r0, #0
bx  lr

* cortex-a9:
f:
cmp r1, #45
it  gt
cmpgt   r0, #42
ite le
movle   r0, #1
movgt   r0, #0
bx  lr

[Bug libstdc++/96958] Long Double in Hash Table policy forces soft-float calculations

2020-09-07 Thread jgreenhalgh at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96958

--- Comment #1 from James Greenhalgh  ---
Asleep at the wheel today, I had intended to link to the
https://gcc.gnu.org/pipermail/libstdc++/2011-September/036420.html original
discussion rather than leave it as a tedious exercise for the reader.

[Bug tree-optimization/96951] strncpy truncation warning does not recognize truncation check

2020-09-07 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96951

Martin Sebor  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Severity|normal  |enhancement
 Ever confirmed|0   |1
   Last reconfirmed||2020-09-07
 Blocks||88781

--- Comment #3 from Martin Sebor  ---
The warning suppression only considers an explicit nul assignment, not a test
for one.  If in the code the test case was derived from the string member is
not necessarily meant to be a string then declaring it with attribute nonstring
avoids the warning:

struct buffer {
  __attribute__((nonstring)) char string[10];
};

The attribute will trigger other warnings in uses of the member that aren't
clearly nul-terminated.

There are different ways to write the code to avoid the warning and I'm not
sure that any one of them is necessarily the best.  I'd rather users read up on
the problem with strncpy (and strncat) and choose what works best for them.  I
wrote a blog post about this some time ago:
https://developers.redhat.com/blog/2018/05/24/detecting-string-truncation-with-gcc-8/

As for memccpy, doesn't know about it yet but it's on my to-do list to add it
to resolve pr88814.  Ideally, I'd also like to improve Glibc's memccpy to make
it one-pass unless someone beats me to it (hint, hint ;-)


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88781
[Bug 88781] [meta-bug] bogus/missing -Wstringop-truncation warnings

[Bug libstdc++/96958] Long Double in Hash Table policy forces soft-float calculations

2020-09-07 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96958

Jonathan Wakely  changed:

   What|Removed |Added

   Last reconfirmed||2020-09-07
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

[Bug c++/96960] New: ICE in tsubst_copy_and_build, at cp/pt.c:20531 from lambda in return-type-requirement

2020-09-07 Thread hstong at ca dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96960

Bug ID: 96960
   Summary: ICE in tsubst_copy_and_build, at cp/pt.c:20531 from
lambda in return-type-requirement
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hstong at ca dot ibm.com
  Target Milestone: ---

The following program ICEs with GCC.

The ICE appears to related to the lambda expression in the
return-type-requirement

### SOURCE ():
template  concept C0 = true;

template 
concept C =
requires(T t) {
  { 42 } -> C0;
};

static_assert(C);

### COMPILER INVOCATION:
g++ -fsyntax-only -std=c++20 -xc++ -


### ACTUAL OUTPUT:
:6:23: internal compiler error: in tsubst_copy_and_build, at
cp/pt.c:20531
0x5c9d7a tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
../../source/gcc/cp/pt.c:20531
0x72c6bd tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
../../source/gcc/cp/pt.c:19873
0x72c6bd tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
../../source/gcc/cp/pt.c:19873
0x73e1e4 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
../../source/gcc/cp/pt.c:19266
0x73e1e4 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
../../source/gcc/cp/pt.c:18879
0x72f59b tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
../../source/gcc/cp/pt.c:17933
0x72f59b tsubst(tree_node*, tree_node*, int, tree_node*)
../../source/gcc/cp/pt.c:15349
0x72f1e3 tsubst(tree_node*, tree_node*, int, tree_node*)
../../source/gcc/cp/pt.c:15792
0x742ec2 tsubst_template_args(tree_node*, tree_node*, int, tree_node*)
../../source/gcc/cp/pt.c:13215
0x72c209 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
../../source/gcc/cp/pt.c:19333
0x73e1e4 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
../../source/gcc/cp/pt.c:19266
0x73e1e4 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
../../source/gcc/cp/pt.c:18879
0x647cf7 tsubst_constraint(tree_node*, tree_node*, int, tree_node*)
../../source/gcc/cp/constraint.cc:2402
0x647cf7 type_deducible_p
../../source/gcc/cp/constraint.cc:1932
0x64c23a tsubst_requires_expr(tree_node*, tree_node*, int, tree_node*)
../../source/gcc/cp/constraint.cc:2013
0x72c55e tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
../../source/gcc/cp/pt.c:20566
0x73e1e4 tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool,
bool)
../../source/gcc/cp/pt.c:19266
0x73e1e4 tsubst_expr(tree_node*, tree_node*, int, tree_node*, bool)
../../source/gcc/cp/pt.c:18879
0x64ba61 satisfy_constraint_r
../../source/gcc/cp/constraint.cc:2610
0x64bee8 satisfy_constraint
../../source/gcc/cp/constraint.cc:2692
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.


### EXPECTED OUTPUT:
(clean compile)


### COMPILER VERSION INFO (g++ -v):
Using built-in specs.
COLLECT_GCC=/opt/wandbox/gcc-head/bin/g++
COLLECT_LTO_WRAPPER=/opt/wandbox/gcc-head/libexec/gcc/x86_64-pc-linux-gnu/11.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../source/configure --prefix=/opt/wandbox/gcc-head
--enable-languages=c,c++ --disable-multilib --without-ppl --without-cloog-ppl
--enable-checking=release --disable-nls --enable-lto
LDFLAGS=-Wl,-rpath,/opt/wandbox/gcc-head/lib,-rpath,/opt/wandbox/gcc-head/lib64,-rpath,/opt/wandbox/gcc-head/lib32
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.0.0 20200906 (experimental) (GCC)

[Bug c++/96959] New: GCC allows ill-formed explicit capture of requires-expression local parameter

2020-09-07 Thread hstong at ca dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96959

Bug ID: 96959
   Summary: GCC allows ill-formed explicit capture of
requires-expression local parameter
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: accepts-invalid
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hstong at ca dot ibm.com
  Target Milestone: ---

A local parameter of a requires-expression is not a local entity (the name is
not introduced into a block scope and is, therefore, not a variable with
automatic storage duration). It is required that a simple-capture names a local
entity.

GCC appears to accept simple-captures that name local parameters of
requires-expressions.

Note: Even the IFNDR interpretation of this should not consider the concept to
be satisfied.

### SOURCE ():
template  concept C0 = true;

template 
concept C =
requires(T t) {
  [t] { };
};

static_assert(C);


### COMPILER INVOCATION:
g++ -fsyntax-only -std=c++20 -Wall -Wextra -pedantic-errors -xc++ -


### ACTUAL OUTPUT:
(clean compile)


### EXPECTED OUTPUT:
(error diagnostic)


### COMPILER VERSION INFO (g++ -v):
Using built-in specs.
COLLECT_GCC=/opt/wandbox/gcc-head/bin/g++
COLLECT_LTO_WRAPPER=/opt/wandbox/gcc-head/libexec/gcc/x86_64-pc-linux-gnu/11.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../source/configure --prefix=/opt/wandbox/gcc-head
--enable-languages=c,c++ --disable-multilib --without-ppl --without-cloog-ppl
--enable-checking=release --disable-nls --enable-lto
LDFLAGS=-Wl,-rpath,/opt/wandbox/gcc-head/lib,-rpath,/opt/wandbox/gcc-head/lib64,-rpath,/opt/wandbox/gcc-head/lib32
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.0.0 20200906 (experimental) (GCC)

[Bug c++/96961] New: ICE default lambda as non-type template with default argument

2020-09-07 Thread bastien.penavayre at epitech dot eu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96961

Bug ID: 96961
   Summary: ICE default lambda as non-type template with default
argument
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bastien.penavayre at epitech dot eu
  Target Milestone: ---

version:

g++ (Compiler-Explorer-Build) 11.0.0 20200906 (experimental)

error:

In instantiation of 'constexpr int in_template() [with auto X = main()::{}]':
16:25:   required from here
9:13: internal compiler error: tree check: expected function_type or
method_type, have integer_type in set_flags_from_callee, at cp/call.c:328
9 | return X();
  |~^~
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

code:

#include 

using sl = std::experimental::source_location;
constexpr int line(sl s = sl::current()) { return s.line(); }

template
constexpr int in_template() { return X(); }

template class p;

int main() { sizeof(p); }

[Bug analyzer/96962] New: [11 Regression] ICE in gimple_call_arg, at gimple.h:3256

2020-09-07 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96962

Bug ID: 96962
   Summary: [11 Regression] ICE in gimple_call_arg, at
gimple.h:3256
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: analyzer
  Assignee: dmalcolm at gcc dot gnu.org
  Reporter: asolokha at gmx dot com
  Target Milestone: ---
Target: x86_64-unknown-linux

gcc-11.0.0-alpha20200906 snapshot (g:23f8b90c401842afcbaa50a7fd3c2f37818f4396)
ICEs when compiling the following testcase w/ -mptwrite -fanalyzer:

void
t5 (unsigned long long b4)
{
  __builtin_ia32_ptwrite64 (b4);
}

% gcc-11.0.0 -mptwrite -fanalyzer -c hr2tho5k.c
during IPA pass: analyzer
hr2tho5k.c: In function 't5':
hr2tho5k.c:4:3: internal compiler error: in gimple_call_arg, at gimple.h:3256
4 |   __builtin_ia32_ptwrite64 (b4);
  |   ^
0x723b86 gimple_call_arg
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/gimple.h:3256
0x723c82 gimple_call_arg
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/tree.h:3295
0x723c82 ana::call_details::get_arg_tree(unsigned int) const
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/analyzer/region-model-impl-calls.cc:103
0x723c82 ana::call_details::get_arg_svalue(unsigned int) const
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/analyzer/region-model-impl-calls.cc:111
0x723c82 ana::region_model::impl_call_memset(ana::call_details const&)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/analyzer/region-model-impl-calls.cc:284
0x111c7fd ana::region_model::on_call_pre(gcall const*,
ana::region_model_context*)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/analyzer/region-model.cc:677
0x10fb2a4 ana::exploded_node::on_stmt(ana::exploded_graph&, ana::supernode
const*, gimple const*, ana::program_state*) const
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/analyzer/engine.cc:1083
0x10fc10d ana::exploded_graph::process_node(ana::exploded_node*)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/analyzer/engine.cc:2526
0x10fcbfa ana::exploded_graph::process_worklist()
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/analyzer/engine.cc:2341
0x10fed4f ana::impl_run_checkers(ana::logger*)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/analyzer/engine.cc:4107
0x10ff97c ana::run_checkers()
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/analyzer/engine.cc:4175
0x10f42f8 execute
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/analyzer/analyzer-pass.cc:84

[Bug rtl-optimization/96796] [9/10/11 Regression] aarch64: ICE during RTL pass: reload

2020-09-07 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96796

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:6001db79c477b03eacc7e7049560921fb54b7845

commit r11-3041-g6001db79c477b03eacc7e7049560921fb54b7845
Author: Richard Sandiford 
Date:   Mon Sep 7 20:15:36 2020 +0100

lra: Avoid cycling on certain subreg reloads [PR96796]

This PR is about LRA cycling for a reload of the form:

   

Changing pseudo 196 in operand 1 of insn 103 on equiv [r105:DI*0x8+r140:DI]
  Creating newreg=287, assigning class ALL_REGS to slow/invalid mem
r287
  Creating newreg=288, assigning class ALL_REGS to slow/invalid mem
r288
  103: r203:SI=r288:SI<<0x1+r196:DI#0
  REG_DEAD r196:DI
Inserting slow/invalid mem reload before:
  316: r287:DI=[r105:DI*0x8+r140:DI]
  317: r288:SI=r287:DI#0
   


The problem is with r287.  We rightly give it a broad starting class of
POINTER_AND_FP_REGS (reduced from ALL_REGS by preferred_reload_class).
However, we never make forward progress towards narrowing it down to
a specific choice of class (POINTER_REGS or FP_REGS).

I think in practice we rely on two things to narrow a reload pseudo's
class down to a specific choice:

(1) a restricted class is specified when the pseudo is created

This happens for input address reloads, where the class is taken
from the target's chosen base register class.  It also happens
for simple REG reloads, where the class is taken from the chosen
alternative's constraints.

(2) uses of the reload pseudo as a direct input operand

In this case get_reload_reg tries to reuse the existing register
and narrow its class, instead of creating a new reload pseudo.

However, neither occurs here.  As described above, r287 rightly
starts out with a wide choice of class, ultimately derived from
ALL_REGS, so we don't get (1).  And as the comments in the PR
explain, r287 is never used as an input reload, only the subreg is,
so we don't get (2):

   

 Choosing alt 13 in insn 317:  (0) r  (1) w {*movsi_aarch64}
  Creating newreg=291, assigning class FP_REGS to r291
  317: r288:SI=r291:SI
Inserting insn reload before:
  320: r291:SI=r287:DI#0
   


IMO, in this case we should rely on the reload of r316 to narrow
down the class of r278.  Currently we do:

   

 Choosing alt 7 in insn 316:  (0) r  (1) m {*movdi_aarch64}
  Creating newreg=289 from oldreg=287, assigning class GENERAL_REGS to
r289
  316: r289:DI=[r105:DI*0x8+r140:DI]
Inserting insn reload after:
  318: r287:DI=r289:DI
---

i.e. we create a new pseudo register r289 and give *that* pseudo
GENERAL_REGS instead.  This is because get_reload_reg only narrows
down the existing class for OP_IN and OP_INOUT, not OP_OUT.

But if we have a reload pseudo in a reload instruction and have chosen
a specific class for the reload pseudo, I think we should simply install
it for OP_OUT reloads too, if the class is a subset of the existing class.
We will need to pick such a register whatever happens (for r289 in the
example above).  And as explained in the PR, doing this actually avoids
an unnecessary move via the FP registers too.

The patch is quite aggressive in that it does this for all reload
pseudos in all reload instructions.  I wondered about reusing the
condition for a reload move in in_class_p:

  INSN_UID (curr_insn) >= new_insn_uid_start
  && curr_insn_set != NULL
  && ((OBJECT_P (SET_SRC (curr_insn_set))
   && ! CONSTANT_P (SET_SRC (curr_insn_set)))
  || (GET_CODE (SET_SRC (curr_insn_set)) == SUBREG
  && OBJECT_P (SUBREG_REG (SET_SRC (curr_insn_set)))
  && ! CONSTANT_P (SUBREG_REG (SET_SRC
(curr_insn_set)))

but I can't really justify that on first principles.  I think we
should apply the rule consistently until we have a specific reason
for doing otherwise.

gcc/
PR rtl-optimization/96796
* lra-constraints.c (in_class_p): Add a default-false
allow_all_reload_class_changes_p parameter.  Do not treat
reload moves specially when the parameter is true.
(get_reload_reg): Try to narrow the class of an existing OP_OUT
reload if we're reloading a reload pseudo in a reload instr

[Bug fortran/96711] Internal Compiler Error on NINT() Function

2020-09-07 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96711

--- Comment #15 from CVS Commits  ---
The master branch has been updated by Harald Anlauf :

https://gcc.gnu.org/g:9164caf25cb210ad0a69357b226e39913aff00d1

commit r11-3042-g9164caf25cb210ad0a69357b226e39913aff00d1
Author: Harald Anlauf 
Date:   Mon Sep 7 21:41:45 2020 +0200

PR fortran/96711 - ICE with NINT() for integer(16) result

When rounding a real to the nearest integer, temporarily convert the real
argument to a longer real kind when the result is of type/kind integer(16).

gcc/fortran/ChangeLog:

* trans-intrinsic.c (build_round_expr): Use temporary with
appropriate kind for conversion before rounding to nearest
integer when the result precision is 128 bits.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr96711.f90: New test.

[Bug target/96964] New: [nvptx] Implement __atomic_test_and_set

2020-09-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96964

Bug ID: 96964
   Summary: [nvptx] Implement __atomic_test_and_set
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Currently __atomic_test_and_set for nvptx falls back onto the "Failing all
else, assume a single threaded environment and simply perform the operation"
case in expand_atomic_test_and_set, so it doesn't map onto an actual atomic
operation.

So, for test-case test.c:
...
int a;

int
main (void)
{
  int res = __atomic_test_and_set (&a, __ATOMIC_SEQ_CST);
  return res;
}
...
we get:
...
$ gcc test.c -S -o-
// BEGIN PREAMBLE
.version3.1
.target sm_30
.address_size 64
// END PREAMBLE


// BEGIN GLOBAL FUNCTION DECL: main
.visible .func (.param.u32 %value_out) main (.param.u32 %in_ar0, .param.u64
%in_ar1);

// BEGIN GLOBAL FUNCTION DEF: main
.visible .func (.param.u32 %value_out) main (.param.u32 %in_ar0, .param.u64
%in_ar1)
{
.reg.u32 %value;
.local .align 16 .b8 %frame_ar[16];
.reg.u64 %frame;
cvta.local.u64 %frame, %frame_ar;
.reg.u32 %r22;
.reg.u32 %r23;
.reg.u32 %r24;
.reg.u32 %r25;
.reg.u32 %r26;
ld.global.u8%r25, [a];
mov.u32 %r26, 1;
st.global.u8[a], %r26;
cvt.u32.u8  %r22, %r25;
st.u32  [%frame], %r22;
ld.u32  %r23, [%frame];
mov.u32 %r24, %r23;
mov.u32 %value, %r24;
st.param.u32[%value_out], %value;
ret;
}


// BEGIN GLOBAL VAR DEF: a
.visible .global .align 4 .u32 a[1];
...

[Bug tree-optimization/96963] New: -Wstringop-overflow false positive on -O3 or -O2 -ftree-vectorize when assigning consecutive char struct members

2020-09-07 Thread gcc_bugzilla at venus dot thegavinli.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96963

Bug ID: 96963
   Summary: -Wstringop-overflow false positive on -O3 or -O2
-ftree-vectorize when assigning consecutive char
struct members
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gcc_bugzilla at venus dot thegavinli.com
  Target Milestone: ---

Created attachment 49194
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49194&action=edit
Test case that triggers bug

Versions affected: 10.2.0 and git master (commit 703bc188f4)

System type: both x86_64-pc-linux-gnu and arm-none-eabi affected

Compilation flags: I tested a couple compilers with the same result:
1) Arch Linux toolchain, configured with: /build/gcc/src/gcc/configure
--prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --with-isl
--with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit
--enable-cet=auto --enable-checking=release --enable-clocale=gnu
--enable-default-pie --enable-default-ssp --enable-gnu-indirect-function
--enable-gnu-unique-object --enable-install-libiberty --enable-linker-build-id
--enable-lto --enable-multilib --enable-plugin --enable-shared
--enable-threads=posix --disable-libssp --disable-libstdcxx-pch
--disable-libunwind-exceptions --disable-werror
gdc_include_dir=/usr/include/dlang/gdc
2) Built from git, configured with: ./configure --prefix=/tmp/gcc-master
--disable-multilib

Command lines that trigger bug:
$ gcc sample.c -c -O3
$ gcc sample.c -c -O2 -ftree-vectorize

Code that triggers bug:

struct foo {
int i;
char a;
char b;
};

void heh(char *);

void clr(struct foo *f) {
heh(&f->a);
f->a = 0;
f->b = 0;
}

Output of compiler:

sample.c: In function ‘clr’:
sample.c:11:10: warning: writing 2 bytes into a region of size 1
[-Wstringop-overflow=]
   11 | f->a = 0;
  | ~^~~
sample.c:3:10: note: at offset 0 to object ‘a’ with size 1 declared here
3 | char a;
  |  ^

-fdump-tree-optimized:

;; Function clr (clr, funcdef_no=0, decl_uid=1936, cgraph_uid=1,
symbol_order=0)

clr (struct foo * f)
{
  char * _1;

   [local count: 1073741824]:
  _1 = &f_2(D)->a;
  heh (_1);
  MEM  [(char *)f_2(D) + 4B] = { 0, 0 };
  return;

}

[Bug target/96964] [nvptx] Implement __atomic_test_and_set

2020-09-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96964

--- Comment #1 from Tom de Vries  ---
This is an attempt to implement it by using a fallback in libatomic (see also
PR96898):
...
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 4168190fa42..612240661f8 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -54,6 +54,7 @@
UNSPECV_LOCK
UNSPECV_CAS
UNSPECV_XCHG
+   UNSPECV_TAS
UNSPECV_BARSYNC
UNSPECV_MEMBAR
UNSPECV_MEMBAR_CTA
@@ -1667,6 +1668,35 @@
   "%.\\tatom%A1.b%T0.\\t%0, %1, %2;"
   [(set_attr "atomic" "true")])

+(define_insn "atomic_test_and_set"
+  [(set (match_operand:QI 0 "nvptx_register_operand" "=R")
+(unspec_volatile:QI
+  [(match_operand:QI 1 "memory_operand" "+m")
+  (match_operand:SI 2 "const_int_operand") ;; model
+ ]
+  UNSPECV_TAS))
+   (set (match_dup 1)
+(unspec_volatile:QI [(match_dup 1)] UNSPECV_TAS))]
+  ""
+  { operands[1] = XEXP (operands[1], 0);
+return
+  "// BEGIN GLOBAL FUNCTION DECL: __atomic_test_and_set_1\n"
+  ".extern .func (.param .u32 %%value_out)"
+  " __atomic_test_and_set_1 (.param .u64 %%in_ar0, .param .u32
%%in_ar1);\n"
+  "{\n"
+  " .param .u32 %%value_in;\n"
+  " .param .u64 %%out_arg1;\n"
+  " .reg.u64 %%ptr;\n"
+  " cvta.global.u64 %%ptr, %1;\n"
+  " st.param.u64 [%%out_arg1],%%ptr;\n"
+  " .param .u32 %%out_arg2;\n"
+  " st.param.u32 [%%out_arg2],%2;\n"
+  " call (%%value_in),__atomic_test_and_set_1,(%%out_arg1,%%out_arg2);\n"
+  " ld.param.u32 %0,[%%value_in];\n"
+  "}";
+  }
+[(set_attr "atomic" "true")])
+
 (define_insn "nvptx_barsync"
   [(unspec_volatile [(match_operand:SI 0 "nvptx_nonmemory_operand" "Ri")
 (match_operand:SI 1 "const_int_operand")]
...

Funnily enough, doing this has the side-effect that the fallback
__atomic_test_and_set_1 in libatomic is fixed.

[Bug libstdc++/96942] std::pmr::monotonic_buffer_resource causes CPU cache misses

2020-09-07 Thread dmitriy.ovdienko at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942

--- Comment #7 from Dmitriy Ovdienko  ---
Following are CPU counters for single threaded code. Pre-allocation is enabled.
Memory pool is created inside the loop.

```cpp
  int poolSize(int depth)
  {
return (1 << (depth + 1)) * sizeof(Node);
  }

  int count = 0;
  for(int i = 0; i < 20; ++i)
  {
MemoryPool store (poolSize(stretch_depth));

Node *c = make(stretch_depth, store);
count += c->check();
store.release();
  }
```

Depth = 21, Pool size = 134,217,728 Bytes

|   |PMR |Malloc |
|---||---|
| cache-references  | 60,180,483 |60,205,187 |
| cache-misses  | 50,288,765 |50,426,418 |
| cycles|  7,587,314,879 | 6,076,106,356 |
| instructions  | 14,347,088,112 | 8,138,591,245 |
| branches  |  2,224,641,671 | 1,550,701,277 |
| branch-misses |  8,074,211 | 7,307,996 |
| faults|655,503 |   655,485 |
| migrations|  1 | 2 |
| time elapsed, sec |   2.16 |  1.75 |
| time (user, sec)  |   1.46 |  1.01 |
| time (sys, sec)   |   0.69 |  0.73 |

Depth = 18, Pool size = 16,777,216 Bytes

|   |   PMR |  Malloc |
|---|---|-|
| cache-references  | 8,186,788 |   3,450,642 |
| cache-misses  | 6,504,691 |   1,592,945 |
| cycles|   992,526,559 | 472,979,689 |
| instructions  | 1,806,230,679 | 766,527,818 |
| branches  |   279,352,274 | 151,353,530 |
| branch-misses | 1,072,404 | 474,648 |
| faults|82,063 |   8,314 |
| migrations| 0 |   0 |
| time elapsed, sec |  0.28 |0.14 |
| time (user, sec)  |  0.17 |0.13 |
| time (sys, sec)   |  0.11 |0.01 |

Depth = 17, Pool size: 8,388,608 Bytes

|   | PMR |  Malloc |
|---|-|-|
| cache-references  |   1,624,992 |   1,707,061 |
| cache-misses  | 867,310 | 718,011 |
| cycles| 312,687,116 | 255,951,365 |
| instructions  | 765,410,795 | 389,671,180 |
| branches  | 118,619,222 |  74,686,565 |
| branch-misses | 272,286 | 263,916 |
| faults|   4,221 |   4,219 |
| migrations|   0 |   0 |
| time elapsed, sec |0.10 |0.08 |
| time (user, sec)  |0.10 |0.07 |
| time (sys, sec)   |0.00 |0.01 |

[Bug libstdc++/96942] std::pmr::monotonic_buffer_resource causes CPU cache misses

2020-09-07 Thread dmitriy.ovdienko at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942

--- Comment #8 from Dmitriy Ovdienko  ---
Same as above for Depth = 19

|   |   PMR |Malloc |
|---|---|---|
| cache-references  |16,571,923 |16,260,256 |
| cache-misses  |13,576,560 |13,197,813 |
| cycles| 2,000,406,391 | 1,566,192,030 |
| instructions  | 3,566,826,120 | 2,021,390,552 |
| branches  |   554,315,206 |   386,308,307 |
| branch-misses | 1,832,371 | 1,790,865 |
| faults|   163,983 |   163,966 |
| migrations| 0 | 0 |
| time elapsed, sec |  0.58 |  0.45 |
| time (user, sec)  |  0.40 |  0.24 |
| time (sys, sec)   |  0.17 |  0.21 |

[Bug analyzer/96962] [11 Regression] ICE in gimple_call_arg, at gimple.h:3256

2020-09-07 Thread dmalcolm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96962

David Malcolm  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2020-09-07
 Ever confirmed|0   |1

--- Comment #1 from David Malcolm  ---
Thanks for filing this; confirmed.

[Bug analyzer/96950] ICE in apply_ctor_val_to_range, at analyzer/store.cc:475

2020-09-07 Thread dmalcolm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96950

David Malcolm  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2020-09-07

--- Comment #1 from David Malcolm  ---
Thanks for filing this, confirmed (on x86_64 fwiw).

[Bug rtl-optimization/96965] New: combine RMW and flags

2020-09-07 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96965

Bug ID: 96965
   Summary: combine RMW and flags
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: segher at gcc dot gnu.org
  Reporter: aoliva at gcc dot gnu.org
  Target Milestone: ---

Consider:

typedef unsigned char T;
T i[2];
int f() {
  T *p = &i[0], *q = &i[1];
  T c = __builtin_add_overflow(*p, 1, p);
  *q += c;
}

The desired code sequence on x86_64 is:

  addb $1, i(%rip)
  adcb $0, i+1(%rip)

What we get instead of the desired addb are separate load, addb, and store
instructions.  There are two reasons why we don't combine them to form the
addb:

- when we try_combine the 3 of them, the flag-store insn is still present,
between M (add) and W (store), thus can_combine_p fails.  after we combine the
flag-store into adcb, we do not retry

- if I manually force the retry, we break up the M parallel insn into a naked
add in i2, and a flag-setting non-canonical compare in i0.  we substitute R and
M into W, for an add without flag-setting.  finally, we deal with added_sets,
building a new parallel to hold the RMW add and appending the flag-setter as
the second item, after the combined add.  alas, recog won't match them in this
order.  *add3_cc_overflow_1 requires the flag-setter before the
reg-setter.

Here's discussion and combine dumbs from a slightly different testcase that
triggers the same combine behavior:

https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553242.html

[Bug analyzer/96949] ICE in get_bit_offset, at analyzer/analyzer.h:164

2020-09-07 Thread dmalcolm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96949

David Malcolm  changed:

   What|Removed |Added

   Last reconfirmed||2020-09-07
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from David Malcolm  ---
Thanks for filing this.  Confirmed (on x86_64)

[Bug rtl-optimization/96965] combine RMW and flags

2020-09-07 Thread aoliva at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96965

--- Comment #1 from Alexandre Oliva  ---
One nit: I wrote the flag-setting non-canonical compare ended up in i0, but it
actually becomes i1, with the original i1 (R) moved to i0.

[Bug target/96898] [nvptx] libatomic support

2020-09-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898

--- Comment #6 from Tom de Vries  ---
Created attachment 49195
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49195&action=edit
Tentative patch

Introduces an option -fatomic-libcalls (analogous to -fsync-libcalls) such that
__atomic_test_and_set maps onto libatomic function __atomic_test_and_set_1.

[Bug target/96898] [nvptx] libatomic support

2020-09-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898

--- Comment #7 from Tom de Vries  ---
(In reply to Tom de Vries from comment #6)
> Created attachment 49195 [details]
> Tentative patch
> 
> Introduces an option -fatomic-libcalls (analogous to -fsync-libcalls) such
> that __atomic_test_and_set maps onto libatomic function
> __atomic_test_and_set_1.

I've now achieved the same in the target, not relying on a new option:
...
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 4168190fa42..6178e6a0f77 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1667,6 +1667,22 @@
   "%.\\tatom%A1.b%T0.\\t%0, %1, %2;"
   [(set_attr "atomic" "true")])

+(define_expand "atomic_test_and_set"
+  [(match_operand:SI 0 "nvptx_register_operand")   ;; output
+   (match_operand:QI 1 "memory_operand")   ;; memory
+   (match_operand:SI 2 "const_int_operand")]   ;; model
+  ""
+{
+  rtx libfunc;
+  rtx addr;
+  libfunc = init_one_libfunc ("__atomic_test_and_set_1");
+  addr = convert_memory_address (ptr_mode, XEXP (operands[1], 0));
+  emit_library_call_value (libfunc, operands[0], LCT_NORMAL, SImode,
+ addr, ptr_mode,
+ operands[2], SImode);
+  DONE;
+})
+
 (define_insn "nvptx_barsync"
   [(unspec_volatile [(match_operand:SI 0 "nvptx_nonmemory_operand" "Ri")
 (match_operand:SI 1 "const_int_operand")]
...

[Bug tree-optimization/96963] -Wstringop-overflow false positive on -O3 or -O2 -ftree-vectorize when assigning consecutive char struct members

2020-09-07 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96963

Martin Sebor  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Keywords||diagnostic
   Last reconfirmed||2020-09-07
 Ever confirmed|0   |1
 Blocks||88443
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=93200
 CC||msebor at gcc dot gnu.org

--- Comment #1 from Martin Sebor  ---
Confirmed.  The vectorizer replaces the two character assignments with a store
into f->a.  The fix for pr93200 added a hack to handle some these cases but not
this one.  This instance of the warning is issued from the strlen pass whose
dump shows the cause of the problem:

$ gcc -O3 -S -Wall -fdump-tree-strlen=/dev/stdout pr96963.c

;; Function clr (clr, funcdef_no=0, decl_uid=1937, cgraph_uid=1,
symbol_order=0)

;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2
;; 2 succs { 1 }
pr96963.c: In function ‘clr’:
pr96963.c:11:14: warning: writing 2 bytes into a region of size 1
[-Wstringop-overflow=]
   11 | f->a = 0;
  | ~^~~
pr96963.c:3:14: note: at offset 0 to object ‘a’ with size 1 declared here
3 | char a;
  |  ^
clr (struct foo * f)
{
  vector(2) char * vectp.4;
  vector(2) char * vectp_f.3;
  char * _1;

   [local count: 1073741824]:
  _1 = &f_2(D)->a;
  heh (_1);
  MEM  [(char *)_1] = { 0, 0 };   <<< _1 points to f_2(D)->a
  return;
}


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88443
[Bug 88443] [meta-bug] bogus/missing -Wstringop-overflow warnings

[Bug tree-optimization/96966] New: redundant memcpy not eliminated after pointer subtraction

2020-09-07 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96966

Bug ID: 96966
   Summary: redundant memcpy not eliminated after pointer
subtraction
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: msebor at gcc dot gnu.org
  Target Milestone: ---

The second call to memcpy and mempcpy, respectively, is redundant in each of
the functions below and can be eliminated.  GCC doesn't notice the redundancy
and emits both pairs of copies.  Clang emits just one call to memcpy but it too
fails to eliminate the redundant call to mempcpy.

$ cat z.c && gcc -O2 -S -o /dev/stdout z.c
extern char a[32];

void f (const void *s)
{
  char *p = (char*)__builtin_memcpy (a, s, 16) + 16;
  __builtin_memcpy (p - 16, s, 16);
}


void g (const void *s)
{
  char *p = (char*)__builtin_mempcpy (a, s, 16);
  __builtin_mempcpy (p - 16, s, 16);
}
.file   "z.c"
.text
.p2align 4
.globl  f
.type   f, @function
f:
.LFB0:
.cfi_startproc
movdqu  (%rdi), %xmm0
movups  %xmm0, a(%rip)
movdqu  (%rdi), %xmm1
movups  %xmm1, a(%rip)
ret
.cfi_endproc
.LFE0:
.size   f, .-f
.p2align 4
.globl  g
.type   g, @function
g:
.LFB3:
.cfi_startproc
movdqu  (%rdi), %xmm0
movups  %xmm0, a(%rip)
movdqu  (%rdi), %xmm1
movups  %xmm1, a(%rip)
ret
.cfi_endproc
.LFE3:
.size   g, .-g
.ident  "GCC: (GNU) 11.0.0 20200902 (experimental)"
.section.note.GNU-stack,"",@progbits

[Bug tree-optimization/96966] [8/9/10/11 Regression] redundant memcpy not eliminated after pointer subtraction

2020-09-07 Thread msebor at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96966

Martin Sebor  changed:

   What|Removed |Added

Summary|redundant memcpy not|[8/9/10/11 Regression]
   |eliminated after pointer|redundant memcpy not
   |subtraction |eliminated after pointer
   ||subtraction
  Known to work||8.1.0, 8.2.0
  Known to fail||10.2.0, 11.0, 7.3.0, 8.3.0,
   ||9.2.0
   Keywords||missed-optimization

--- Comment #1 from Martin Sebor  ---
According to Godbolt, GCC 8.1 and 8.2 emit optimal code for both functions but
GCC 8.3 emits the less optimal code for f and has g jump to it.  Starting with
10.1, GCC emits the same suboptimal code for both functions.

[Bug tree-optimization/96967] New: [11 Regression] ICE in decompose, at wide-int.h:984

2020-09-07 Thread asolokha at gmx dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96967

Bug ID: 96967
   Summary: [11 Regression] ICE in decompose, at wide-int.h:984
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: asolokha at gmx dot com
  Target Milestone: ---

gcc-11.0.0-alpha20200906 snapshot (g:23f8b90c401842afcbaa50a7fd3c2f37818f4396)
ICEs when compiling the following testcase w/ -O2 -fshort-enums:

enum re {
  o3,
};

int
uj (int mq, enum re dn)
{
  enum re nr = mq;

  switch (nr)
{
case 4:
  if (dn == 0)
goto wdev_inactive_unlock;
  break;

default:
  break;
}

  switch (nr)
{
case 0:
case 4:
  return 0;

default:
  break;
}

 wdev_inactive_unlock:
  return 1;
}

% gcc-11.0.0 -O2 -fshort-enums -c cvihn9ij.c
during GIMPLE pass: vrp
cvihn9ij.c: In function 'uj':
cvihn9ij.c:6:1: internal compiler error: in decompose, at wide-int.h:984
6 | uj (int mq, enum re dn)
  | ^~
0x70e4c0 wi::int_traits >
>::decompose(long*, unsigned int, generic_wide_int > const&)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/wide-int.h:984
0x1070b85 wi::int_traits >
>::decompose(long*, unsigned int, generic_wide_int > const&)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/wide-int.h:1931
0x1070b85 wide_int_ref_storage::wide_int_ref_storage > >(generic_wide_int > const&,
unsigned int)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/wide-int.h:1034
0x1070b85 generic_wide_int
>::generic_wide_int >
>(generic_wide_int > const&, unsigned int)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/wide-int.h:790
0x1070b85 bool wi::ltu_p >,
generic_wide_int >
>(generic_wide_int > const&,
generic_wide_int > const&)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/wide-int.h:1935
0x106ed94 bool wi::lt_p >,
generic_wide_int >
>(generic_wide_int > const&,
generic_wide_int > const&, signop)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/wide-int.h:1961
0x106ed94 irange::irange_intersect(irange const&)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/value-range.cc:1701
0x106f049 irange::intersect(irange const*)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/value-range.cc:1539
0x1029e02 find_case_label_range(gswitch*, irange const*)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/tree-vrp.c:3831
0xf88a63 simplify_control_stmt_condition
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/tree-ssa-threadedge.c:554
0xf89840 thread_through_normal_block
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/tree-ssa-threadedge.c:1101
0xf8b50d thread_through_normal_block
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/tree-ssa-threadedge.c:1302
0xf8b50d thread_across_edge
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/tree-ssa-threadedge.c:1259
0xf8b7cf thread_outgoing_edges(basic_block_def*, gcond*, const_and_copies*,
avail_exprs_stack*, evrp_range_analyzer*, tree_node* (*)(gimple*, gimple*,
avail_exprs_stack*, basic_block_def*))
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/tree-ssa-threadedge.c:1463
0x1024748 vrp_dom_walker::after_dom_children(basic_block_def*)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/tree-vrp.c:4286
0x16d8687 dom_walker::walk(basic_block_def*)
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/domwalk.c:352
0x10337f1 identify_jump_threads
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/tree-vrp.c:4341
0x10337f1 execute_vrp
   
/var/tmp/portage/sys-devel/gcc-11.0.0_alpha20200906/work/gcc-11-20200906/gcc/tree-vrp.c:4480

[Bug target/87767] Missing AVX512 memory broadcast for constant vector

2020-09-07 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

Hongtao.liu  changed:

   What|Removed |Added

 CC||vmakarov at redhat dot com

--- Comment #15 from Hongtao.liu  ---
(In reply to Jakub Jelinek from comment #12)
> What I mean is that we should try to simplify the md file, instead of adding
> hundreds of new *_bcst patterns.
> We have e.g.
> (define_insn "*3"
>   [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
> (plusminus:VI_AVX2
>   (match_operand:VI_AVX2 1 "vector_operand" "0,v")
>   (match_operand:VI_AVX2 2 "vector_operand" "xBm,vm")))]
>   "TARGET_SSE2 && ix86_binary_operator_ok (, mode, operands)"
>   "@
>p\t{%2, %0|%0, %2}
>vp\t{%2, %1, %0|%0, %1, %2}"
>   [(set_attr "isa" "noavx,avx")
>(set_attr "type" "sseiadd")
>(set_attr "prefix_data16" "1,*")
>(set_attr "prefix" "orig,vex")
>(set_attr "mode" "")])
> 
> (define_insn "*sub3_bcst"
>   [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
> (minus:VI48_AVX512VL
>   (match_operand:VI48_AVX512VL 1 "register_operand" "v")
>   (vec_duplicate:VI48_AVX512VL
> (match_operand: 2 "memory_operand" "m"]
>   "TARGET_AVX512F && ix86_binary_operator_ok (MINUS, mode, operands)"
>   "vpsub\t{%2, %1, %0|%0, %1, %2}"
>   [(set_attr "type" "sseiadd")
>(set_attr "prefix" "evex")
>(set_attr "mode" "")])
> 
> What I meant is we could have just:
> (define_insn "*3"
>   [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
> (plusminus:VI_AVX2
>   (match_operand:VI_AVX2 1 "vector_bcst_operand" "0,v")
>   (match_operand:VI_AVX2 2 "vector_bcst_operand" "xBm,vBb")))]
>   "TARGET_SSE2 && ix86_binary_operator_ok (, mode, operands)"
>   "@
>p\t{%2, %0|%0, %2}
>vp\t{%2, %1, %0|%0, %1, %2}"
>   [(set_attr "isa" "noavx,avx")
>(set_attr "type" "sseiadd")
>(set_attr "prefix_data16" "1,*")
>(set_attr "prefix" "orig,vex")
>(set_attr "mode" "")])
> where vector_bcst_operand is either vector_operand, or for TARGET_AVX512F
> a VEC_DUPLICATE of the right mode with a MEM inside of it with the element
> mode of the VEC_DUPLICATE mode, similarly Bb constraint is either m, or for
> TARGET_AVX512F also again the VEC_DUPLICATE with MEM inside of it, and that
> ix86_binary_operator_ok would treat a VEC_DUPLICATE wrapping MEM the same as
> MEM (in particular ensure one e.g. doesn't have one VEC_DUPLICATE and one
> MEM operand, or two VEC_DUPLICATE operands) and that the output code would
> handle emitting an operand with VEC_DUPLICATE of a MEM properly.
> Or perhaps the constraint there could be just for the broadcast and one
> could write vmBb.  Still, I think the predicate needs to be accurate, i.e.
> for some instructions we want e.g. vector_operand or TARGET_AVX512F and
> bcst_mem_operand,
> for others vector_operand or TARGET_AVX512VL and bcst_mem_operand etc.
> 
> Anyway, if we go down this route, might be best to handle just a couple of
> patterns, then ask for review and see what Kirill (or if Uros would be
> interested) think about it and only later convert more.

Hi Vladimir Makarov: 
  I saw you add DEFINE_SPECIAL_MEMORY_CONSTRAINT in PR69299, currently we
encounter a similar problem as PR69299, we want to add
special_memory_constraint for broadcast memory operand(call it bcst_mem_operand
later), but problem is bcst_mem_operand is not MEM_P, it's like
(vec_duplicate:V4SF (mem:SF (reg:...))), so pass_reload can't properly handle
this constraint(it alway assumes the operand should be MEM_P). So the question
is can we enhance the handling of special_memory_constraint, not only
restricted to MEM_P, but also for operand containing a memory_operand
inside(i.e. bcst_mem_operand).

[Bug target/96933] rs6000: inefficient code for char/short vec CTOR

2020-09-07 Thread linkw at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96933

--- Comment #8 from Kewen Lin  ---
(In reply to Segher Boessenkool from comment #7)
> There are vmrglb and vrghb etc.?

But these are only for low/high part separately, with mtvsrdd both low/high
parts (doubleword) have the values, we don't have Vector Merge Even/Odd for
char or short to merge them. Now I used one artificial control vector for the
merging, correct me if I miss something.

[Bug target/96968] New: aarch64 : ICE in vregs pass lowering __builtin_aarch64_get_fpcr

2020-09-07 Thread iains at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96968

Bug ID: 96968
   Summary: aarch64 : ICE in vregs pass lowering
__builtin_aarch64_get_fpcr
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: iains at gcc dot gnu.org
  Target Milestone: ---

Testcase from FX:

int main (void) {
  unsigned int fpcr;
  fpcr = __builtin_aarch64_get_fpcr ();
}

gcc11 / master r11-3035.

$ ./gcc/xgcc -Bgcc ~/fpcr-test-a.c -S
.../fpcr-test-a.c: In function ‘main’:
.../fpcr-test-a.c:7:1: error: unrecognizable insn:
7 | }
  | ^
(insn 5 2 6 2 (set (mem/c:SI (plus:DI (reg/f:DI 87 virtual-stack-vars)
(const_int -4 [0xfffc])) [1 fpcr+0 S4 A32])
(unspec_volatile:SI [
(const_int 0 [0])
] UNSPECV_GET_FPCR)) ".../fpcr-test-a.c":5:10 -1
 (nil))
during RTL pass: vregs
.../fpcr-test-a.c:7:1: internal compiler error: in extract_insn, at
recog.c:2294
0x61cf73 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
../../src/gcc/rtl-error.c:108
0x61cfa7 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../../src/gcc/rtl-error.c:116
0xbdd5ef extract_insn(rtx_insn*)
../../src/gcc/recog.c:2294
0x96407b instantiate_virtual_regs_in_insn
../../src/gcc/function.c:1607
0x96407b instantiate_virtual_regs
../../src/gcc/function.c:1977
0x96407b execute
../../src/gcc/function.c:2026

[Bug target/96968] aarch64 : ICE in vregs pass lowering __builtin_aarch64_get_fpcr

2020-09-07 Thread iains at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96968

Iain Sandoe  changed:

   What|Removed |Added

   Keywords||ice-on-valid-code
 Target||aarch64-linux-gnu,
   ||aarch64-darwin
 Ever confirmed|0   |1
   Last reconfirmed||2020-09-08
 Status|UNCONFIRMED |NEW

[Bug debug/93865] .debug_line with LTO refers to bogus file-names

2020-09-07 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93865

--- Comment #4 from rguenther at suse dot de  ---
On Mon, 7 Sep 2020, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93865
> 
> Jakub Jelinek  changed:
> 
>What|Removed |Added
> 
>  CC||jakub at gcc dot gnu.org
> 
> --- Comment #3 from Jakub Jelinek  ---
> I guess this isn't only about the main source files, but about any includes 
> (if
> they are relative, not absolute).
> Perhaps when streaming out lto we should stream for each TU also the
> get_src_pwd () string, and in canon_file_name in lto-streamer-in.c take into
> account the src pwd read from the current TU vs. get_src_pwd () for the LTO
> link.
> If they are the same, don't do any changes, similarly for absolute paths no
> difference, otherwise canonicalize relative paths for the difference in the
> paths.

Something like that - but note that there's another "copy" of .debug_line
in the early debug data (but with "correct" CWD).  Note one "natural" 
place to stream the CWD string is attached to the TRANSLATION_UNIT_DECL,
but then streaming the original location file and intending to massage
it later is probably awkward which means the easiest fix would certainly
be to canonicalize / concat CWD and location file at location stream out
time.  The question is whether we want to "undo" / re-canonicalize
any of that afterwards (and what "CWD" to use for the LTRANS .debug_line).

[Bug libstdc++/96958] Long Double in Hash Table policy forces soft-float calculations

2020-09-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96958

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization

--- Comment #2 from Richard Biener  ---
IMHO _any_ FP calculation in that spot is unwanted (but _M_max_load_factor is a
FP value?)

[Bug analyzer/96962] [11 Regression] ICE in gimple_call_arg, at gimple.h:3256

2020-09-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96962

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug tree-optimization/96963] -Wstringop-overflow false positive on -O3 or -O2 -ftree-vectorize when assigning consecutive char struct members

2020-09-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96963

--- Comment #2 from Richard Biener  ---
store-merging will also happily store a (short)0 there, but likely runs after
strlen.

[Bug target/96964] [nvptx] Implement __atomic_test_and_set

2020-09-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96964

--- Comment #2 from Tom de Vries  ---
Patch submitted:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553393.html

[Bug target/96898] [nvptx] libatomic support

2020-09-07 Thread vries at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96898

--- Comment #8 from Tom de Vries  ---
Patch submitted:
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553393.html

[Bug target/96968] aarch64 : ICE in vregs pass lowering __builtin_aarch64_get_fpcr

2020-09-07 Thread fxcoudert at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96968

Francois-Xavier Coudert  changed:

   What|Removed |Added

 CC||fxcoudert at gcc dot gnu.org

--- Comment #1 from Francois-Xavier Coudert  ---
Exact same issue also occurs with __builtin_aarch64_get_fpsr().

With __builtin_aarch64_get_fpsr64() and __builtin_aarch64_get_fpcr64(), I get
"error: unrecognizable insn" at -O0, and "internal compiler error: Segmentation
fault" at -O1.

[Bug c++/96957] No name-lookup into base class when using an non dependent base class via template alias with dummy parameter.

2020-09-07 Thread anders.granlund.0 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96957

--- Comment #2 from Anders Granlund  ---
Correction to my first comment:

"GCC is correctly compiling the program with no error messages outputed. See
the discussion in: https://bugs.llvm.org/show_bug.cgi?id=47435";

should be:

"Clang is correctly rejecting the program with a error message outputed. See
the discussion in: https://bugs.llvm.org/show_bug.cgi?id=47435";

[Bug target/96955] Implement __builtin_thread_pointer for x86 TLS

2020-09-07 Thread crazylht at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96955

--- Comment #2 from Hongtao.liu  ---
Do we also need "__builtin_set_thread_pointer" ?

[Bug c++/96884] Missing diagnostics when applying the member operator on this in class template

2020-09-07 Thread anders.granlund.0 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96884

Anders Granlund  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Anders Granlund  ---
Yes. You are correct this is not a bug. The standard allows both possibilities
in this case.