Re: [PATCH] PR tree-optimization/101403: Incorrect folding of ((T)bswap(x))>>C

2021-07-12 Thread Jakub Jelinek via Gcc-patches
On Sun, Jul 11, 2021 at 10:48:17AM +0100, Roger Sayle wrote:
> /* { dg-do run } */
> /* { dg-options "-O2" } */
> unsigned int foo (unsigned int a)
> {
>   unsigned int u;

Can you please change the above line to
  unsigned int u = 0;
or add some other initializer,
or make it
  static unsigned int u;
?
With all those the testcase is still miscompiled without your patch,
but it doesn't use an indeterminate value in the comma expression's
lhs operand.

>   unsigned short b = __builtin_bswap16 (a);
>   return b >> (u, 12);
> }
> 
> int main (void)
> {
>   unsigned int x = foo (0x80);
>   if (x != 0x0008)
> __builtin_abort ();
>   return 0;
> }
> 


Jakub



Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-12 Thread Richard Sandiford via Gcc-patches
Martin Jambor  writes:
> On Thu, Jul 08 2021, Qing Zhao wrote:
>> (Resend this email since the previous one didn’t quote, I changed one
>> setting in my mail client, hopefully that can fix this issue).
>>
>> Hi, Martin,
>>
>> Thank you for the review and comment.
>>
>>> On Jul 8, 2021, at 8:29 AM, Martin Jambor  wrote:
 diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
 index c05d22f3e8f1..35051d7c6b96 100644
 --- a/gcc/tree-sra.c
 +++ b/gcc/tree-sra.c
 @@ -384,6 +384,13 @@ static struct
 
   /* Numbber of components created when splitting aggregate parameters.  */
   int param_reductions_created;
 +
 +  /* Number of deferred_init calls that are modified.  */
 +  int deferred_init;
 +
 +  /* Number of deferred_init calls that are created by
 + generate_subtree_deferred_init.  */
 +  int subtree_deferred_init;
 } sra_stats;
 
 static void
 @@ -4096,6 +4103,110 @@ get_repl_default_def_ssa_name (struct access 
 *racc, tree reg_type)
   return get_or_create_ssa_default_def (cfun, racc->replacement_decl);
 }
 
 +
 +/* Generate statements to call .DEFERRED_INIT to initialize scalar 
 replacements
 +   of accesses within a subtree ACCESS; all its children, siblings and 
 their
 +   children are to be processed.
 +   GSI is a statement iterator used to place the new statements.  */
 +static void
 +generate_subtree_deferred_init (struct access *access,
 +  tree init_type,
 +  tree is_vla,
 +  gimple_stmt_iterator *gsi,
 +  location_t loc)
 +{
 +  do
 +{
 +  if (access->grp_to_be_replaced)
 +  {
 +tree repl = get_access_replacement (access);
 +gimple *call
 +  = gimple_build_call_internal (IFN_DEFERRED_INIT, 3,
 +TYPE_SIZE_UNIT (TREE_TYPE (repl)),
 +init_type, is_vla);
 +gimple_call_set_lhs (call, repl);
 +gsi_insert_before (gsi, call, GSI_SAME_STMT);
 +update_stmt (call);
 +gimple_set_location (call, loc);
 +sra_stats.subtree_deferred_init++;
 +  }
 +  else if (access->grp_to_be_debug_replaced)
 +  {
 +tree drepl = get_access_replacement (access);
 +tree call = build_call_expr_internal_loc
 +   (UNKNOWN_LOCATION, IFN_DEFERRED_INIT,
 +TREE_TYPE (drepl), 3,
 +TYPE_SIZE_UNIT (TREE_TYPE (drepl)),
 +init_type, is_vla);
 +gdebug *ds = gimple_build_debug_bind (drepl, call,
 +  gsi_stmt (*gsi));
 +gsi_insert_before (gsi, ds, GSI_SAME_STMT);
>>> 
>>> Is handling of grp_to_be_debug_replaced accesses necessary here?  If so,
>>> why?  grp_to_be_debug_replaced accesses are there only to facilitate
>>> debug information about a part of an aggregate decl is that is likely
>>> going to be entirely removed - so that debuggers can sometimes show to
>>> users information about what they would contain had they not removed.
>>> It seems strange you need to mark them as uninitialized because they
>>> should not have any consumers.  (But perhaps it is also harmless.)
>>
>> This part has been discussed during the 2nd version of the patch, but
>> I think that more discussion might be necessary.
>>
>> In the previous discussion, Richard Sandiford mentioned:
>> (https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568620.html):
>>
>> =
>>
>> I guess the thing we need to decide here is whether -ftrivial-auto-var-init
>> should affect debug-only constructs too.  If it doesn't, exmaining removed
>> components in a debugger might show uninitialised values in cases where
>> the user was expecting initialised ones.  There would be no security
>> concern, but it might be surprising.
>>
>> I think in principle the DRHS can contain a call to DEFERRED_INIT.
>> Doing that would probably require further handling elsewhere though.
>>
>> =
>>
>> I am still not very confident now for this part of the change.
>
> I see.  I still tend to think that with or without the generation of
> gimple_build_debug_binds, the debugger would still not display any value
> for the component in question.  Without it there would be no information
> about the component at a any place in code affected by this, with it the
> component would be explicitely uninitialized.  But OK.

FTR, I don't have a strong opinion here.  You know the code better
than I do, so if you think not generating debug binds is better then
let's do that.

Thanks,
Richard


Re: [PATCH] New hook adjust_iv_update_pos

2021-07-12 Thread Xionghu Luo via Gcc-patches



On 2021/7/7 21:20, Richard Biener wrote:
> On Tue, Jun 29, 2021 at 11:19 AM Xionghu Luo  wrote:
>>
>>
>>
>> On 2021/6/28 16:25, Richard Biener wrote:
>>> On Mon, Jun 28, 2021 at 10:07 AM Xionghu Luo  wrote:



 On 2021/6/25 18:02, Richard Biener wrote:
> On Fri, Jun 25, 2021 at 11:41 AM Xionghu Luo  wrote:
>>
>>
>>
>> On 2021/6/25 16:54, Richard Biener wrote:
>>> On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches
>>>  wrote:

 From: Xiong Hu Luo 

 adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance
 on Power.  For example, it generates mismatched address offset after
 adjust iv update statement position:

  [local count: 70988443]:
 _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
 ivtmp.30_415 = ivtmp.30_414 + 1;
 _34 = ref_180 + 18446744073709551615;
 _86 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
 if (_84 == _86)
   goto ; [94.50%]
   else
   goto ; [5.50%]

 Disable it will produce:

  [local count: 70988443]:
 _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
 _86 = MEM[(uint8_t *)ref_180 + ivtmp.30_414 * 1];
 ivtmp.30_415 = ivtmp.30_414 + 1;
 if (_84 == _86)
   goto ; [94.50%]
   else
   goto ; [5.50%]

 Then later pass loop unroll could benefit from same address offset
 with different base address and reduces register dependency.
 This patch could improve performance by 10% for typical case on Power,
 no performance change observed for X86 or Aarch64 due to small loops
 not unrolled on these platforms.  Any comments?
>>>
>>> The case you quote is special in that if we hoisted the IV update before
>>> the other MEM _also_ used in the condition it would be fine again.
>>
>> Thanks.  I tried to hoist the IV update statement before the first MEM 
>> (Fix 2), it
>> shows even worse performance due to not unroll(two more "base-1" is 
>> generated in gimple,
>> then loop->ninsns is 11 so small loops is not unrolled), change the 
>> threshold from
>> 10 to 12 in rs6000_loop_unroll_adjust would make it also unroll 2 times, 
>> the
>> performance is SAME to the one that IV update statement in the *MIDDLE* 
>> (trunk).
>>From the ASM, we can see the index register %r4 is used in two 
>> iterations which
>> maybe a bottle neck for hiding instruction latency?
>>
>> Then it seems reasonable the performance would be better if keep the IV 
>> update
>> statement at *LAST* (Fix 1).
>>
>> (Fix 2):
>>   [local count: 70988443]:
>>  ivtmp.30_415 = ivtmp.30_414 + 1;
>>  _34 = ip_229 + 18446744073709551615;
>>  _84 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
>>  _33 = ref_180 + 18446744073709551615;
>>  _86 = MEM[(uint8_t *)_33 + ivtmp.30_415 * 1];
>>  if (_84 == _86)
>>goto ; [94.50%]
>>  else
>>goto ; [5.50%]
>>
>>
>> .L67:
>>lbzx %r12,%r24,%r4
>>lbzx %r25,%r7,%r4
>>cmpw %cr0,%r12,%r25
>>bne %cr0,.L11
>>mr %r26,%r4
>>addi %r4,%r4,1
>>lbzx %r12,%r24,%r4
>>lbzx %r25,%r7,%r4
>>mr %r6,%r26
>>cmpw %cr0,%r12,%r25
>>bne %cr0,.L11
>>mr %r26,%r4
>> .L12:
>>cmpdi %cr0,%r10,1
>>addi %r4,%r26,1
>>mr %r6,%r26
>>addi %r10,%r10,-1
>>bne %cr0,.L67
>>
>>>
>>> Now, adjust_iv_update_pos doesn't seem to check that the
>>> condition actually uses the IV use stmt def, so it likely applies to
>>> too many cases.
>>>
>>> Unfortunately the introducing rev didn't come with a testcase,
>>> but still I think fixing up adjust_iv_update_pos is better than
>>> introducing a way to short-cut it per target decision.
>>>
>>> One "fix" might be to add a check that either the condition
>>> lhs or rhs is the def of the IV use and the other operand
>>> is invariant.  Or if it's of similar structure hoist across the
>>> other iv-use as well.  Not that I understand the argument
>>> about the overlapping life-range.
>>>
>>> You also don't provide a complete testcase ...
>>>
>>
>> Attached the test code, will also add it it patch in future version.
>> The issue comes from a very small hot loop:
>>
>>do {
>>  len++;
>>} while(len < maxlen && ip[len] == ref[len]);
>
> unsigned int foo (unsigned char *ip, unsigned char *ref, unsigned int 
> maxlen)
> {
>  unsigned int len = 2;
>  do

Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-12 Thread guojiufu via Gcc-patches

On 2021-07-12 14:20, Richard Biener wrote:

On Fri, 9 Jul 2021, Segher Boessenkool wrote:


On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote:
> I wonder if there's a way to query the target what modes the doloop
> pattern can handle (not being too familiar with the doloop code).

You can look what modes are allowed for operand 0 of doloop_end,
perhaps?  Although that is a define_expand, not a define_insn, so it 
is

hard to introspect.

> Why do you need to do any checks besides the new type being able to
> represent all IV values?  The original doloop IV will never wrap
> (OTOH if niter is U*_MAX then we compute niter + 1 which will become
> zero ... I suppose the doloop might still do the correct thing here
> but it also still will with a IV with larger type).


The issue comes from U*_MAX (original short MAX), as you said: on which
niter + 1 becomes zero.  And because the step for doloop is -1; then, on
larger type 'zero - 1' will be a very large number on larger type
(e.g. 0xff...ff); but on the original short type 'zero - 1' is a small 
value

(e.g. "0xff").



doloop_valid_p guarantees it is simple and doesn't wrap.

> I'd have expected sth like
>
>ntype = lang_hooks.types.type_for_mode (word_mode, TYPE_UNSIGNED
> (ntype));
>
> thus the decision made using a mode - which is also why I wonder
> if there's a way to query the target for this.  As you say,
> it _may_ be fast, so better check (somehow).



I was also thinking of using hooks like type_for_size/type_for_mode.
/* Use type in word size may fast.  */
if (TYPE_PRECISION (ntype) < BITS_PER_WORD
&& Wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE 
(ntype

  {
ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
base = fold_convert (ntype, base);
  }

As you pointed out, this does not query the mode from targets.
As Segher pointed out "doloop_end" checks unsupported mode, while it 
seems

not easy to use it in tree-ssa-loop-ivopts.c.
For implementations of doloop_end, tartgets like rs6000/aarch64/ia64 
requires
Pmode/DImode; while there are other targets that work on other 'mode' 
(e.g. SI).



In doloop_optimize, there is code:

```
mode = desc->mode;
.
doloop_reg = gen_reg_rtx (mode);
rtx_insn *doloop_seq = targetm.gen_doloop_end (doloop_reg, 
start_label);


word_mode_size = GET_MODE_PRECISION (word_mode);
word_mode_max = (HOST_WIDE_INT_1U << (word_mode_size - 1) << 1) - 1;
if (! doloop_seq
&& mode != word_mode
/* Before trying mode different from the one in that # of 
iterations is
   computed, we must be sure that the number of iterations fits 
into

   the new mode.  */
&& (word_mode_size >= GET_MODE_PRECISION (mode)
|| wi::leu_p (iterations_max, word_mode_max)))
  {
if (word_mode_size > GET_MODE_PRECISION (mode))
  count = simplify_gen_unary (ZERO_EXTEND, word_mode, count, 
mode);

else
  count = lowpart_subreg (word_mode, count, mode);
PUT_MODE (doloop_reg, word_mode);
doloop_seq = targetm.gen_doloop_end (doloop_reg, start_label);
  }
if (! doloop_seq)
  {
if (dump_file)
  fprintf (dump_file,
   "Doloop: Target unwilling to use doloop pattern!\n");
return false;
  }
```
The above code first tries the mode of niter_desc by call 
targetm.gen_doloop_end

to see if the target can generate doloop insns, if fail, then try to use
'word_mode' against gen_doloop_end.




Almost all targets just use Pmode, but there is no such guarantee I
think, and esp. some targets that do not have machine insns for this
(but want to generate different code for this anyway) can do pretty 
much

anything.

Maybe using just Pmode here is good enough though?


I think Pmode is a particularly bad choice and I'd prefer word_mode
if we go for any hardcoded mode.  s390x for example seems to handle
both SImode and DImode (but names the helper gen_doloop_si64
for SImode?!).  But indeed it looks like somehow querying doloop_end
is going to be difficult since the expander doesn't have any mode,
so we'd have to actually try emit RTL here.


Instead of using hardcode mode, maybe we could add a hook for targets to 
return

the preferred mode.


Thanks for those valuable comments!

Jiufu Guo





Richard.


[PATCH] libgomp: Include early to avoid link failure with glibc 2.34

2021-07-12 Thread Florian Weimer via Gcc-patches
 is included indirectly in the #pragma GCC visibility hidden
block.  With glibc 2.34,  needs a declaration of the sysconf
function, and including it under hidden visibility turns other calls
to sysconf into hidden references, leading to a linker failure.

libgomp/ChangeLog:

* libgomp.h: Include .

---
 libgomp/libgomp.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 8d25dc8e2a8..1fe209429d1 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -46,6 +46,7 @@
 #include "libgomp-plugin.h"
 #include "gomp-constants.h"
 
+#include 
 #ifdef HAVE_PTHREAD_H
 #include 
 #endif



Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-12 Thread Richard Biener
On Mon, 12 Jul 2021, guojiufu wrote:

> On 2021-07-12 14:20, Richard Biener wrote:
> > On Fri, 9 Jul 2021, Segher Boessenkool wrote:
> > 
> >> On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote:
> >> > I wonder if there's a way to query the target what modes the doloop
> >> > pattern can handle (not being too familiar with the doloop code).
> >> 
> >> You can look what modes are allowed for operand 0 of doloop_end,
> >> perhaps?  Although that is a define_expand, not a define_insn, so it is
> >> hard to introspect.
> >> 
> >> > Why do you need to do any checks besides the new type being able to
> >> > represent all IV values?  The original doloop IV will never wrap
> >> > (OTOH if niter is U*_MAX then we compute niter + 1 which will become
> >> > zero ... I suppose the doloop might still do the correct thing here
> >> > but it also still will with a IV with larger type).
> 
> The issue comes from U*_MAX (original short MAX), as you said: on which
> niter + 1 becomes zero.  And because the step for doloop is -1; then, on
> larger type 'zero - 1' will be a very large number on larger type
> (e.g. 0xff...ff); but on the original short type 'zero - 1' is a small value
> (e.g. "0xff").

But for the larger type the small type MAX + 1 fits and does not yield
zero so it should still work exactly as before, no?  Of course you
have to compute the + 1 in the larger type.

> >> 
> >> doloop_valid_p guarantees it is simple and doesn't wrap.
> >> 
> >> > I'd have expected sth like
> >> >
> >> >ntype = lang_hooks.types.type_for_mode (word_mode, TYPE_UNSIGNED
> >> > (ntype));
> >> >
> >> > thus the decision made using a mode - which is also why I wonder
> >> > if there's a way to query the target for this.  As you say,
> >> > it _may_ be fast, so better check (somehow).
> 
> 
> I was also thinking of using hooks like type_for_size/type_for_mode.
> /* Use type in word size may fast.  */
> if (TYPE_PRECISION (ntype) < BITS_PER_WORD
> && Wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE
> (ntype
>   {
> ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
> base = fold_convert (ntype, base);
>   }
> 
> As you pointed out, this does not query the mode from targets.
> As Segher pointed out "doloop_end" checks unsupported mode, while it seems
> not easy to use it in tree-ssa-loop-ivopts.c.
> For implementations of doloop_end, tartgets like rs6000/aarch64/ia64 requires
> Pmode/DImode; while there are other targets that work on other 'mode' (e.g.
> SI).
> 
> 
> In doloop_optimize, there is code:
> 
> ```
> mode = desc->mode;
> .
> doloop_reg = gen_reg_rtx (mode);
> rtx_insn *doloop_seq = targetm.gen_doloop_end (doloop_reg, 
> start_label);
> 
> word_mode_size = GET_MODE_PRECISION (word_mode);
> word_mode_max = (HOST_WIDE_INT_1U << (word_mode_size - 1) << 1) - 1;
> if (! doloop_seq
> && mode != word_mode
> /* Before trying mode different from the one in that # of 
> iterations is
>computed, we must be sure that the number of iterations fits into
>the new mode.  */
> && (word_mode_size >= GET_MODE_PRECISION (mode)
> || wi::leu_p (iterations_max, word_mode_max)))
>   {
> if (word_mode_size > GET_MODE_PRECISION (mode))
>   count = simplify_gen_unary (ZERO_EXTEND, word_mode, count, mode);
> else
>   count = lowpart_subreg (word_mode, count, mode);
> PUT_MODE (doloop_reg, word_mode);
> doloop_seq = targetm.gen_doloop_end (doloop_reg, start_label);
>   }
> if (! doloop_seq)
>   {
> if (dump_file)
>   fprintf (dump_file,
>"Doloop: Target unwilling to use doloop pattern!\n");
> return false;
>   }
> ```
> The above code first tries the mode of niter_desc by call
> targetm.gen_doloop_end
> to see if the target can generate doloop insns, if fail, then try to use
> 'word_mode' against gen_doloop_end.
> 
> 
> >> 
> >> Almost all targets just use Pmode, but there is no such guarantee I
> >> think, and esp. some targets that do not have machine insns for this
> >> (but want to generate different code for this anyway) can do pretty much
> >> anything.
> >> 
> >> Maybe using just Pmode here is good enough though?
> > 
> > I think Pmode is a particularly bad choice and I'd prefer word_mode
> > if we go for any hardcoded mode.  s390x for example seems to handle
> > both SImode and DImode (but names the helper gen_doloop_si64
> > for SImode?!).  But indeed it looks like somehow querying doloop_end
> > is going to be difficult since the expander doesn't have any mode,
> > so we'd have to actually try emit RTL here.
> 
> Instead of using hardcode mode, maybe we could add a hook for targets to
> return
> the preferred mode.

That's a possiblity of course.  Like the following which just shows the
default implementation then (pass in current mode, return a more preferred
mode or the mo

Re: [PATCH] New hook adjust_iv_update_pos

2021-07-12 Thread Hongtao Liu via Gcc-patches
On Mon, Jul 12, 2021 at 4:14 PM Xionghu Luo via Gcc-patches
 wrote:
>
>
>
> On 2021/7/7 21:20, Richard Biener wrote:
> > On Tue, Jun 29, 2021 at 11:19 AM Xionghu Luo  wrote:
> >>
> >>
> >>
> >> On 2021/6/28 16:25, Richard Biener wrote:
> >>> On Mon, Jun 28, 2021 at 10:07 AM Xionghu Luo  wrote:
> 
> 
> 
>  On 2021/6/25 18:02, Richard Biener wrote:
> > On Fri, Jun 25, 2021 at 11:41 AM Xionghu Luo  
> > wrote:
> >>
> >>
> >>
> >> On 2021/6/25 16:54, Richard Biener wrote:
> >>> On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches
> >>>  wrote:
> 
>  From: Xiong Hu Luo 
> 
>  adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help performance
>  on Power.  For example, it generates mismatched address offset after
>  adjust iv update statement position:
> 
>   [local count: 70988443]:
>  _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
>  ivtmp.30_415 = ivtmp.30_414 + 1;
>  _34 = ref_180 + 18446744073709551615;
>  _86 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
>  if (_84 == _86)
>    goto ; [94.50%]
>    else
>    goto ; [5.50%]
> 
>  Disable it will produce:
> 
>   [local count: 70988443]:
>  _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
>  _86 = MEM[(uint8_t *)ref_180 + ivtmp.30_414 * 1];
>  ivtmp.30_415 = ivtmp.30_414 + 1;
>  if (_84 == _86)
>    goto ; [94.50%]
>    else
>    goto ; [5.50%]
> 
>  Then later pass loop unroll could benefit from same address offset
>  with different base address and reduces register dependency.
>  This patch could improve performance by 10% for typical case on 
>  Power,
>  no performance change observed for X86 or Aarch64 due to small loops
>  not unrolled on these platforms.  Any comments?
> >>>
> >>> The case you quote is special in that if we hoisted the IV update 
> >>> before
> >>> the other MEM _also_ used in the condition it would be fine again.
> >>
> >> Thanks.  I tried to hoist the IV update statement before the first MEM 
> >> (Fix 2), it
> >> shows even worse performance due to not unroll(two more "base-1" is 
> >> generated in gimple,
> >> then loop->ninsns is 11 so small loops is not unrolled), change the 
> >> threshold from
> >> 10 to 12 in rs6000_loop_unroll_adjust would make it also unroll 2 
> >> times, the
> >> performance is SAME to the one that IV update statement in the 
> >> *MIDDLE* (trunk).
> >>From the ASM, we can see the index register %r4 is used in two 
> >> iterations which
> >> maybe a bottle neck for hiding instruction latency?
> >>
> >> Then it seems reasonable the performance would be better if keep the 
> >> IV update
> >> statement at *LAST* (Fix 1).
> >>
> >> (Fix 2):
> >>   [local count: 70988443]:
> >>  ivtmp.30_415 = ivtmp.30_414 + 1;
> >>  _34 = ip_229 + 18446744073709551615;
> >>  _84 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
> >>  _33 = ref_180 + 18446744073709551615;
> >>  _86 = MEM[(uint8_t *)_33 + ivtmp.30_415 * 1];
> >>  if (_84 == _86)
> >>goto ; [94.50%]
> >>  else
> >>goto ; [5.50%]
> >>
> >>
> >> .L67:
> >>lbzx %r12,%r24,%r4
> >>lbzx %r25,%r7,%r4
> >>cmpw %cr0,%r12,%r25
> >>bne %cr0,.L11
> >>mr %r26,%r4
> >>addi %r4,%r4,1
> >>lbzx %r12,%r24,%r4
> >>lbzx %r25,%r7,%r4
> >>mr %r6,%r26
> >>cmpw %cr0,%r12,%r25
> >>bne %cr0,.L11
> >>mr %r26,%r4
> >> .L12:
> >>cmpdi %cr0,%r10,1
> >>addi %r4,%r26,1
> >>mr %r6,%r26
> >>addi %r10,%r10,-1
> >>bne %cr0,.L67
> >>
> >>>
> >>> Now, adjust_iv_update_pos doesn't seem to check that the
> >>> condition actually uses the IV use stmt def, so it likely applies to
> >>> too many cases.
> >>>
> >>> Unfortunately the introducing rev didn't come with a testcase,
> >>> but still I think fixing up adjust_iv_update_pos is better than
> >>> introducing a way to short-cut it per target decision.
> >>>
> >>> One "fix" might be to add a check that either the condition
> >>> lhs or rhs is the def of the IV use and the other operand
> >>> is invariant.  Or if it's of similar structure hoist across the
> >>> other iv-use as well.  Not that I understand the argument
> >>> about the overlapping life-range.
> >>>
> >>> You also don't provide a complete testcase ...
> >>>
> >>
> >> Attached the test code, will also add i

[PATCH] offloading: fix -foffload hinting

2021-07-12 Thread Martin Liška

We should not call candidates_list_and_hint if there are no candidates.

Ready after testing finishes?
Thanks,
Martin

sanitizer/101425

gcc/ChangeLog:

* gcc.c (check_offload_target_name): Call
  candidates_list_and_hint only if we have a candidate.
---
 gcc/gcc.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index f086dd47b91..16a8aa8f17b 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -4015,15 +4015,18 @@ check_offload_target_name (const char *target, 
ptrdiff_t len)
 
   error ("GCC is not configured to support %qs as offload target", target2);
 
-  const char *hint = candidates_list_and_hint (target2, s, candidates);

   if (candidates.is_empty ())
inform (UNKNOWN_LOCATION, "no offloading targets configured");
-  else if (hint)
-   inform (UNKNOWN_LOCATION,
-   "valid offload targets are: %s; did you mean %qs?", s, hint);
   else
-   inform (UNKNOWN_LOCATION, "valid offload targets are: %s", s);
-  XDELETEVEC (s);
+   {
+ const char *hint = candidates_list_and_hint (target2, s, candidates);
+ if (hint)
+   inform (UNKNOWN_LOCATION,
+   "valid offload targets are: %s; did you mean %qs?", s, 
hint);
+ else
+   inform (UNKNOWN_LOCATION, "valid offload targets are: %s", s);
+ XDELETEVEC (s);
+   }
   return false;
 }
   return true;
--
2.32.0



Re: [PATCH] offloading: fix -foffload hinting

2021-07-12 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 12, 2021 at 11:01:39AM +0200, Martin Liška wrote:
> We should not call candidates_list_and_hint if there are no candidates.
> 
> Ready after testing finishes?
> Thanks,
> Martin
> 
>   sanitizer/101425
> 
> gcc/ChangeLog:
> 
>   * gcc.c (check_offload_target_name): Call
> candidates_list_and_hint only if we have a candidate.
> ---
>  gcc/gcc.c | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/gcc.c b/gcc/gcc.c
> index f086dd47b91..16a8aa8f17b 100644
> --- a/gcc/gcc.c
> +++ b/gcc/gcc.c
> @@ -4015,15 +4015,18 @@ check_offload_target_name (const char *target, 
> ptrdiff_t len)
>error ("GCC is not configured to support %qs as offload target", 
> target2);
> -  const char *hint = candidates_list_and_hint (target2, s, candidates);
>if (candidates.is_empty ())
>   inform (UNKNOWN_LOCATION, "no offloading targets configured");
> -  else if (hint)
> - inform (UNKNOWN_LOCATION,
> - "valid offload targets are: %s; did you mean %qs?", s, hint);
>else
> - inform (UNKNOWN_LOCATION, "valid offload targets are: %s", s);
> -  XDELETEVEC (s);
> + {
> +   const char *hint = candidates_list_and_hint (target2, s, candidates);
> +   if (hint)
> + inform (UNKNOWN_LOCATION,
> + "valid offload targets are: %s; did you mean %qs?", s, 
> hint);
> +   else
> + inform (UNKNOWN_LOCATION, "valid offload targets are: %s", s);
> +   XDELETEVEC (s);
> + }
>return false;
>  }
>return true;

Please move char *s; declaration into the new scope.
Ok with that change.

Jakub



RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-07-12 Thread Tamar Christina via Gcc-patches
Hi,

> Richard Sandiford  writes:
> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
> *vinfo,
> >>/* FORNOW.  Can continue analyzing the def-use chain when this stmt in
> a phi
> >>   inside the loop (in case we are analyzing an outer-loop).  */
> >>vect_unpromoted_value unprom0[2];
> >> +  enum optab_subtype subtype = optab_vector;
> >>if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> WIDEN_MULT_EXPR,
> >> -   false, 2, unprom0, &half_type))
> >> +   false, 2, unprom0, &half_type, &subtype))
> >> +return NULL;
> >> +
> >> +  if (subtype == optab_vector_mixed_sign
> >> +  && TYPE_UNSIGNED (unprom_mult.type)
> >> +  && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
> >> + (unprom_mult.type))
> >>  return NULL;
> >
> > Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
> > I.e. we need to reject the case in which we multiply a signed and an
> > unsigned value to get a (logically) signed result, but then
> > zero-extend it (rather than sign-extend it) to the precision of the 
> > addition.
> >
> > That would make the test:
> >
> >   if (subtype == optab_vector_mixed_sign
> >   && TYPE_UNSIGNED (unprom_mult.type)
> >   && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
> > return NULL;
> >
> > instead.
> 
> And folding that into the existing test gives:
> 
>   /* If there are two widening operations, make sure they agree on the sign
>  of the extension.  The result of an optab_vector_mixed_sign operation
>  is signed; otherwise, the result has the same sign as the operands.  */
>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>   && (subtype == optab_vector_mixed_sign
> ? TYPE_UNSIGNED (unprom_mult.type)
> : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
> return NULL;
> 

I went with the first one which doesn't add the extra constraints for the
normal dotproduct as that makes it too restrictive. It's the type of the
multiplication that determines the operation so dotproduct can be used
a bit more than where we currently do.

This was relaxed in an earlier patch.

Updated patch attached.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* optabs.def (usdot_prod_optab): New.
* doc/md.texi: Document it and clarify other dot prod optabs.
* optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign.
* optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
* optabs.c (expand_widen_pattern_expr): Likewise.
* tree-cfg.c (verify_gimple_assign_ternary): Likewise.
* tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
* tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional
optab subtype.
(vect_widened_op_tree): Optionally ignore
mismatch types.
(vect_recog_dot_prod_pattern): Support usdot_prod_optab.

 Inline copy of patch 

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
1b91814433057b1b377283fd1f40cb970dc3d243..323ba8eab78e2b2e582fa0633752930182e83ee5
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes an 
additional mask operand
 
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
+
+Compute the sum of the products of two signed elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+sdot ==
+   res = sign-ext (a) * sign-ext (b) + c
+@dots{}
+@end smallexample
+
 @cindex @code{udot_prod@var{m}} instruction pattern
-@itemx @samp{udot_prod@var{m}}
-Compute the sum of the products of two signed/unsigned elements.
-Operand 1 and operand 2 are of the same mode. Their product, which is of a
-wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
-wider than the mode of the product. The result is placed in operand 0, which
-is of the same mode as operand 3.
+@item @samp{udot_prod@var{m}}
+
+Compute the sum of the products of two unsigned elements.
+Operand 1 and operand 2 are of the same mode. Their
+product, which is of a wider mode, is computed and added to operand 3.
+Operand 3 is of a mode equal or wider than the mode of the product. The
+result is placed in operand 0, which is of the same mode as operand 3.
+
+Semantically the expressions perform the multiplication in the following signs
+
+@smallexample
+udot ==
+   res = zero-ext (a) * zero-ext (b) + c
+@dots{}
+@end smallexample
+
+
+
+@cindex @code{usdot_prod@var{m}} instruction pattern
+@item @samp{usdot_prod@var{m}}
+Compute the sum of the products of elements of different

Re: [PATCH] offloading: fix -foffload hinting

2021-07-12 Thread Martin Liška

On 7/12/21 11:17 AM, Jakub Jelinek wrote:

Please move char *s; declaration into the new scope.


Sure.


Ok with that change.


I've just pushed to master.

Martin


Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-07-12 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi,
>
>> Richard Sandiford  writes:
>> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
>> *vinfo,
>> >>/* FORNOW.  Can continue analyzing the def-use chain when this stmt in
>> a phi
>> >>   inside the loop (in case we are analyzing an outer-loop).  */
>> >>vect_unpromoted_value unprom0[2];
>> >> +  enum optab_subtype subtype = optab_vector;
>> >>if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
>> WIDEN_MULT_EXPR,
>> >> -  false, 2, unprom0, &half_type))
>> >> +  false, 2, unprom0, &half_type, &subtype))
>> >> +return NULL;
>> >> +
>> >> +  if (subtype == optab_vector_mixed_sign
>> >> +  && TYPE_UNSIGNED (unprom_mult.type)
>> >> +  && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
>> >> + (unprom_mult.type))
>> >>  return NULL;
>> >
>> > Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
>> > I.e. we need to reject the case in which we multiply a signed and an
>> > unsigned value to get a (logically) signed result, but then
>> > zero-extend it (rather than sign-extend it) to the precision of the 
>> > addition.
>> >
>> > That would make the test:
>> >
>> >   if (subtype == optab_vector_mixed_sign
>> >   && TYPE_UNSIGNED (unprom_mult.type)
>> >   && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
>> > return NULL;
>> >
>> > instead.
>> 
>> And folding that into the existing test gives:
>> 
>>   /* If there are two widening operations, make sure they agree on the sign
>>  of the extension.  The result of an optab_vector_mixed_sign operation
>>  is signed; otherwise, the result has the same sign as the operands.  */
>>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>>   && (subtype == optab_vector_mixed_sign
>>? TYPE_UNSIGNED (unprom_mult.type)
>>: TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
>> return NULL;
>> 
>
> I went with the first one which doesn't add the extra constraints for the
> normal dotproduct as that makes it too restrictive. It's the type of the
> multiplication that determines the operation so dotproduct can be used
> a bit more than where we currently do.
>
> This was relaxed in an earlier patch.

I didn't mean that we should add extra constraints to the normal case
though.  The existing test I was referring to above was:

  /* If there are two widening operations, make sure they agree on
 the sign of the extension.  */
  if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
  && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
return NULL;

Although this existing test makes sense for the normal case, IMO testing
TYPE_SIGN (half_type) doesn't make sense for the mixed-sign case.  I think
we should therefore replace the existing test with:

  /* If there are two widening operations, make sure they agree on the sign
 of the extension.  The result of an optab_vector_mixed_sign operation
 is signed; otherwise, the result has the same sign as the operands.  */
  if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
  && (subtype == optab_vector_mixed_sign
 ? TYPE_UNSIGNED (unprom_mult.type)
 : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
return NULL;

rather than add a separate condition for the mixed-sign case.
The behaviour of the normal case is the same both ways.

Thanks,
Richard




Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-12 Thread guojiufu via Gcc-patches

On 2021-07-12 16:57, Richard Biener wrote:

On Mon, 12 Jul 2021, guojiufu wrote:


On 2021-07-12 14:20, Richard Biener wrote:
> On Fri, 9 Jul 2021, Segher Boessenkool wrote:
>
>> On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote:
>> > I wonder if there's a way to query the target what modes the doloop
>> > pattern can handle (not being too familiar with the doloop code).
>>
>> You can look what modes are allowed for operand 0 of doloop_end,
>> perhaps?  Although that is a define_expand, not a define_insn, so it is
>> hard to introspect.
>>
>> > Why do you need to do any checks besides the new type being able to
>> > represent all IV values?  The original doloop IV will never wrap
>> > (OTOH if niter is U*_MAX then we compute niter + 1 which will become
>> > zero ... I suppose the doloop might still do the correct thing here
>> > but it also still will with a IV with larger type).

The issue comes from U*_MAX (original short MAX), as you said: on 
which
niter + 1 becomes zero.  And because the step for doloop is -1; then, 
on

larger type 'zero - 1' will be a very large number on larger type
(e.g. 0xff...ff); but on the original short type 'zero - 1' is a small 
value

(e.g. "0xff").


But for the larger type the small type MAX + 1 fits and does not yield
zero so it should still work exactly as before, no?  Of course you
have to compute the + 1 in the larger type.

You are right, if compute the "+ 1" in the larger type it is ok, as 
below code:

```
   /* Use type in word size may fast.  */
if (TYPE_PRECISION (ntype) < BITS_PER_WORD)
  {
ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
niter = fold_convert (ntype, niter);
  }

tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter),
 build_int_cst (ntype, 1));


add_candidate (data, base, build_int_cst (ntype, -1), true, NULL, 
NULL, true);

```
The issue of this is, this code generates more stmt for doloop.xxx:
  _12 = (unsigned int) xx(D);
  _10 = _12 + 4294967295;
  _24 = (long unsigned int) _10;
  doloop.6_8 = _24 + 1;

if use previous patch, "+ 1" on original type, then the stmts will looks 
like:

  _12 = (unsigned int) xx(D);
  doloop.6_8 = (long unsigned int) _12;

This is the reason for checking
   wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE (ntype)))


>>
>> doloop_valid_p guarantees it is simple and doesn't wrap.
>>
>> > I'd have expected sth like
>> >
>> >ntype = lang_hooks.types.type_for_mode (word_mode, TYPE_UNSIGNED
>> > (ntype));
>> >
>> > thus the decision made using a mode - which is also why I wonder
>> > if there's a way to query the target for this.  As you say,
>> > it _may_ be fast, so better check (somehow).


I was also thinking of using hooks like type_for_size/type_for_mode.
/* Use type in word size may fast.  */
if (TYPE_PRECISION (ntype) < BITS_PER_WORD
&& Wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE
(ntype
  {
ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
base = fold_convert (ntype, base);
  }

As you pointed out, this does not query the mode from targets.
As Segher pointed out "doloop_end" checks unsupported mode, while it 
seems

not easy to use it in tree-ssa-loop-ivopts.c.
For implementations of doloop_end, tartgets like rs6000/aarch64/ia64 
requires
Pmode/DImode; while there are other targets that work on other 'mode' 
(e.g.

SI).


In doloop_optimize, there is code:

```
mode = desc->mode;
.
doloop_reg = gen_reg_rtx (mode);
rtx_insn *doloop_seq = targetm.gen_doloop_end (doloop_reg,
start_label);

word_mode_size = GET_MODE_PRECISION (word_mode);
word_mode_max = (HOST_WIDE_INT_1U << (word_mode_size - 1) << 1) - 
1;

if (! doloop_seq
&& mode != word_mode
/* Before trying mode different from the one in that # of
iterations is
   computed, we must be sure that the number of iterations 
fits into

   the new mode.  */
&& (word_mode_size >= GET_MODE_PRECISION (mode)
|| wi::leu_p (iterations_max, word_mode_max)))
  {
if (word_mode_size > GET_MODE_PRECISION (mode))
  count = simplify_gen_unary (ZERO_EXTEND, word_mode, count, 
mode);

else
  count = lowpart_subreg (word_mode, count, mode);
PUT_MODE (doloop_reg, word_mode);
doloop_seq = targetm.gen_doloop_end (doloop_reg, start_label);
  }
if (! doloop_seq)
  {
if (dump_file)
  fprintf (dump_file,
   "Doloop: Target unwilling to use doloop 
pattern!\n");

return false;
  }
```
The above code first tries the mode of niter_desc by call
targetm.gen_doloop_end
to see if the target can generate doloop insns, if fail, then try to 
use

'word_mode' against gen_doloop_end.


>>
>> Almost all targets just use Pmode, but there is no such guarantee I
>> think, and esp. some targets that do not have machine insns for this
>> (but

Re: [PATCH] New hook adjust_iv_update_pos

2021-07-12 Thread Richard Biener via Gcc-patches
On Mon, Jul 12, 2021 at 11:00 AM Hongtao Liu  wrote:
>
> On Mon, Jul 12, 2021 at 4:14 PM Xionghu Luo via Gcc-patches
>  wrote:
> >
> >
> >
> > On 2021/7/7 21:20, Richard Biener wrote:
> > > On Tue, Jun 29, 2021 at 11:19 AM Xionghu Luo  wrote:
> > >>
> > >>
> > >>
> > >> On 2021/6/28 16:25, Richard Biener wrote:
> > >>> On Mon, Jun 28, 2021 at 10:07 AM Xionghu Luo  
> > >>> wrote:
> > 
> > 
> > 
> >  On 2021/6/25 18:02, Richard Biener wrote:
> > > On Fri, Jun 25, 2021 at 11:41 AM Xionghu Luo  
> > > wrote:
> > >>
> > >>
> > >>
> > >> On 2021/6/25 16:54, Richard Biener wrote:
> > >>> On Fri, Jun 25, 2021 at 10:34 AM Xionghu Luo via Gcc-patches
> > >>>  wrote:
> > 
> >  From: Xiong Hu Luo 
> > 
> >  adjust_iv_update_pos in tree-ssa-loop-ivopts doesn't help 
> >  performance
> >  on Power.  For example, it generates mismatched address offset 
> >  after
> >  adjust iv update statement position:
> > 
> >   [local count: 70988443]:
> >  _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
> >  ivtmp.30_415 = ivtmp.30_414 + 1;
> >  _34 = ref_180 + 18446744073709551615;
> >  _86 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
> >  if (_84 == _86)
> >    goto ; [94.50%]
> >    else
> >    goto ; [5.50%]
> > 
> >  Disable it will produce:
> > 
> >   [local count: 70988443]:
> >  _84 = MEM[(uint8_t *)ip_229 + ivtmp.30_414 * 1];
> >  _86 = MEM[(uint8_t *)ref_180 + ivtmp.30_414 * 1];
> >  ivtmp.30_415 = ivtmp.30_414 + 1;
> >  if (_84 == _86)
> >    goto ; [94.50%]
> >    else
> >    goto ; [5.50%]
> > 
> >  Then later pass loop unroll could benefit from same address offset
> >  with different base address and reduces register dependency.
> >  This patch could improve performance by 10% for typical case on 
> >  Power,
> >  no performance change observed for X86 or Aarch64 due to small 
> >  loops
> >  not unrolled on these platforms.  Any comments?
> > >>>
> > >>> The case you quote is special in that if we hoisted the IV update 
> > >>> before
> > >>> the other MEM _also_ used in the condition it would be fine again.
> > >>
> > >> Thanks.  I tried to hoist the IV update statement before the first 
> > >> MEM (Fix 2), it
> > >> shows even worse performance due to not unroll(two more "base-1" is 
> > >> generated in gimple,
> > >> then loop->ninsns is 11 so small loops is not unrolled), change the 
> > >> threshold from
> > >> 10 to 12 in rs6000_loop_unroll_adjust would make it also unroll 2 
> > >> times, the
> > >> performance is SAME to the one that IV update statement in the 
> > >> *MIDDLE* (trunk).
> > >>From the ASM, we can see the index register %r4 is used in two 
> > >> iterations which
> > >> maybe a bottle neck for hiding instruction latency?
> > >>
> > >> Then it seems reasonable the performance would be better if keep the 
> > >> IV update
> > >> statement at *LAST* (Fix 1).
> > >>
> > >> (Fix 2):
> > >>   [local count: 70988443]:
> > >>  ivtmp.30_415 = ivtmp.30_414 + 1;
> > >>  _34 = ip_229 + 18446744073709551615;
> > >>  _84 = MEM[(uint8_t *)_34 + ivtmp.30_415 * 1];
> > >>  _33 = ref_180 + 18446744073709551615;
> > >>  _86 = MEM[(uint8_t *)_33 + ivtmp.30_415 * 1];
> > >>  if (_84 == _86)
> > >>goto ; [94.50%]
> > >>  else
> > >>goto ; [5.50%]
> > >>
> > >>
> > >> .L67:
> > >>lbzx %r12,%r24,%r4
> > >>lbzx %r25,%r7,%r4
> > >>cmpw %cr0,%r12,%r25
> > >>bne %cr0,.L11
> > >>mr %r26,%r4
> > >>addi %r4,%r4,1
> > >>lbzx %r12,%r24,%r4
> > >>lbzx %r25,%r7,%r4
> > >>mr %r6,%r26
> > >>cmpw %cr0,%r12,%r25
> > >>bne %cr0,.L11
> > >>mr %r26,%r4
> > >> .L12:
> > >>cmpdi %cr0,%r10,1
> > >>addi %r4,%r26,1
> > >>mr %r6,%r26
> > >>addi %r10,%r10,-1
> > >>bne %cr0,.L67
> > >>
> > >>>
> > >>> Now, adjust_iv_update_pos doesn't seem to check that the
> > >>> condition actually uses the IV use stmt def, so it likely applies to
> > >>> too many cases.
> > >>>
> > >>> Unfortunately the introducing rev didn't come with a testcase,
> > >>> but still I think fixing up adjust_iv_update_pos is better than
> > >>> introducing a way to short-cut it per target decision.
> > >>>
> > >>> One "fix" might be to add a check that either the condition
> > >>> lhs or rhs is the def 

Re: [ARM] PR66791: Replace builtins for signed vmul_n intrinsics

2021-07-12 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 5 Jul 2021 at 14:47, Prathamesh Kulkarni
 wrote:
>
> Hi,
> This patch replaces builtins with __a * __b for signed variants of
> vmul_n intrinsics.
> As discussed earlier, the patch has issue if __a * __b overflows, and
> whether we wish to leave
> that as UB.
ping 
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=6785eb595981abd93ad85edcfdf1d2e43c0841f5

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh


Re: [ARM] PR66791: Replace builtins for signed vmul_n intrinsics

2021-07-12 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 12 Jul 2021 at 15:23, Prathamesh Kulkarni
 wrote:
>
> On Mon, 5 Jul 2021 at 14:47, Prathamesh Kulkarni
>  wrote:
> >
> > Hi,
> > This patch replaces builtins with __a * __b for signed variants of
> > vmul_n intrinsics.
> > As discussed earlier, the patch has issue if __a * __b overflows, and
> > whether we wish to leave
> > that as UB.
> ping 
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=6785eb595981abd93ad85edcfdf1d2e43c0841f5
Oops sorry, I meant this link:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574428.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh


RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-07-12 Thread Tamar Christina via Gcc-patches



> -Original Message-
> From: Richard Sandiford 
> Sent: Monday, July 12, 2021 10:39 AM
> To: Tamar Christina 
> Cc: Richard Biener ; nd ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> Tamar Christina  writes:
> > Hi,
> >
> >> Richard Sandiford  writes:
> >> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
> >> *vinfo,
> >> >>/* FORNOW.  Can continue analyzing the def-use chain when this
> >> >> stmt in
> >> a phi
> >> >>   inside the loop (in case we are analyzing an outer-loop).  */
> >> >>vect_unpromoted_value unprom0[2];
> >> >> +  enum optab_subtype subtype = optab_vector;
> >> >>if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> >> WIDEN_MULT_EXPR,
> >> >> -false, 2, unprom0, &half_type))
> >> >> +false, 2, unprom0, &half_type, &subtype))
> >> >> +return NULL;
> >> >> +
> >> >> +  if (subtype == optab_vector_mixed_sign
> >> >> +  && TYPE_UNSIGNED (unprom_mult.type)
> >> >> +  && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
> >> >> + (unprom_mult.type))
> >> >>  return NULL;
> >> >
> >> > Isn't the final condition here instead that TYPE1 is narrower than TYPE2?
> >> > I.e. we need to reject the case in which we multiply a signed and
> >> > an unsigned value to get a (logically) signed result, but then
> >> > zero-extend it (rather than sign-extend it) to the precision of the
> addition.
> >> >
> >> > That would make the test:
> >> >
> >> >   if (subtype == optab_vector_mixed_sign
> >> >   && TYPE_UNSIGNED (unprom_mult.type)
> >> >   && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
> >> > return NULL;
> >> >
> >> > instead.
> >>
> >> And folding that into the existing test gives:
> >>
> >>   /* If there are two widening operations, make sure they agree on the
> sign
> >>  of the extension.  The result of an optab_vector_mixed_sign operation
> >>  is signed; otherwise, the result has the same sign as the operands.  
> >> */
> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
> >>   && (subtype == optab_vector_mixed_sign
> >>  ? TYPE_UNSIGNED (unprom_mult.type)
> >>  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
> >> return NULL;
> >>
> >
> > I went with the first one which doesn't add the extra constraints for
> > the normal dotproduct as that makes it too restrictive. It's the type
> > of the multiplication that determines the operation so dotproduct can
> > be used a bit more than where we currently do.
> >
> > This was relaxed in an earlier patch.
> 
> I didn't mean that we should add extra constraints to the normal case though.
> The existing test I was referring to above was:
> 
>   /* If there are two widening operations, make sure they agree on
>  the sign of the extension.  */
>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>   && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
> return NULL;

But as I mentioned, this restriction is unneeded and has been removed hence why 
it's not in my patchset's diff.
It's removed by https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569851.html 
which Richi conditioned on
the rest of these patches being approved.

This change needlessly blocks test vect-reduc-dot-[2,3,6,7].c from being 
dotproducts for instance

It's also part of the deficiency between GCC codegen and Clang 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492#c6

Regards,
Tamar

> 
> Although this existing test makes sense for the normal case, IMO testing
> TYPE_SIGN (half_type) doesn't make sense for the mixed-sign case.  I think
> we should therefore replace the existing test with:
> 
>   /* If there are two widening operations, make sure they agree on the sign
>  of the extension.  The result of an optab_vector_mixed_sign operation
>  is signed; otherwise, the result has the same sign as the operands.  */
>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>   && (subtype == optab_vector_mixed_sign
>  ? TYPE_UNSIGNED (unprom_mult.type)
>  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
> return NULL;
> 
> rather than add a separate condition for the mixed-sign case.
> The behaviour of the normal case is the same both ways.
> 
> Thanks,
> Richard
> 



Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-12 Thread Richard Biener
On Mon, 12 Jul 2021, guojiufu wrote:

> On 2021-07-12 16:57, Richard Biener wrote:
> > On Mon, 12 Jul 2021, guojiufu wrote:
> > 
> >> On 2021-07-12 14:20, Richard Biener wrote:
> >> > On Fri, 9 Jul 2021, Segher Boessenkool wrote:
> >> >
> >> >> On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote:
> >> >> > I wonder if there's a way to query the target what modes the doloop
> >> >> > pattern can handle (not being too familiar with the doloop code).
> >> >>
> >> >> You can look what modes are allowed for operand 0 of doloop_end,
> >> >> perhaps?  Although that is a define_expand, not a define_insn, so it is
> >> >> hard to introspect.
> >> >>
> >> >> > Why do you need to do any checks besides the new type being able to
> >> >> > represent all IV values?  The original doloop IV will never wrap
> >> >> > (OTOH if niter is U*_MAX then we compute niter + 1 which will become
> >> >> > zero ... I suppose the doloop might still do the correct thing here
> >> >> > but it also still will with a IV with larger type).
> >> 
> >> The issue comes from U*_MAX (original short MAX), as you said: on which
> >> niter + 1 becomes zero.  And because the step for doloop is -1; then, on
> >> larger type 'zero - 1' will be a very large number on larger type
> >> (e.g. 0xff...ff); but on the original short type 'zero - 1' is a small
> >> value
> >> (e.g. "0xff").
> > 
> > But for the larger type the small type MAX + 1 fits and does not yield
> > zero so it should still work exactly as before, no?  Of course you
> > have to compute the + 1 in the larger type.
> > 
> You are right, if compute the "+ 1" in the larger type it is ok, as below
> code:
> ```
>/* Use type in word size may fast.  */
> if (TYPE_PRECISION (ntype) < BITS_PER_WORD)
>   {
> ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
> niter = fold_convert (ntype, niter);
>   }
> 
> tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter),
>  build_int_cst (ntype, 1));
> 
> 
> add_candidate (data, base, build_int_cst (ntype, -1), true, NULL, NULL,
> true);
> ```
> The issue of this is, this code generates more stmt for doloop.xxx:
>   _12 = (unsigned int) xx(D);
>   _10 = _12 + 4294967295;
>   _24 = (long unsigned int) _10;
>   doloop.6_8 = _24 + 1;
> 
> if use previous patch, "+ 1" on original type, then the stmts will looks like:
>   _12 = (unsigned int) xx(D);
>   doloop.6_8 = (long unsigned int) _12;
> 
> This is the reason for checking
>wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE (ntype)))

But this then only works when there's an upper bound on the number
of iterations.  Note you should not use TYPE_MAX_VALUE here but
you can instead use

 wi::ltu_p (niter_desc->max, wi::to_widest (wi::max_value 
(TYPE_PRECISION (ntype), TYPE_SIGN (ntype;

I think the -1 above comes from number of latch iterations vs. header
entries - it's a common source for this kind of issues.  range analysis
might be able to prove that we can still merge the two adds even with
the intermediate extension.

Is this pre-loop extra add really offsetting the in-loop doloop
improvements?

> >> >>
> >> >> doloop_valid_p guarantees it is simple and doesn't wrap.
> >> >>
> >> >> > I'd have expected sth like
> >> >> >
> >> >> >ntype = lang_hooks.types.type_for_mode (word_mode, TYPE_UNSIGNED
> >> >> > (ntype));
> >> >> >
> >> >> > thus the decision made using a mode - which is also why I wonder
> >> >> > if there's a way to query the target for this.  As you say,
> >> >> > it _may_ be fast, so better check (somehow).
> >> 
> >> 
> >> I was also thinking of using hooks like type_for_size/type_for_mode.
> >> /* Use type in word size may fast.  */
> >> if (TYPE_PRECISION (ntype) < BITS_PER_WORD
> >> && Wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE
> >> (ntype
> >>   {
> >> ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
> >> base = fold_convert (ntype, base);
> >>   }
> >> 
> >> As you pointed out, this does not query the mode from targets.
> >> As Segher pointed out "doloop_end" checks unsupported mode, while it seems
> >> not easy to use it in tree-ssa-loop-ivopts.c.
> >> For implementations of doloop_end, tartgets like rs6000/aarch64/ia64
> >> requires
> >> Pmode/DImode; while there are other targets that work on other 'mode' (e.g.
> >> SI).
> >> 
> >> 
> >> In doloop_optimize, there is code:
> >> 
> >> ```
> >> mode = desc->mode;
> >> .
> >> doloop_reg = gen_reg_rtx (mode);
> >> rtx_insn *doloop_seq = targetm.gen_doloop_end (doloop_reg,
> >> start_label);
> >> 
> >> word_mode_size = GET_MODE_PRECISION (word_mode);
> >> word_mode_max = (HOST_WIDE_INT_1U << (word_mode_size - 1) << 1) - 
> >> 1;
> >> if (! doloop_seq
> >> && mode != word_mode
> >> /* Before trying mode different from the one in that # of
> >> iterations is
> >>computed, we must be sure that the numb

[PATCH] mklog: support '-b c/101343' format.

2021-07-12 Thread Martin Liška

Hello.

I don't use --fill-up-bug-titles, but I would like using ./contrib/mklog.py  -b 
c/101343
producing:

PR c/101343

...

I'm going to install the patch if there are not comments.

Martin

contrib/ChangeLog:

* mklog.py: Support additional PRs without PR prefix.
---
 contrib/mklog.py | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/contrib/mklog.py b/contrib/mklog.py
index ba70af0eef2..d2aea85c7cc 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -157,7 +157,11 @@ def generate_changelog(data, no_functions=False, 
fill_pr_titles=False,
 global firstpr
 
 if additional_prs:

-prs = [pr for pr in additional_prs if pr not in prs]
+for apr in additional_prs:
+if not apr.startswith('PR ') and '/' in apr:
+apr = 'PR ' + apr
+if apr not in prs:
+prs.append(apr)
 for file in diff:
 # skip files that can't be parsed
 if file.path == '/dev/null':
--
2.32.0



[PATCH] middle-end/101423 - internal calls do not trap

2021-07-12 Thread Richard Biener
This adjusts gimple_could_trap_p to not consider internal function
calls to trap compared to indirect calls or calls to weak functions.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-07-12  Richard Biener  

PR middle-end/101423
* gimple.c (gimple_could_trap_p_1): Internal function calls
do not trap.
* tree-eh.c (tree_could_trap_p): Likewise.
---
 gcc/gimple.c  | 4 +++-
 gcc/tree-eh.c | 5 -
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple.c b/gcc/gimple.c
index 60a90667e4b..cc464547e34 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -2149,8 +2149,10 @@ gimple_could_trap_p_1 (gimple *s, bool include_mem, bool 
include_stores)
   return gimple_asm_volatile_p (as_a  (s));
 
 case GIMPLE_CALL:
+  if (gimple_call_internal_p (s))
+   return false;
   t = gimple_call_fndecl (s);
-  /* Assume that calls to weak functions may trap.  */
+  /* Assume that indirect and calls to weak functions may trap.  */
   if (!t || !DECL_P (t) || DECL_WEAK (t))
return true;
   return false;
diff --git a/gcc/tree-eh.c b/gcc/tree-eh.c
index 601285c401c..57ce8f04a43 100644
--- a/gcc/tree-eh.c
+++ b/gcc/tree-eh.c
@@ -2723,8 +2723,11 @@ tree_could_trap_p (tree expr)
   return TREE_THIS_VOLATILE (expr);
 
 case CALL_EXPR:
+  /* Internal function calls do not trap.  */
+  if (CALL_EXPR_FN (expr) == NULL_TREE)
+   return false;
   t = get_callee_fndecl (expr);
-  /* Assume that calls to weak functions may trap.  */
+  /* Assume that indirect and calls to weak functions may trap.  */
   if (!t || !DECL_P (t))
return true;
   if (DECL_WEAK (t))
-- 
2.26.2


[PATCH] tree-optimization/101394 - fix PRE full redundancy wrt abnormals

2021-07-12 Thread Richard Biener
This avoids adding a copy from an abnormal picked up from PHI
translation much like we'd avoid inserting the translated
expression on pred edges.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-07-12  Richard Biener  

PR tree-optimization/101394
* tree-ssa-pre.c (do_pre_regular_insertion): Avoid inserting
copies from abnormals for a full redundancy.

* gcc.dg/torture/pr101394.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr101394.c | 18 ++
 gcc/tree-ssa-pre.c  |  6 +-
 2 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr101394.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr101394.c 
b/gcc/testsuite/gcc.dg/torture/pr101394.c
new file mode 100644
index 000..87fbdadc152
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr101394.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+
+int a, b, c, d;
+void h();
+int e() __attribute__((returns_twice));
+void f() {
+  int *g = (int *)(__INTPTR_TYPE__)c;
+  if (b) {
+h();
+g--;
+if (a)
+  if (d)
+h();
+  }
+  if (g++)
+e();
+  c = (__INTPTR_TYPE__)g;
+}
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index d86fe26bd07..69141c2f0c9 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -3412,7 +3412,11 @@ do_pre_regular_insertion (basic_block block, basic_block 
dom,
  /* If all edges produce the same value and that value is
 an invariant, then the PHI has the same value on all
 edges.  Note this.  */
- else if (!cant_insert && all_same)
+ else if (!cant_insert
+  && all_same
+  && (edoubleprime->kind != NAME
+  || !SSA_NAME_OCCURS_IN_ABNORMAL_PHI
+(PRE_EXPR_NAME (edoubleprime
{
  gcc_assert (edoubleprime->kind == CONSTANT
  || edoubleprime->kind == NAME);
-- 
2.26.2


[PATCH] tree-optimization/101373 - avoid PRE across externally throwing call

2021-07-12 Thread Richard Biener
PRE already tries to avoid hoisting possibly trapping expressions
across calls that might not return normally but fails to consider
const calls that throw externally.  The following fixes that and
also plugs the hole of trapping references not pruned in case
they are not catched by the actuall call clobbering it.

At -Os we hit the same issue in RTL PRE and postreload-gcse has
even more incomplete checks so the patch adjusts both of those
as well.

Boostrapped and tested on x86_64-unknown-linux-gnu, I'll push
unless I hear any comments from Eric regarding the Ada testcase
(I hope I have correctly altered it).

Thanks,
Richard.

2021-07-08  Richard Biener  

PR tree-optimization/101373
* tree-ssa-pre.c (prune_clobbered_mems): Also prune trapping
references when the BB may not return.
(compute_avail): Pass in the function we're working on and
replace cfun references with it.  Externally throwing
const calls also possibly terminate the function.
(pass_pre::execute): Pass down the function we're working on.
* gcse.c (compute_hash_table_work): Externally throwing
const/pure calls also need record_last_mem_set_info.
* postreload-gcse.c (record_opr_changes): Looping or externally
throwing const/pure calls also need record_last_mem_set_info.

* g++.dg/torture/pr101373.C: New testcase, XFAILed.
* gnat.dg/opt95.adb: Likewise.
---
 gcc/gcse.c  |  3 +-
 gcc/postreload-gcse.c   |  4 ++-
 gcc/testsuite/g++.dg/torture/pr101373.C | 33 
 gcc/testsuite/gnat.dg/opt95.adb | 40 +
 gcc/tree-ssa-pre.c  | 34 +
 5 files changed, 99 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr101373.C
 create mode 100644 gcc/testsuite/gnat.dg/opt95.adb

diff --git a/gcc/gcse.c b/gcc/gcse.c
index ecf7e51aac5..ccd33664af5 100644
--- a/gcc/gcse.c
+++ b/gcc/gcse.c
@@ -1537,7 +1537,8 @@ compute_hash_table_work (struct gcse_hash_table_d *table)
record_last_reg_set_info (insn, regno);
 
  if (! RTL_CONST_OR_PURE_CALL_P (insn)
- || RTL_LOOPING_CONST_OR_PURE_CALL_P (insn))
+ || RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
+ || can_throw_external (insn))
record_last_mem_set_info (insn);
}
 
diff --git a/gcc/postreload-gcse.c b/gcc/postreload-gcse.c
index 0b28247e299..6c95d09a1e5 100644
--- a/gcc/postreload-gcse.c
+++ b/gcc/postreload-gcse.c
@@ -779,7 +779,9 @@ record_opr_changes (rtx_insn *insn)
   EXECUTE_IF_SET_IN_HARD_REG_SET (callee_clobbers, 0, regno, hrsi)
record_last_reg_set_info_regno (insn, regno);
 
-  if (! RTL_CONST_OR_PURE_CALL_P (insn))
+  if (! RTL_CONST_OR_PURE_CALL_P (insn)
+ || RTL_LOOPING_CONST_OR_PURE_CALL_P (insn)
+ || can_throw_external (insn))
record_last_mem_set_info (insn);
 }
 }
diff --git a/gcc/testsuite/g++.dg/torture/pr101373.C 
b/gcc/testsuite/g++.dg/torture/pr101373.C
new file mode 100644
index 000..f8c809739e2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr101373.C
@@ -0,0 +1,33 @@
+// { dg-do run }
+// { dg-xfail-run-if "PR100409" { *-*-* } }
+
+int __attribute__((const,noipa)) foo (int j)
+{
+  if (j != 0)
+throw 1;
+  return 0;
+}
+
+int __attribute__((noipa)) bar (int *p, int n)
+{
+  int ret = 0;
+  if (n)
+{
+   foo (n);
+   ret = *p;
+}
+  ret += *p;
+  return ret;
+}
+
+int main()
+{
+  try
+{
+  return bar (nullptr, 1);
+}
+  catch (...)
+{
+  return 0;
+}
+}
diff --git a/gcc/testsuite/gnat.dg/opt95.adb b/gcc/testsuite/gnat.dg/opt95.adb
new file mode 100644
index 000..2c72582b3f1
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/opt95.adb
@@ -0,0 +1,40 @@
+-- { dg-do run }
+-- { dg-options "-O2 -gnatp" }
+
+procedure Opt95 is
+
+  function Foo (J : Integer) return Integer;
+  pragma Pure_Function (Foo);
+  pragma Machine_Attribute (Foo, "noipa");
+
+  function Foo (J : Integer) return Integer is
+  begin
+if J /= 0 then
+  raise Constraint_Error;
+end if;
+return 0;
+  end;
+
+  function Bar (A : access Integer; N : Integer) return Integer;
+  pragma Machine_Attribute (Bar, "noipa");
+
+  function Bar (A : access Integer; N : Integer) return Integer is
+Ret : Integer := 0;
+Ret2 : Integer := 0;
+  begin
+if N /= 0 then
+  Ret2 := Foo (N);
+  Ret := A.all;
+end if;
+Ret := Ret + A.all;
+return Ret + Ret2;
+  end;
+
+  V : Integer;
+  pragma Volatile (V);
+
+begin
+  V := Bar (null, 1);
+exception
+  when Constraint_Error => null;
+end;
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 69141c2f0c9..aa5244e678c 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -2071,6 +2071,13 @@ prune_clobbered_mems (bitmap_set_t set, basic_block 
block)
  && value_dies_in_bl

[PATCH] Display the number of components BB vectorized

2021-07-12 Thread Richard Biener
This amends the optimization message printed when a basic-block
part is vectorized to mention the number of SLP graph entries.
This helps when debugging vectorization differences and we end up
merging SLP instances for costing purposes.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-07-07  Richard Biener  

* tree-vect-slp.c (vect_slp_region): Show the number of
SLP graph entries in the optimization message.

* g++.dg/vect/slp-pr87105.cc: Adjust.
* gcc.dg/vect/bb-slp-pr54400.c: Likewise.
---
 gcc/testsuite/g++.dg/vect/slp-pr87105.cc   |  2 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c |  2 +-
 gcc/tree-vect-slp.c| 12 
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/g++.dg/vect/slp-pr87105.cc 
b/gcc/testsuite/g++.dg/vect/slp-pr87105.cc
index d07b1cd46b7..451a117e2d2 100644
--- a/gcc/testsuite/g++.dg/vect/slp-pr87105.cc
+++ b/gcc/testsuite/g++.dg/vect/slp-pr87105.cc
@@ -99,7 +99,7 @@ void quadBoundingBoxA(const Point bez[3], Box& bBox) noexcept 
{
 
 // We should have if-converted everything down to straight-line code
 // { dg-final { scan-tree-dump-times "" 1 "slp2" } }
-// { dg-final { scan-tree-dump-times "basic block part vectorized" 1 "slp2" { 
xfail { { ! vect_element_align } && { ! vect_hw_misalign } } } } }
+// { dg-final { scan-tree-dump-times "optimized: basic block part" 1 "slp2" { 
xfail { { ! vect_element_align } && { ! vect_hw_misalign } } } } }
 // It's a bit awkward to detect that all stores were vectorized but the
 // following more or less does the trick
 // { dg-final { scan-tree-dump "vect_\[^\r\m\]* = MIN" "slp2" { xfail { { ! 
vect_element_align } && { ! vect_hw_misalign } } } } }
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c
index 6b427aac774..7c46fa0e464 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c
@@ -39,5 +39,5 @@ main ()
 }
 
 /* We are lacking an effective target for .REDUC_PLUS support.  */
-/* { dg-final { scan-tree-dump-times "basic block part vectorized" 3 "slp2" { 
target x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "optimized: basic block part" 3 "slp2" { 
target x86_64-*-* } } } */
 /* { dg-final { scan-tree-dump-not " = VEC_PERM_EXPR" "slp2" { target 
x86_64-*-* } } } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 5357cd0e7a4..cd002b3fb7c 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -5827,12 +5827,16 @@ vect_slp_region (vec bbs, 
vec datarefs,
  if (GET_MODE_SIZE
(bb_vinfo->vector_mode).is_constant (&bytes))
dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
-"basic block part vectorized using %wu "
-"byte vectors\n", bytes);
+"basic block part with %u components "
+"vectorized using %wu byte vectors\n",
+instance->subgraph_entries.length (),
+bytes);
  else
dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
-"basic block part vectorized using "
-"variable length vectors\n");
+"basic block part with %u components "
+"vectorized using variable length "
+"vectors\n",
+instance->subgraph_entries.length ());
}
}
}
-- 
2.26.2


Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-07-12 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Monday, July 12, 2021 10:39 AM
>> To: Tamar Christina 
>> Cc: Richard Biener ; nd ; gcc-
>> patc...@gcc.gnu.org
>> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
>> where the sign for the multiplicant changes.
>> 
>> Tamar Christina  writes:
>> > Hi,
>> >
>> >> Richard Sandiford  writes:
>> >> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
>> >> *vinfo,
>> >> >>/* FORNOW.  Can continue analyzing the def-use chain when this
>> >> >> stmt in
>> >> a phi
>> >> >>   inside the loop (in case we are analyzing an outer-loop).  */
>> >> >>vect_unpromoted_value unprom0[2];
>> >> >> +  enum optab_subtype subtype = optab_vector;
>> >> >>if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
>> >> WIDEN_MULT_EXPR,
>> >> >> -   false, 2, unprom0, &half_type))
>> >> >> +   false, 2, unprom0, &half_type, &subtype))
>> >> >> +return NULL;
>> >> >> +
>> >> >> +  if (subtype == optab_vector_mixed_sign
>> >> >> +  && TYPE_UNSIGNED (unprom_mult.type)
>> >> >> +  && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
>> >> >> + (unprom_mult.type))
>> >> >>  return NULL;
>> >> >
>> >> > Isn't the final condition here instead that TYPE1 is narrower than 
>> >> > TYPE2?
>> >> > I.e. we need to reject the case in which we multiply a signed and
>> >> > an unsigned value to get a (logically) signed result, but then
>> >> > zero-extend it (rather than sign-extend it) to the precision of the
>> addition.
>> >> >
>> >> > That would make the test:
>> >> >
>> >> >   if (subtype == optab_vector_mixed_sign
>> >> >   && TYPE_UNSIGNED (unprom_mult.type)
>> >> >   && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION (type))
>> >> > return NULL;
>> >> >
>> >> > instead.
>> >>
>> >> And folding that into the existing test gives:
>> >>
>> >>   /* If there are two widening operations, make sure they agree on the
>> sign
>> >>  of the extension.  The result of an optab_vector_mixed_sign operation
>> >>  is signed; otherwise, the result has the same sign as the operands.  
>> >> */
>> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>> >>   && (subtype == optab_vector_mixed_sign
>> >> ? TYPE_UNSIGNED (unprom_mult.type)
>> >> : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
>> >> return NULL;
>> >>
>> >
>> > I went with the first one which doesn't add the extra constraints for
>> > the normal dotproduct as that makes it too restrictive. It's the type
>> > of the multiplication that determines the operation so dotproduct can
>> > be used a bit more than where we currently do.
>> >
>> > This was relaxed in an earlier patch.
>> 
>> I didn't mean that we should add extra constraints to the normal case though.
>> The existing test I was referring to above was:
>> 
>>   /* If there are two widening operations, make sure they agree on
>>  the sign of the extension.  */
>>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>>   && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
>> return NULL;
>
> But as I mentioned, this restriction is unneeded and has been removed hence 
> why it's not in my patchset's diff.
> It's removed by 
> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569851.html which Richi 
> conditioned on
> the rest of these patches being approved.
>
> This change needlessly blocks test vect-reduc-dot-[2,3,6,7].c from being 
> dotproducts for instance
>
> It's also part of the deficiency between GCC codegen and Clang 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492#c6

Hmm, OK.  Just removing the check regresses:

unsigned long __attribute__ ((noipa))
f (signed short *x, signed short *y)
{
  unsigned long res = 0;
  for (int i = 0; i < 100; ++i)
res += (unsigned int) x[i] * (unsigned int) y[i];
  return res;
}

int
main (void)
{
  signed short x[100], y[100];
  for (int i = 0; i < 100; ++i)
{
  x[i] = -1;
  y[i] = 1;
}
  if (f (x, y) != 0x64ULL - 100)
__builtin_abort ();
  return 0;
}

on SVE.  We then use SDOT even though the result of the multiplication
is zero- rather than sign-extended to 64 bits.  Does something else
in the series stop that from that happening?

Richard


Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-07-12 Thread Richard Biener via Gcc-patches
On Wed, Jul 7, 2021 at 4:37 PM Martin Sebor  wrote:
>
> On 7/7/21 1:28 AM, Richard Biener wrote:
> > On Tue, Jul 6, 2021 at 5:06 PM Martin Sebor  wrote:
> >>
> >> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573968.html
> >>
> >> Any questions/suggestions on the final patch or is it okay to commit?
> >
> > I don't remember seeing one (aka saying "bootstrapped/tested, OK to commit?"
> > or so) - and the link above doesn't have one.
> >
> > So, can you re-post it please?
>
> The patch is attached to the email above with the following text
> at the end:
>
>Attached is a revised patch with these changes (a superset of
>those I sent in response to Jason's question), tested on x86_64.
>
> I've also attached it to this reply.

Thanks - I was confused about the pipermail way of referencing attachments ...

The pieces where you change vec<> passing to const vec<>& and the few
where you change vec<> * to const vec<> * are OK - this should make the
rest a smaller piece to review.  In general const correctness changes should
be considered obvious (vec<> to const vec<>& passing isn't quite obvious
so I acked the cases explicitely).

I think the vec<> -> vec<>& cases would either benefit from constification
of callers that make using const vec<>& not possible or from a change to
pass array_slice<> (not array_slice<>&), noting that the vec<> contents
are mutated but the vec<> size does not change.

Somebody with more C++ knowledge than me needs to approve the
vec.h changes - I don't feel competent to assess all effects of the change.

Thanks,
Richard.

> Martin
>
> >
> > Thanks,
> > Richard.
> >
> >> On 6/29/21 7:46 PM, Martin Sebor wrote:
> >>> On 6/29/21 4:58 AM, Richard Biener wrote:
>  On Mon, Jun 28, 2021 at 8:07 PM Martin Sebor  wrote:
> >
> > On 6/28/21 2:07 AM, Richard Biener wrote:
> >> On Sat, Jun 26, 2021 at 12:36 AM Martin Sebor  wrote:
> >>>
> >>> On 6/25/21 4:11 PM, Jason Merrill wrote:
>  On 6/25/21 4:51 PM, Martin Sebor wrote:
> > On 6/1/21 3:38 PM, Jason Merrill wrote:
> >> On 6/1/21 3:56 PM, Martin Sebor wrote:
> >>> On 5/27/21 2:53 PM, Jason Merrill wrote:
>  On 4/27/21 11:52 AM, Martin Sebor via Gcc-patches wrote:
> > On 4/27/21 8:04 AM, Richard Biener wrote:
> >> On Tue, Apr 27, 2021 at 3:59 PM Martin Sebor 
> >> wrote:
> >>>
> >>> On 4/27/21 1:58 AM, Richard Biener wrote:
>  On Tue, Apr 27, 2021 at 2:46 AM Martin Sebor via Gcc-patches
>   wrote:
> >
> > PR 90904 notes that auto_vec is unsafe to copy and assign
> > because
> > the class manages its own memory but doesn't define (or
> > delete)
> > either special function.  Since I first ran into the
> > problem,
> > auto_vec has grown a move ctor and move assignment from
> > a dynamically-allocated vec but still no copy ctor or copy
> > assignment operator.
> >
> > The attached patch adds the two special functions to
> > auto_vec
> > along
> > with a few simple tests.  It makes auto_vec safe to use in
> > containers
> > that expect copyable and assignable element types and passes
> > bootstrap
> > and regression testing on x86_64-linux.
> 
>  The question is whether we want such uses to appear since
>  those
>  can be quite inefficient?  Thus the option is to delete those
>  operators?
> >>>
> >>> I would strongly prefer the generic vector class to have the
> >>> properties
> >>> expected of any other generic container: copyable and
> >>> assignable.  If
> >>> we also want another vector type with this restriction I
> >>> suggest
> >>> to add
> >>> another "noncopyable" type and make that property explicit in
> >>> its name.
> >>> I can submit one in a followup patch if you think we need one.
> >>
> >> I'm not sure (and not strictly against the copy and assign).
> >> Looking around
> >> I see that vec<> does not do deep copying.  Making
> >> auto_vec<> do it
> >> might be surprising (I added the move capability to match
> >> how vec<>
> >> is used - as "reference" to a vector)
> >
> > The vec base classes are special: they have no ctors at all
> > (because
> > of their use in unions).  That's something we might have to
> > live with
> > but it's not a model to follow in ordinary containers.
> >

Re: [RFC, PATCH] Allow means for targets to out out of CTF/BTF support

2021-07-12 Thread Richard Biener via Gcc-patches
On Fri, Jul 9, 2021 at 12:23 AM Indu Bhagat via Gcc-patches
 wrote:
>
> Hello,
>
> It was brought up when discussing PR debug/101283 (Several tests fail on
> Darwin with -gctf/gbtf) that it will be good to provide means for targets to
> opt out of CTF/BTF support.
>
> By and large, it seems to me that CTF/BTF debug formats can be safely enabled
> for all ELF-based targets by default in GCC.
>
> So, at a high level:
>   - By default, CTF/BTF formats can be enabled for all ELF-based targets.
>   - By default, CTF/BTF formats can be disabled for all non ELF-based targets.
>   - If the user passed a -gctf but CTF is not enabled for the target, GCC
>   issues an error to the user (as is done currently with other debug formats) 
> -
>   "target system does not support the 'ctf' debug format".
>
> This is a makeshift patch which fulfills the above requirements and is based 
> on
> the approach taken for DWARF via DWARF2_DEBUGGING_INFO (I still have to see if
> I need some specific handling in common_handle_option in opts.c). On minimal
> testing, the patch works as desired on x86_64-pc-linux-gnu and a darwin-based
> target.
>
> My question is - Looking around in config.gcc etc., it seems defining in 
> elfos.h
> gives targets/platforms means to override it by virtue of the recommended 
> order
> of # includes in $tm_file. What I cannot say for certain is if this is true in
> practice ? On first look, I believe this could work fine. What do you think ?
>
> If you think this approach could work, I will continue on this track and
> test/refine the patch.

I think it looks reasonable.  Note that target macros need to be documented
in tm.texi - I think while we generally do not want new target macros but
hooks this case can be an exception for consistency purposes (unless
somebody else strongly disagrees).

Thanks,
Richard.

> Thanks
> Indu
>
> -
>
> gcc/ChangeLog:
>
> * config/elfos.h (CTF_DEBUGGING_INFO): New definition.
> (BTF_DEBUGGING_INFO): Likewise.
> * toplev.c: Guard initialization of debug hooks.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/debug/btf/btf.exp: Do not run BTF testsuite if target does 
> not
> support BTF format.
> * gcc.dg/debug/ctf/ctf.exp: Do not run CTF testsuite if target does 
> not
> support CTF format.
> ---
>  gcc/config/elfos.h |  8 
>  gcc/testsuite/gcc.dg/debug/btf/btf.exp | 11 +--
>  gcc/testsuite/gcc.dg/debug/ctf/ctf.exp | 11 +--
>  gcc/toplev.c   | 11 +--
>  4 files changed, 35 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/elfos.h b/gcc/config/elfos.h
> index 7a736cc..e5cb487 100644
> --- a/gcc/config/elfos.h
> +++ b/gcc/config/elfos.h
> @@ -68,6 +68,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>
>  #define DWARF2_DEBUGGING_INFO 1
>
> +/* All ELF targets can support CTF.  */
> +
> +#define CTF_DEBUGGING_INFO 1
> +
> +/* All ELF targets can support BTF.  */
> +
> +#define BTF_DEBUGGING_INFO 1
> +
>  /* The GNU tools operate better with dwarf2, and it is required by some
> psABI's.  Since we don't have any native tools to be compatible with,
> default to dwarf2.  */
> diff --git a/gcc/testsuite/gcc.dg/debug/btf/btf.exp 
> b/gcc/testsuite/gcc.dg/debug/btf/btf.exp
> index e173515..a3e680c 100644
> --- a/gcc/testsuite/gcc.dg/debug/btf/btf.exp
> +++ b/gcc/testsuite/gcc.dg/debug/btf/btf.exp
> @@ -39,8 +39,15 @@ if ![info exists DEFAULT_CFLAGS] then {
>  dg-init
>
>  # Main loop.
> -dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\] ]] \
> -   "" $DEFAULT_CFLAGS
> +set comp_output [gcc_target_compile \
> +"$srcdir/$subdir/../trivial.c" "trivial.S" assembly \
> +"additional_flags=-gbtf"]
> +if { ! [string match "*: target system does not support the * debug format*" 
> \
> +$comp_output] } {
> +remove-build-file "trivial.S"
> +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\] ]] \
> +   "" $DEFAULT_CFLAGS
> +}
>
>  # All done.
>  dg-finish
> diff --git a/gcc/testsuite/gcc.dg/debug/ctf/ctf.exp 
> b/gcc/testsuite/gcc.dg/debug/ctf/ctf.exp
> index 0b650ed..c53cd8b 100644
> --- a/gcc/testsuite/gcc.dg/debug/ctf/ctf.exp
> +++ b/gcc/testsuite/gcc.dg/debug/ctf/ctf.exp
> @@ -39,8 +39,15 @@ if ![info exists DEFAULT_CFLAGS] then {
>  dg-init
>
>  # Main loop.
> -dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\] ]] \
> -   "" $DEFAULT_CFLAGS
> +set comp_output [gcc_target_compile \
> +"$srcdir/$subdir/../trivial.c" "trivial.S" assembly \
> +"additional_flags=-gctf"]
> +if { ! [string match "*: target system does not support the * debug format*" 
> \
> +$comp_output] } {
> +remove-build-file "trivial.S"
> +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\] ]] \
> +   "" $DEFAULT_CFLAGS
> +}
>
>  # All done.
>  dg-finish
> diff --git a/gcc/toplev.c b/gcc/toplev.c
> index 43f1f7d..8103812 100644
> --- a/gcc/t

Re: [PATCH] [PHIOPT/MATCH] Remove the statement to move if not used

2021-07-12 Thread Richard Biener via Gcc-patches
On Fri, Jul 9, 2021 at 10:05 PM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> Instead of waiting for DCE to remove the unused statement,
> and maybe optimize another conditional, it is better if
> we don't move the statement and have the statement
> removed.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK.

Thanks,
Richard.

> Changes from v1:
> * v2: Change the order of insertation and check to see if the lhs
>   is used rather than see if the lhs was used in the sequence.
>
> gcc/ChangeLog:
>
> * tree-ssa-phiopt.c (match_simplify_replacement): Move
> insert of the sequence before the movement of the
> statement. Check if to see if the statement is used
> outside of the original phi to see if we should move it.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/pr96928-1.c: Update to similar as pr96928.c.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr96928-1.c |  5 -
>  gcc/tree-ssa-phiopt.c | 13 ++---
>  2 files changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr96928-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr96928-1.c
> index 2e86620da11..9e505ac9900 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr96928-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr96928-1.c
> @@ -2,7 +2,10 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -fdump-tree-phiopt2 -fdump-tree-optimized" } */
>  /* { dg-final { scan-tree-dump-times " = a_\[0-9]*\\\(D\\\) >> " 5 "phiopt2" 
> } } */
> -/* { dg-final { scan-tree-dump-times " = ~c_\[0-9]*\\\(D\\\);" 1 "phiopt2" } 
> } */
> +/* The following check is done at optimized because a ^ (~b) is rewritten as 
> ~(a^b)
> +   and in the case of match.pd optimizing these ?:, the ~ is moved out 
> already
> +   by the time we get to phiopt2. */
> +/* { dg-final { scan-tree-dump-times "c_\[0-9]*\\\(D\\\) \\\^" 1 "optimized" 
> } } */
>  /* { dg-final { scan-tree-dump-times " = ~" 1 "optimized" } } */
>  /* { dg-final { scan-tree-dump-times " = \[abc_0-9\\\(\\\)D]* \\\^ " 5 
> "phiopt2" } } */
>  /* { dg-final { scan-tree-dump-not "a < 0" "phiopt2" } } */
> diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
> index 7a98b7afdf1..c6adbbd28a0 100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tree-ssa-phiopt.c
> @@ -1020,7 +1020,16 @@ match_simplify_replacement (basic_block cond_bb, 
> basic_block middle_bb,
>  return false;
>
>gsi = gsi_last_bb (cond_bb);
> -  if (stmt_to_move)
> +  /* Insert the sequence generated from gimple_simplify_phiopt.  */
> +  if (seq)
> +gsi_insert_seq_before (&gsi, seq, GSI_CONTINUE_LINKING);
> +
> +  /* If there was a statement to move and the result of the statement
> + is going to be used, move it to right before the original
> + conditional.  */
> +  if (stmt_to_move
> +  && (gimple_assign_lhs (stmt_to_move) == result
> + || !has_single_use (gimple_assign_lhs (stmt_to_move
>  {
>if (dump_file && (dump_flags & TDF_DETAILS))
> {
> @@ -1032,8 +1041,6 @@ match_simplify_replacement (basic_block cond_bb, 
> basic_block middle_bb,
>gsi_move_before (&gsi1, &gsi);
>reset_flow_sensitive_info (gimple_assign_lhs (stmt_to_move));
>  }
> -  if (seq)
> -gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT);
>
>replace_phi_edge_with_variable (cond_bb, e1, phi, result);
>
> --
> 2.27.0
>


Re: [PATCH take 2] PR tree-optimization/38943: Preserve trapping instructions with -fpreserve-traps

2021-07-12 Thread Richard Biener via Gcc-patches
On Sat, Jul 10, 2021 at 9:26 AM Roger Sayle  wrote:
>
>
> Hi Richard and Eric,
> Of course, you're both completely right.  Rather than argue that
> -fnon-call-exceptions without -fexceptions (and without
> -fdelete-dead-exceptions) has some implicit undocumented semantics,
> trapping instructions should be completely orthogonal to exception
> handling.
>
> This patch adds a new code generation option -fpreserve-traps, the
> (obvious) semantics of which is demonstrated by the expanded test
> case below.  The current behaviour of gcc is to eliminate calls
> to may_trap_1, may_trap_2, may_trap_3 etc. from foo, but these are
> all retained with -fpreserve-traps.
>
> Historically, the semantics of -fnon-call-exceptions vs. traps has
> been widely misunderstood, with different levels of optimization
> producing different outcomes, as shown by the impressive list of PRs
> affected by this solution.  Hopefully, this new documentation will
> clarify things.

I've reviewed the cited PRs and most of them would have
individual fixes and are not fixed by your patch, though
-fpreserve-traps would offer a workaround in some cases.

Now, -fpreserve-traps follows the unfortunate precedence of
tieing IL semantics to a (global) flag rather than to individual
stmts.  I'm not sure -fpreserve-traps is something good to offer
since on its own it looks not too useful and for making use of
it one still needs -fnon-call-exceptions [-fexceptions].

There's still the open question what -fnon-call-exceptions on its
own should do - IMHO it doesn't make sense to allow unwiding
from a trapping memory reference but not from the call it resides
in which means -fnon-call-exceptions should better enable
-fexceptions?

There's also valid points made in some of the PRs (some of which
look like dups of each other) that an asm with memory operands
should be trapping and thus throwing with -fnon-call-exceptions
even when it is not volatile and that some builtin functions like
memcpy should not be nothrow with -fnon-call-exceptions.

There's const-correctness pieces in your patch - those are OK
under the obvious rule and you might want to push them separately.

Thanks,
Richard.

>
> This patch has been tested on x86_64-pc-linux-gnu with a "make
> bootstrap" and "make -k check" with no new failures.
>
> Ok for mainline?
>
>
> 2021-07-09  Roger Sayle  
> Eric Botcazou  
> Richard Biener  
>
> gcc/ChangeLog
> PR tree-optimization/38943
> PR middle-end/39801
> PR middle-end/64711
> PR target/70387
> PR tree-optimization/94357
> * common.opt (fpreserve-traps): New code generation option.
> * doc/invoke.texi (-fpreserve-traps): Document new option.
> * gimple.c (gimple_has_side_effects): Consider trapping to
> be a side-effect when -fpreserve-traps is specified.
> (gimple_could_trap_p_1):  Make S argument a "const gimple*".
> Preserve constness in call to gimple_asm_volatile_p.
> (gimple_could_trap_p): Make S argument a "const gimple*".
> * gimple.h (gimple_could_trap_p_1, gimple_could_trap_p):
> Update function prototypes.
> * ipa-pure-const.c (check_stmt): When preserving traps,
> a trapping statement should be considered a side-effect,
> so the function is neither const nor pure.
>
> gcc/testsuite/ChangeLog
> PR tree-optimization/38943
> PR middle-end/39801
> PR middle-end/64711
> PR target/70387
> PR tree-optimization/94357
> * gcc.dg/pr38943.c: New test case.
>
> --
> Roger Sayle, PhD.
> CEO and founder
> NextMove Software Limited
> Registered in England No. 07588305
> Registered Office: Innovation Centre, 320 Cambridge Science Park, Cambridge, 
> CB4 0WG
>
> -Original Message-
> From: Richard Biener 
> Sent: 08 July 2021 11:19
> To: Roger Sayle ; Eric Botcazou 
> 
> Cc: GCC Patches 
> Subject: Re: [PATCH] PR tree-optimization/38943: Preserve trapping 
> instructions with -fnon-call-exceptions
>
> On Thu, Jul 8, 2021 at 11:54 AM Roger Sayle  
> wrote:
> >
> >
> > This patch addresses PR tree-optimization/38943 where gcc may optimize
> > away trapping instructions even when -fnon-call-exceptions is specified.
> > Interestingly this only affects the C compiler (when -fexceptions is
> > not
> > specified) as g++ (or -fexceptions) supports C++-style exception
> > handling, where -fnon-call-exceptions triggers the stmt_could_throw_p 
> > machinery.
> > Without -fexceptions, trapping instructions aren't always considered
> > visible side-effects.
>
> But -fnon-call-exceptions without -fexceptions doesn't make much sense, does 
> it?  I see the testcase behaves correctly when -fexceptions is also specified.
>
> The call vanishes in DCE because stmt_could_throw_p starts with
>
> bool
> stmt_could_throw_p (function *fun, gimple *stmt) {
>   if (!flag_exceptions)
> return false;
>
> the documentation of -fnon-call-exceptions says
>
> Generate

Re: [PATCH] move the (a-b) CMP 0 ? (a-b) : (b-a) optimization from fold_cond_expr_with_comparison to match

2021-07-12 Thread Richard Biener via Gcc-patches
On Sun, Jul 11, 2021 at 4:12 AM apinski--- via Gcc-patches
 wrote:
>
> From: Andrew Pinski 
>
> This patch moves the (a-b) CMP 0 ? (a-b) : (b-a) optimization
> from fold_cond_expr_with_comparison to match.

So I searched and I guess these transforms are produced from

  /* If we have A op 0 ? A : -A, consider applying the following
 transformations:

 A == 0? A : -Asame as -A
 A != 0? A : -Asame as A
 A >= 0? A : -Asame as abs (A)
 A > 0?  A : -Asame as abs (A)
 A <= 0? A : -Asame as -abs (A)
 A < 0?  A : -Asame as -abs (A)

 None of these transformations work for modes with signed
 zeros.  If A is +/-0, the first two transformations will
 change the sign of the result (from +0 to -0, or vice
 versa).  The last four will fix the sign of the result,
 even though the original expressions could be positive or
 negative, depending on the sign of A.

 Note that all these transformations are correct if A is
 NaN, since the two alternatives (A and -A) are also NaNs.  */
  if (!HONOR_SIGNED_ZEROS (type)
  && (FLOAT_TYPE_P (TREE_TYPE (arg01))
  ? real_zerop (arg01)
  : integer_zerop (arg01))
  && ((TREE_CODE (arg2) == NEGATE_EXPR
   && operand_equal_p (TREE_OPERAND (arg2, 0), arg1, 0))
 /* In the case that A is of the form X-Y, '-A' (arg2) may
have already been folded to Y-X, check for that. */
  || (TREE_CODE (arg1) == MINUS_EXPR
  && TREE_CODE (arg2) == MINUS_EXPR
  && operand_equal_p (TREE_OPERAND (arg1, 0),
  TREE_OPERAND (arg2, 1), 0)
  && operand_equal_p (TREE_OPERAND (arg1, 1),
  TREE_OPERAND (arg2, 0), 0
...

I wonder at which point we can remove the code from fold-const.c?

Some comments inline below.

> OK? Bootstrapped and tested on x86_64-linux-gnu.
>
> gcc/ChangeLog:
>
> * match.pd ((A-B) CMP 0 ? (A-B) : (B - A)):
> New patterns.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/phi-opt-25.c: New test.
> ---
>  gcc/match.pd   | 48 --
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25.c | 45 
>  2 files changed, 90 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 30680d488ab..aa88381fdcb 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4040,9 +4040,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(cnd (logical_inverted_value truth_valued_p@0) @1 @2)
>(cnd @0 @2 @1)))
>
> -/* abs/negative simplifications moved from fold_cond_expr_with_comparison,
> -   Need to handle (A - B) case as fold_cond_expr_with_comparison does.
> -   Need to handle UN* comparisons.
> +/* abs/negative simplifications moved from fold_cond_expr_with_comparison.
>
> None of these transformations work for modes with signed
> zeros.  If A is +/-0, the first two transformations will
> @@ -4098,6 +4096,50 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (convert (negate (absu:utype @0
> (negate (abs @0)
>   )
> +
> + /* (A - B) == 0 ? (A - B) : (B - A)same as (B - A) */
> + (for cmp (eq uneq)
> +  (simplify
> +   (cnd (cmp (minus@0 @1 @2) zerop) @0 (minus@3 @2 @1))
> +(if (!HONOR_SIGNED_ZEROS (type))
> + @3))
> +  (simplify
> +   (cnd (cmp (minus@0 @1 @2) zerop) integer_zerop (minus@3 @2 @1))

So that makes me think why integer_zerop?  'type' should then be
integer and thus never HONOR_SIGNED_ZEROS.

Don't we also need the inverted condition case for completeness?

> +(if (!HONOR_SIGNED_ZEROS (type))
> + @3))
> +  (simplify
> +   (cnd (cmp @1 @2) integer_zerop (minus@3 @2 @1))

I think this needs to be (cmp:c @1 @2)

> +(if (!HONOR_SIGNED_ZEROS (type))
> + @3))
> + )
> + /* (A - B) != 0 ? (A - B) : (B - A)same as (A - B) */
> + (for cmp (ne ltgt)
> +  (simplify
> +   (cnd (cmp (minus@0 @1 @2) zerop) @0 (minus @2 @1))
> +(if (!HONOR_SIGNED_ZEROS (type))
> + @0))
> + )
> + /* (A - B) >=/> 0 ? (A - B) : (B - A)same as abs (A - B) */
> + (for cmp (ge gt)
> +  (simplify
> +   (cnd (cmp (minus@0 @1 @2) zerop) @0 (minus @2 @1))
> +(if (!HONOR_SIGNED_ZEROS (type)
> +&& !TYPE_UNSIGNED (type))
> + (abs @0
> + /* (A - B) <=/< 0 ? (A - B) : (B - A)same as -abs (A - B) */
> + (for cmp (le lt)
> +  (simplify
> +   (cnd (cmp (minus@0 @1 @2) zerop) @0 (minus @2 @1))
> +(if (!HONOR_SIGNED_ZEROS (type)
> +&& !TYPE_UNSIGNED (type))
> + (if (ANY_INTEGRAL_TYPE_P (type)
> + && !TYPE_OVERFLOW_WRAPS (type))
> +  (with {
> +   tree utype = unsigned_type_for (type);
> +   }
> +   (convert (negate (absu:utype @0
> +   (negate (abs @0)
> + )
>  )
>
>  /* -(type)!A -> (type)A - 1.  */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt

Re: [PATCH v3 1/2] Add -f[no-]direct-extern-access

2021-07-12 Thread H.J. Lu via Gcc-patches
On Sun, Jul 11, 2021 at 11:13 PM Richard Biener
 wrote:
>
> On Fri, Jul 9, 2021 at 4:50 PM H.J. Lu  wrote:
> >
> > -fdirect-extern-access is the default.  With -fno-direct-extern-access:
> >
> > 1. Always use GOT to access undefined data and function symbols,
> >including in PIE and non-PIE.  These will avoid copy relocations
> >in executables.  This is compatible with existing executables and
> >shared libraries.
> > 2. In executable and shared library, bind symbols with the STV_PROTECTED
> >visibility locally:
> >a. The address of data symbol is the address of data body.
> >b. For systems without function descriptor, the function pointer is
> >   the address of function body.
> >c. The resulting shared libraries may not be incompatible with
> >   executables which have copy relocations on protected symbols or
> >   use executable PLT entries as function addresses for protected
> >   functions in shared libraries.
> > 3. Update asm_preferred_eh_data_format to select PC relative EH encoding
> > format with -fno-direct-extern-access to avoid copy relocation.
> > 4. Add ix86_reloc_rw_mask for TARGET_ASM_RELOC_RW_MASK to avoid copy
> > relocation with -fno-direct-extern-access.
>
> Did you check how relocations in .debug_info behave?  I don't remember whether

Yes, I did.   I added ix86_reloc_rw_mask and use PC-relative format for
EH pointer encodings to avoid copy relocation for -fno-direct-extern-access
in read-only sections.

> we're doing anything special there or if we just copy how we emit
> relocs in .text
>
> Richard.
>
> > gcc/
> >
> > PR target/35513
> > PR target/100593
> > * common.opt: Add -fdirect-extern-access.
> > * config/i386/i386-protos.h (ix86_force_load_from_GOT_p): Add a
> > bool argument.
> > * config/i386/i386.c (ix86_force_load_from_GOT_p): Add a bool
> > argument to indicate call operand.  Force non-call load
> > from GOT for -fno-direct-extern-access.
> > (legitimate_pic_address_disp_p): Avoid copy relocation in PIE
> > for -fno-direct-extern-access.
> > (ix86_print_operand): Pass true to ix86_force_load_from_GOT_p
> > for call operand.
> > (asm_preferred_eh_data_format): Use PC-relative format for
> > -fno-direct-extern-access to avoid copy relocation.  Check
> > ptr_mode instead of TARGET_64BIT when selecting DW_EH_PE_sdata4.
> > (ix86_binds_local_p): Don't treat protected data as extern and
> > avoid copy relocation on common symbol with
> > -fno-direct-extern-access.
> > (ix86_reloc_rw_mask): New to avoid copy relocation for
> > -fno-direct-extern-access.
> > (TARGET_ASM_RELOC_RW_MASK): New.
> > * doc/invoke.texi: Document -f[no-]direct-extern-access.
> >
> > gcc/testsuite/
> >
> > PR target/35513
> > PR target/100593
> > * g++.dg/pr35513-1.C: New file.
> > * g++.dg/pr35513-2.C: Likewise.
> > * gcc.target/i386/pr35513-1.c: Likewise.
> > * gcc.target/i386/pr35513-2.c: Likewise.
> > * gcc.target/i386/pr35513-3.c: Likewise.
> > * gcc.target/i386/pr35513-4.c: Likewise.
> > * gcc.target/i386/pr35513-5.c: Likewise.
> > * gcc.target/i386/pr35513-6.c: Likewise.
> > * gcc.target/i386/pr35513-7.c: Likewise.
> > * gcc.target/i386/pr35513-8.c: Likewise.
> > ---
> >  gcc/common.opt|  4 ++
> >  gcc/config/i386/i386-protos.h |  2 +-
> >  gcc/config/i386/i386.c| 50 +++--
> >  gcc/doc/invoke.texi   | 13 ++
> >  gcc/testsuite/g++.dg/pr35513-1.C  | 25 +++
> >  gcc/testsuite/g++.dg/pr35513-2.C  | 53 +++
> >  gcc/testsuite/gcc.target/i386/pr35513-1.c | 16 +++
> >  gcc/testsuite/gcc.target/i386/pr35513-2.c | 15 +++
> >  gcc/testsuite/gcc.target/i386/pr35513-3.c | 15 +++
> >  gcc/testsuite/gcc.target/i386/pr35513-4.c | 15 +++
> >  gcc/testsuite/gcc.target/i386/pr35513-5.c | 15 +++
> >  gcc/testsuite/gcc.target/i386/pr35513-6.c | 14 ++
> >  gcc/testsuite/gcc.target/i386/pr35513-7.c | 15 +++
> >  gcc/testsuite/gcc.target/i386/pr35513-8.c | 41 ++
> >  14 files changed, 278 insertions(+), 15 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/pr35513-1.C
> >  create mode 100644 gcc/testsuite/g++.dg/pr35513-2.C
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-5.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-6.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr35513-7.c
> >  create mode 100644 gcc/testsuite/gcc.

PING^1 [PATCH v5 00/11] Allow TImode/OImode/XImode in op_by_pieces operations

2021-07-12 Thread H.J. Lu via Gcc-patches
On Thu, Jul 1, 2021 at 8:22 AM H.J. Lu  wrote:
>
> Changes in the v5 patches:
>
> 1. Add TARGET_GEN_MEMSET_SCRATCH_RTX to allow the backend to use a hard
> scratch register to avoid stack realignment when expanding memset.
> 2. Use vec_duplicate, instead of adding TARGET_READ_MEMSET_VALUE and
> TARGET_GEN_MEMSET_VALUE, to expand memset if available.
>
> Changes in the v4 patches:
>
> 1. Define x86 MAX_MOVE_MAX to 64, which is the constant maximum number
> of bytes that a single instruction can move quickly between memory and
> registers or between two memory locations.
> 2. Define x86 MOVE_MAX to MOVE_MAX_PIECES, which is the maximum number of
> bytes we can move from memory to memory in one reasonably fast instruction.
> The difference between MAX_MOVE_MAX and MOVE_MAX is that MAX_MOVE_MAX
> must be a constant, independent of compiler options, since it is used in
> reload.h to define struct target_reload and MOVE_MAX can vary, depending
> on compiler options.
>
> Changes in the v3 patches:
>
> 1. Split the TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE changes
> into the generic part and the x86 part.
>
>
> 1. Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support
> target instructions to duplicate QImode value to TImode/OImode/XImode
> value for memmset.
> 2. x86: Avoid stack realignment when copying data
> 3. x86: Remov MAX_BITSIZE_MODE_ANY_INT.  Only x86 backend defines it.
> 4. x86: Use TImode/OImode/XImode integers for piecewise move and store.
> 5. x86: Add tests for TImode/OImode/XImode for piecewise move and store.
> 6. x86: Adjust existing tests.
>
> On x86-64, SPEC CPU 2017 performance impact is neutral.  Glibc code size
> differences with -O2 build are:
>
>  Before After
> libc.so 19065721906444
>
> Some code sequence differences in libc.so are:
>
> :
> ...
> jne   | jne   
>  
> test   %r15,%r15test  
>  %r15,%r15
> je| je
>  
> mov%r13d,(%r14) mov   
>  %r13d,(%r14)
> lea0x10(%r14),%rdi  lea   
>  0x10(%r14),%rdi
> mov$0x1,%ecxmov   
>  $0x1,%ecx
> mov%r13d,%edx   mov   
>  %r13d,%edx
> mov%r15,0x40(%r12)  mov   
>  %r15,0x40(%r12)
> mov%r15,%rsimov   
>  %r15,%rsi
> call call  
>  
> lea0xa2f9b(%rip),%rax# | lea   
>  0xa2fab(%rip),%rax# 
> xor%esi,%esixor   
>  %esi,%esi
> mov%ebp,%edimov   
>  %ebp,%edi
> mov%rax,0x8(%r12)   mov   
>  %rax,0x8(%r12)
> movzwl 0x12(%rsp),%eax  
> movzwl 0x12(%rsp),%eax
> mov$0x8,%edx  <
> lea0xc(%rsp),%rcx   lea   
>  0xc(%rsp),%rcx
> mov%r14,0x48(%r12)<
> add$0x40,%r14 <
> mov$0x4,%r8dmov   
>  $0x4,%r8d
>   > movq  
>  $0x0,0x1d0(%r14)
>   > mov   
>  $0x8,%edx
> rol$0x8,%ax rol   
>  $0x8,%ax
> mov%ebp,(%r12)| mov   
>  %r14,0x48(%r12)
> movq   $0x0,0x190(%r14)   | add   
>  $0x40,%r14
> mov%ax,0x4(%r12)  <
> mov%r14,0x30(%r12)  mov   
>  %r14,0x30(%r12)
>   > mov   
>  %ax,0x4(%r12)
>   > mov   
>  %ebp,(%r12)
> movl   $0x1,0xc(%rsp)   movl  
>  $0x1,0xc(%rsp)
> callcall  
>  
> mov%r12,%rdimov   
>  %r12,%rdi
> movabs $0x101010101010101,%rdx<
> test   %eax,%eaxtest  
>  %eax,%eax
> mov$0xff,%eax   mov   
>  $0xff,%eax
> cmove  %eax,%ebx

Re: [PATCH take 2] PR tree-optimization/38943: Preserve trapping instructions with -fpreserve-traps

2021-07-12 Thread Eric Botcazou
> There's still the open question what -fnon-call-exceptions on its
> own should do - IMHO it doesn't make sense to allow unwiding
> from a trapping memory reference but not from the call it resides
> in which means -fnon-call-exceptions should better enable
> -fexceptions?

Or issue a warning that it requires -fexceptions if the latter is not enabled?

-- 
Eric Botcazou




RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-07-12 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Sandiford 
> Sent: Monday, July 12, 2021 11:26 AM
> To: Tamar Christina 
> Cc: Richard Biener ; nd ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Monday, July 12, 2021 10:39 AM
> >> To: Tamar Christina 
> >> Cc: Richard Biener ; nd ; gcc-
> >> patc...@gcc.gnu.org
> >> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
> >> where the sign for the multiplicant changes.
> >>
> >> Tamar Christina  writes:
> >> > Hi,
> >> >
> >> >> Richard Sandiford  writes:
> >> >> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
> >> >> *vinfo,
> >> >> >>/* FORNOW.  Can continue analyzing the def-use chain when
> >> >> >> this stmt in
> >> >> a phi
> >> >> >>   inside the loop (in case we are analyzing an outer-loop).  */
> >> >> >>vect_unpromoted_value unprom0[2];
> >> >> >> +  enum optab_subtype subtype = optab_vector;
> >> >> >>if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> >> >> WIDEN_MULT_EXPR,
> >> >> >> - false, 2, unprom0, &half_type))
> >> >> >> + false, 2, unprom0, &half_type, &subtype))
> >> >> >> +return NULL;
> >> >> >> +
> >> >> >> +  if (subtype == optab_vector_mixed_sign
> >> >> >> +  && TYPE_UNSIGNED (unprom_mult.type)
> >> >> >> +  && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
> >> >> >> + (unprom_mult.type))
> >> >> >>  return NULL;
> >> >> >
> >> >> > Isn't the final condition here instead that TYPE1 is narrower than
> TYPE2?
> >> >> > I.e. we need to reject the case in which we multiply a signed
> >> >> > and an unsigned value to get a (logically) signed result, but
> >> >> > then zero-extend it (rather than sign-extend it) to the
> >> >> > precision of the
> >> addition.
> >> >> >
> >> >> > That would make the test:
> >> >> >
> >> >> >   if (subtype == optab_vector_mixed_sign
> >> >> >   && TYPE_UNSIGNED (unprom_mult.type)
> >> >> >   && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION
> (type))
> >> >> > return NULL;
> >> >> >
> >> >> > instead.
> >> >>
> >> >> And folding that into the existing test gives:
> >> >>
> >> >>   /* If there are two widening operations, make sure they agree on
> >> >> the
> >> sign
> >> >>  of the extension.  The result of an optab_vector_mixed_sign
> operation
> >> >>  is signed; otherwise, the result has the same sign as the operands.
> */
> >> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
> >> >>   && (subtype == optab_vector_mixed_sign
> >> >>   ? TYPE_UNSIGNED (unprom_mult.type)
> >> >>   : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
> >> >> return NULL;
> >> >>
> >> >
> >> > I went with the first one which doesn't add the extra constraints
> >> > for the normal dotproduct as that makes it too restrictive. It's
> >> > the type of the multiplication that determines the operation so
> >> > dotproduct can be used a bit more than where we currently do.
> >> >
> >> > This was relaxed in an earlier patch.
> >>
> >> I didn't mean that we should add extra constraints to the normal case
> though.
> >> The existing test I was referring to above was:
> >>
> >>   /* If there are two widening operations, make sure they agree on
> >>  the sign of the extension.  */
> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
> >>   && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
> >> return NULL;
> >
> > But as I mentioned, this restriction is unneeded and has been removed
> hence why it's not in my patchset's diff.
> > It's removed by
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569851.html which
> Richi conditioned on the rest of these patches being approved.
> >
> > This change needlessly blocks test vect-reduc-dot-[2,3,6,7].c from
> > being dotproducts for instance
> >
> > It's also part of the deficiency between GCC codegen and Clang
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492#c6
> 
> Hmm, OK.  Just removing the check regresses:
> 
> unsigned long __attribute__ ((noipa))
> f (signed short *x, signed short *y)
> {
>   unsigned long res = 0;
>   for (int i = 0; i < 100; ++i)
> res += (unsigned int) x[i] * (unsigned int) y[i];
>   return res;
> }
> 
> int
> main (void)
> {
>   signed short x[100], y[100];
>   for (int i = 0; i < 100; ++i)
> {
>   x[i] = -1;
>   y[i] = 1;
> }
>   if (f (x, y) != 0x64ULL - 100)
> __builtin_abort ();
>   return 0;
> }
> 
> on SVE.  We then use SDOT even though the result of the multiplication is
> zero- rather than sign-extended to 64 bits.  Does something else in the series
> stop that from that happening?

No, and I hadn't noticed it before because it looks like the mid-end tests that 
are execution test don't turn 

Re: [PATCH take 2] PR tree-optimization/38943: Preserve trapping instructions with -fpreserve-traps

2021-07-12 Thread Richard Biener via Gcc-patches
On Mon, Jul 12, 2021 at 2:22 PM Eric Botcazou  wrote:
>
> > There's still the open question what -fnon-call-exceptions on its
> > own should do - IMHO it doesn't make sense to allow unwiding
> > from a trapping memory reference but not from the call it resides
> > in which means -fnon-call-exceptions should better enable
> > -fexceptions?
>
> Or issue a warning that it requires -fexceptions if the latter is not enabled?

Maybe that as well - I'd just like to avoid having the "undefined" state
flag_non_call_exceptions && ! flag_exceptions in the middle-end.

Well, unless somebody comes up with a good convincing use.

Richard.

> --
> Eric Botcazou
>
>


[Ada] Duplicate Size/Value_Size clause

2021-07-12 Thread Pierre-Marie de Rodat
Give a warning if both Size and Value_Size attributes are specified for
the same type.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch13.adb (Duplicate_Clause): Add a helper routine
Check_One_Attr, with a parameter for the attribute_designator we
are looking for, and one for the attribute_designator of the
current node (which are usually the same). For Size and
Value_Size, call it twice, once for each.
* errout.ads: Fix a typo.diff --git a/gcc/ada/errout.ads b/gcc/ada/errout.ads
--- a/gcc/ada/errout.ads
+++ b/gcc/ada/errout.ads
@@ -279,7 +279,7 @@ package Errout is
--  The character ? appearing anywhere in a message makes the message
--  warning instead of a normal error message, and the text of the
--  message will be preceded by "warning:" in the normal case. The
-   --  handling of warnings if further controlled by the Warning_Mode
+   --  handling of warnings is further controlled by the Warning_Mode
--  option (-w switch), see package Opt for further details, and also by
--  the current setting from pragma Warnings. This pragma applies only
--  to warnings issued from the semantic phase (not the parser), but


diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -5181,7 +5181,9 @@ package body Sem_Ch13 is
   --  This routine checks if the aspect for U_Ent being given by attribute
   --  definition clause N is for an aspect that has already been specified,
   --  and if so gives an error message. If there is a duplicate, True is
-  --  returned, otherwise if there is no error, False is returned.
+  --  returned, otherwise there is no error, and False is returned. Size
+  --  and Value_Size are considered to conflict, but for compatibility,
+  --  this is merely a warning.
 
   procedure Check_Indexing_Functions;
   --  Check that the function in Constant_Indexing or Variable_Indexing
@@ -6007,7 +6009,47 @@ package body Sem_Ch13 is
   --
 
   function Duplicate_Clause return Boolean is
- A : Node_Id;
+
+ function Check_One_Attr (Attr_1, Attr_2 : Name_Id) return Boolean;
+ --  Check for one attribute; Attr_1 is the attribute_designator we are
+ --  looking for. Attr_2 is the attribute_designator of the current
+ --  node. Normally, this is called just once by Duplicate_Clause, with
+ --  Attr_1 = Attr_2. However, it needs to be called twice for Size and
+ --  Value_Size, because these mean the same thing. For compatibility,
+ --  we allow specifying both Size and Value_Size, but only if the two
+ --  sizes are equal.
+
+ 
+ -- Check_One_Attr --
+ 
+
+ function Check_One_Attr (Attr_1, Attr_2 : Name_Id) return Boolean is
+A : constant Node_Id :=
+  Get_Rep_Item (U_Ent, Attr_1, Check_Parents => False);
+ begin
+if Present (A) then
+   if Attr_1 = Attr_2 then
+  Error_Msg_Name_1 := Attr_1;
+  Error_Msg_Sloc := Sloc (A);
+  Error_Msg_NE ("aspect% for & previously given#", N, U_Ent);
+
+   else
+  pragma Assert (Attr_1 in Name_Size | Name_Value_Size);
+  pragma Assert (Attr_2 in Name_Size | Name_Value_Size);
+
+  Error_Msg_Name_1 := Attr_2;
+  Error_Msg_Name_2 := Attr_1;
+  Error_Msg_Sloc := Sloc (A);
+  Error_Msg_NE ("?% for & conflicts with % #", N, U_Ent);
+   end if;
+
+   return True;
+end if;
+
+return False;
+ end Check_One_Attr;
+
+  --  Start of processing for Duplicate_Clause
 
   begin
  --  Nothing to do if this attribute definition clause comes from
@@ -6019,21 +6061,20 @@ package body Sem_Ch13 is
 return False;
  end if;
 
- --  Otherwise current clause may duplicate previous clause, or a
- --  previously given pragma or aspect specification for the same
- --  aspect.
-
- A := Get_Rep_Item (U_Ent, Chars (N), Check_Parents => False);
+ --  Special cases for Size and Value_Size
 
- if Present (A) then
-Error_Msg_Name_1 := Chars (N);
-Error_Msg_Sloc := Sloc (A);
-
-Error_Msg_NE ("aspect% for & previously given#", N, U_Ent);
+ if (Chars (N) = Name_Size
+   and then Check_One_Attr (Name_Value_Size, Name_Size))
+   or else
+(Chars (N) = Name_Value_Size
+   and then Check_One_Attr (Name_Size, Name_Value_Size))
+ then
 return True;
  end if;
 
- return False;
+ --  Normal case (including Size and Value_Size)
+
+ return Check_One_Attr

[Ada] Add DWARF 5 support to System.Dwarf_Line

2021-07-12 Thread Pierre-Marie de Rodat
The encoding of the debugging line information has substantially changed
in DWARF 5, so this adds the support for it alongside the existing code.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-dwalin.ads: Adjust a few comments left and right.
(Line_Info_Register): Comment out unused components.
(Line_Info_Header): Add DWARF 5 support.
(Dwarf_Context): Likewise.  Rename "prologue" into "header".
* libgnat/s-dwalin.adb: Alphabetize "with" clauses.
(DWARF constants): Add DWARF 5 support and reorder.
(For_Each_Row): Adjust.
(Initialize_Pass): Likewise.
(Initialize_State_Machine): Likewise and fix typo.
(Open): Add DWARF 5 support.
(Parse_Prologue): Rename into...
(Parse_Header): ...this and add DWARF 5 support.
(Read_And_Execute_Isn): Rename into...
(Read_And_Execute_Insn): ...this and adjust.
(To_File_Name): Change parameter name and add DWARF 5 support.
(Read_Entry_Format_Array): New procedure.
(Skip_Form): Add DWARF 5 support and reorder.
(Seek_Abbrev): Do not count entries and add DWARF 5 support.
(Debug_Info_Lookup): Add DWARF 5 support.
(Symbolic_Address.Set_Result): Likewise.
(Symbolic_Address): Adjust.

patch.diff.gz
Description: application/gzip


[Ada] Clean up Uint fields

2021-07-12 Thread Pierre-Marie de Rodat
We add new field types Valid_Uint, Unat, Upos, Nonzero_Uint,
which have predicates that assert the value is a proper
Uint value (i.e. not No_Uint), and that the value is
appropriate. It is not clear that Nonzero_Uint is needed,
but it is useful in testing; we can always remove it later.

We use the new field types for Alignment (which requires
changes) and a few others (which were easy). We intend to
use these for other fields as well. For example, Esize should
be of type Valid_Uint.

Fields of these new subtypes have no default (unlike Uint fields, which
still default to Uint_0); it is required to set them before calling
the getter. This patch fixes various places where that was not true
(so far, mainly for Alignment).

The "unknown" state of Alignment is now represented by the initial zero
value (instead of Uint_0). Unfortunately, we often set Alignment to some
value, and then set it back to unknown, so we need Init_Alignment to
call Reinit_Field_To_Zero. We intend to change other fields, such as
Esize, in a similar way.

Note that "initial zero value" and Uint_0 are two different things.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* uintp.ads, types.h: New subtypes of Uint: Valid_Uint, Unat,
Upos, Nonzero_Uint with predicates. These correspond to new
field types in Gen_IL.
* gen_il-types.ads (Valid_Uint, Unat, Upos, Nonzero_Uint): New
field types.
* einfo-utils.ads, einfo-utils.adb, fe.h (Known_Alignment,
Init_Alignment): Use the initial zero value to represent
"unknown". This will ensure that if Alignment is called before
Set_Alignment, the compiler will blow up (if assertions are
enabled).
* atree.ads, atree.adb, atree.h, gen_il-gen.adb
(Get_Valid_32_Bit_Field): New generic low-level getter for
subtypes of Uint.
(Copy_Alignment): New procedure to copy Alignment field even
when Unknown.
(Init_Object_Size_Align, Init_Size_Align): Do not bypass the
Init_ procedures.
* exp_pakd.adb, freeze.adb, layout.adb, repinfo.adb,
sem_util.adb: Protect calls to Alignment with Known_Alignment.
Use Copy_Alignment when it might be unknown.
* gen_il-gen-gen_entities.adb (Alignment,
String_Literal_Length): Use type Unat instead of Uint, to ensure
that the field is always Set_ before we get it, and that it is
set to a nonnegative value.
(Enumeration_Pos): Unat.
(Enumeration_Rep): Valid_Uint. Can be negative, but must be
valid before fetching.
(Discriminant_Number): Upos.
(Renaming_Map): Remove.
* gen_il-gen-gen_nodes.adb (Char_Literal_Value, Reason): Unat.
(Intval, Corresponding_Integer_Value): Valid_Uint.
* gen_il-internals.ads: New functions for dealing with special
defaults and new subtypes of Uint.
* scans.ads: Correct comments.
* scn.adb (Post_Scan): Do not set Intval to No_Uint; that is no
longer allowed.
* sem_ch13.adb (Analyze_Enumeration_Representation_Clause): Do
not set Enumeration_Rep to No_Uint; that is no longer allowed.
(Offset_Value): Protect calls to Alignment with Known_Alignment.
* sem_prag.adb (Set_Atomic_VFA): Do not use Uint_0 to mean
"unknown"; call Init_Alignment instead.
* sinfo.ads: Minor comment fix.
* treepr.adb: Deal with printing of new field types.
* einfo.ads, gen_il-fields.ads (Renaming_Map): Remove.
* gcc-interface/decl.c (gnat_to_gnu_entity): Use Known_Alignment
before calling Alignment. This preserve some probably buggy
behavior: if the alignment is not set, it previously defaulted
to Uint_0; we now make that explicit.  Use Copy_Alignment,
because "Set_Alignment (Y, Alignment (X));" no longer works when
the Alignment of X has not yet been set.
* gcc-interface/trans.c (process_freeze_entity): Use
Copy_Alignment.

patch.diff.gz
Description: application/gzip


[Ada] Implement support for unconstrained array types with FLB

2021-07-12 Thread Pierre-Marie de Rodat
The fixed lower bound also makes it possible to simplify the formula of
the upper bound used for unconstrained array types.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* gcc-interface/decl.c (gnat_to_gnu_entity) : Use a
fixed lower bound if the index subtype is marked so, as well as a
more efficient formula for the upper bound if the array cannot be
superflat.
(flb_cannot_be_superflat): New predicate.
(cannot_be_superflat): Rename into...
(range_cannot_be_superfla): ...this.  Minor tweak.diff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c
--- a/gcc/ada/gcc-interface/decl.c
+++ b/gcc/ada/gcc-interface/decl.c
@@ -217,7 +217,8 @@ static void set_reverse_storage_order_on_array_type (tree);
 static bool same_discriminant_p (Entity_Id, Entity_Id);
 static bool array_type_has_nonaliased_component (tree, Entity_Id);
 static bool compile_time_known_address_p (Node_Id);
-static bool cannot_be_superflat (Node_Id);
+static bool flb_cannot_be_superflat (Node_Id);
+static bool range_cannot_be_superflat (Node_Id);
 static bool constructor_address_p (tree);
 static bool allocatable_size_p (tree, bool);
 static bool initial_value_needs_conversion (tree, tree);
@@ -2238,13 +2239,15 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 	 index += (convention_fortran_p ? - 1 : 1),
 	 gnat_index = Next_Index (gnat_index))
 	  {
-	char field_name[16];
+	const bool is_flb
+	  = Is_Fixed_Lower_Bound_Index_Subtype (Etype (gnat_index));
 	tree gnu_index_type = get_unpadded_type (Etype (gnat_index));
 	tree gnu_orig_min = TYPE_MIN_VALUE (gnu_index_type);
 	tree gnu_orig_max = TYPE_MAX_VALUE (gnu_index_type);
 	tree gnu_index_base_type = get_base_type (gnu_index_type);
 	tree gnu_lb_field, gnu_hb_field;
 	tree gnu_min, gnu_max, gnu_high;
+	char field_name[16];
 
 	/* Update the maximum size of the array in elements.  */
 	if (gnu_max_size)
@@ -2278,25 +2281,38 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 
 	/* We can't use build_component_ref here since the template type
 	   isn't complete yet.  */
-	gnu_orig_min = build3 (COMPONENT_REF, TREE_TYPE (gnu_lb_field),
-   gnu_template_reference, gnu_lb_field,
-   NULL_TREE);
+	if (!is_flb)
+	  {
+		gnu_orig_min = build3 (COMPONENT_REF, TREE_TYPE (gnu_lb_field),
+   gnu_template_reference, gnu_lb_field,
+   NULL_TREE);
+		TREE_READONLY (gnu_orig_min) = 1;
+	  }
+
 	gnu_orig_max = build3 (COMPONENT_REF, TREE_TYPE (gnu_hb_field),
    gnu_template_reference, gnu_hb_field,
    NULL_TREE);
-	TREE_READONLY (gnu_orig_min) = TREE_READONLY (gnu_orig_max) = 1;
+	TREE_READONLY (gnu_orig_max) = 1;
 
 	gnu_min = convert (sizetype, gnu_orig_min);
 	gnu_max = convert (sizetype, gnu_orig_max);
 
 	/* Compute the size of this dimension.  See the E_Array_Subtype
 	   case below for the rationale.  */
-	gnu_high
-	  = build3 (COND_EXPR, sizetype,
-			build2 (GE_EXPR, boolean_type_node,
-gnu_orig_max, gnu_orig_min),
-			gnu_max,
-			size_binop (MINUS_EXPR, gnu_min, size_one_node));
+	if (is_flb
+		&& Nkind (gnat_index) == N_Subtype_Indication
+	&& flb_cannot_be_superflat (gnat_index))
+	  gnu_high = gnu_max;
+
+	else
+	  gnu_high
+		= build3 (COND_EXPR, sizetype,
+			  build2 (GE_EXPR, boolean_type_node,
+  gnu_orig_max, gnu_orig_min),
+			  gnu_max,
+			  TREE_CODE (gnu_min) == INTEGER_CST
+			  ? int_const_binop (MINUS_EXPR, gnu_min, size_one_node)
+			  : size_binop (MINUS_EXPR, gnu_min, size_one_node));
 
 	/* Make a range type with the new range in the Ada base type.
 	   Then make an index type with the size range in sizetype.  */
@@ -2595,7 +2611,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 		 this.  If we can prove that the array can never be superflat,
 		 we can just use the high bound of the index type.  */
 	  else if ((Nkind (gnat_index) == N_Range
-		&& cannot_be_superflat (gnat_index))
+		&& range_cannot_be_superflat (gnat_index))
 		   /* Bit-Packed Array Impl. Types are never superflat.  */
 		   || (Is_Packed_Array_Impl_Type (gnat_entity)
 			   && Is_Bit_Packed_Array
@@ -6414,33 +6430,81 @@ compile_time_known_address_p (Node_Id gnat_address)
   return Compile_Time_Known_Value (gnat_address);
 }
 
+/* Return true if GNAT_INDIC, a N_Subtype_Indication node for the index of a
+   FLB, cannot yield superflat objects, i.e. if the inequality HB >= LB - 1
+   is true for these objects.  LB and HB are the low and high bounds.  */
+
+static bool
+flb_cannot_be_superflat (Node_Id gnat_indic)
+{
+  const Entity_Id gnat_type = Entity (Subtype_Mark (gnat_indic));
+  const Entity_Id gnat_subtype = Etype (gnat_indic);
+  Node_Id gnat_scalar_range, gnat_lb, gnat_hb;
+  tree gnu_lb, gnu_hb, gnu_lb_minus_one;

Re: [PATCH] libgomp: Include early to avoid link failure with glibc 2.34

2021-07-12 Thread Florian Weimer via Gcc-patches
* Florian Weimer:

>  is included indirectly in the #pragma GCC visibility hidden
> block.  With glibc 2.34,  needs a declaration of the sysconf
> function, and including it under hidden visibility turns other calls
> to sysconf into hidden references, leading to a linker failure.
>
> libgomp/ChangeLog:
>
>   * libgomp.h: Include .
>
> ---
>  libgomp/libgomp.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
> index 8d25dc8e2a8..1fe209429d1 100644
> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -46,6 +46,7 @@
>  #include "libgomp-plugin.h"
>  #include "gomp-constants.h"
>  
> +#include 
>  #ifdef HAVE_PTHREAD_H
>  #include 
>  #endif

I think this is a real libgomp bug, but if this glibc patch is accepted,
at least libgomp will build again:

  Reduce  pollution due to dynamic PTHREAD_STACK_MIN
  

So while I still think libgomp should be fixed, it won't need
backporting to release branches (assuming the glibc workaround goes in,
of course).

Thanks,
Florian



Benefits of using Sphinx documentation format

2021-07-12 Thread Martin Liška

Hello.

Let's make it a separate sub-thread where we can discuss motivation why
do I want moving to Sphinx format.

Benefits:
1) modern looking HTML output (before: [1], after: [2]):
   a) syntax highlighting for examples (code, shell commands, etc.)
   b) precise anchors, the current Texinfo anchors are not displayed (start 
with first line of an option)
   c) one can easily copy a link to an anchor (displayed as ¶)
   d) internal links are working, e.g. one can easily jump from listing of 
options
   e) left menu navigation provides better orientation in the manual
   f) Sphinx provides internal search capability: [3]
2) internal links are also provided in PDF version of the manual
3) some existing GCC manuals are already written in Sphinx (GNAT manuals and 
libgccjit)
4) support for various output formats, some people are interested in ePUB format
5) Sphinx is using RST which is quite minimal semantic markup language
6) TOC is automatically generated - no need for manual navigation like seen 
here: [5]

Disadvantages:

1) info pages are currently missing Page description in TOC
2) rich formatting is leading to extra wrapping in info output - beings 
partially addresses in [4]
3) one needs e.g. Emacs support for inline links (rendered as notes)

I'm willing to address issue 1) in next weeks and I tend to skip emission of 
links as mentioned in 3).
Generally speaking, I'm aware that some people still use Info, but I think we 
should more focus
on more modern documentation formats. That's HTML (and partially PDF).

Martin

[1] 
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fstrict-aliasing
[2] 
https://splichal.eu/gccsphinx-final/html/gcc/gcc-command-options/options-that-control-optimization.html#cmdoption-fstrict-aliasing
[3] 
https://splichal.eu/gccsphinx-final/html/gcc/search.html?q=-fipa-icf&check_keywords=yes&area=default#
[4] https://github.com/sphinx-doc/sphinx/pull/9391
[5] @comment node-name, next,  previous, up
@nodeInstalling GCC, Binaries, , Top


Re: [PATCH] libgomp: Include early to avoid link failure with glibc 2.34

2021-07-12 Thread Jakub Jelinek via Gcc-patches
On Mon, Jul 12, 2021 at 10:26:47AM +0200, Florian Weimer via Gcc-patches wrote:
>  is included indirectly in the #pragma GCC visibility hidden
> block.  With glibc 2.34,  needs a declaration of the sysconf
> function, and including it under hidden visibility turns other calls
> to sysconf into hidden references, leading to a linker failure.
> 
> libgomp/ChangeLog:
> 
>   * libgomp.h: Include .

If this is because of the config/linux/sem.h #include ,
I'd prefer not to include that header instead, we rely on being compiled
by GCC anyway (and clang/icc support __INT_MAX__ anyway).

Or e.g. config/posix/sem.h uses
#ifdef HAVE_ATTRIBUTE_VISIBILITY
# pragma GCC visibility push(default)
#endif

#include 

#ifdef HAVE_ATTRIBUTE_VISIBILITY
# pragma GCC visibility pop
#endif

2021-07-12  Jakub Jelinek  
Florian Weimer  

* config/linux/sem.h: Don't include limits.h.
(SEM_WAIT): Define to -__INT_MAX__ - 1 instead of INT_MIN.

--- libgomp/config/linux/sem.h.jj   2021-01-18 07:18:42.360339646 +0100
+++ libgomp/config/linux/sem.h  2021-07-12 15:18:10.121178404 +0200
@@ -33,10 +33,8 @@
 #ifndef GOMP_SEM_H
 #define GOMP_SEM_H 1
 
-#include  /* For INT_MIN */
-
 typedef int gomp_sem_t;
-#define SEM_WAIT INT_MIN
+#define SEM_WAIT (-__INT_MAX__ - 1)
 #define SEM_INC 1
 
 extern void gomp_sem_wait_slow (gomp_sem_t *, int);


Jakub



[PATCH] produce simple DOT graphs from SLP trees

2021-07-12 Thread Richard Biener
This adds a dot_slp_tree debug function producing a simple DOT
graph from a starting node down the graph.  There's no fancy
direct invocation of dot but the output is directed to a specified
file.  It re-uses vect_print_slp_tree, naming nodes as their
address.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

2021-07-12  Richard Biener  

* dump-context.h (debug_dump_context::debug_dump_context):
Add FILE * parameter defaulted to stderr.
* dumpfile.c (debug_dump_context::debug_dump_context): Adjust.
* tree-vect-slp.c (dot_slp_tree): New functions.
---
 gcc/dump-context.h  |  2 +-
 gcc/dumpfile.c  |  4 ++--
 gcc/tree-vect-slp.c | 38 ++
 3 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/gcc/dump-context.h b/gcc/dump-context.h
index e8ed3743b7b..1a6bf5eb513 100644
--- a/gcc/dump-context.h
+++ b/gcc/dump-context.h
@@ -204,7 +204,7 @@ private:
 class debug_dump_context
 {
  public:
-  debug_dump_context ();
+  debug_dump_context (FILE *f = stderr);
   ~debug_dump_context ();
 
  private:
diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
index 2457df2df0e..8169daf7f59 100644
--- a/gcc/dumpfile.c
+++ b/gcc/dumpfile.c
@@ -2098,14 +2098,14 @@ enable_rtl_dump_file (void)
 /* debug_dump_context's ctor.  Temporarily override the dump_context
(to forcibly enable output to stderr).  */
 
-debug_dump_context::debug_dump_context ()
+debug_dump_context::debug_dump_context (FILE *f)
 : m_context (),
   m_saved (&dump_context::get ()),
   m_saved_flags (dump_flags),
   m_saved_pflags (pflags),
   m_saved_file (dump_file)
 {
-  set_dump_file (stderr);
+  set_dump_file (f);
   dump_context::s_current = &m_context;
   pflags = dump_flags = MSG_ALL_KINDS | MSG_ALL_PRIORITIES;
   dump_context::get ().refresh_dumps_are_enabled ();
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index cd002b3fb7c..86fa3c1b349 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2552,6 +2552,44 @@ debug (slp_tree node)
   node);
 }
 
+/* Recursive helper for the dot producer below.  */
+
+static void
+dot_slp_tree (FILE *f, slp_tree node, hash_set &visited)
+{
+  if (visited.add (node))
+return;
+
+  fprintf (f, "\"%p\" [label=\"", (void *)node);
+  vect_print_slp_tree (MSG_NOTE,
+  dump_location_t::from_location_t (UNKNOWN_LOCATION),
+  node);
+  fprintf (f, "\"];\n");
+
+
+  for (slp_tree child : SLP_TREE_CHILDREN (node))
+fprintf (f, "\"%p\" -> \"%p\";", (void *)node, (void *)child);
+
+  for (slp_tree child : SLP_TREE_CHILDREN (node))
+dot_slp_tree (f, child, visited);
+}
+
+DEBUG_FUNCTION void
+dot_slp_tree (const char *fname, slp_tree node)
+{
+  FILE *f = fopen (fname, "w");
+  fprintf (f, "digraph {\n");
+  fflush (f);
+{
+  debug_dump_context ctx (f);
+  hash_set visited;
+  dot_slp_tree (f, node, visited);
+}
+  fflush (f);
+  fprintf (f, "}\n");
+  fclose (f);
+}
+
 /* Dump a slp tree NODE using flags specified in DUMP_KIND.  */
 
 static void
-- 
2.26.2


Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Eli Zaretskii via Gcc-patches
> Cc: g...@gcc.gnu.org, gcc-patches@gcc.gnu.org, jos...@codesourcery.com
> From: Martin Liška 
> Date: Mon, 12 Jul 2021 15:25:47 +0200
> 
> Let's make it a separate sub-thread where we can discuss motivation why
> do I want moving to Sphinx format.

Thanks for starting this discussion.

> Benefits:
> 1) modern looking HTML output (before: [1], after: [2]):
> a) syntax highlighting for examples (code, shell commands, etc.)
> b) precise anchors, the current Texinfo anchors are not displayed (start 
> with first line of an option)
> c) one can easily copy a link to an anchor (displayed as ¶)
> d) internal links are working, e.g. one can easily jump from listing of 
> options
> e) left menu navigation provides better orientation in the manual
> f) Sphinx provides internal search capability: [3]
> 2) internal links are also provided in PDF version of the manual

How is this different from Texinfo?

> 3) some existing GCC manuals are already written in Sphinx (GNAT manuals and 
> libgccjit)
> 4) support for various output formats, some people are interested in ePUB 
> format

Texinfo likewise supports many output formats.  Someone presented a
very simple package to produce epub format from it.

> 5) Sphinx is using RST which is quite minimal semantic markup language

Is it more minimal than Texinfo?

> 6) TOC is automatically generated - no need for manual navigation like seen 
> here: [5]

That is not needed in Texinfo as well, since long ago.  Nowadays, you
just say

  @node Whatever

and the rest is done automatically, as long as the manual's structure
is a proper tree (which it normally is, I know of only one manual that
is an exception).

> Disadvantages:
> 
> 1) info pages are currently missing Page description in TOC
> 2) rich formatting is leading to extra wrapping in info output - beings 
> partially addresses in [4]
> 3) one needs e.g. Emacs support for inline links (rendered as notes)

 4) The need to learn yet another markup language.
While this is not a problem for simple text, it does require a
serious study of RST and Sphinx to use the more advanced features.

 5) Lack of macros.
AFAIK, only simple textual substitution is available, no macros
with arguments.


Re: [PATCH] libgomp: Include early to avoid link failure with glibc 2.34

2021-07-12 Thread Florian Weimer via Gcc-patches
* Jakub Jelinek:

> On Mon, Jul 12, 2021 at 10:26:47AM +0200, Florian Weimer via Gcc-patches 
> wrote:
>>  is included indirectly in the #pragma GCC visibility hidden
>> block.  With glibc 2.34,  needs a declaration of the sysconf
>> function, and including it under hidden visibility turns other calls
>> to sysconf into hidden references, leading to a linker failure.
>> 
>> libgomp/ChangeLog:
>> 
>>  * libgomp.h: Include .
>
> If this is because of the config/linux/sem.h #include ,
> I'd prefer not to include that header instead, we rely on being compiled
> by GCC anyway (and clang/icc support __INT_MAX__ anyway).
>
> Or e.g. config/posix/sem.h uses
> #ifdef HAVE_ATTRIBUTE_VISIBILITY
> # pragma GCC visibility push(default)
> #endif
>
> #include 
>
> #ifdef HAVE_ATTRIBUTE_VISIBILITY
> # pragma GCC visibility pop
> #endif
>
> 2021-07-12  Jakub Jelinek  
>   Florian Weimer  
>
>   * config/linux/sem.h: Don't include limits.h.
>   (SEM_WAIT): Define to -__INT_MAX__ - 1 instead of INT_MIN.
>
> --- libgomp/config/linux/sem.h.jj 2021-01-18 07:18:42.360339646 +0100
> +++ libgomp/config/linux/sem.h2021-07-12 15:18:10.121178404 +0200
> @@ -33,10 +33,8 @@
>  #ifndef GOMP_SEM_H
>  #define GOMP_SEM_H 1
>  
> -#include  /* For INT_MIN */
> -
>  typedef int gomp_sem_t;
> -#define SEM_WAIT INT_MIN
> +#define SEM_WAIT (-__INT_MAX__ - 1)
>  #define SEM_INC 1
>  
>  extern void gomp_sem_wait_slow (gomp_sem_t *, int);

I tested this on csky-linux-gnuabiv2 with the glibc version that failed
before, and it works.  So I guess your version is fine, too.

Thanks,
Florian



Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Jonathan Wakely via Gcc-patches
On Mon, 12 Jul 2021 at 14:41, Eli Zaretskii via Gcc  wrote:
>
> > Cc: g...@gcc.gnu.org, gcc-patches@gcc.gnu.org, jos...@codesourcery.com
> > From: Martin Liška 
> > Date: Mon, 12 Jul 2021 15:25:47 +0200
> >
> > Let's make it a separate sub-thread where we can discuss motivation why
> > do I want moving to Sphinx format.
>
> Thanks for starting this discussion.
>
> > Benefits:
> > 1) modern looking HTML output (before: [1], after: [2]):
> > a) syntax highlighting for examples (code, shell commands, etc.)
> > b) precise anchors, the current Texinfo anchors are not displayed 
> > (start with first line of an option)
> > c) one can easily copy a link to an anchor (displayed as ¶)
> > d) internal links are working, e.g. one can easily jump from listing of 
> > options

For me, these items are enough justification to switch away from
texinfo, which produces crap HTML pages with crap anchors. You can't
find out the anchors without inspecting (and searching) the HTML
source. That's utterly stupid. And even after you do that, the anchor
is at the wrong place:
https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html#index-c
As somebody who spends a lot of time helping users on the mailing
list, IRC, stackoverflow, and elsewhere, this "feature" of the texinfo
HTML has angered me for many years.

Yes, some people like texinfo, but some people also dislike it and
there are serious usability problems with the output. I support
replacing texinfo with anything that isn't texinfo.


> > e) left menu navigation provides better orientation in the manual
> > f) Sphinx provides internal search capability: [3]
> > 2) internal links are also provided in PDF version of the manual
>
> How is this different from Texinfo?
>
> > 3) some existing GCC manuals are already written in Sphinx (GNAT manuals 
> > and libgccjit)
> > 4) support for various output formats, some people are interested in ePUB 
> > format
>
> Texinfo likewise supports many output formats.  Someone presented a
> very simple package to produce epub format from it.
>
> > 5) Sphinx is using RST which is quite minimal semantic markup language
>
> Is it more minimal than Texinfo?
>
> > 6) TOC is automatically generated - no need for manual navigation like seen 
> > here: [5]
>
> That is not needed in Texinfo as well, since long ago.  Nowadays, you
> just say
>
>   @node Whatever
>
> and the rest is done automatically, as long as the manual's structure
> is a proper tree (which it normally is, I know of only one manual that
> is an exception).
>
> > Disadvantages:
> >
> > 1) info pages are currently missing Page description in TOC
> > 2) rich formatting is leading to extra wrapping in info output - beings 
> > partially addresses in [4]
> > 3) one needs e.g. Emacs support for inline links (rendered as notes)
>
>  4) The need to learn yet another markup language.
> While this is not a problem for simple text, it does require a
> serious study of RST and Sphinx to use the more advanced features.

This is a problem with texinfo too.

>
>  5) Lack of macros.
> AFAIK, only simple textual substitution is available, no macros
> with arguments.

Is this a problem for GCC docs though?


Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Jonathan Wakely via Gcc-patches
On Mon, 12 Jul 2021 at 14:53, Jonathan Wakely wrote:
> For me, these items are enough justification to switch away from
> texinfo, which produces crap HTML pages with crap anchors. You can't
> find out the anchors without inspecting (and searching) the HTML
> source. That's utterly stupid. And even after you do that, the anchor
> is at the wrong place:
> https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html#index-c
> As somebody who spends a lot of time helping users on the mailing
> list, IRC, stackoverflow, and elsewhere, this "feature" of the texinfo
> HTML has angered me for many years.

To be clear, I give links to users frequently (several times a week,
every week, for decades) and prefer to give them a link to specific
options. Obviously I link to the online HTML docs rather than telling
them an 'info' command to run, because most people don't use info
pages or know how to navigate them. That means I can't provide decent
links, because the actual option name I'm trying to link to is always
off the top of the page. This is simply unacceptable IMHO. Texinfo
must go.


Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-12 Thread guojiufu via Gcc-patches

On 2021-07-12 18:02, Richard Biener wrote:

On Mon, 12 Jul 2021, guojiufu wrote:


On 2021-07-12 16:57, Richard Biener wrote:
> On Mon, 12 Jul 2021, guojiufu wrote:
>
>> On 2021-07-12 14:20, Richard Biener wrote:
>> > On Fri, 9 Jul 2021, Segher Boessenkool wrote:
>> >
>> >> On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote:
>> >> > I wonder if there's a way to query the target what modes the doloop
>> >> > pattern can handle (not being too familiar with the doloop code).
>> >>
>> >> You can look what modes are allowed for operand 0 of doloop_end,
>> >> perhaps?  Although that is a define_expand, not a define_insn, so it is
>> >> hard to introspect.
>> >>
>> >> > Why do you need to do any checks besides the new type being able to
>> >> > represent all IV values?  The original doloop IV will never wrap
>> >> > (OTOH if niter is U*_MAX then we compute niter + 1 which will become
>> >> > zero ... I suppose the doloop might still do the correct thing here
>> >> > but it also still will with a IV with larger type).
>>
>> The issue comes from U*_MAX (original short MAX), as you said: on which
>> niter + 1 becomes zero.  And because the step for doloop is -1; then, on
>> larger type 'zero - 1' will be a very large number on larger type
>> (e.g. 0xff...ff); but on the original short type 'zero - 1' is a small
>> value
>> (e.g. "0xff").
>
> But for the larger type the small type MAX + 1 fits and does not yield
> zero so it should still work exactly as before, no?  Of course you
> have to compute the + 1 in the larger type.
>
You are right, if compute the "+ 1" in the larger type it is ok, as 
below

code:
```
   /* Use type in word size may fast.  */
if (TYPE_PRECISION (ntype) < BITS_PER_WORD)
  {
ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
niter = fold_convert (ntype, niter);
  }

tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter),
 build_int_cst (ntype, 1));


add_candidate (data, base, build_int_cst (ntype, -1), true, NULL, 
NULL,

true);
```
The issue of this is, this code generates more stmt for doloop.xxx:
  _12 = (unsigned int) xx(D);
  _10 = _12 + 4294967295;
  _24 = (long unsigned int) _10;
  doloop.6_8 = _24 + 1;

if use previous patch, "+ 1" on original type, then the stmts will 
looks like:

  _12 = (unsigned int) xx(D);
  doloop.6_8 = (long unsigned int) _12;

This is the reason for checking
   wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE (ntype)))


But this then only works when there's an upper bound on the number
of iterations.  Note you should not use TYPE_MAX_VALUE here but
you can instead use

 wi::ltu_p (niter_desc->max, wi::to_widest (wi::max_value
(TYPE_PRECISION (ntype), TYPE_SIGN (ntype;


Ok, Thanks!
I remember you mentioned that:
widest_int::from (wi::max_value (TYPE_PRECISION (ntype), TYPE_SIGN 
(ntype)), TYPE_SIGN (ntype))

would be better than
wi::to_widest (TYPE_MAX_VALUE (ntype)).

It seems that:
"TYPE_MAX_VALUE (ntype)" is "NUMERICAL_TYPE_CHECK 
(NODE)->type_non_common.maxval"
which do a numerical-check and return the field of maxval.  And then 
call to

wi::to_widest

The other code "widest_int::from (wi::max_value (..,..),..)" calls 
wi::max_value

and widest_int::from.

I'm wondering if wi::to_widest (TYPE_MAX_VALUE (ntype)) is cheaper?



I think the -1 above comes from number of latch iterations vs. header
entries - it's a common source for this kind of issues.  range analysis
might be able to prove that we can still merge the two adds even with
the intermediate extension.

Yes, as you mentioned here, it relates to number of latch iterations
For loop looks like : while (l < n) or for (i = 0; i < n; i++)
This kind of loop, the niter is used to be 'n - 1' after transformed
into 'do-while' form.
I would see how to merge these two adds safely at this point
when generating doloop iv. (maybe range info, thanks!



Is this pre-loop extra add really offsetting the in-loop doloop
improvements?

I'm not catching this question too much, sorry.  I guess your concern
is if the "+1" is an offset: it may not, "+1" may be just that doloop.xx
is decreasing niter until 0 (all number >0).
If misunderstand,  thanks for point out.




>> >>
>> >> doloop_valid_p guarantees it is simple and doesn't wrap.
>> >>
>> >> > I'd have expected sth like
>> >> >
>> >> >ntype = lang_hooks.types.type_for_mode (word_mode, TYPE_UNSIGNED
>> >> > (ntype));
>> >> >
>> >> > thus the decision made using a mode - which is also why I wonder
>> >> > if there's a way to query the target for this.  As you say,
>> >> > it _may_ be fast, so better check (somehow).
>>
>>
>> I was also thinking of using hooks like type_for_size/type_for_mode.
>> /* Use type in word size may fast.  */
>> if (TYPE_PRECISION (ntype) < BITS_PER_WORD
>> && Wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE
>> (ntype
>>   {
>> ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);

Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Eli Zaretskii via Gcc-patches
> From: Jonathan Wakely 
> Date: Mon, 12 Jul 2021 14:53:44 +0100
> Cc: Martin Liška , 
>   "g...@gcc.gnu.org" , gcc-patches 
> , 
>   "Joseph S. Myers" 
> 
> For me, these items are enough justification to switch away from
> texinfo, which produces crap HTML pages with crap anchors.

If we want to have a serious discussion with useful conclusions, I
suggest to avoid "loaded" terminology.

I get it that you dislike the HTML produced by Texinfo, but without
some examples of such bad HTML it is impossible to know what exactly
do you dislike and why.

> You can't find out the anchors without inspecting (and searching)
> the HTML source. That's utterly stupid.

I don't think I follow: find out the anchors with which means and for
what purposes?

> And even after you do that, the anchor
> is at the wrong place:
> https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html#index-c

IME, the anchor is where you put it.  If you show me the source of
that HTMl, maybe we can have a more useful discussion of the issue.

> As somebody who spends a lot of time helping users on the mailing
> list, IRC, stackoverflow, and elsewhere, this "feature" of the texinfo
> HTML has angered me for many years.

As somebody who spends a lot of time helping users on every possible
forum, and as someone who has wrote a lot of Texinfo, I don't
understand what angers you.  Please elaborate.

> Yes, some people like texinfo, but some people also dislike it and
> there are serious usability problems with the output. I support
> replacing texinfo with anything that isn't texinfo.

"Anything"?  Even plain text?  I hope not.

See, such "arguments" don't help to have a useful discussion.

> >  4) The need to learn yet another markup language.
> > While this is not a problem for simple text, it does require a
> > serious study of RST and Sphinx to use the more advanced features.
> 
> This is a problem with texinfo too.

Not for someone who already knows Texinfo.  We are talking about
switching away of it, so I'm thinking about people who contributed
patches for the manual in the past.  They already know Texinfo, at
least to some extent, and some of them know it very well.

> >  5) Lack of macros.
> > AFAIK, only simple textual substitution is available, no macros
> > with arguments.
> 
> Is this a problem for GCC docs though?

I don't know.  It could be, even if it isn't now.


Re: [PATCH] [wwwdocs] Update description of GM2 and document branch

2021-07-12 Thread Gaius Mulley via Gcc-patches
Gerald Pfeifer  writes:

> I realize this predates your patch (which merely changes version numbers),
> but a reference to back ends could be misunderstood. I assume GNU Modula-2
> doesn't just use the back ends (x86, aarch64,...), but also the middle-end
> and tree optimizers etc.?
>
> What do you think about just saying "with GCC 10 and GCC 11".

Hi Gerald,

yes indeed this sounds more accurate.

>>  Work is in progress to move the front end to
>> +the GCC trunk.  The front end is mostly written in Modula-2, but
>> +includes a bootstrap procedure using mc.
>
> On my system mc refers to Midnight Commander :-), whereas I guess mc
> here is about "Modula Compiler"?  Can you rephrase this for the sake
> of those not so closely involved?

ah yes will do!

> Usually I'd just say "subject", which is a header in our mail systems;
> the term "subject line" isn't widely used.

feel free to overrule and use "subject".  I copied the text from other
branch descriptions :-) (there are 38 uses).  I guess there should be
consistency on the web page - perhaps they could all be changed though -
what do you think?

> Thanks (and okay considering the above),
> Gerald

thanks for the suggestions and maintaining the pages.  Below are the
proposed updated patches

regards,
Gaius



possible ChangeLog/commit entry:

htdocs/frontends.html: Update the description of GNU Modula-2.
htdocs/git.html: Document the new devel/modula-2 branch.

updated patches:

diff --git a/htdocs/frontends.html b/htdocs/frontends.html
index bec33b7b..7c8d84bc 100644
--- a/htdocs/frontends.html
+++ b/htdocs/frontends.html
@@ -42,10 +42,10 @@ has a back end that generates assembler directly, using the 
GCC back end.

 http://www.nongnu.org/gm2/";>GNU Modula-2 implements
 the PIM2, PIM3, PIM4 and ISO dialects of the language.  The compiler
-is fully operational with the GCC 4.1.2 back end (on GNU/Linux x86
-systems).  Work is in progress to move the front end to the GCC trunk.
-The front end is mostly written in Modula-2, but includes a bootstrap
-procedure via a heavily modified version of p2c.
+is fully operational with the GCC 10 and GCC 11 (on
+GNU/Linux x86 systems).  Work is in progress to move the front end to
+the GCC trunk.  The front end is mostly written in Modula-2 and it
+includes a bootstrap tool which translates Modula-2 into C/C++.

 Modula-3 (for links see http://www.modula3.org/";>www.modula3.org); SRC M3 is based on an old
diff --git a/htdocs/git.html b/htdocs/git.html
index 2bbfc334..c112980b 100644
--- a/htdocs/git.html
+++ b/htdocs/git.html
@@ -471,6 +471,17 @@ in Git.
   Further information can be found on the
   https://github.com/Intrepid/GUPC";>GNU UPC page.

+  modula-2
+  This branch is for the
+http://nongnu.org/gm2/homepage.html";>GNU Modula-2
+front end to GCC prior to its integration with the mainline.  The
+branch will be regularly rebased against the mainline.  It is
+maintained by
+mailto:gaius.mul...@southwales.ac.uk";>Gaius Mulley.
+Patches should be
+prefixed with [modula-2] in the subject line.
+  
+
   pph
   This branch implements https://gcc.gnu.org/wiki/pph";> Pre-Parsed
   Headers for C++.  It is maintained by 

Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Eli Zaretskii via Gcc-patches
> From: Jonathan Wakely 
> Date: Mon, 12 Jul 2021 15:05:11 +0100
> Cc: Martin Liška , 
>   "g...@gcc.gnu.org" , gcc-patches 
> , 
>   "Joseph S. Myers" 
> 
> To be clear, I give links to users frequently (several times a week,
> every week, for decades) and prefer to give them a link to specific
> options. Obviously I link to the online HTML docs rather than telling
> them an 'info' command to run, because most people don't use info
> pages or know how to navigate them. That means I can't provide decent
> links, because the actual option name I'm trying to link to is always
> off the top of the page. This is simply unacceptable IMHO. Texinfo
> must go.

"Texinfo must go" is one possible conclusion from your description.
But it isn't the only one.  An alternative is "the Texinfo source of
the GCC manual must be improved to fix this problem."  And yes, this
problem does have a solution in Texinfo.


Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Martin Liška

On 7/12/21 4:12 PM, Eli Zaretskii wrote:

From: Jonathan Wakely 
Date: Mon, 12 Jul 2021 14:53:44 +0100
Cc: Martin Liška ,
"g...@gcc.gnu.org" , gcc-patches 
,
"Joseph S. Myers" 

For me, these items are enough justification to switch away from
texinfo, which produces crap HTML pages with crap anchors.


If we want to have a serious discussion with useful conclusions, I
suggest to avoid "loaded" terminology.

I get it that you dislike the HTML produced by Texinfo, but without
some examples of such bad HTML it is impossible to know what exactly
do you dislike and why.


Please follow my 1) from Benefits and *read* bullet points a) to f). That will
give you an answer.




You can't find out the anchors without inspecting (and searching)
the HTML source. That's utterly stupid.


I don't think I follow: find out the anchors with which means and for
what purposes?


Benefits, 1c).




And even after you do that, the anchor
is at the wrong place:
https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html#index-c


IME, the anchor is where you put it.  If you show me the source of
that HTMl, maybe we can have a more useful discussion of the issue.


Problem is that Texinfo emits poor HTML where # link points to a wrong place.
Open the given page, view source and search for .




As somebody who spends a lot of time helping users on the mailing
list, IRC, stackoverflow, and elsewhere, this "feature" of the texinfo
HTML has angered me for many years.


As somebody who spends a lot of time helping users on every possible
forum, and as someone who has wrote a lot of Texinfo, I don't
understand what angers you.  Please elaborate.


You can't point to an option documentation.




Yes, some people like texinfo, but some people also dislike it and
there are serious usability problems with the output. I support
replacing texinfo with anything that isn't texinfo.


"Anything"?  Even plain text?  I hope not.

See, such "arguments" don't help to have a useful discussion.


  4) The need to learn yet another markup language.
 While this is not a problem for simple text, it does require a
 serious study of RST and Sphinx to use the more advanced features.


This is a problem with texinfo too.


Not for someone who already knows Texinfo.  We are talking about
switching away of it, so I'm thinking about people who contributed
patches for the manual in the past.  They already know Texinfo, at
least to some extent, and some of them know it very well.


Yes, people will have to learn a new syntax. Similarly to transition of SVN,
people also had to learn with a more modern tool.




  5) Lack of macros.
 AFAIK, only simple textual substitution is available, no macros
 with arguments.


Is this a problem for GCC docs though?


I don't know.  It could be, even if it isn't now.


Then it's not an argument, sorry.

Martin



Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Martin Liška

On 7/12/21 4:16 PM, Eli Zaretskii wrote:

From: Jonathan Wakely 
Date: Mon, 12 Jul 2021 15:05:11 +0100
Cc: Martin Liška ,
"g...@gcc.gnu.org" , gcc-patches 
,
"Joseph S. Myers" 

To be clear, I give links to users frequently (several times a week,
every week, for decades) and prefer to give them a link to specific
options. Obviously I link to the online HTML docs rather than telling
them an 'info' command to run, because most people don't use info
pages or know how to navigate them. That means I can't provide decent
links, because the actual option name I'm trying to link to is always
off the top of the page. This is simply unacceptable IMHO. Texinfo
must go.


"Texinfo must go" is one possible conclusion from your description.
But it isn't the only one.  An alternative is "the Texinfo source of
the GCC manual must be improved to fix this problem."  And yes, this
problem does have a solution in Texinfo.


No, the alternative is more powerful output given by Texinfo, in particular
more modern HTML pages.

Martin



Re: [PATCH] libgomp: Include early to avoid link failure with glibc 2.34

2021-07-12 Thread Florian Weimer via Gcc-patches
* Florian Weimer:

> I tested this on csky-linux-gnuabiv2 with the glibc version that failed
> before, and it works.  So I guess your version is fine, too.

Build on powerpc64-linux-gnu and other targets now fails with:

/home/bmg/src/gcc/libgomp/config/linux/affinity.c: In function ‘gomp_init_affini
ty’:
/home/bmg/src/gcc/libgomp/config/linux/affinity.c:53:41: error: ‘ULONG_MAX’ unde
clared (first use in this function)
   53 |   if (!gomp_affinity_init_level (1, ULONG_MAX, true))
  | ^

So affinity.c will need to include .

Thanks,
Florian



Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Martin Liška

On 7/12/21 3:39 PM, Eli Zaretskii wrote:

Cc: g...@gcc.gnu.org, gcc-patches@gcc.gnu.org, jos...@codesourcery.com
From: Martin Liška 
Date: Mon, 12 Jul 2021 15:25:47 +0200

Let's make it a separate sub-thread where we can discuss motivation why
do I want moving to Sphinx format.


Thanks for starting this discussion.


Benefits:
1) modern looking HTML output (before: [1], after: [2]):
 a) syntax highlighting for examples (code, shell commands, etc.)
 b) precise anchors, the current Texinfo anchors are not displayed (start 
with first line of an option)
 c) one can easily copy a link to an anchor (displayed as ¶)
 d) internal links are working, e.g. one can easily jump from listing of 
options
 e) left menu navigation provides better orientation in the manual
 f) Sphinx provides internal search capability: [3]
2) internal links are also provided in PDF version of the manual


How is this different from Texinfo?


Texinfo does not emit them. See e.g. links in option listing (-O2, -Os, ...).




3) some existing GCC manuals are already written in Sphinx (GNAT manuals and 
libgccjit)
4) support for various output formats, some people are interested in ePUB format


Texinfo likewise supports many output formats.  Someone presented a
very simple package to produce epub format from it.


Good to know.




5) Sphinx is using RST which is quite minimal semantic markup language


Is it more minimal than Texinfo?


I would say that's pretty easy to learn, similarly complex as Texinfo.




6) TOC is automatically generated - no need for manual navigation like seen 
here: [5]


That is not needed in Texinfo as well, since long ago.  Nowadays, you
just say

   @node Whatever

and the rest is done automatically, as long as the manual's structure
is a proper tree (which it normally is, I know of only one manual that
is an exception).


All right, then we likely do an extra work right now.




Disadvantages:

1) info pages are currently missing Page description in TOC
2) rich formatting is leading to extra wrapping in info output - beings 
partially addresses in [4]
3) one needs e.g. Emacs support for inline links (rendered as notes)


  4) The need to learn yet another markup language.
 While this is not a problem for simple text, it does require a
 serious study of RST and Sphinx to use the more advanced features.


No, majority of the documentation is pretty simple: basic formatting, links, 
tables and
code examples.

Martin



  5) Lack of macros.
 AFAIK, only simple textual substitution is available, no macros
 with arguments.



Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-12 Thread Richard Biener
On Mon, 12 Jul 2021, guojiufu wrote:

> On 2021-07-12 18:02, Richard Biener wrote:
> > On Mon, 12 Jul 2021, guojiufu wrote:
> > 
> >> On 2021-07-12 16:57, Richard Biener wrote:
> >> > On Mon, 12 Jul 2021, guojiufu wrote:
> >> >
> >> >> On 2021-07-12 14:20, Richard Biener wrote:
> >> >> > On Fri, 9 Jul 2021, Segher Boessenkool wrote:
> >> >> >
> >> >> >> On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote:
> >> >> >> > I wonder if there's a way to query the target what modes the doloop
> >> >> >> > pattern can handle (not being too familiar with the doloop code).
> >> >> >>
> >> >> >> You can look what modes are allowed for operand 0 of doloop_end,
> >> >> >> perhaps?  Although that is a define_expand, not a define_insn, so it
> >> >> >> is
> >> >> >> hard to introspect.
> >> >> >>
> >> >> >> > Why do you need to do any checks besides the new type being able to
> >> >> >> > represent all IV values?  The original doloop IV will never wrap
> >> >> >> > (OTOH if niter is U*_MAX then we compute niter + 1 which will
> >> >> >> > become
> >> >> >> > zero ... I suppose the doloop might still do the correct thing here
> >> >> >> > but it also still will with a IV with larger type).
> >> >>
> >> >> The issue comes from U*_MAX (original short MAX), as you said: on which
> >> >> niter + 1 becomes zero.  And because the step for doloop is -1; then, on
> >> >> larger type 'zero - 1' will be a very large number on larger type
> >> >> (e.g. 0xff...ff); but on the original short type 'zero - 1' is a small
> >> >> value
> >> >> (e.g. "0xff").
> >> >
> >> > But for the larger type the small type MAX + 1 fits and does not yield
> >> > zero so it should still work exactly as before, no?  Of course you
> >> > have to compute the + 1 in the larger type.
> >> >
> >> You are right, if compute the "+ 1" in the larger type it is ok, as below
> >> code:
> >> ```
> >>/* Use type in word size may fast.  */
> >> if (TYPE_PRECISION (ntype) < BITS_PER_WORD)
> >>   {
> >> ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
> >> niter = fold_convert (ntype, niter);
> >>   }
> >> 
> >> tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter),
> >>  build_int_cst (ntype, 1));
> >> 
> >> 
> >> add_candidate (data, base, build_int_cst (ntype, -1), true, NULL, NULL,
> >> true);
> >> ```
> >> The issue of this is, this code generates more stmt for doloop.xxx:
> >>   _12 = (unsigned int) xx(D);
> >>   _10 = _12 + 4294967295;
> >>   _24 = (long unsigned int) _10;
> >>   doloop.6_8 = _24 + 1;
> >> 
> >> if use previous patch, "+ 1" on original type, then the stmts will looks
> >> like:
> >>   _12 = (unsigned int) xx(D);
> >>   doloop.6_8 = (long unsigned int) _12;
> >> 
> >> This is the reason for checking
> >>wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE (ntype)))
> > 
> > But this then only works when there's an upper bound on the number
> > of iterations.  Note you should not use TYPE_MAX_VALUE here but
> > you can instead use
> > 
> >  wi::ltu_p (niter_desc->max, wi::to_widest (wi::max_value
> > (TYPE_PRECISION (ntype), TYPE_SIGN (ntype;
> 
> Ok, Thanks!
> I remember you mentioned that:
> widest_int::from (wi::max_value (TYPE_PRECISION (ntype), TYPE_SIGN (ntype)),
> TYPE_SIGN (ntype))
> would be better than
> wi::to_widest (TYPE_MAX_VALUE (ntype)).
> 
> It seems that:
> "TYPE_MAX_VALUE (ntype)" is "NUMERICAL_TYPE_CHECK
> (NODE)->type_non_common.maxval"
> which do a numerical-check and return the field of maxval.  And then call to
> wi::to_widest
> 
> The other code "widest_int::from (wi::max_value (..,..),..)" calls
> wi::max_value
> and widest_int::from.
> 
> I'm wondering if wi::to_widest (TYPE_MAX_VALUE (ntype)) is cheaper?

TYPE_MAX_VALUE can be "suprising", it does not necessarily match the
underlying modes precision.  At some point we've tried to eliminate
most of its uses, not sure what the situation/position is right now.

> > I think the -1 above comes from number of latch iterations vs. header
> > entries - it's a common source for this kind of issues.  range analysis
> > might be able to prove that we can still merge the two adds even with
> > the intermediate extension.
> Yes, as you mentioned here, it relates to number of latch iterations
> For loop looks like : while (l < n) or for (i = 0; i < n; i++)
> This kind of loop, the niter is used to be 'n - 1' after transformed
> into 'do-while' form.
> I would see how to merge these two adds safely at this point
> when generating doloop iv. (maybe range info, thanks!
>
> > 
> > Is this pre-loop extra add really offsetting the in-loop doloop
> > improvements?
> I'm not catching this question too much, sorry.  I guess your concern
> is if the "+1" is an offset: it may not, "+1" may be just that doloop.xx
> is decreasing niter until 0 (all number >0).
> If misunderstand,  thanks for point out.

I'm questioning the argument that not being able to eliminate the +1-1
pair effect

Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Jonathan Wakely via Gcc-patches
On Mon, 12 Jul 2021 at 15:13, Eli Zaretskii  wrote:
>
> > From: Jonathan Wakely 
> > Date: Mon, 12 Jul 2021 14:53:44 +0100
> > Cc: Martin Liška ,
> >   "g...@gcc.gnu.org" , gcc-patches 
> > ,
> >   "Joseph S. Myers" 
> >
> > For me, these items are enough justification to switch away from
> > texinfo, which produces crap HTML pages with crap anchors.
>
> If we want to have a serious discussion with useful conclusions, I
> suggest to avoid "loaded" terminology.

But the results *are* crap.

>
> I get it that you dislike the HTML produced by Texinfo, but without
> some examples of such bad HTML it is impossible to know what exactly
> do you dislike and why.
>
> > You can't find out the anchors without inspecting (and searching)
> > the HTML source. That's utterly stupid.
>
> I don't think I follow: find out the anchors with which means and for
> what purposes?

I want to point a user at the documentation for the -c option. I can't
do that without examining the HTML source to find the anchor, then
manually editing the URL to append the anchor. It's a tedious process,
and the result is an anchor that doesn't even point to the option but
to the text following it. The process is unnecessarily difficult and
the results are bad.

You participated in a discussion about this very topic previously:
https://lists.gnu.org/archive/html/help-texinfo/2019-02/msg0.html

>
> > And even after you do that, the anchor
> > is at the wrong place:
> > https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html#index-c
>
> IME, the anchor is where you put it.  If you show me the source of
> that HTMl, maybe we can have a more useful discussion of the issue.

@item -c
@opindex c
Compile or assemble the source files, but do not link.  The linking
stage simply is not done.  The ultimate output is in the form of an
object file for each source file.

Putting the @opindex before the @item causes the anchor to be placed
on the previous item, which is not desirable.


>
> > As somebody who spends a lot of time helping users on the mailing
> > list, IRC, stackoverflow, and elsewhere, this "feature" of the texinfo
> > HTML has angered me for many years.
>
> As somebody who spends a lot of time helping users on every possible
> forum, and as someone who has wrote a lot of Texinfo, I don't
> understand what angers you.  Please elaborate.

I don't know what part of my email you don't understand. The HTML
anchors that texinfo creates are in the wrong place, and not
"discoverable". If you don't understand that, then you're clearly not
using the GCC HTML docs, and so I'm not surprised you think there's no
reason to ditch texinfo. As a regular user of the HTML (for myself and
end users of GCC), the HTML output has major usability problems.


> > Yes, some people like texinfo, but some people also dislike it and
> > there are serious usability problems with the output. I support
> > replacing texinfo with anything that isn't texinfo.
>
> "Anything"?  Even plain text?  I hope not.

Plain text with a tool to generate good HTML might be better than texinfo.

> See, such "arguments" don't help to have a useful discussion.

Your insistence that texinfo is fine doesn't either. It's not fine.

> > >  4) The need to learn yet another markup language.
> > > While this is not a problem for simple text, it does require a
> > > serious study of RST and Sphinx to use the more advanced features.
> >
> > This is a problem with texinfo too.
>
> Not for someone who already knows Texinfo.  We are talking about
> switching away of it, so I'm thinking about people who contributed
> patches for the manual in the past.  They already know Texinfo, at
> least to some extent, and some of them know it very well.

I've contributed dozens of patches to the manual, and I don't want to
have to use texinfo to do it in future.

> > >  5) Lack of macros.
> > > AFAIK, only simple textual substitution is available, no macros
> > > with arguments.
> >
> > Is this a problem for GCC docs though?
>
> I don't know.  It could be, even if it isn't now.

So not a problem then.


Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Jonathan Wakely via Gcc-patches
On Mon, 12 Jul 2021 at 15:52, Jonathan Wakely  wrote:
>
> On Mon, 12 Jul 2021 at 15:13, Eli Zaretskii  wrote:
> >
> > > From: Jonathan Wakely 
> > > This is a problem with texinfo too.
> >
> > Not for someone who already knows Texinfo.  We are talking about
> > switching away of it, so I'm thinking about people who contributed
> > patches for the manual in the past.  They already know Texinfo, at
> > least to some extent, and some of them know it very well.
>
> I've contributed dozens of patches to the manual, and I don't want to
> have to use texinfo to do it in future.

And some of the people already know sphinx, and some know it very
well. And it seems likely that future contributors are more likely to
know a more modern tool than they are to know texinfo.

You like texinfo. We get it.


Re: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

2021-07-12 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Monday, July 12, 2021 11:26 AM
>> To: Tamar Christina 
>> Cc: Richard Biener ; nd ; gcc-
>> patc...@gcc.gnu.org
>> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
>> where the sign for the multiplicant changes.
>> 
>> Tamar Christina  writes:
>> >> -Original Message-
>> >> From: Richard Sandiford 
>> >> Sent: Monday, July 12, 2021 10:39 AM
>> >> To: Tamar Christina 
>> >> Cc: Richard Biener ; nd ; gcc-
>> >> patc...@gcc.gnu.org
>> >> Subject: Re: [PATCH 1/4]middle-end Vect: Add support for dot-product
>> >> where the sign for the multiplicant changes.
>> >>
>> >> Tamar Christina  writes:
>> >> > Hi,
>> >> >
>> >> >> Richard Sandiford  writes:
>> >> >> >> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info
>> >> >> *vinfo,
>> >> >> >>/* FORNOW.  Can continue analyzing the def-use chain when
>> >> >> >> this stmt in
>> >> >> a phi
>> >> >> >>   inside the loop (in case we are analyzing an outer-loop).  */
>> >> >> >>vect_unpromoted_value unprom0[2];
>> >> >> >> +  enum optab_subtype subtype = optab_vector;
>> >> >> >>if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
>> >> >> WIDEN_MULT_EXPR,
>> >> >> >> -false, 2, unprom0, &half_type))
>> >> >> >> +false, 2, unprom0, &half_type, &subtype))
>> >> >> >> +return NULL;
>> >> >> >> +
>> >> >> >> +  if (subtype == optab_vector_mixed_sign
>> >> >> >> +  && TYPE_UNSIGNED (unprom_mult.type)
>> >> >> >> +  && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
>> >> >> >> + (unprom_mult.type))
>> >> >> >>  return NULL;
>> >> >> >
>> >> >> > Isn't the final condition here instead that TYPE1 is narrower than
>> TYPE2?
>> >> >> > I.e. we need to reject the case in which we multiply a signed
>> >> >> > and an unsigned value to get a (logically) signed result, but
>> >> >> > then zero-extend it (rather than sign-extend it) to the
>> >> >> > precision of the
>> >> addition.
>> >> >> >
>> >> >> > That would make the test:
>> >> >> >
>> >> >> >   if (subtype == optab_vector_mixed_sign
>> >> >> >   && TYPE_UNSIGNED (unprom_mult.type)
>> >> >> >   && TYPE_PRECISION (unprom_mult.type) < TYPE_PRECISION
>> (type))
>> >> >> > return NULL;
>> >> >> >
>> >> >> > instead.
>> >> >>
>> >> >> And folding that into the existing test gives:
>> >> >>
>> >> >>   /* If there are two widening operations, make sure they agree on
>> >> >> the
>> >> sign
>> >> >>  of the extension.  The result of an optab_vector_mixed_sign
>> operation
>> >> >>  is signed; otherwise, the result has the same sign as the 
>> >> >> operands.
>> */
>> >> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>> >> >>   && (subtype == optab_vector_mixed_sign
>> >> >>  ? TYPE_UNSIGNED (unprom_mult.type)
>> >> >>  : TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type)))
>> >> >> return NULL;
>> >> >>
>> >> >
>> >> > I went with the first one which doesn't add the extra constraints
>> >> > for the normal dotproduct as that makes it too restrictive. It's
>> >> > the type of the multiplication that determines the operation so
>> >> > dotproduct can be used a bit more than where we currently do.
>> >> >
>> >> > This was relaxed in an earlier patch.
>> >>
>> >> I didn't mean that we should add extra constraints to the normal case
>> though.
>> >> The existing test I was referring to above was:
>> >>
>> >>   /* If there are two widening operations, make sure they agree on
>> >>  the sign of the extension.  */
>> >>   if (TYPE_PRECISION (unprom_mult.type) != TYPE_PRECISION (type)
>> >>   && TYPE_SIGN (unprom_mult.type) != TYPE_SIGN (half_type))
>> >> return NULL;
>> >
>> > But as I mentioned, this restriction is unneeded and has been removed
>> hence why it's not in my patchset's diff.
>> > It's removed by
>> > https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569851.html which
>> Richi conditioned on the rest of these patches being approved.
>> >
>> > This change needlessly blocks test vect-reduc-dot-[2,3,6,7].c from
>> > being dotproducts for instance
>> >
>> > It's also part of the deficiency between GCC codegen and Clang
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492#c6
>> 
>> Hmm, OK.  Just removing the check regresses:
>> 
>> unsigned long __attribute__ ((noipa))
>> f (signed short *x, signed short *y)
>> {
>>   unsigned long res = 0;
>>   for (int i = 0; i < 100; ++i)
>> res += (unsigned int) x[i] * (unsigned int) y[i];
>>   return res;
>> }
>> 
>> int
>> main (void)
>> {
>>   signed short x[100], y[100];
>>   for (int i = 0; i < 100; ++i)
>> {
>>   x[i] = -1;
>>   y[i] = 1;
>> }
>>   if (f (x, y) != 0x64ULL - 100)
>> __builtin_abort ();
>>   return 0;
>> }
>> 
>> on SVE.  We then use SDOT even though the result of the multiplication is
>> zero- rather than sign-extended to 64 bits.  Does something el

Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Jonathan Wakely via Gcc-patches
On Mon, 12 Jul 2021 at 15:52, Jonathan Wakely  wrote:
>
> On Mon, 12 Jul 2021 at 15:13, Eli Zaretskii  wrote:
> > I get it that you dislike the HTML produced by Texinfo, but without
> > some examples of such bad HTML it is impossible to know what exactly
> > do you dislike and why.
> >
> > > You can't find out the anchors without inspecting (and searching)
> > > the HTML source. That's utterly stupid.
> >
> > I don't think I follow: find out the anchors with which means and for
> > what purposes?
>
> I want to point a user at the documentation for the -c option. I can't
> do that without examining the HTML source to find the anchor, then
> manually editing the URL to append the anchor. It's a tedious process,
> and the result is an anchor that doesn't even point to the option but
> to the text following it. The process is unnecessarily difficult and
> the results are bad.
>
> You participated in a discussion about this very topic previously:
> https://lists.gnu.org/archive/html/help-texinfo/2019-02/msg0.html
>
> >
> > > And even after you do that, the anchor
> > > is at the wrong place:
> > > https://gcc.gnu.org/onlinedocs/gcc/Overall-Options.html#index-c
> >
> > IME, the anchor is where you put it.  If you show me the source of
> > that HTMl, maybe we can have a more useful discussion of the issue.
>
> @item -c
> @opindex c
> Compile or assemble the source files, but do not link.  The linking
> stage simply is not done.  The ultimate output is in the form of an
> object file for each source file.
>
> Putting the @opindex before the @item causes the anchor to be placed
> on the previous item, which is not desirable.

GNU Hello has the same problem with its docs:
https://www.gnu.org/software/hello/manual/hello.html#index-_002dg
That URL is garbage because of the URL-encoded %2d character, and the
fact it links to the wrong place (the description of the option, not
the option itself). The former is no longer an issue for GCC (it was
for many years) but the latter is still a problem.

If you don't know where to find it yourself, the source is visible here:
https://github.com/yugui/example/blob/master/doc/hello.texi#L208

If GNU Hello and GCC can't get this right using texinfo, maybe texinfo
is not fit for purpose?


Re: [PATCH] libgomp: Include early to avoid link failure with glibc 2.34

2021-07-12 Thread Florian Weimer via Gcc-patches
* Florian Weimer:

> * Florian Weimer:
>
>> I tested this on csky-linux-gnuabiv2 with the glibc version that failed
>> before, and it works.  So I guess your version is fine, too.
>
> Build on powerpc64-linux-gnu and other targets now fails with:
>
> /home/bmg/src/gcc/libgomp/config/linux/affinity.c: In function 
> ‘gomp_init_affini
> ty’:
> /home/bmg/src/gcc/libgomp/config/linux/affinity.c:53:41: error: ‘ULONG_MAX’ 
> unde
> clared (first use in this function)
>53 |   if (!gomp_affinity_init_level (1, ULONG_MAX, true))
>   | ^
>
> So affinity.c will need to include .

I verifed that this change on top successfully builds GCC for all glibc
targets:

diff --git a/libgomp/config/linux/affinity.c b/libgomp/config/linux/affinity.c
index c5abdce23..1b636c613 100644
--- a/libgomp/config/linux/affinity.c
+++ b/libgomp/config/linux/affinity.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef HAVE_PTHREAD_AFFINITY_NP
 

Thanks,
Florian



Re: [PATCH V2] gcc: Add vec_select -> subreg RTL simplification

2021-07-12 Thread Richard Sandiford via Gcc-patches
Jonathan Wright  writes:
> Hi,
>
> Version 2 of this patch adds more code generation tests to show the
> benefit of this RTL simplification as well as adding a new helper function
> 'rtx_vec_series_p' to reduce code duplication.
>
> Patch tested as version 1 - ok for master?

Sorry for the slow reply.

> Regression tested and bootstrapped on aarch64-none-linux-gnu,
> x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf and
> aarch64_be-none-linux-gnu - no issues.

I've also tested this on powerpc64le-unknown-linux-gnu, no issues again.

> diff --git a/gcc/combine.c b/gcc/combine.c
> index 
> 6476812a21268e28219d1e302ee1c979d528a6ca..0ff6ca87e4432cfeff1cae1dd219ea81ea0b73e4
>  100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -6276,6 +6276,26 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, 
> int in_dest,
> - 1,
> 0));
>break;
> +case VEC_SELECT:
> +  {
> + rtx trueop0 = XEXP (x, 0);
> + mode = GET_MODE (trueop0);
> + rtx trueop1 = XEXP (x, 1);
> + int nunits;
> + /* If we select a low-part subreg, return that.  */
> + if (GET_MODE_NUNITS (mode).is_constant (&nunits)
> + && targetm.can_change_mode_class (mode, GET_MODE (x), ALL_REGS))
> +   {
> + int offset = BYTES_BIG_ENDIAN ? nunits - XVECLEN (trueop1, 0) : 0;
> +
> + if (rtx_vec_series_p (trueop1, offset))
> +   {
> + rtx new_rtx = lowpart_subreg (GET_MODE (x), trueop0, mode);
> + if (new_rtx != NULL_RTX)
> +   return new_rtx;
> +   }
> +   }
> +  }

Since this occurs three times, I think it would be worth having
a new predicate:

/* Return true if, for all OP of mode OP_MODE:

 (vec_select:RESULT_MODE OP SEL)

   is equivalent to the lowpart RESULT_MODE of OP.  */

bool
vec_series_lowpart_p (machine_mode result_mode, machine_mode op_mode, rtx sel)

containing the GET_MODE_NUNITS (…).is_constant, can_change_mode_class
and rtx_vec_series_p tests.

I think the function belongs in rtlanal.[hc], even though subreg_lowpart_p
is in emit-rtl.c.

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> aef6da9732d45b3586bad5ba57dafa438374ac3c..f12a0bebd3d6dd3381ac8248cd3fa3f519115105
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1884,15 +1884,16 @@
>  )
>  
>  (define_insn "*zero_extend2_aarch64"
> -  [(set (match_operand:GPI 0 "register_operand" "=r,r,w")
> -(zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" 
> "r,m,m")))]
> +  [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r")
> +(zero_extend:GPI (match_operand:SHORT 1 "nonimmediate_operand" 
> "r,m,m,w")))]
>""
>"@
> and\t%0, %1, 
> ldr\t%w0, %1
> -   ldr\t%0, %1"
> -  [(set_attr "type" "logic_imm,load_4,f_loads")
> -   (set_attr "arch" "*,*,fp")]
> +   ldr\t%0, %1
> +   umov\t%w0, %1.[0]"
> +  [(set_attr "type" "logic_imm,load_4,f_loads,neon_to_gp")
> +   (set_attr "arch" "*,*,fp,fp")]

FTR (just to show I thought about it): I don't know whether the umov
can really be considered an fp operation rather than a simd operation,
but since we don't support fp without simd, this is already a distinction
without a difference.  So the pattern is IMO OK as-is.

> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
> index 
> 55b6c1ac585a4cae0789c3afc0fccfc05a6d3653..93e963696dad30f29a76025696670f8b31bf2c35
>  100644
> --- a/gcc/config/arm/vfp.md
> +++ b/gcc/config/arm/vfp.md
> @@ -224,7 +224,7 @@
>  ;; problems because small constants get converted into adds.
>  (define_insn "*arm_movsi_vfp"
>[(set (match_operand:SI 0 "nonimmediate_operand" "=rk,r,r,r,rk,m 
> ,*t,r,*t,*t, *Uv")
> -  (match_operand:SI 1 "general_operand" "rk, 
> I,K,j,mi,rk,r,*t,*t,*Uvi,*t"))]
> +  (match_operand:SI 1 "general_operand" "rk, 
> I,K,j,mi,rk,r,t,*t,*Uvi,*t"))]
>"TARGET_ARM && TARGET_HARD_FLOAT
> && (   s_register_operand (operands[0], SImode)
> || s_register_operand (operands[1], SImode))"

I'll assume that an Arm maintainer would have spoken up by now if
they didn't want this for some reason.

> diff --git a/gcc/rtl.c b/gcc/rtl.c
> index 
> aaee882f5ca3e37b59c9829e41d0864070c170eb..3e8b3628b0b76b41889b77bb0019f582ee6f5aaa
>  100644
> --- a/gcc/rtl.c
> +++ b/gcc/rtl.c
> @@ -736,6 +736,19 @@ rtvec_all_equal_p (const_rtvec vec)
>  }
>  }
>  
> +/* Return true if element-selection indices in VEC are in series.  */
> +
> +bool
> +rtx_vec_series_p (const_rtx vec, int start)

I think rtvec_series_p would be better, for consistency with
rtvec_all_equal_p.  Also, let's generalise it to:

/* Return true if VEC contains a linear series of integers
   { START, START+1, START+2, ... }.  */

bool
rtvec_series_p (rtvec vec, int start)
{
}

> +{
> +  for (int i = 0; i < XVECLEN (vec, 0); i++)
> +{
> +  if (i + start != INTVAL (XVECEXP (vec, 0, i)))
> + return false;
> +}

Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-12 Thread Qing Zhao via Gcc-patches


> On Jul 12, 2021, at 2:51 AM, Richard Sandiford  
> wrote:
> 
> Martin Jambor  writes:
>> On Thu, Jul 08 2021, Qing Zhao wrote:
>>> (Resend this email since the previous one didn’t quote, I changed one
>>> setting in my mail client, hopefully that can fix this issue).
>>> 
>>> Hi, Martin,
>>> 
>>> Thank you for the review and comment.
>>> 
 On Jul 8, 2021, at 8:29 AM, Martin Jambor  wrote:
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index c05d22f3e8f1..35051d7c6b96 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -384,6 +384,13 @@ static struct
> 
>  /* Numbber of components created when splitting aggregate parameters.  */
>  int param_reductions_created;
> +
> +  /* Number of deferred_init calls that are modified.  */
> +  int deferred_init;
> +
> +  /* Number of deferred_init calls that are created by
> + generate_subtree_deferred_init.  */
> +  int subtree_deferred_init;
> } sra_stats;
> 
> static void
> @@ -4096,6 +4103,110 @@ get_repl_default_def_ssa_name (struct access 
> *racc, tree reg_type)
>  return get_or_create_ssa_default_def (cfun, racc->replacement_decl);
> }
> 
> +
> +/* Generate statements to call .DEFERRED_INIT to initialize scalar 
> replacements
> +   of accesses within a subtree ACCESS; all its children, siblings and 
> their
> +   children are to be processed.
> +   GSI is a statement iterator used to place the new statements.  */
> +static void
> +generate_subtree_deferred_init (struct access *access,
> + tree init_type,
> + tree is_vla,
> + gimple_stmt_iterator *gsi,
> + location_t loc)
> +{
> +  do
> +{
> +  if (access->grp_to_be_replaced)
> + {
> +   tree repl = get_access_replacement (access);
> +   gimple *call
> + = gimple_build_call_internal (IFN_DEFERRED_INIT, 3,
> +   TYPE_SIZE_UNIT (TREE_TYPE (repl)),
> +   init_type, is_vla);
> +   gimple_call_set_lhs (call, repl);
> +   gsi_insert_before (gsi, call, GSI_SAME_STMT);
> +   update_stmt (call);
> +   gimple_set_location (call, loc);
> +   sra_stats.subtree_deferred_init++;
> + }
> +  else if (access->grp_to_be_debug_replaced)
> + {
> +   tree drepl = get_access_replacement (access);
> +   tree call = build_call_expr_internal_loc
> +  (UNKNOWN_LOCATION, IFN_DEFERRED_INIT,
> +   TREE_TYPE (drepl), 3,
> +   TYPE_SIZE_UNIT (TREE_TYPE (drepl)),
> +   init_type, is_vla);
> +   gdebug *ds = gimple_build_debug_bind (drepl, call,
> + gsi_stmt (*gsi));
> +   gsi_insert_before (gsi, ds, GSI_SAME_STMT);
 
 Is handling of grp_to_be_debug_replaced accesses necessary here?  If so,
 why?  grp_to_be_debug_replaced accesses are there only to facilitate
 debug information about a part of an aggregate decl is that is likely
 going to be entirely removed - so that debuggers can sometimes show to
 users information about what they would contain had they not removed.
 It seems strange you need to mark them as uninitialized because they
 should not have any consumers.  (But perhaps it is also harmless.)
>>> 
>>> This part has been discussed during the 2nd version of the patch, but
>>> I think that more discussion might be necessary.
>>> 
>>> In the previous discussion, Richard Sandiford mentioned:
>>> (https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568620.html):
>>> 
>>> =
>>> 
>>> I guess the thing we need to decide here is whether -ftrivial-auto-var-init
>>> should affect debug-only constructs too.  If it doesn't, exmaining removed
>>> components in a debugger might show uninitialised values in cases where
>>> the user was expecting initialised ones.  There would be no security
>>> concern, but it might be surprising.
>>> 
>>> I think in principle the DRHS can contain a call to DEFERRED_INIT.
>>> Doing that would probably require further handling elsewhere though.
>>> 
>>> =
>>> 
>>> I am still not very confident now for this part of the change.
>> 
>> I see.  I still tend to think that with or without the generation of
>> gimple_build_debug_binds, the debugger would still not display any value
>> for the component in question.  Without it there would be no information
>> about the component at a any place in code affected by this, with it the
>> component would be explicitely uninitialized.  But OK.
> 
> FTR, I don't have a strong opinion here.  You know the code better
> than I do, so if you think not generating debug binds is better then
> let's do that.

I am okay with not generating debug binds here. 

Then I will just delete the part of code that guarded with  if 
(

Re: Repost: [PATCH] Deal with prefixed loads/stores in tests, PR testsuite/100166

2021-07-12 Thread Bill Schmidt via Gcc-patches

Hi Mike,

On 7/7/21 3:03 PM, Michael Meissner wrote:

[PATCH] Deal with prefixed loads/stores in tests, PR testsuite/100166

This patch updates the various tests in the testsuite to treat plxv
and pstxv as being vector loads/stores.  This shows up if you run the
testsuite with a compiler configured with the option: --with-cpu=power10.

I have verified that these tests now all pass when I build and test a compiler
on a power10 system using --with-cpu=power10.  I have verified that they
continue to run on power9 little endian and power8 big endian systems.

Can I check this into the master branch?

2021-07-07  Michael Meissner  

gcc/testsuite/
PR testsuite/100166
* gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c:

Please drop the gcc/testsuite/ part from this line.

* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c:
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c:
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c:
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c:
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c:
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-short.c:
* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-char.c:
* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-double.c:
* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-float.c:
* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-int.c:
* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-longlong.c:
* gcc.target/powerpc/fold-vec-load-vec_vsx_ld-short.c:
* gcc.target/powerpc/fold-vec-load-vec_xl-char.c:
* gcc.target/powerpc/fold-vec-load-vec_xl-double.c:
* gcc.target/powerpc/fold-vec-load-vec_xl-float.c:
* gcc.target/powerpc/fold-vec-load-vec_xl-int.c:
* gcc.target/powerpc/fold-vec-load-vec_xl-longlong.c:
* gcc.target/powerpc/fold-vec-load-vec_xl-short.c:
* gcc.target/powerpc/fold-vec-splat-floatdouble.c:
* gcc.target/powerpc/fold-vec-splat-longlong.c:
* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-char.c:
* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-double.c:
* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-float.c:
* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-int.c:
* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-longlong.c:
* gcc.target/powerpc/fold-vec-store-builtin_vec_xst-short.c:
* gcc.target/powerpc/fold-vec-store-vec_vsx_st-char.c:
* gcc.target/powerpc/fold-vec-store-vec_vsx_st-double.c:
* gcc.target/powerpc/fold-vec-store-vec_vsx_st-float.c:
* gcc.target/powerpc/fold-vec-store-vec_vsx_st-int.c:
* gcc.target/powerpc/fold-vec-store-vec_vsx_st-longlong.c:
* gcc.target/powerpc/fold-vec-store-vec_vsx_st-short.c:
* gcc.target/powerpc/fold-vec-store-vec_xst-char.c:
* gcc.target/powerpc/fold-vec-store-vec_xst-double.c:
* gcc.target/powerpc/fold-vec-store-vec_xst-float.c:
* gcc.target/powerpc/fold-vec-store-vec_xst-int.c:
* gcc.target/powerpc/fold-vec-store-vec_xst-longlong.c:
* gcc.target/powerpc/fold-vec-store-vec_xst-short.c:
* gcc.target/powerpc/lvsl-lvsr.c:
* gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c:
Update insn counts to account for power10 prefixed loads and
stores.

Also here.

---
  .../vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c   | 2 +-
  .../gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c | 2 +-
  .../powerpc/fold-vec-load-builtin_vec_xl-double.c  | 2 +-
  .../powerpc/fold-vec-load-builtin_vec_xl-float.c   | 2 +-
  .../gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c  | 2 +-
  .../powerpc/fold-vec-load-builtin_vec_xl-longlong.c| 2 +-
  .../powerpc/fold-vec-load-builtin_vec_xl-short.c   | 2 +-
  .../gcc.target/powerpc/fold-vec-load-vec_vsx_ld-char.c | 2 +-
  .../gcc.target/powerpc/fold-vec-load-vec_vsx_ld-double.c   | 2 +-
  .../gcc.target/powerpc/fold-vec-load-vec_vsx_ld-float.c| 2 +-
  .../gcc.target/powerpc/fold-vec-load-vec_vsx_ld-int.c  | 2 +-
  .../gcc.target/powerpc/fold-vec-load-vec_vsx_ld-longlong.c | 2 +-
  .../gcc.target/powerpc/fold-vec-load-vec_vsx_ld-short.c| 2 +-
  .../gcc.target/powerpc/fold-vec-load-vec_xl-char.c | 2 +-
  .../gcc.target/powerpc/fold-vec-load-vec_xl-double.c   | 2 +-
  .../gcc.target/powerpc/fold-vec-load-vec_xl-float.c| 2 +-
  .../gcc.target/powerpc/fold-vec-load-vec_xl-int.c  | 2 +-
  .../gcc.target/powerpc/fold-vec-load-vec_xl-longlong.c | 2 +-
  .../gcc.target/powerpc/fold-vec-load-vec_xl-short.c| 2 +-
  .../gcc.target/powerpc/fold-vec-splat-floatdouble.c| 7 ---
  gcc/testsuite/gcc.target/powerpc/fold-vec-splat-longlong.c | 2 +-
  .../powerpc/fold-vec-store-builtin_vec_xst-char.c  | 2 +-
  .../powerpc/fold-vec-store-builtin_vec_xst-doubl

Re: Repost: [PATCH] Fix vec-splati-runnable.c test.

2021-07-12 Thread Bill Schmidt via Gcc-patches

Hi Mike,

On 7/7/21 3:00 PM, Michael Meissner wrote:

[PATCH] Fix vec-splati-runnable.c test.

I noticed that the vec-splati-runnable.c did not have an abort after one
of the tests.  If the test was run with optimization, the optimizer could
delete some of the tests and throw off the count.  However, due to the
fact that the value being loaded in that test is undefined, I did not
check what value was loaded, but I just stored it into a volatile global
variable.

2021-07-07  Michael Meissner  

gcc/testsuite/
* gcc.target/powerpc/vec-splati-runnable.c: Run test with -O2
optimization.  Do not check what XXSPLTIDP generates if the value
is undefined.
---
  .../gcc.target/powerpc/vec-splati-runnable.c  | 29 ++-
  1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index e84ce77a21d..a135279b1d7 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -1,7 +1,7 @@
  /* { dg-do run { target { power10_hw } } } */
  /* { dg-do link { target { ! power10_hw } } } */
  /* { dg-require-effective-target power10_ok } */
-/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
+/* { dg-options "-mdejagnu-cpu=power10 -save-temps -O2" } */


Why did you restrict optimization here?  The tests should be run at 
various opt levels if you don't specify this, right?


The test changes are otherwise okay, but I'd like to understand this first.

Thanks,
Bill


  #include 
  
  #define DEBUG 0

@@ -12,6 +12,8 @@
  
  extern void abort (void);
  
+volatile vector double vresult_d_undefined;

+
  int
  main (int argc, char *argv [])
  {
@@ -85,25 +87,12 @@ main (int argc, char *argv [])
  #endif
}
  
-  /* This test will generate a "note" to the user that the argument

- is subnormal.  It is not an error, but results are not defined.  */
-  vresult_d = (vector double) { 2.0, 3.0 };
-  expected_vresult_d = (vector double) { 6.6E-42f, 6.6E-42f };
-
-  vresult_d = vec_splatid (6.6E-42f);
-
-  /* Although the instruction says the results are not defined, it does seem
- to work, at least on Mambo.  But no guarentees!  */
-  if (!vec_all_eq (vresult_d,  expected_vresult_d)) {
-#if DEBUG
-printf("ERROR, vec_splati (6.6E-42f)\n");
-for(i = 0; i < 2; i++)
-  printf(" vresult_d[%i] = %e, expected_vresult_d[%i] = %e\n",
-i, vresult_d[i], i, expected_vresult_d[i]);
-#else
-;
-#endif
-  }
+  /* This test will generate a "note" to the user that the argument is
+ subnormal.  It is not an error, but results are not defined.  Because this
+ is undefined, we cannot check that any value is correct.  Just store it in
+ a volatile variable so the XXSPLTIDP instruction gets generated and the
+ warning message printed. */
+  vresult_d_undefined = vec_splatid (6.6E-42f);
  
/* Vector splat immediate */

vsrc_a_int = (vector int) { 2, 3, 4, 5 };


Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-12 Thread guojiufu via Gcc-patches

On 2021-07-12 22:46, Richard Biener wrote:

On Mon, 12 Jul 2021, guojiufu wrote:


On 2021-07-12 18:02, Richard Biener wrote:
> On Mon, 12 Jul 2021, guojiufu wrote:
>
>> On 2021-07-12 16:57, Richard Biener wrote:
>> > On Mon, 12 Jul 2021, guojiufu wrote:
>> >
>> >> On 2021-07-12 14:20, Richard Biener wrote:
>> >> > On Fri, 9 Jul 2021, Segher Boessenkool wrote:
>> >> >
>> >> >> On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote:
>> >> >> > I wonder if there's a way to query the target what modes the doloop
>> >> >> > pattern can handle (not being too familiar with the doloop code).
>> >> >>
>> >> >> You can look what modes are allowed for operand 0 of doloop_end,
>> >> >> perhaps?  Although that is a define_expand, not a define_insn, so it
>> >> >> is
>> >> >> hard to introspect.
>> >> >>
>> >> >> > Why do you need to do any checks besides the new type being able to
>> >> >> > represent all IV values?  The original doloop IV will never wrap
>> >> >> > (OTOH if niter is U*_MAX then we compute niter + 1 which will
>> >> >> > become
>> >> >> > zero ... I suppose the doloop might still do the correct thing here
>> >> >> > but it also still will with a IV with larger type).
>> >>
>> >> The issue comes from U*_MAX (original short MAX), as you said: on which
>> >> niter + 1 becomes zero.  And because the step for doloop is -1; then, on
>> >> larger type 'zero - 1' will be a very large number on larger type
>> >> (e.g. 0xff...ff); but on the original short type 'zero - 1' is a small
>> >> value
>> >> (e.g. "0xff").
>> >
>> > But for the larger type the small type MAX + 1 fits and does not yield
>> > zero so it should still work exactly as before, no?  Of course you
>> > have to compute the + 1 in the larger type.
>> >
>> You are right, if compute the "+ 1" in the larger type it is ok, as below
>> code:
>> ```
>>/* Use type in word size may fast.  */
>> if (TYPE_PRECISION (ntype) < BITS_PER_WORD)
>>   {
>> ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
>> niter = fold_convert (ntype, niter);
>>   }
>>
>> tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter),
>>  build_int_cst (ntype, 1));
>>
>>
>> add_candidate (data, base, build_int_cst (ntype, -1), true, NULL, NULL,
>> true);
>> ```
>> The issue of this is, this code generates more stmt for doloop.xxx:
>>   _12 = (unsigned int) xx(D);
>>   _10 = _12 + 4294967295;
>>   _24 = (long unsigned int) _10;
>>   doloop.6_8 = _24 + 1;
>>
>> if use previous patch, "+ 1" on original type, then the stmts will looks
>> like:
>>   _12 = (unsigned int) xx(D);
>>   doloop.6_8 = (long unsigned int) _12;
>>
>> This is the reason for checking
>>wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE (ntype)))
>
> But this then only works when there's an upper bound on the number
> of iterations.  Note you should not use TYPE_MAX_VALUE here but
> you can instead use
>
>  wi::ltu_p (niter_desc->max, wi::to_widest (wi::max_value
> (TYPE_PRECISION (ntype), TYPE_SIGN (ntype;

Ok, Thanks!
I remember you mentioned that:
widest_int::from (wi::max_value (TYPE_PRECISION (ntype), TYPE_SIGN 
(ntype)),

TYPE_SIGN (ntype))
would be better than
wi::to_widest (TYPE_MAX_VALUE (ntype)).

It seems that:
"TYPE_MAX_VALUE (ntype)" is "NUMERICAL_TYPE_CHECK
(NODE)->type_non_common.maxval"
which do a numerical-check and return the field of maxval.  And then 
call to

wi::to_widest

The other code "widest_int::from (wi::max_value (..,..),..)" calls
wi::max_value
and widest_int::from.

I'm wondering if wi::to_widest (TYPE_MAX_VALUE (ntype)) is cheaper?


TYPE_MAX_VALUE can be "suprising", it does not necessarily match the
underlying modes precision.  At some point we've tried to eliminate
most of its uses, not sure what the situation/position is right now.

Ok, get it, thanks.
I will use "widest_int::from (wi::max_value (..,..),..)".




> I think the -1 above comes from number of latch iterations vs. header
> entries - it's a common source for this kind of issues.  range analysis
> might be able to prove that we can still merge the two adds even with
> the intermediate extension.
Yes, as you mentioned here, it relates to number of latch iterations
For loop looks like : while (l < n) or for (i = 0; i < n; i++)
This kind of loop, the niter is used to be 'n - 1' after transformed
into 'do-while' form.

For this kind of loop, the max value for the number of iteration "n - 1"
would be "max_value_type(n) - 1" which is wi::ltu than max_value_type.
This kind of loop is already common, and we could use wi::ltu (max, 
max_value_type)

to check.

For loop looks like:
  do ;
  while (n-- > 0); /* while  (n-- > low); */

The niter_desc->max will wi::eq to max_value_type, and niter would be 
"n",

and then doloop.xx is 'n+1'.


I would see how to merge these two adds safely at this point
when generating doloop iv. (maybe range info, thanks!

>
> Is this pre-loop extra add really offsetting the in-loop doloop

[RFA] Some libgcc headers are missing the runtime exception

2021-07-12 Thread Richard Sandiford via Gcc-patches
David Edelsohn  writes:
> On Fri, Jul 9, 2021 at 1:31 PM Richard Sandiford
>  wrote:
>>
>> David Edelsohn  writes:
>> > On Fri, Jul 9, 2021 at 12:53 PM Richard Sandiford via Gcc
>> >  wrote:
>> >>
>> >> Hi,
>> >>
>> >> It was pointed out to me off-list that config/aarch64/value-unwind.h
>> >> is missing the runtime exception.  It looks like a few other files
>> >> are too; a fuller list is:
>> >>
>> >> libgcc/config/aarch64/value-unwind.h
>> >> libgcc/config/frv/frv-abi.h
>> >> libgcc/config/i386/value-unwind.h
>> >> libgcc/config/pa/pa64-hpux-lib.h
>> >>
>> >> Certainly for the aarch64 file this was simply a mistake;
>> >> it seems to have been copied from the i386 version, both of which
>> >> reference the runtime exception but don't actually include it.
>> >>
>> >> What's the procedure for fixing this?  Can we treat it as a textual
>> >> error or do the files need to be formally relicensed?
>> >
>> > I'm unsure what you mean by "formally relicensed".
>>
>> It seemed like there were two possibilities: the licence of the files
>> is actually GPL + exception despite what the text says (the textual
>> error case), or the licence of the files is plain GPL because the text
>> has said so since the introduction of the files.  In the latter case
>> I'd have imagined that someone would need to relicense the code so
>> that it is GPL + exception.
>>
>> > It generally is considered a textual omission.  The runtime library
>> > components of GCC are intended to be licensed under the runtime
>> > exception, which was granted and approved at the time of introduction.
>>
>> OK, thanks.  So would a patch to fix at least the i386 and aarch64 header
>> files be acceptable?  (I'm happy to fix the other two as well if that's
>> definitely the right thing to do.  It's just that there's more history
>> involved there…)
>
> Please correct the text in the files. The files in libgcc used in the
> GCC runtime are intended to be licensed with the runtime exception and
> GCC previously was granted approval for that licensing and purpose.
>
> As you are asking the question, I sincerely doubt that ARM and Cavium
> intended to apply a license without the exception to those files.  And
> similarly for Intel and FRV.

FTR, I think only Linaro (rather than Arm) touched the aarch64 file.

> The runtime exception explicitly was intended for this purpose and
> usage at the time that GCC received approval to apply the exception.

Ack.  Is the patch below OK for trunk and branches?

Thanks,
Richard

>From a601ac8ea9be14a898215456c22cd826e8fd92d9 Mon Sep 17 00:00:00 2001
From: Richard Sandiford 
Date: Mon, 12 Jul 2021 13:04:56 +0100
Subject: [PATCH] libgcc: Add missing runtime exception notices
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Quoting from https://gcc.gnu.org/pipermail/gcc/2021-July/236716.html:


It was pointed out to me off-list that config/aarch64/value-unwind.h
is missing the runtime exception.  It looks like a few other files
are too; a fuller list is:

libgcc/config/aarch64/value-unwind.h
libgcc/config/frv/frv-abi.h
libgcc/config/i386/value-unwind.h
libgcc/config/pa/pa64-hpux-lib.h

Certainly for the aarch64 file this was simply a mistake;
it seems to have been copied from the i386 version, both of which
reference the runtime exception but don't actually include it.


Similarly, frv-abi.h referenced the exception but didn't include it.
pa64-hpux-lib.h was missing any reference to the exception.

The decision was that this was simply a mistake
[https://gcc.gnu.org/pipermail/gcc/2021-July/236717.html]:


[…] It generally is
considered a textual omission.  The runtime library components of GCC
are intended to be licensed under the runtime exception, which was
granted and approved at the time of introduction.


and that we should simply change all of the files above
[https://gcc.gnu.org/pipermail/gcc/2021-July/236719.html]:


Please correct the text in the files. The files in libgcc used in the
GCC runtime are intended to be licensed with the runtime exception and
GCC previously was granted approval for that licensing and purpose.

[…]

The runtime exception explicitly was intended for this purpose and
usage at the time that GCC received approval to apply the exception.


libgcc/
	* config/aarch64/value-unwind.h: Add missing runtime exception
	paragraph.
	* config/frv/frv-abi.h: Likewise.
	* config/i386/value-unwind.h: Likewise.
	* config/pa/pa64-hpux-lib.h: Likewise.
---
 libgcc/config/aarch64/value-unwind.h | 4 
 libgcc/config/frv/frv-abi.h  | 4 
 libgcc/config/i386

Re: [PATCH] Check type size for doloop iv on BITS_PER_WORD [PR61837]

2021-07-12 Thread guojiufu via Gcc-patches

On 2021-07-12 23:53, guojiufu via Gcc-patches wrote:

On 2021-07-12 22:46, Richard Biener wrote:

On Mon, 12 Jul 2021, guojiufu wrote:


On 2021-07-12 18:02, Richard Biener wrote:
> On Mon, 12 Jul 2021, guojiufu wrote:
>
>> On 2021-07-12 16:57, Richard Biener wrote:
>> > On Mon, 12 Jul 2021, guojiufu wrote:
>> >
>> >> On 2021-07-12 14:20, Richard Biener wrote:
>> >> > On Fri, 9 Jul 2021, Segher Boessenkool wrote:
>> >> >
>> >> >> On Fri, Jul 09, 2021 at 08:43:59AM +0200, Richard Biener wrote:
>> >> >> > I wonder if there's a way to query the target what modes the doloop
>> >> >> > pattern can handle (not being too familiar with the doloop code).
>> >> >>
>> >> >> You can look what modes are allowed for operand 0 of doloop_end,
>> >> >> perhaps?  Although that is a define_expand, not a define_insn, so it
>> >> >> is
>> >> >> hard to introspect.
>> >> >>
>> >> >> > Why do you need to do any checks besides the new type being able to
>> >> >> > represent all IV values?  The original doloop IV will never wrap
>> >> >> > (OTOH if niter is U*_MAX then we compute niter + 1 which will
>> >> >> > become
>> >> >> > zero ... I suppose the doloop might still do the correct thing here
>> >> >> > but it also still will with a IV with larger type).
>> >>
>> >> The issue comes from U*_MAX (original short MAX), as you said: on which
>> >> niter + 1 becomes zero.  And because the step for doloop is -1; then, on
>> >> larger type 'zero - 1' will be a very large number on larger type
>> >> (e.g. 0xff...ff); but on the original short type 'zero - 1' is a small
>> >> value
>> >> (e.g. "0xff").
>> >
>> > But for the larger type the small type MAX + 1 fits and does not yield
>> > zero so it should still work exactly as before, no?  Of course you
>> > have to compute the + 1 in the larger type.
>> >
>> You are right, if compute the "+ 1" in the larger type it is ok, as below
>> code:
>> ```
>>/* Use type in word size may fast.  */
>> if (TYPE_PRECISION (ntype) < BITS_PER_WORD)
>>   {
>> ntype = lang_hooks.types.type_for_size (BITS_PER_WORD, 1);
>> niter = fold_convert (ntype, niter);
>>   }
>>
>> tree base = fold_build2 (PLUS_EXPR, ntype, unshare_expr (niter),
>>  build_int_cst (ntype, 1));
>>
>>
>> add_candidate (data, base, build_int_cst (ntype, -1), true, NULL, NULL,
>> true);
>> ```
>> The issue of this is, this code generates more stmt for doloop.xxx:
>>   _12 = (unsigned int) xx(D);
>>   _10 = _12 + 4294967295;
>>   _24 = (long unsigned int) _10;
>>   doloop.6_8 = _24 + 1;
>>
>> if use previous patch, "+ 1" on original type, then the stmts will looks
>> like:
>>   _12 = (unsigned int) xx(D);
>>   doloop.6_8 = (long unsigned int) _12;
>>
>> This is the reason for checking
>>wi::ltu_p (niter_desc->max, wi::to_widest (TYPE_MAX_VALUE (ntype)))
>
> But this then only works when there's an upper bound on the number
> of iterations.  Note you should not use TYPE_MAX_VALUE here but
> you can instead use
>
>  wi::ltu_p (niter_desc->max, wi::to_widest (wi::max_value
> (TYPE_PRECISION (ntype), TYPE_SIGN (ntype;

Ok, Thanks!
I remember you mentioned that:
widest_int::from (wi::max_value (TYPE_PRECISION (ntype), TYPE_SIGN 
(ntype)),

TYPE_SIGN (ntype))
would be better than
wi::to_widest (TYPE_MAX_VALUE (ntype)).

It seems that:
"TYPE_MAX_VALUE (ntype)" is "NUMERICAL_TYPE_CHECK
(NODE)->type_non_common.maxval"
which do a numerical-check and return the field of maxval.  And then 
call to

wi::to_widest

The other code "widest_int::from (wi::max_value (..,..),..)" calls
wi::max_value
and widest_int::from.

I'm wondering if wi::to_widest (TYPE_MAX_VALUE (ntype)) is cheaper?


TYPE_MAX_VALUE can be "suprising", it does not necessarily match the
underlying modes precision.  At some point we've tried to eliminate
most of its uses, not sure what the situation/position is right now.

Ok, get it, thanks.
I will use "widest_int::from (wi::max_value (..,..),..)".




> I think the -1 above comes from number of latch iterations vs. header
> entries - it's a common source for this kind of issues.  range analysis
> might be able to prove that we can still merge the two adds even with
> the intermediate extension.
Yes, as you mentioned here, it relates to number of latch iterations
For loop looks like : while (l < n) or for (i = 0; i < n; i++)
This kind of loop, the niter is used to be 'n - 1' after transformed
into 'do-while' form.
For this kind of loop, the max value for the number of iteration "n - 
1"

would be "max_value_type(n) - 1" which is wi::ltu than max_value_type.
This kind of loop is already common, and we could use wi::ltu (max,
max_value_type)
to check.

For loop looks like:
  do ;
  while (n-- > 0); /* while  (n-- > low); */

The niter_desc->max will wi::eq to max_value_type, and niter would be 
"n",

and then doloop.xx is 'n+1'.


I would see how to merge these two adds safely at this point
when generating doloop iv. (maybe range info, thanks!

>
> Is this pr

Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Gavin Smith via Gcc-patches
On Mon, Jul 12, 2021 at 4:04 PM Jonathan Wakely via Gcc  
wrote:
> GNU Hello has the same problem with its docs:
> https://www.gnu.org/software/hello/manual/hello.html#index-_002dg
> That URL is garbage because of the URL-encoded %2d character, and the
> fact it links to the wrong place (the description of the option, not
> the option itself). The former is no longer an issue for GCC (it was
> for many years) but the latter is still a problem.
>
> If you don't know where to find it yourself, the source is visible here:
> https://github.com/yugui/example/blob/master/doc/hello.texi#L208

I downloaded the source for the "hello" manual and recreated it with
Texinfo 6.8 (running " texi2any --html hello.texi --no-split"). I've
attached the results. The current output doesn't exhibit the problem
with the scrolling being at the wrong place - this problem has
evidently resolved itself since the time when the online "hello"
manual was generated. (I don't remember many complaints about it on
the mailing list, though: if we don't know about problems, we can't
fix them.)

The URL is mangled because index entries can have more characters in
them than what is suitable for a URL. A space character becomes a "-",
so a "-" has to become something else. They have to be distinguished
because there may be two separate index entries in different places
which wouldn't be distinguishable otherwise.

However, I find that adding an extra index entry means you can use
hello.html#index-greeting instead:

@item --greeting=@var{text}
@itemx -g @var{text}
@opindex greeting
@opindex --greeting
@opindex -g
Output @var{text} instead of the default greeting.


Re: Repost: [PATCH] Change rs6000_const_f32_to_i32 return type.

2021-07-12 Thread Bill Schmidt via Gcc-patches

Hi Mike,

On 7/7/21 2:59 PM, Michael Meissner wrote:

[PATCH] Change rs6000_const_f32_to_i32 return type.

The function rs6000_const_f32_to_i32 called REAL_VALUE_TO_TARGET_SINGLE
with a long long type and returns it.  This patch changes the type to long
which is the proper type for REAL_VALUE_TO_TARGET_SINGLE.

2021-07-07  Michael Meissner  

gcc/
* config/rs6000/rs6000-protos.h (rs6000_const_f32_to_i32): Change
return type to long.
* config/rs6000/rs6000.c (rs6000_const_f32_to_i32): Change return
type to long.
---
  gcc/config/rs6000/rs6000-protos.h | 2 +-
  gcc/config/rs6000/rs6000.c| 6 --
  2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 9de294d3b28..94bf961c6b7 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -281,7 +281,7 @@ extern void rs6000_asm_output_dwarf_pcrel (FILE *file, int 
size,
   const char *label);
  extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size,
 const char *label);
-extern long long rs6000_const_f32_to_i32 (rtx operand);
+extern long rs6000_const_f32_to_i32 (rtx operand);
  
  /* Declare functions in rs6000-c.c */
  
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c

index 9a5db63d0ef..de11de5e079 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -27936,10 +27936,12 @@ rs6000_invalid_conversion (const_tree fromtype, 
const_tree totype)
return NULL;
  }
  
-long long

+/* Convert a SFmode constant to the integer bit pattern.  */
+
+long
  rs6000_const_f32_to_i32 (rtx operand)
  {
-  long long value;
+  long value;
const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (operand);
  
gcc_assert (GET_MODE (operand) == SFmode);


These changes look OK.  Can you please also fix the expander for 
xxspltiw_v4sf, which incorrectly expects a long long?


I can't approve, but recommend approval with that also fixed.

Thanks!
Bill



Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Jonathan Wakely via Gcc-patches
On Mon, 12 Jul 2021 at 17:01, Gavin Smith wrote:
>
> On Mon, Jul 12, 2021 at 4:04 PM Jonathan Wakely via Gcc  
> wrote:
> > GNU Hello has the same problem with its docs:
> > https://www.gnu.org/software/hello/manual/hello.html#index-_002dg
> > That URL is garbage because of the URL-encoded %2d character, and the
> > fact it links to the wrong place (the description of the option, not
> > the option itself). The former is no longer an issue for GCC (it was
> > for many years) but the latter is still a problem.
> >
> > If you don't know where to find it yourself, the source is visible here:
> > https://github.com/yugui/example/blob/master/doc/hello.texi#L208
>
> I downloaded the source for the "hello" manual and recreated it with
> Texinfo 6.8 (running " texi2any --html hello.texi --no-split"). I've
> attached the results. The current output doesn't exhibit the problem
> with the scrolling being at the wrong place - this problem has
> evidently resolved itself since the time when the online "hello"
> manual was generated. (I don't remember many complaints about it on
> the mailing list, though: if we don't know about problems, we can't
> fix them.)

The "copyable link" does work as I would expect. The #index-_002dg
anchor still seems to be in the "wrong" place, i.e. in the 
element not the  element. But the addition of the copyable link
nicely solves the problem of needing to easily obtain a link to the
right position.

> The URL is mangled because index entries can have more characters in
> them than what is suitable for a URL. A space character becomes a "-",
> so a "-" has to become something else.

Yes, I understand the reason.


Re: [RFA] Some libgcc headers are missing the runtime exception

2021-07-12 Thread David Edelsohn via Gcc-patches
On Mon, Jul 12, 2021 at 11:58 AM Richard Sandiford
 wrote:
>
> David Edelsohn  writes:
> > On Fri, Jul 9, 2021 at 1:31 PM Richard Sandiford
> >  wrote:
> >>
> >> David Edelsohn  writes:
> >> > On Fri, Jul 9, 2021 at 12:53 PM Richard Sandiford via Gcc
> >> >  wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> It was pointed out to me off-list that config/aarch64/value-unwind.h
> >> >> is missing the runtime exception.  It looks like a few other files
> >> >> are too; a fuller list is:
> >> >>
> >> >> libgcc/config/aarch64/value-unwind.h
> >> >> libgcc/config/frv/frv-abi.h
> >> >> libgcc/config/i386/value-unwind.h
> >> >> libgcc/config/pa/pa64-hpux-lib.h
> >> >>
> >> >> Certainly for the aarch64 file this was simply a mistake;
> >> >> it seems to have been copied from the i386 version, both of which
> >> >> reference the runtime exception but don't actually include it.
> >> >>
> >> >> What's the procedure for fixing this?  Can we treat it as a textual
> >> >> error or do the files need to be formally relicensed?
> >> >
> >> > I'm unsure what you mean by "formally relicensed".
> >>
> >> It seemed like there were two possibilities: the licence of the files
> >> is actually GPL + exception despite what the text says (the textual
> >> error case), or the licence of the files is plain GPL because the text
> >> has said so since the introduction of the files.  In the latter case
> >> I'd have imagined that someone would need to relicense the code so
> >> that it is GPL + exception.
> >>
> >> > It generally is considered a textual omission.  The runtime library
> >> > components of GCC are intended to be licensed under the runtime
> >> > exception, which was granted and approved at the time of introduction.
> >>
> >> OK, thanks.  So would a patch to fix at least the i386 and aarch64 header
> >> files be acceptable?  (I'm happy to fix the other two as well if that's
> >> definitely the right thing to do.  It's just that there's more history
> >> involved there…)
> >
> > Please correct the text in the files. The files in libgcc used in the
> > GCC runtime are intended to be licensed with the runtime exception and
> > GCC previously was granted approval for that licensing and purpose.
> >
> > As you are asking the question, I sincerely doubt that ARM and Cavium
> > intended to apply a license without the exception to those files.  And
> > similarly for Intel and FRV.
>
> FTR, I think only Linaro (rather than Arm) touched the aarch64 file.
>
> > The runtime exception explicitly was intended for this purpose and
> > usage at the time that GCC received approval to apply the exception.
>
> Ack.  Is the patch below OK for trunk and branches?

I'm not certain whom you are asking for approval, but it looks good to me.

It would be nice to add SPDX License Identifier at the top of the
files as well, but that's not required.

Thanks, David


Re: disable -Warray-bounds in libgo (PR 101374)

2021-07-12 Thread Martin Sebor via Gcc-patches

On 7/9/21 7:26 AM, Rainer Orth wrote:

Hi Martin,


Yesterday's enhancement to -Warray-bounds has exposed a couple of
issues in libgo where the code writes into an invalid constant
address that the warning is designed to flag.

On the assumption that those invalid addresses are deliberate,
the attached patch suppresses these instances by using #pragma
GCC diagnostic but I don't think I'm supposed to commit it (at
least Git won't let me).  To avoid Go bootstrap failures please
either apply the patch or otherwise suppress the warning (e.g.,
by using a volatile pointer temporary).


while this patch does fix the libgo bootstrap failure, Go is completely
broken: almost 1000 go.test failures and all libgo tests FAIL as well.
Seen on both i386-pc-solaris2.11 and sparc-sun-solaris2.11.


FWIW, I see exactly the same failures on x86_64-pc-linux-gnu, so nothing
Solaris-specific here.


I don't normally test Go because of PR 91992, but I see just
the three test failures below on x86_64-linux with the latest trunk:

FAIL: go.test/test/fixedbugs/issue10441.go   -O  (test for excess errors)
FAIL: ./index0-out.go execution,  -O0 -g -fno-var-tracking-assignments
FAIL: runtime/pprof

The excess errors don't look related to my changes:

FAIL: go.test/test/fixedbugs/issue10441.go   -O  (test for excess errors)
Excess errors:
/usr/bin/ld: 
/ssd/test/build/gcc-trunk/x86_64-pc-linux-gnu/./libgo/.libs/libgo.so: 
undefined reference to `__go_init_main'
/usr/bin/ld: 
/ssd/test/build/gcc-trunk/x86_64-pc-linux-gnu/./libgo/.libs/libgo.so: 
undefined reference to `main.main'


My libgo.log shows the FAILs below.  I don't know how to interpret
that but nothing in the file suggests that my change is the cause
of these failures

--- FAIL: ExampleFrames (0.00s)
FAIL
FAIL: runtime
--- FAIL: TestConvertCPUProfile (0.00s)
--- FAIL: TestConvertMemProfile (0.00s)
--- FAIL: TestConvertMemProfile/heap (0.00s)
--- FAIL: TestConvertMemProfile/allocs (0.00s)
FAIL
FAIL: runtime/pprof
--- FAIL: ExampleFrames (0.00s)
FAIL
FAIL: runtime
--- FAIL: TestDurationSeconds (0.00s)
--- FAIL: ExampleParseDuration (0.00s)
FAIL
FAIL: time

If you see different failures in your build that look like they
might be caused by them then please show what those are.

Martin


Re: [RFA] Some libgcc headers are missing the runtime exception

2021-07-12 Thread Richard Sandiford via Gcc-patches
David Edelsohn  writes:
> On Mon, Jul 12, 2021 at 11:58 AM Richard Sandiford
>  wrote:
>>
>> David Edelsohn  writes:
>> > On Fri, Jul 9, 2021 at 1:31 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> David Edelsohn  writes:
>> >> > On Fri, Jul 9, 2021 at 12:53 PM Richard Sandiford via Gcc
>> >> >  wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> It was pointed out to me off-list that config/aarch64/value-unwind.h
>> >> >> is missing the runtime exception.  It looks like a few other files
>> >> >> are too; a fuller list is:
>> >> >>
>> >> >> libgcc/config/aarch64/value-unwind.h
>> >> >> libgcc/config/frv/frv-abi.h
>> >> >> libgcc/config/i386/value-unwind.h
>> >> >> libgcc/config/pa/pa64-hpux-lib.h
>> >> >>
>> >> >> Certainly for the aarch64 file this was simply a mistake;
>> >> >> it seems to have been copied from the i386 version, both of which
>> >> >> reference the runtime exception but don't actually include it.
>> >> >>
>> >> >> What's the procedure for fixing this?  Can we treat it as a textual
>> >> >> error or do the files need to be formally relicensed?
>> >> >
>> >> > I'm unsure what you mean by "formally relicensed".
>> >>
>> >> It seemed like there were two possibilities: the licence of the files
>> >> is actually GPL + exception despite what the text says (the textual
>> >> error case), or the licence of the files is plain GPL because the text
>> >> has said so since the introduction of the files.  In the latter case
>> >> I'd have imagined that someone would need to relicense the code so
>> >> that it is GPL + exception.
>> >>
>> >> > It generally is considered a textual omission.  The runtime library
>> >> > components of GCC are intended to be licensed under the runtime
>> >> > exception, which was granted and approved at the time of introduction.
>> >>
>> >> OK, thanks.  So would a patch to fix at least the i386 and aarch64 header
>> >> files be acceptable?  (I'm happy to fix the other two as well if that's
>> >> definitely the right thing to do.  It's just that there's more history
>> >> involved there…)
>> >
>> > Please correct the text in the files. The files in libgcc used in the
>> > GCC runtime are intended to be licensed with the runtime exception and
>> > GCC previously was granted approval for that licensing and purpose.
>> >
>> > As you are asking the question, I sincerely doubt that ARM and Cavium
>> > intended to apply a license without the exception to those files.  And
>> > similarly for Intel and FRV.
>>
>> FTR, I think only Linaro (rather than Arm) touched the aarch64 file.
>>
>> > The runtime exception explicitly was intended for this purpose and
>> > usage at the time that GCC received approval to apply the exception.
>>
>> Ack.  Is the patch below OK for trunk and branches?
>
> I'm not certain whom you are asking for approval,

I was assuming it would need a global reviewer.

> but it looks good to me.

Thanks.

> It would be nice to add SPDX License Identifier at the top of the
> files as well, but that's not required.

Yeah, I agree that might a good thing to have, but TBH I try to keep
my involvement with licensing stuff to the bare minimum :-)

Richard


Re: Benefits of using Sphinx documentation format

2021-07-12 Thread David Malcolm via Gcc-patches
On Mon, 2021-07-12 at 15:25 +0200, Martin Liška wrote:
> Hello.
> 
> Let's make it a separate sub-thread where we can discuss motivation
> why
> do I want moving to Sphinx format.
> 
> Benefits:
> 1) modern looking HTML output (before: [1], after: [2]):

"modern looking" is rather subjective; I'd rate Sphinx's output as
looking like it's from 2010s (last decade), whereas Texinfos' looks
like it's from the 1990s.  In theory this ought not to matter, but
every time I look at our documentation it gives me a depressing
feeling, reminiscent of a graveyard, that discourages me from fixing
things.

>     a) syntax highlighting for examples (code, shell commands, etc.)

...with support for multiple programming languages, potentially on the
same page.  For example, in the libgccjit docs:
  https://gcc.gnu.org/onlinedocs/jit/intro/tutorial02.html
we can have a mixture of C, assembler and shell on one page, and each
example is syntax-highlighted accordingly.  It's not clear to me how to
do that in texinfo, since there needs to be a way to express what
language an example is in.

>     b) precise anchors, the current Texinfo anchors are not displayed
> (start with first line of an option)

...and the URLs are sane and stable (so e.g. there is a reliable,
guessable, readable URL for the docs for say, "-Wall").

>     c) one can easily copy a link to an anchor (displayed as ¶)
>     d) internal links are working, e.g. one can easily jump from
> listing of options
>     e) left menu navigation provides better orientation in the manual
>     f) Sphinx provides internal search capability: [3]

...also (quoting myself in places here from 2015
  https://gcc.gnu.org/pipermail/gcc-patches/2015-November/434055.html 
):

* the ability to include fragments of files: libgccjit's documentation
uses directives to include code from the test suite, so that all of the
code examples are also part of the test suite, and are thus known to
compile), allowing for (almost) literate programming.  [That said, the
build of libgccjit's docs on gcc.gnu.org seems to be missing those
fragments; I wonder if there's a path or version issue?]

* a page-splitting structure that make sense, to me, at least (I have
never fathomed the way texinfo's navigation works, for HTML, at least,
and I believe I'm not the only one; I generally pick the all-in-one-
HTML-page option when viewing texinfo-html docs and do textual
searches, since otherwise I usually can't find the thing I'm looking
for (or have to resort to a brute-force depth-first search of clicking
through the links).)

* much more use of markup, with restrained and well-chosen CSS
(texinfo's HTML seems to ignore much of the inline markup in
the .texinfo file)

> 2) internal links are also provided in PDF version of the manual
> 3) some existing GCC manuals are already written in Sphinx (GNAT
> manuals and libgccjit)
> 4) support for various output formats, some people are interested in
> ePUB format
> 5) Sphinx is using RST which is quite minimal semantic markup language

Sphinx is also used by many high-profile FLOSS projects (e.g. the Linux
kernel, LLVM, and the Python community), so it reduces the barrier to
entry for new contributors, relative to texinfo.


> 6) TOC is automatically generated - no need for manual navigation
> like seen here: [5]
> 
> Disadvantages:
> 
> 1) info pages are currently missing Page description in TOC
> 2) rich formatting is leading to extra wrapping in info output -
> beings partially addresses in [4]
> 3) one needs e.g. Emacs support for inline links (rendered as notes)
> 
> I'm willing to address issue 1) in next weeks and I tend to skip
> emission of links as mentioned in 3).
> Generally speaking, I'm aware that some people still use Info, but I
> think we should more focus
> on more modern documentation formats. That's HTML (and partially
> PDF).

I think the output formats we need to support are:
- HTML
- PDF
- man page (hardly "modern", but still used)

I regared "info" as merely "nice to have" - I don't know anyone who
uses it other than some core GNU contributors.

Dave

> 
> Martin
> 
> [1]
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fstrict-aliasing
> [2]
> https://splichal.eu/gccsphinx-final/html/gcc/gcc-command-options/options-that-control-optimization.html#cmdoption-fstrict-aliasing
> [3]
> https://splichal.eu/gccsphinx-final/html/gcc/search.html?q=-fipa-icf&check_keywords=yes&area=default#
> [4] https://github.com/sphinx-doc/sphinx/pull/9391
> [5] @comment node-name, next,  previous, up
>  @node    Installing GCC, Binaries, , Top
> 




Re: [PATCH] rs6000: Fix restored rs6000_long_double_type_size.

2021-07-12 Thread Segher Boessenkool
On Mon, Jul 12, 2021 at 06:19:28AM +0200, Martin Liška wrote:
> PING^1

I did not notice you attached a new patch.  It works a lot better if
every patch series is a new thread.


Segher


Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-12 Thread Martin Jambor
Hi,

On Mon, Jul 12 2021, Qing Zhao wrote:
>> On Jul 12, 2021, at 2:51 AM, Richard Sandiford  
>> wrote:
>>
>> Martin Jambor  writes:
>>> On Thu, Jul 08 2021, Qing Zhao wrote:
 (Resend this email since the previous one didn’t quote, I changed one
 setting in my mail client, hopefully that can fix this issue).

 Hi, Martin,

 Thank you for the review and comment.

> On Jul 8, 2021, at 8:29 AM, Martin Jambor  wrote:
>> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>> index c05d22f3e8f1..35051d7c6b96 100644
>> --- a/gcc/tree-sra.c
>> +++ b/gcc/tree-sra.c
>> @@ -384,6 +384,13 @@ static struct
>>
>>  /* Numbber of components created when splitting aggregate parameters.  
>> */
>>  int param_reductions_created;
>> +
>> +  /* Number of deferred_init calls that are modified.  */
>> +  int deferred_init;
>> +
>> +  /* Number of deferred_init calls that are created by
>> + generate_subtree_deferred_init.  */
>> +  int subtree_deferred_init;
>> } sra_stats;
>>
>> static void
>> @@ -4096,6 +4103,110 @@ get_repl_default_def_ssa_name (struct access 
>> *racc, tree reg_type)
>>  return get_or_create_ssa_default_def (cfun, racc->replacement_decl);
>> }
>>
>> +
>> +/* Generate statements to call .DEFERRED_INIT to initialize scalar 
>> replacements
>> +   of accesses within a subtree ACCESS; all its children, siblings and 
>> their
>> +   children are to be processed.
>> +   GSI is a statement iterator used to place the new statements.  */
>> +static void
>> +generate_subtree_deferred_init (struct access *access,
>> +tree init_type,
>> +tree is_vla,
>> +gimple_stmt_iterator *gsi,
>> +location_t loc)
>> +{
>> +  do
>> +{
>> +  if (access->grp_to_be_replaced)
>> +{
>> +  tree repl = get_access_replacement (access);
>> +  gimple *call
>> += gimple_build_call_internal (IFN_DEFERRED_INIT, 3,
>> +  TYPE_SIZE_UNIT (TREE_TYPE 
>> (repl)),
>> +  init_type, is_vla);
>> +  gimple_call_set_lhs (call, repl);
>> +  gsi_insert_before (gsi, call, GSI_SAME_STMT);
>> +  update_stmt (call);
>> +  gimple_set_location (call, loc);
>> +  sra_stats.subtree_deferred_init++;
>> +}
>> +  else if (access->grp_to_be_debug_replaced)
>> +{
>> +  tree drepl = get_access_replacement (access);
>> +  tree call = build_call_expr_internal_loc
>> + (UNKNOWN_LOCATION, IFN_DEFERRED_INIT,
>> +  TREE_TYPE (drepl), 3,
>> +  TYPE_SIZE_UNIT (TREE_TYPE (drepl)),
>> +  init_type, is_vla);
>> +  gdebug *ds = gimple_build_debug_bind (drepl, call,
>> +gsi_stmt (*gsi));
>> +  gsi_insert_before (gsi, ds, GSI_SAME_STMT);
>
> Is handling of grp_to_be_debug_replaced accesses necessary here?  If so,
> why?  grp_to_be_debug_replaced accesses are there only to facilitate
> debug information about a part of an aggregate decl is that is likely
> going to be entirely removed - so that debuggers can sometimes show to
> users information about what they would contain had they not removed.
> It seems strange you need to mark them as uninitialized because they
> should not have any consumers.  (But perhaps it is also harmless.)

 This part has been discussed during the 2nd version of the patch, but
 I think that more discussion might be necessary.

 In the previous discussion, Richard Sandiford mentioned:
 (https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568620.html):

 =

 I guess the thing we need to decide here is whether -ftrivial-auto-var-init
 should affect debug-only constructs too.  If it doesn't, exmaining removed
 components in a debugger might show uninitialised values in cases where
 the user was expecting initialised ones.  There would be no security
 concern, but it might be surprising.

 I think in principle the DRHS can contain a call to DEFERRED_INIT.
 Doing that would probably require further handling elsewhere though.

 =

 I am still not very confident now for this part of the change.
>>>
>>> I see.  I still tend to think that with or without the generation of
>>> gimple_build_debug_binds, the debugger would still not display any value
>>> for the component in question.  Without it there would be no information
>>> about the component at a any place in code affected by this, with it the
>

Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Eli Zaretskii via Gcc-patches
> Cc: g...@gcc.gnu.org, gcc-patches@gcc.gnu.org, jos...@codesourcery.com
> From: Martin Liška 
> Date: Mon, 12 Jul 2021 16:34:11 +0200
> 
> > "Texinfo must go" is one possible conclusion from your description.
> > But it isn't the only one.  An alternative is "the Texinfo source of
> > the GCC manual must be improved to fix this problem."  And yes, this
> > problem does have a solution in Texinfo.
> 
> No, the alternative is more powerful output given by Texinfo, in particular
> more modern HTML pages.

Please see the response by Gavin: it sounds like at least some of that
was resolved in Texinfo, sometimes long ago.


Re: Repost: [PATCH] Fix vec-splati-runnable.c test.

2021-07-12 Thread Michael Meissner via Gcc-patches
On Mon, Jul 12, 2021 at 10:49:26AM -0500, Bill Schmidt wrote:
> Hi Mike,
> 
> On 7/7/21 3:00 PM, Michael Meissner wrote:
> >[PATCH] Fix vec-splati-runnable.c test.
> >
> >I noticed that the vec-splati-runnable.c did not have an abort after one
> >of the tests.  If the test was run with optimization, the optimizer could
> >delete some of the tests and throw off the count.  However, due to the
> >fact that the value being loaded in that test is undefined, I did not
> >check what value was loaded, but I just stored it into a volatile global
> >variable.
> >
> >2021-07-07  Michael Meissner  
> >
> >gcc/testsuite/
> > * gcc.target/powerpc/vec-splati-runnable.c: Run test with -O2
> > optimization.  Do not check what XXSPLTIDP generates if the value
> > is undefined.
> >---
> >  .../gcc.target/powerpc/vec-splati-runnable.c  | 29 ++-
> >  1 file changed, 9 insertions(+), 20 deletions(-)
> >
> >diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c 
> >b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
> >index e84ce77a21d..a135279b1d7 100644
> >--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
> >+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
> >@@ -1,7 +1,7 @@
> >  /* { dg-do run { target { power10_hw } } } */
> >  /* { dg-do link { target { ! power10_hw } } } */
> >  /* { dg-require-effective-target power10_ok } */
> >-/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
> >+/* { dg-options "-mdejagnu-cpu=power10 -save-temps -O2" } */
> 
> Why did you restrict optimization here?  The tests should be run at
> various opt levels if you don't specify this, right?
> 
> The test changes are otherwise okay, but I'd like to understand this first.

When doing tests with instruction counts, you always want to specify the
optimization level (with -O2 being the standard).  Otherwise, depending on the
other optimizations, the instruction counts might not line up.

In this particular case, because it didn't have an optimization flag, the
original code was compiled with -O0.  If you compile it with -O1, it deletes
the if statement with the empty then statement, and deletes the corresponding
XXSPLTIDP instruction.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Eli Zaretskii via Gcc-patches
> Cc: h...@bitrange.com, g...@gcc.gnu.org, gcc-patches@gcc.gnu.org,
>  jos...@codesourcery.com
> From: Martin Liška 
> Date: Mon, 12 Jul 2021 16:37:00 +0200
> 
> >   4) The need to learn yet another markup language.
> >  While this is not a problem for simple text, it does require a
> >  serious study of RST and Sphinx to use the more advanced features.
> 
> No, majority of the documentation is pretty simple: basic formatting, links, 
> tables and
> code examples.

We also have documentation of APIs (a.k.a. "functions").  I actually
tried to find in the Sphinx docs how to do that and got lost.  So, not
really "very simple".


Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Eli Zaretskii via Gcc-patches
> From: Jonathan Wakely 
> Date: Mon, 12 Jul 2021 15:54:49 +0100
> Cc: Martin Liška , 
>   "g...@gcc.gnu.org" , gcc-patches 
> , 
>   "Joseph S. Myers" 
> 
> You like texinfo. We get it.

Would you please drop the attitude?


Re: Repost: [PATCH] PR 100167: Fix vector long long multiply/divide tests on power10

2021-07-12 Thread Michael Meissner via Gcc-patches
On Sun, Jul 11, 2021 at 02:55:04PM -0500, Bill Schmidt wrote:
> Hi Mike,
> 
> On 7/7/21 3:04 PM, Michael Meissner wrote:
> >[PATCH] PR 100167: Fix vector long long multiply/divide tests on power10.
> >
> >This patch updates the vector long long multiply and divide tests to
> >supply the correct code information if power10 code generation is used.
> >
> >2021-07-07  Michael Meissner  
> >
> >gcc/testsuite/
> > PR testsuite/100167
> > * gcc.target/powerpc/fold-vec-div-longlong.c:
> Missing information after colon.

Because all of the changes were the same thing, and the line is long enough.  I
just grouped all of the files together, and put the change line as the last
entry.

> > * gcc.target/powerpc/fold-vec-mult-longlong.c: Fix expected code
> > generation on power10.
> >---
> >  gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c  | 7 +--
> >  gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c | 3 ++-
> >  2 files changed, 7 insertions(+), 3 deletions(-)
> >
> >diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c 
> >b/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
> >index 312e984d3cc..f6a9b290ae5 100644
> >--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
> >+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
> >@@ -19,5 +19,8 @@ test6 (vector unsigned long long x, vector unsigned long 
> >long y)
> >  {
> >return vec_div (x, y);
> >  }
> >-/* { dg-final { scan-assembler-times {\mdivd\M} 2 } } */
> >-/* { dg-final { scan-assembler-times {\mdivdu\M} 2 } } */
> >+
> >+/* { dg-final { scan-assembler-times {\mdivd\M}   2 { target { ! 
> >has_arch_pwr10 } } } } */
> >+/* { dg-final { scan-assembler-times {\mdivdu\M}  2 { target { ! 
> >has_arch_pwr10 } } } } */
> >+/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 { target {   
> >has_arch_pwr10 } } } } */
> >+/* { dg-final { scan-assembler-times {\mvdivud\M} 1 { target {   
> >has_arch_pwr10 } } } } */
> >diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c 
> >b/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
> >index 38dba9f5023..bd210e34801 100644
> >--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
> >+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
> >@@ -20,5 +20,6 @@ test6 (vector unsigned long long x, vector unsigned long 
> >long y)
> >return vec_mul (x, y);
> >  }
> >-/* { dg-final { scan-assembler-times "\[ \t\]mulld " 4 { target lp64 } } } 
> >*/
> >+/* { dg-final { scan-assembler-times {\mmulld\M}  4 { target { lp64 && { ! 
> >has_arch_pwr10 } } } } } */
> >+/* { dg-final { scan-assembler-times {\mvmulld\M} 2 { target { 
> >has_arch_pwr10 } } } } */
> 
> Shouldn't this last be { lp64 && has_arch_pwr10 } ?

Nope.  Because the power10 vector multiply is done in the vector unit, it can
generate the vmulld instruction.

> Otherwise LGTM.  I can't approve, but recommend approval with those changes.
> 
> Thanks,
> Bill
> 

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH] Port GCC documentation to Sphinx

2021-07-12 Thread Martin Sebor via Gcc-patches

On 6/29/21 4:09 AM, Martin Liška wrote:

On 6/28/21 5:33 PM, Joseph Myers wrote:

Are formatted manuals (HTML, PDF, man, info) corresponding to this patch
version also available for review?


I've just uploaded them here:
https://splichal.eu/gccsphinx-final/

Martin



I think listing the -Wfoo and -Wno-foo (and analogously the -fbar
and -fno-bar) options is an improvement over prior revisions but when
the positive form is the default the text reads funny.  For example:

  -fno-inline

  Do not expand any functions inline apart from those marked
  with the always_inline attribute. This is the default when
  not optimizing.

  Single functions can be exempted from inlining by marking
  them with the noinline attribute.

  -finline

  Default option value for -fno-inline.


I.e., -finline is not what I would describe as a default value for
-fno-inline.

I would suggest to drop the option name from the text describing
the default, and also replace value with setting (it's really not
a value).  It could be as simple as:

  -finline

  Default setting.

Alternatively, to preserve the connection to the alternate setting:

  -finline

  Default setting; overrides -fno-inline.

At some point we talked about also making attributes hyperlinks
(like always_inline and noinline above) but I don't remember
the conclusion.  Are you planning to do that?  (Would handling
it as part of the transition be easier than doing it later?)

Martin


Re: [PATCH] rs6000: Fix restored rs6000_long_double_type_size.

2021-07-12 Thread Segher Boessenkool
On Mon, Jun 28, 2021 at 02:19:03PM +0200, Martin Liška wrote:
> On 6/24/21 12:46 AM, Segher Boessenkool wrote:
> >>As mentioned in the "Fallout: save/restore target options in
> >>handle_optimize_attribute"
> >>thread, we need to support target option restore of
> >>rs6000_long_double_type_size == FLOAT_PRECISION_TFmode.
> >
> >I have no idea?  Could you explain please?
> 
> Sure. Few weeks ago, we started using cl_target_option_{save,restore} calls
> even for optimize attributes (and pragma). Motivation was that optimize 
> options
> can influence target options (and vice versa).
> 
> Doing that, FLOAT_PRECISION_TFmode must be accepted as a valid option value
> for rs6000_long_double_type_size.


> >>--- /dev/null
> >>+++ b/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c
> >>@@ -0,0 +1,14 @@
> >>+/* { dg-do compile { target { powerpc*-*-linux* } } } */
> >
> >Why on Linux only?  That doesn't sound right.  Do you need some other
> >selector(s)?
> 
> Sorry, I copied the test-case.

Ugh.  Yes, the status quo is no good either :-(

> >>+/* { dg-options "-O2 -mlong-double-128 -mabi=ibmlongdouble" } */
> >>+
> >>+extern unsigned long int x;
> >>+extern float f (float);
> >>+extern __typeof (f) f_power8;
> >>+extern __typeof (f) f_power9;
> >>+extern __typeof (f) f __attribute__ ((ifunc ("f_ifunc")));
> >>+static __attribute__ ((optimize ("-fno-stack-protector"))) __typeof (f) *
> >
> >-fno-stack-protector is default.
> 
> Yes, but one needs an optimize attribute in order to trigger 
> cl_target_option_save/restore
> mechanism.

So it behaves differently if you select the default than if you do not
select anything?  That is wrong, no?

> >From 1632939853fbf193f72ace3d1024a137d549fef4 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Tue, 1 Jun 2021 15:39:14 +0200
> Subject: [PATCH] rs6000: Fix restored rs6000_long_double_type_size.

(No full stop at end of subject please)

Missing patch description here.  This should be suitable as commit
message when you eventually commit the patch.

Please send with that, as a separate mail, not as attachment to another
thread.

> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.c (rs6000_option_override_internal): When
>   a target option is restored, it can have
>   rs6000_long_double_type_size set to FLOAT_PRECISION_TFmode.

That does not say what changed?

> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -4185,6 +4185,8 @@ rs6000_option_override_internal (bool global_init_p)
>else
>   rs6000_long_double_type_size = default_long_double_size;
>  }
> +  else if (rs6000_long_double_type_size == FLOAT_PRECISION_TFmode)
> +; /* The option can be restored with cl_target_option_restore.  */
>else if (rs6000_long_double_type_size == 128)
>  rs6000_long_double_type_size = FLOAT_PRECISION_TFmode;
>else if (global_options_set.x_rs6000_ieeequad)

"The option can be restored" is more confusing than helpful.  *Will* be
restored by it, maybe?  Not that I understand what that means :-/

Does it make more sense to merge the 128 and FLOAT_PRECISION_TFmode
cases?

> diff --git a/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c 
> b/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c
> new file mode 100644
> index 000..2455fb57138
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */

No target powerpc*-*-* in gcc.target/powerpc please.  This is enforced
for everything in there by powerpc.exp already.

Thanks,


Segher


Re: Repost: [PATCH] Fix vec-splati-runnable.c test.

2021-07-12 Thread Bill Schmidt via Gcc-patches

On 7/12/21 12:11 PM, Michael Meissner wrote:

On Mon, Jul 12, 2021 at 10:49:26AM -0500, Bill Schmidt wrote:

Hi Mike,

On 7/7/21 3:00 PM, Michael Meissner wrote:

[PATCH] Fix vec-splati-runnable.c test.

I noticed that the vec-splati-runnable.c did not have an abort after one
of the tests.  If the test was run with optimization, the optimizer could
delete some of the tests and throw off the count.  However, due to the
fact that the value being loaded in that test is undefined, I did not
check what value was loaded, but I just stored it into a volatile global
variable.

2021-07-07  Michael Meissner  

gcc/testsuite/
* gcc.target/powerpc/vec-splati-runnable.c: Run test with -O2
optimization.  Do not check what XXSPLTIDP generates if the value
is undefined.
---
  .../gcc.target/powerpc/vec-splati-runnable.c  | 29 ++-
  1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index e84ce77a21d..a135279b1d7 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -1,7 +1,7 @@
  /* { dg-do run { target { power10_hw } } } */
  /* { dg-do link { target { ! power10_hw } } } */
  /* { dg-require-effective-target power10_ok } */
-/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
+/* { dg-options "-mdejagnu-cpu=power10 -save-temps -O2" } */

Why did you restrict optimization here?  The tests should be run at
various opt levels if you don't specify this, right?

The test changes are otherwise okay, but I'd like to understand this first.

When doing tests with instruction counts, you always want to specify the
optimization level (with -O2 being the standard).  Otherwise, depending on the
other optimizations, the instruction counts might not line up.

In this particular case, because it didn't have an optimization flag, the
original code was compiled with -O0.  If you compile it with -O1, it deletes
the if statement with the empty then statement, and deletes the corresponding
XXSPLTIDP instruction.

Sorry, yes.  I thought this set of tests was run with all optimization 
flags, but it is not.  Objection withdrawn. :-)


I can't approve, but recommend approval as is.

Thanks,
Bill



[committed] libstdc++: Constrain std::as_writable_bytes [PR101411]

2021-07-12 Thread Jonathan Wakely via Gcc-patches
The std::as_writable_bytes function should be constrained to only accept
writable spans. Currently it can be called but then gives an error in
the function body.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101411
* include/std/span (as_writable_bytes): Add requires-clause.
* testsuite/23_containers/span/101411.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

I'll backport this too.

commit 9d4393af9d2b37b78eb5b1f84f5d4da3a6f7fba6
Author: Jonathan Wakely 
Date:   Mon Jul 12 16:09:34 2021

libstdc++: Constrain std::as_writable_bytes [PR101411]

The std::as_writable_bytes function should be constrained to only accept
writable spans. Currently it can be called but then gives an error in
the function body.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101411
* include/std/span (as_writable_bytes): Add requires-clause.
* testsuite/23_containers/span/101411.cc: New test.

diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
index 63f0a8f6279..21d8f6a43a6 100644
--- a/libstdc++-v3/include/std/span
+++ b/libstdc++-v3/include/std/span
@@ -425,6 +425,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
+requires (!is_const_v<_Type>)
 inline
 span
diff --git a/libstdc++-v3/testsuite/23_containers/span/101411.cc 
b/libstdc++-v3/testsuite/23_containers/span/101411.cc
new file mode 100644
index 000..05bdd3badbd
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/span/101411.cc
@@ -0,0 +1,15 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { xfail c++20 } }
+#include 
+
+// PR libstdc++/101411
+
+void f(std::span s)
+{
+  std::as_writable_bytes(s); // { dg-error "no matching function" }
+}
+
+void f1(std::span s)
+{
+  std::as_writable_bytes(s); // { dg-error "no matching function" }
+}


Re: Repost: [PATCH] PR 100167: Fix vector long long multiply/divide tests on power10

2021-07-12 Thread Bill Schmidt via Gcc-patches



On 7/12/21 12:16 PM, Michael Meissner wrote:

On Sun, Jul 11, 2021 at 02:55:04PM -0500, Bill Schmidt wrote:

Hi Mike,

On 7/7/21 3:04 PM, Michael Meissner wrote:

[PATCH] PR 100167: Fix vector long long multiply/divide tests on power10.

This patch updates the vector long long multiply and divide tests to
supply the correct code information if power10 code generation is used.

2021-07-07  Michael Meissner  

gcc/testsuite/
PR testsuite/100167
* gcc.target/powerpc/fold-vec-div-longlong.c:

Missing information after colon.

Because all of the changes were the same thing, and the line is long enough.  I
just grouped all of the files together, and put the change line as the last
entry.
But that's not accepted style.  Put it after the first one and use 
"Likewise" is the usual thing.  This looks like an omission.



* gcc.target/powerpc/fold-vec-mult-longlong.c: Fix expected code
generation on power10.
---
  gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c  | 7 +--
  gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c | 3 ++-
  2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
index 312e984d3cc..f6a9b290ae5 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
@@ -19,5 +19,8 @@ test6 (vector unsigned long long x, vector unsigned long long 
y)
  {
return vec_div (x, y);
  }
-/* { dg-final { scan-assembler-times {\mdivd\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mdivdu\M} 2 } } */
+
+/* { dg-final { scan-assembler-times {\mdivd\M}   2 { target { ! 
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mdivdu\M}  2 { target { ! 
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 { target {   
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mvdivud\M} 1 { target {   
has_arch_pwr10 } } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
index 38dba9f5023..bd210e34801 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
@@ -20,5 +20,6 @@ test6 (vector unsigned long long x, vector unsigned long long 
y)
return vec_mul (x, y);
  }
-/* { dg-final { scan-assembler-times "\[ \t\]mulld " 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mmulld\M}  4 { target { lp64 && { ! 
has_arch_pwr10 } } } } } */
+/* { dg-final { scan-assembler-times {\mvmulld\M} 2 { target { has_arch_pwr10  
   } } } } */

Shouldn't this last be { lp64 && has_arch_pwr10 } ?

Nope.  Because the power10 vector multiply is done in the vector unit, it can
generate the vmulld instruction.


Please document this, then.

Thanks,
Bill




Otherwise LGTM.  I can't approve, but recommend approval with those changes.

Thanks,
Bill



[r12-2245 Regression] FAIL: g++.dg/vect/slp-pr87105.cc -std=c++2a scan-tree-dump-times slp2 "optimized: basic block part" 1 on Linux/x86_64

2021-07-12 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

c03cae4e066066278c8435c409829a9bf851e49f is the first bad commit
commit c03cae4e066066278c8435c409829a9bf851e49f
Author: Richard Biener 
Date:   Wed Jul 7 11:45:43 2021 +0200

Display the number of components BB vectorized

caused

FAIL: g++.dg/vect/slp-pr87105.cc  -std=c++14  scan-tree-dump-times slp2 
"optimized: basic block part" 1
FAIL: g++.dg/vect/slp-pr87105.cc  -std=c++17  scan-tree-dump-times slp2 
"optimized: basic block part" 1
FAIL: g++.dg/vect/slp-pr87105.cc  -std=c++2a  scan-tree-dump-times slp2 
"optimized: basic block part" 1

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-2245/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=g++.dg/vect/slp-pr87105.cc --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=g++.dg/vect/slp-pr87105.cc --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH 10/10] vect: Reuse reduction accumulators between loops

2021-07-12 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Fri, Jul 9, 2021 at 3:12 PM Richard Sandiford
>  wrote:
>>
>> Thanks for the review.
>>
>> Richard Biener  writes:
>> >> @@ -588,6 +600,23 @@ public:
>> >>/* Unrolling factor  */
>> >>poly_uint64 vectorization_factor;
>> >>
>> >> +  /* If this loop is an epilogue loop whose main loop can be skipped,
>> >> + MAIN_LOOP_EDGE is the edge from the main loop to this loop's
>> >> + preheader.  SKIP_MAIN_LOOP_EDGE is then the edge that skips the
>> >> + main loop and goes straight to this loop's preheader.
>> >> +
>> >> + Both fields are null otherwise.  */
>> >> +  edge main_loop_edge;
>> >> +  edge skip_main_loop_edge;
>> >> +
>> >> +  /* If this loop is an epilogue loop that might be skipped after 
>> >> executing
>> >> + the main loop, this edge is the one that skips the epilogue.  */
>> >> +  edge skip_this_loop_edge;
>> >> +
>> >> +  /* After vectorization, maps live-out SSA names to information about
>> >> + the reductions that generated them.  */
>> >> +  hash_map reusable_accumulators;
>> >
>> > Is that the LC PHI node defs or the definition inside of the loop?
>> > If the latter we could attach the info directly to its stmt-info?
>>
>> Ah, yeah, I should improve the comment there.  It's the vectoriser's
>> replacement for the original LC PHI node, i.e. the final scalar result
>> after the reduction has taken place.
>
> OK
>
>> >> @@ -1186,6 +1215,21 @@ public:
>> >>/* The vector type for performing the actual reduction.  */
>> >>tree reduc_vectype;
>> >>
>> >> +  /* If IS_REDUC_INFO is true and if the reduction is operating on N
>> >> + elements in parallel, this vector gives the initial values of these
>> >> + N elements.  */
>> >
>> > That's N scalar elements or N vector elements?  I suppose it's for
>> > SLP reductions (rather than SLP reduction chains) and never non-SLP
>> > reductions?
>>
>> Yeah, poor wording again, sorry.  I meant something closer to:
>>
>>   /* If IS_REDUC_INFO is true and if the vector code is performing
>>  N scalar reductions in parallel, this vector gives the initial
>>  scalar values of those N reductions.  */
>>
>> >> +  vec reduc_initial_values;
>> >> +
>> >> +  /* If IS_REDUC_INFO is true and if the reduction is operating on N
>> >> + elements in parallel, this vector gives the scalar result of each
>> >> + reduction.  */
>> >> +  vec reduc_scalar_results;
>>
>> Same change here.
>>
>> >> […]
>> >> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
>> >> index 2909e8a0fc3..b7b0523e3c8 100644
>> >> --- a/gcc/tree-vect-loop-manip.c
>> >> +++ b/gcc/tree-vect-loop-manip.c
>> >> @@ -2457,6 +2457,31 @@ vect_update_epilogue_niters (loop_vec_info 
>> >> epilogue_vinfo,
>> >>return vect_determine_partial_vectors_and_peeling (epilogue_vinfo, 
>> >> true);
>> >>  }
>> >>
>> >> +/* LOOP_VINFO is an epilogue loop and MAIN_LOOP_VALUE is available on 
>> >> exit
>> >> +   from the corresponding main loop.  Return a value that is available in
>> >> +   LOOP_VINFO's preheader, using SKIP_VALUE if the main loop is skipped.
>> >> +   Passing a null SKIP_VALUE is equivalent to passing zero.  */
>> >> +
>> >> +tree
>> >> +vect_get_main_loop_result (loop_vec_info loop_vinfo, tree 
>> >> main_loop_value,
>> >> +  tree skip_value)
>> >> +{
>> >> +  if (!loop_vinfo->main_loop_edge)
>> >> +return main_loop_value;
>> >> +
>> >> +  if (!skip_value)
>> >> +skip_value = build_zero_cst (TREE_TYPE (main_loop_value));
>> >
>> > shouldn't that be the initial value?
>>
>> For the current use case, the above two conditions are never true.
>> I wrote it like this because I had a follow-on patch (which might
>> not go anywhere) that needed this function for 0-based IVs.
>>
>> Maybe that's a bad risk/reward trade-off though.  Not having to pass
>> zero makes things only slightly simpler for the follow-on patch,
>> and I guess could be dangerous in other cases.
>>
>> Perhaps in that case though I should change loop_vinfo->main_loop_edge
>> into a gcc_assert as well.
>
> Yeah, I think asserts (and comments in case it's because we don't handle
> some specific cases yet) are better than possibly wrong behavior.

OK.

>> >> +  tree phi_result = make_ssa_name (TREE_TYPE (main_loop_value));
>> >> +  basic_block bb = loop_vinfo->main_loop_edge->dest;
>> >> +  gphi *new_phi = create_phi_node (phi_result, bb);
>> >> +  add_phi_arg (new_phi, main_loop_value, loop_vinfo->main_loop_edge,
>> >> +  UNKNOWN_LOCATION);
>> >> +  add_phi_arg (new_phi, skip_value,
>> >> +  loop_vinfo->skip_main_loop_edge, UNKNOWN_LOCATION);
>> >> +  return phi_result;
>> >> +}
>> >> +
>> >>  /* Function vect_do_peeling.
>> >>
>> >> Input:
>> >> […]
>> >> @@ -4823,6 +4842,100 @@ info_for_reduction (vec_info *vinfo, 
>> >> stmt_vec_info stmt_info)
>> >>return stmt_info;
>> >>  }
>> >>
>> >> +/* PHI is a reduction in LOOP_VINFO that we are going to vectorize

Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-12 Thread Kees Cook via Gcc-patches
On Wed, Jul 07, 2021 at 05:38:02PM +, Qing Zhao wrote:
> Hi, 
> 
> This is the 4th version of the patch for the new security feature for GCC.
> 
> I have tested it with bootstrap on both x86 and aarch64, regression testing 
> on both x86 and aarch64.
> Also compile and run CPU2017, without any issue.
> 
> Please take a look and let me know your comments and suggestions.

Thanks for the update!

It looks like padding initialization has regressed to where things where
in version 1[1] (it was, however, working in version 2[2]). I'm seeing
these failures again in the kernel self-test:

test_stackinit: small_hole_static_all FAIL (uninit bytes: 3)
test_stackinit: big_hole_static_all FAIL (uninit bytes: 61)
test_stackinit: trailing_hole_static_all FAIL (uninit bytes: 7)
test_stackinit: small_hole_dynamic_all FAIL (uninit bytes: 3)
test_stackinit: big_hole_dynamic_all FAIL (uninit bytes: 61)
test_stackinit: trailing_hole_dynamic_all FAIL (uninit bytes: 7)

In looking at the gcc test cases, I think the wrong thing is
being checked: we want to verify the padding itself. For example,
in auto-init-17.c, the actual bytes after "four" need to be checked,
rather than "four" itself. For example, something like this:

struct test_trailing_hole {
int one;
int two;
int three;
char four;
/* "sizeof(unsigned long) - 1" byte padding hole here. */
};

#define offsetofend(STRUCT, MEMBER) \
  (__builtin_offsetof(STRUCT, MEMBER) + sizeofSTRUCT *)0)->MEMBER)))

int foo ()
{ 
  struct test_trailing_hole var[10];
  unsigned char *ptr = (unsigned char *)&var[2];
  int i;

  for (i = 0; i < sizeof(var[2]) - offsetofend(typeof(var[2]), four); i++) {
if (ptr[i] != 0)
  return 1;
  } 
  return 0;
}

But this isn't actually sufficient because they may _accidentally_
be zero already. The kernel tests specifically make sure to fill the
about-to-be-used stack with 0xff before calling a function like foo()
above.

(And as an aside, it seems like naming the test cases with some details
about what is being tested in the filename would be nice -- it was
a little weird having to dig through their numeric names to find the
padding tests.)

Otherwise, this looks like it's coming along; I remain very excited!
Thank you for sticking with it. :)

-Kees

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565840.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2021-April/567754.html

-- 
Kees Cook


PING [PATCH] handle sanitizer built-ins in -Wuninitialized (PR 101300)

2021-07-12 Thread Martin Sebor via Gcc-patches

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574385.html

On 7/2/21 1:21 PM, Martin Sebor wrote:

To avoid a class of false negatives for sanitized code
-Wuninitialized recognizes the ASAN_MARK internal function
doesn't modify its argument.  But the warning code doesn't do
the same for any sanitizer built-ins even though they don't
modify user-supplied arguments either.  This leaves another
class of false negatives unresolved.

The attached fix enhances the warning logic to recognize all
sanitizer built-ins as well and treat them as non-modifying.

Tested on x86_64-linux.

Martin




Re: Repost: [PATCH] PR 100167: Fix vector long long multiply/divide tests on power10

2021-07-12 Thread Bill Schmidt via Gcc-patches

On 7/12/21 12:47 PM, Bill Schmidt via Gcc-patches wrote:


On 7/12/21 12:16 PM, Michael Meissner wrote:

On Sun, Jul 11, 2021 at 02:55:04PM -0500, Bill Schmidt wrote:

Hi Mike,

On 7/7/21 3:04 PM, Michael Meissner wrote:
[PATCH] PR 100167: Fix vector long long multiply/divide tests on 
power10.


This patch updates the vector long long multiply and divide tests to
supply the correct code information if power10 code generation is 
used.


2021-07-07  Michael Meissner  

gcc/testsuite/
PR testsuite/100167
* gcc.target/powerpc/fold-vec-div-longlong.c:

Missing information after colon.
Because all of the changes were the same thing, and the line is long 
enough.  I
just grouped all of the files together, and put the change line as 
the last

entry.
But that's not accepted style.  Put it after the first one and use 
"Likewise" is the usual thing.  This looks like an omission.



* gcc.target/powerpc/fold-vec-mult-longlong.c: Fix expected code
generation on power10.
---
  gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c | 7 +--
  gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c | 3 ++-
  2 files changed, 7 insertions(+), 3 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c

index 312e984d3cc..f6a9b290ae5 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-div-longlong.c
@@ -19,5 +19,8 @@ test6 (vector unsigned long long x, vector 
unsigned long long y)

  {
    return vec_div (x, y);
  }
-/* { dg-final { scan-assembler-times {\mdivd\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mdivdu\M} 2 } } */
+
+/* { dg-final { scan-assembler-times {\mdivd\M}   2 { target { ! 
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mdivdu\M}  2 { target { ! 
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mvdivsd\M} 1 { target {   
has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mvdivud\M} 1 { target {   
has_arch_pwr10 } } } } */
diff --git 
a/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c

index 38dba9f5023..bd210e34801 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-mult-longlong.c
@@ -20,5 +20,6 @@ test6 (vector unsigned long long x, vector 
unsigned long long y)

    return vec_mul (x, y);
  }
-/* { dg-final { scan-assembler-times "\[ \t\]mulld " 4 { target 
lp64 } } } */
+/* { dg-final { scan-assembler-times {\mmulld\M}  4 { target { 
lp64 && { ! has_arch_pwr10 } } } } } */
+/* { dg-final { scan-assembler-times {\mvmulld\M} 2 { target { 
has_arch_pwr10 } } } } */

Shouldn't this last be { lp64 && has_arch_pwr10 } ?
Nope.  Because the power10 vector multiply is done in the vector 
unit, it can

generate the vmulld instruction.


Please document this, then.

Well, never mind, that's relatively obvious, sorry. :-)
Bill


Thanks,
Bill



Otherwise LGTM.  I can't approve, but recommend approval with those 
changes.


Thanks,
Bill



Re: [patch][version 4]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-07-12 Thread Qing Zhao via Gcc-patches
Hi, Martin,

Thanks a lot for your experiments and examples, they are really helpful.

So, based on your study, I will delete the code that handle 
grp_to_be_debug_replaced accesses in generate_subtree_deferred_init.

Let me know if you have further comments on this.

Qing


> On Jul 12, 2021, at 12:06 PM, Martin Jambor  wrote:
> 
> Hi,
> 
> On Mon, Jul 12 2021, Qing Zhao wrote:
>>> On Jul 12, 2021, at 2:51 AM, Richard Sandiford  
>>> wrote:
>>> 
>>> Martin Jambor  writes:
 On Thu, Jul 08 2021, Qing Zhao wrote:
> (Resend this email since the previous one didn’t quote, I changed one
> setting in my mail client, hopefully that can fix this issue).
> 
> Hi, Martin,
> 
> Thank you for the review and comment.
> 
>> On Jul 8, 2021, at 8:29 AM, Martin Jambor  wrote:
>>> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>>> index c05d22f3e8f1..35051d7c6b96 100644
>>> --- a/gcc/tree-sra.c
>>> +++ b/gcc/tree-sra.c
>>> @@ -384,6 +384,13 @@ static struct
>>> 
>>> /* Numbber of components created when splitting aggregate parameters.  
>>> */
>>> int param_reductions_created;
>>> +
>>> +  /* Number of deferred_init calls that are modified.  */
>>> +  int deferred_init;
>>> +
>>> +  /* Number of deferred_init calls that are created by
>>> + generate_subtree_deferred_init.  */
>>> +  int subtree_deferred_init;
>>> } sra_stats;
>>> 
>>> static void
>>> @@ -4096,6 +4103,110 @@ get_repl_default_def_ssa_name (struct access 
>>> *racc, tree reg_type)
>>> return get_or_create_ssa_default_def (cfun, racc->replacement_decl);
>>> }
>>> 
>>> +
>>> +/* Generate statements to call .DEFERRED_INIT to initialize scalar 
>>> replacements
>>> +   of accesses within a subtree ACCESS; all its children, siblings and 
>>> their
>>> +   children are to be processed.
>>> +   GSI is a statement iterator used to place the new statements.  */
>>> +static void
>>> +generate_subtree_deferred_init (struct access *access,
>>> +   tree init_type,
>>> +   tree is_vla,
>>> +   gimple_stmt_iterator *gsi,
>>> +   location_t loc)
>>> +{
>>> +  do
>>> +{
>>> +  if (access->grp_to_be_replaced)
>>> +   {
>>> + tree repl = get_access_replacement (access);
>>> + gimple *call
>>> +   = gimple_build_call_internal (IFN_DEFERRED_INIT, 3,
>>> + TYPE_SIZE_UNIT (TREE_TYPE 
>>> (repl)),
>>> + init_type, is_vla);
>>> + gimple_call_set_lhs (call, repl);
>>> + gsi_insert_before (gsi, call, GSI_SAME_STMT);
>>> + update_stmt (call);
>>> + gimple_set_location (call, loc);
>>> + sra_stats.subtree_deferred_init++;
>>> +   }
>>> +  else if (access->grp_to_be_debug_replaced)
>>> +   {
>>> + tree drepl = get_access_replacement (access);
>>> + tree call = build_call_expr_internal_loc
>>> +(UNKNOWN_LOCATION, IFN_DEFERRED_INIT,
>>> + TREE_TYPE (drepl), 3,
>>> + TYPE_SIZE_UNIT (TREE_TYPE (drepl)),
>>> + init_type, is_vla);
>>> + gdebug *ds = gimple_build_debug_bind (drepl, call,
>>> +   gsi_stmt (*gsi));
>>> + gsi_insert_before (gsi, ds, GSI_SAME_STMT);
>> 
>> Is handling of grp_to_be_debug_replaced accesses necessary here?  If so,
>> why?  grp_to_be_debug_replaced accesses are there only to facilitate
>> debug information about a part of an aggregate decl is that is likely
>> going to be entirely removed - so that debuggers can sometimes show to
>> users information about what they would contain had they not removed.
>> It seems strange you need to mark them as uninitialized because they
>> should not have any consumers.  (But perhaps it is also harmless.)
> 
> This part has been discussed during the 2nd version of the patch, but
> I think that more discussion might be necessary.
> 
> In the previous discussion, Richard Sandiford mentioned:
> (https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568620.html):
> 
> =
> 
> I guess the thing we need to decide here is whether 
> -ftrivial-auto-var-init
> should affect debug-only constructs too.  If it doesn't, exmaining removed
> components in a debugger might show uninitialised values in cases where
> the user was expecting initialised ones.  There would be no security
> concern, but it might be surprising.
> 
> I think in principle the DRHS can contain a call to DEFERRED_INIT.
> Doing that would probably require 

Re: Benefits of using Sphinx documentation format

2021-07-12 Thread Koning, Paul via Gcc-patches


> On Jul 12, 2021, at 12:36 PM, David Malcolm via Gcc-patches 
>  wrote:
> 
> On Mon, 2021-07-12 at 15:25 +0200, Martin Liška wrote:
>> ...
> 
> I think the output formats we need to support are:
> - HTML
> - PDF
> - man page (hardly "modern", but still used)

Also info format (for the Emacs info reader).  And ebook formats (epub and/or 
mobi).  Having good quality ebook output is a major benefit in my view; it 
would be very good for the standard makefiles to offer make targets for these 
formats.

paul



Re: [PATCH] [wwwdocs] Update description of GM2 and document branch

2021-07-12 Thread Gerald Pfeifer
On Mon, 12 Jul 2021, Gaius Mulley wrote:
>> Usually I'd just say "subject", which is a header in our mail systems;
>> the term "subject line" isn't widely used.
> feel free to overrule and use "subject".  I copied the text from other
> branch descriptions :-) (there are 38 uses).  I guess there should be
> consistency on the web page - perhaps they could all be changed though -
> what do you think?

Ah, in that case suggestion withdrawn, and I'll separately see whether
we can simplify/unify anything there.

> thanks for the suggestions and maintaining the pages.  Below are the
> proposed updated patches

As maintainer, you don't need to seek approval for doc or web patches 
related to your area of maintainership, though of course I'm always 
happy to have a look.

> +is fully operational with the GCC 10 and GCC 11 (on

Here I'd omit "the", though I cannot (linguistically) explain why
and have to refer to established practice.

Cheers,
Gerald


[COMMITTED] tree-optimization/101335 - Do not register a cast as an equivalence.

2021-07-12 Thread Andrew MacLeod via Gcc-patches
Registering an equivalence between objects of the same size in a cast 
can cause other registered relations to be incorrect. Detailed in the 
PR.  This was an older attempt to solve a problem which has since been 
resolved by recomputation in the GORI engine.


Bootstrapped on  x86_64-pc-linux-gnu with no regressions. Pushed.

Andrew

commit a1539b797a06e03b08e1f1de28ad0d19a3956616
Author: Andrew MacLeod 
Date:   Mon Jul 12 11:38:17 2021 -0400

Do not register a cast as an equivalence.

Registering an equivalence between objects of the same size in a cast can
cause other relations to be incorrect.

gcc/
PR tree-optimization/101335
* range-op.cc (operator_cast::lhs_op1_relation): Delete.

gcc/testsuite/
* gcc.dg/tree-ssa/pr101335.c: New.

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index f8e4c6d4e49..08000465fd9 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -2159,10 +2159,6 @@ public:
 			  const irange &lhs,
 			  const irange &op2,
 			  relation_kind rel = VREL_NONE) const;
-  virtual enum tree_code lhs_op1_relation (const irange &lhs,
-	   const irange &op1,
-	   const irange &op2) const;
-
 private:
   bool truncating_cast_p (const irange &inner, const irange &outer) const;
   bool inside_domain_p (const wide_int &min, const wide_int &max,
@@ -2171,27 +2167,6 @@ private:
 			   const irange &outer) const;
 } op_convert;
 
-// Determine if there is a relationship between LHS and OP1.
-
-enum tree_code
-operator_cast::lhs_op1_relation (const irange &lhs,
- const irange &op1,
- const irange &op2 ATTRIBUTE_UNUSED) const
-{
-  if (op1.undefined_p ())
-return VREL_NONE;
-  // We can't make larger types equivalent to smaller types because we can
-  // miss sign extensions in a chain of casts.
-  // u32 = 0xf
-  // s32 = (s32) u32
-  // s64 = (s64) s32
-  // we cant simply "convert" s64 = (s64)u32  or we get positive 0x
-  // value instead of sign extended negative value.
-  if (TYPE_PRECISION (lhs.type ()) == TYPE_PRECISION (op1.type ()))
-return EQ_EXPR;
-  return VREL_NONE;
-}
-
 // Return TRUE if casting from INNER to OUTER is a truncating cast.
 
 inline bool
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr101335.c b/gcc/testsuite/gcc.dg/tree-ssa/pr101335.c
new file mode 100644
index 000..921362c2954
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr101335.c
@@ -0,0 +1,17 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+unsigned a = 0x;
+int b;
+int main()
+{
+  int c = ~a;
+  unsigned d = c - 10;
+  if (d > c)
+c = 20;
+  b = -(c | 0);
+  if (b > -8)
+__builtin_abort ();
+  return 0;
+}
+


[PATCH] i386: Fix vec_set expanders [PR101424]

2021-07-12 Thread Uros Bizjak via Gcc-patches
AVX does not support 32-byte integer compares, required by
ix86_expand_vector_set_var.  The following patch fixes vec_set
expanders by introducing new vec_setm_avx2_operand predicate for AVX
vector modes.

gcc/

2021-07-12  Uroš Bizjak  

PR target/101424
* config/i386/predicates.md (vec_setm_sse41_operand):
Rename from vec_setm_operand.
(vec_setm_avx2_operand): New predicate.
* config/i386/sse.md (vec_set): Use V_128 mode iterator.
Use vec_setm_sse41_operand as operand 2 predicate.
(vec_set

PR target/101424
* gcc.target/i386/pr101424.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 986b758396a..0984f7cc44d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -3604,7 +3604,7 @@ (define_insn "*pextrb_zext"
 (define_expand "vec_setv2hi"
   [(match_operand:V2HI 0 "register_operand")
(match_operand:HI 1 "register_operand")
-   (match_operand 2 "vec_setm_operand")]
+   (match_operand 2 "vec_setm_sse41_operand")]
   "TARGET_SSE2"
 {
   if (CONST_INT_P (operands[2]))
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 9488632ce24..6aa1ea32627 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -1021,11 +1021,16 @@ (define_predicate "incdec_operand"
 })
 
 ;; True for registers, or const_int_operand, used to vec_setm expander.
-(define_predicate "vec_setm_operand"
+(define_predicate "vec_setm_sse41_operand"
   (ior (and (match_operand 0 "register_operand")
(match_test "TARGET_SSE4_1"))
(match_code "const_int")))
 
+(define_predicate "vec_setm_avx2_operand"
+  (ior (and (match_operand 0 "register_operand")
+   (match_test "TARGET_AVX2"))
+   (match_code "const_int")))
+
 (define_predicate "vec_setm_mmx_operand"
   (ior (and (match_operand 0 "register_operand")
(match_test "TARGET_SSE4_1")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 17c9e571d5d..ab2023d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -8486,9 +8486,9 @@ (define_insn "vec_setv2df_0"
(set_attr "mode" "DF")])
 
 (define_expand "vec_set"
-  [(match_operand:V 0 "register_operand")
+  [(match_operand:V_128 0 "register_operand")
(match_operand: 1 "register_operand")
-   (match_operand 2 "vec_setm_operand")]
+   (match_operand 2 "vec_setm_sse41_operand")]
   "TARGET_SSE"
 {
   if (CONST_INT_P (operands[2]))
@@ -8499,6 +8499,20 @@ (define_expand "vec_set"
   DONE;
 })
 
+(define_expand "vec_set"
+  [(match_operand:V_256_512 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand 2 "vec_setm_avx2_operand")]
+  "TARGET_AVX"
+{
+  if (CONST_INT_P (operands[2]))
+ix86_expand_vector_set (false, operands[0], operands[1],
+   INTVAL (operands[2]));
+  else
+ix86_expand_vector_set_var (operands[0], operands[1], operands[2]);
+  DONE;
+})
+
 (define_insn_and_split "*vec_extractv4sf_0"
   [(set (match_operand:SF 0 "nonimmediate_operand" "=v,m,f,r")
(vec_select:SF
diff --git a/gcc/testsuite/gcc.target/i386/pr101424.c 
b/gcc/testsuite/gcc.target/i386/pr101424.c
new file mode 100644
index 000..28bb7230e47
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101424.c
@@ -0,0 +1,15 @@
+/* PR target/101424 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx" } */
+
+typedef int v4df __attribute__((vector_size(32)));
+
+int foo_v4df_b, foo_v4df_c;
+
+v4df
+__attribute__foo_v4df ()
+{
+  v4df a;
+  a[foo_v4df_c] = foo_v4df_b;
+  return a;
+}


[PATCH v3] IBM Z: Use @PLT symbols for local functions in 64-bit mode

2021-07-12 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573614.html
v1 -> v2: Do not use UNSPEC_PLT in 64-bit code and rename it to
  UNSPEC_PLT31 (Ulrich, Andreas).  Do not append @PLT only to
  weak symbols in non-PIC code (Ulrich).  Add TLS tests.

v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574646.html
v2 -> v3: Use %K in function_profiler() and s390_output_mi_thunk(),
  add tests for these cases.



This helps with generating code for kernel hotpatches, which contain
individual functions and are loaded more than 2G away from vmlinux.
This should not create performance regressions for the normal use
cases, because for local functions ld replaces @PLT calls with direct
calls.

gcc/ChangeLog:

* config/s390/predicates.md (bras_sym_operand): Accept all
functions in 64-bit mode, use UNSPEC_PLT31.
(larl_operand): Use UNSPEC_PLT31.
* config/s390/s390.c (s390_loadrelative_operand_p): Likewise.
(legitimize_pic_address): Likewise.
(s390_emit_tls_call_insn): Mark __tls_get_offset as function,
use UNSPEC_PLT31.
(s390_delegitimize_address): Use UNSPEC_PLT31.
(s390_output_addr_const_extra): Likewise.
(print_operand): Add @PLT to TLS calls, handle %K.
(s390_function_profiler): Mark __fentry__/_mcount as function,
use %K, use UNSPEC_PLT31.
(s390_output_mi_thunk): Use only UNSPEC_GOT, use %K.
(s390_emit_call): Use UNSPEC_PLT31.
(s390_emit_tpf_eh_return): Mark __tpf_eh_return as function.
* config/s390/s390.md (UNSPEC_PLT31): Rename from UNSPEC_PLT.
(*movdi_64): Use %K.
(reload_base_64): Likewise.
(*sibcall_brc): Likewise.
(*sibcall_brcl): Likewise.
(*sibcall_value_brc): Likewise.
(*sibcall_value_brcl): Likewise.
(*bras): Likewise.
(*brasl): Likewise.
(*bras_r): Likewise.
(*brasl_r): Likewise.
(*bras_tls): Likewise.
(*brasl_tls): Likewise.
(main_base_64): Likewise.
(reload_base_64): Likewise.
(@split_stack_call): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/visibility/noPLT.C: Skip on s390x.
* g++.target/s390/mi-thunk.C: New test.
* gcc.target/s390/nodatarel-1.c: Move foostatic to the new
tests.
* gcc.target/s390/pr80080-4.c: Allow @PLT suffix.
* gcc.target/s390/risbg-ll-3.c: Likewise.
* gcc.target/s390/call.h: Common code for the new tests.
* gcc.target/s390/call-z10-pic-nodatarel.c: New test.
* gcc.target/s390/call-z10-pic.c: New test.
* gcc.target/s390/call-z10.c: New test.
* gcc.target/s390/call-z9-pic-nodatarel.c: New test.
* gcc.target/s390/call-z9-pic.c: New test.
* gcc.target/s390/call-z9.c: New test.
* gcc.target/s390/mfentry-m64-pic.c: New test.
* gcc.target/s390/tls.h: Common code for the new TLS tests.
* gcc.target/s390/tls-pic.c: New test.
* gcc.target/s390/tls.c: New test.
---
 gcc/config/s390/predicates.md |  9 ++-
 gcc/config/s390/s390.c| 81 +--
 gcc/config/s390/s390.md   | 32 
 gcc/testsuite/g++.dg/ext/visibility/noPLT.C   |  2 +-
 gcc/testsuite/g++.target/s390/mi-thunk.C  | 23 ++
 .../gcc.target/s390/call-z10-pic-nodatarel.c  | 20 +
 gcc/testsuite/gcc.target/s390/call-z10-pic.c  | 20 +
 gcc/testsuite/gcc.target/s390/call-z10.c  | 20 +
 .../gcc.target/s390/call-z9-pic-nodatarel.c   | 18 +
 gcc/testsuite/gcc.target/s390/call-z9-pic.c   | 18 +
 gcc/testsuite/gcc.target/s390/call-z9.c   | 20 +
 gcc/testsuite/gcc.target/s390/call.h  | 40 +
 .../gcc.target/s390/mfentry-m64-pic.c |  9 +++
 gcc/testsuite/gcc.target/s390/nodatarel-1.c   | 26 +-
 gcc/testsuite/gcc.target/s390/pr80080-4.c |  2 +-
 gcc/testsuite/gcc.target/s390/risbg-ll-3.c|  6 +-
 gcc/testsuite/gcc.target/s390/tls-pic.c   | 14 
 gcc/testsuite/gcc.target/s390/tls.c   | 10 +++
 gcc/testsuite/gcc.target/s390/tls.h   | 23 ++
 19 files changed, 320 insertions(+), 73 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/s390/mi-thunk.C
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call.h
 create mode 100644 gcc/testsuite/gcc.target/s390/mfentry-m64-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls.c
 create mode 1006

*Ping* [PATCH] PR fortran/101084 - [10/11/12 Regression] ICE in gfc_typenode_for_spec, at fortran/trans-types.c:1124

2021-07-12 Thread Harald Anlauf via Gcc-patches
*Ping*

> Gesendet: Dienstag, 15. Juni 2021 um 21:31 Uhr
> Von: "Harald Anlauf" 
> An: "fortran" , "gcc-patches" 
> Betreff: [PATCH] PR fortran/101084 - [10/11/12 Regression] ICE in 
> gfc_typenode_for_spec, at fortran/trans-types.c:1124
>
> A recent change to the checking of legacy FORMAT tags did not handle
> cases where the type is not set.  Adjust the check.
>
> Regtested on x86_64-pc-linux-gnu.
>
> OK for mainline / 11- / 10-branch?
>
> Thanks,
> Harald
>
>
> Fortran: reject FORMAT tag of unknown type.
>
> gcc/fortran/ChangeLog:
>
>   PR fortran/101084
>   * io.c (resolve_tag_format): Extend FORMAT check to unknown type.
>
> gcc/testsuite/ChangeLog:
>
>   PR fortran/101084
>   * gfortran.dg/fmt_nonchar_3.f90: New test.
>
>


Re: Repost: [PATCH] Deal with prefixed loads/stores in tests, PR testsuite/100166

2021-07-12 Thread David Edelsohn via Gcc-patches
On Wed, Jul 7, 2021 at 4:03 PM Michael Meissner  wrote:
>
> [PATCH] Deal with prefixed loads/stores in tests, PR testsuite/100166
>
> This patch updates the various tests in the testsuite to treat plxv
> and pstxv as being vector loads/stores.  This shows up if you run the
> testsuite with a compiler configured with the option: --with-cpu=power10.
>
> I have verified that these tests now all pass when I build and test a compiler
> on a power10 system using --with-cpu=power10.  I have verified that they
> continue to run on power9 little endian and power8 big endian systems.
>
> Can I check this into the master branch?
>
> 2021-07-07  Michael Meissner  
>
> gcc/testsuite/
> PR testsuite/100166
> * 
> gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-bb-slp-9a-pr63175.c:
> * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c:
> * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c:
> * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c:
> * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c:
> * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c:
> * gcc.target/powerpc/fold-vec-load-builtin_vec_xl-short.c:
> * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-char.c:
> * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-double.c:
> * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-float.c:
> * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-int.c:
> * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-longlong.c:
> * gcc.target/powerpc/fold-vec-load-vec_vsx_ld-short.c:
> * gcc.target/powerpc/fold-vec-load-vec_xl-char.c:
> * gcc.target/powerpc/fold-vec-load-vec_xl-double.c:
> * gcc.target/powerpc/fold-vec-load-vec_xl-float.c:
> * gcc.target/powerpc/fold-vec-load-vec_xl-int.c:
> * gcc.target/powerpc/fold-vec-load-vec_xl-longlong.c:
> * gcc.target/powerpc/fold-vec-load-vec_xl-short.c:
> * gcc.target/powerpc/fold-vec-splat-floatdouble.c:
> * gcc.target/powerpc/fold-vec-splat-longlong.c:
> * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-char.c:
> * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-double.c:
> * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-float.c:
> * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-int.c:
> * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-longlong.c:
> * gcc.target/powerpc/fold-vec-store-builtin_vec_xst-short.c:
> * gcc.target/powerpc/fold-vec-store-vec_vsx_st-char.c:
> * gcc.target/powerpc/fold-vec-store-vec_vsx_st-double.c:
> * gcc.target/powerpc/fold-vec-store-vec_vsx_st-float.c:
> * gcc.target/powerpc/fold-vec-store-vec_vsx_st-int.c:
> * gcc.target/powerpc/fold-vec-store-vec_vsx_st-longlong.c:
> * gcc.target/powerpc/fold-vec-store-vec_vsx_st-short.c:
> * gcc.target/powerpc/fold-vec-store-vec_xst-char.c:
> * gcc.target/powerpc/fold-vec-store-vec_xst-double.c:
> * gcc.target/powerpc/fold-vec-store-vec_xst-float.c:
> * gcc.target/powerpc/fold-vec-store-vec_xst-int.c:
> * gcc.target/powerpc/fold-vec-store-vec_xst-longlong.c:
> * gcc.target/powerpc/fold-vec-store-vec_xst-short.c:
> * gcc.target/powerpc/lvsl-lvsr.c:
> * gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c:
> Update insn counts to account for power10 prefixed loads and
> stores.

This is okay, modulo the one plvx -> plxv typo mentioned by Bill.

Thanks, David


Re: Repost: [PATCH] Change rs6000_const_f32_to_i32 return type.

2021-07-12 Thread David Edelsohn via Gcc-patches
On Mon, Jul 12, 2021 at 12:07 PM Bill Schmidt  wrote:
>
> Hi Mike,
>
> On 7/7/21 2:59 PM, Michael Meissner wrote:
> > [PATCH] Change rs6000_const_f32_to_i32 return type.
> >
> > The function rs6000_const_f32_to_i32 called REAL_VALUE_TO_TARGET_SINGLE
> > with a long long type and returns it.  This patch changes the type to long
> > which is the proper type for REAL_VALUE_TO_TARGET_SINGLE.
> >
> > 2021-07-07  Michael Meissner  
> >
> > gcc/
> >   * config/rs6000/rs6000-protos.h (rs6000_const_f32_to_i32): Change
> >   return type to long.
> >   * config/rs6000/rs6000.c (rs6000_const_f32_to_i32): Change return
> >   type to long.
> > ---
> >   gcc/config/rs6000/rs6000-protos.h | 2 +-
> >   gcc/config/rs6000/rs6000.c| 6 --
> >   2 files changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/config/rs6000/rs6000-protos.h 
> > b/gcc/config/rs6000/rs6000-protos.h
> > index 9de294d3b28..94bf961c6b7 100644
> > --- a/gcc/config/rs6000/rs6000-protos.h
> > +++ b/gcc/config/rs6000/rs6000-protos.h
> > @@ -281,7 +281,7 @@ extern void rs6000_asm_output_dwarf_pcrel (FILE *file, 
> > int size,
> >  const char *label);
> >   extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size,
> >const char *label);
> > -extern long long rs6000_const_f32_to_i32 (rtx operand);
> > +extern long rs6000_const_f32_to_i32 (rtx operand);
> >
> >   /* Declare functions in rs6000-c.c */
> >
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 9a5db63d0ef..de11de5e079 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -27936,10 +27936,12 @@ rs6000_invalid_conversion (const_tree fromtype, 
> > const_tree totype)
> > return NULL;
> >   }
> >
> > -long long
> > +/* Convert a SFmode constant to the integer bit pattern.  */
> > +
> > +long
> >   rs6000_const_f32_to_i32 (rtx operand)
> >   {
> > -  long long value;
> > +  long value;
> > const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (operand);
> >
> > gcc_assert (GET_MODE (operand) == SFmode);
>
> These changes look OK.  Can you please also fix the expander for
> xxspltiw_v4sf, which incorrectly expects a long long?
>
> I can't approve, but recommend approval with that also fixed.

This is okay with the fix to xxspltiw_v4sf in altivec.md.  And please
update the ChangeLog appropriately.

Thanks, David


  1   2   >