RE: [PATCH][x86] Knights Mill -march/-mtune options

2017-09-19 Thread Peryt, Sebastian
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> Sent: Monday, September 18, 2017 9:10 PM
> To: Peryt, Sebastian 
> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin 
> Subject: Re: [PATCH][x86] Knights Mill -march/-mtune options
> 
> On Mon, Sep 18, 2017 at 12:42 PM, Peryt, Sebastian
>  wrote:
> >> -Original Message-
> >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> >> Sent: Monday, September 18, 2017 12:23 PM
> >> To: Peryt, Sebastian 
> >> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin 
> >> Subject: Re: [PATCH][x86] Knights Mill -march/-mtune options
> >>
> >> On Mon, Sep 18, 2017 at 12:17 PM, Peryt, Sebastian
> >>  wrote:
> >> >> -Original Message-
> >> >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> >> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> >> >> Sent: Sunday, September 17, 2017 6:14 PM
> >> >> To: Peryt, Sebastian 
> >> >> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin
> >> >> 
> >> >> Subject: Re: [PATCH][x86] Knights Mill -march/-mtune options
> >> >>
> >> >> On Thu, Sep 14, 2017 at 1:47 PM, Peryt, Sebastian
> >> >> 
> >> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > This patch adds  options -march=/-mtune=knm for Knights Mill.
> >> >> >
> >> >> > 2017-09-14  Sebastian Peryt   gcc/
> >> >> >
> >> >> > * config.gcc: Support "knm".
> >> >> > * config/i386/driver-i386.c (host_detect_local_cpu): Detect 
> >> >> > "knm".
> >> >> > * config/i386/i386-c.c (ix86_target_macros_internal): Handle
> >> >> > PROCESSOR_KNM.
> >> >> > * config/i386/i386.c (m_KNM): Define.
> >> >> > (processor_target_table): Add "knm".
> >> >> > (PTA_KNM): Define.
> >> >> > (ix86_option_override_internal): Add "knm".
> >> >> > (ix86_issue_rate): Add PROCESSOR_KNM.
> >> >> > (ix86_adjust_cost): Ditto.
> >> >> > (ia32_multipass_dfa_lookahead): Ditto.
> >> >> > (get_builtin_code_for_version): Handle PROCESSOR_KNM.
> >> >> > (fold_builtin_cpu): Define M_INTEL_KNM.
> >> >> > * config/i386/i386.h (TARGET_KNM): Define.
> >> >> > (processor_type): Add PROCESSOR_KNM.
> >> >> > * config/i386/x86-tune.def: Add m_KNM.
> >> >> > * doc/invoke.texi: Add knm as x86 -march=/-mtune= CPU type.
> >> >> >
> >> >> >
> >> >> > gcc/testsuite/
> >> >> >
> >> >> > * gcc.target/i386/funcspec-5.c: Test knm.
> >> >> >
> >> >> > Is it ok for trunk?
> >> >>
> >> >> You also have to update libgcc/cpuinfo.h together with
> >> >> fold_builtin_cpu from i386.c. Please note that all new processor
> >> >> types and subtypes have to be added at the end of the enum.
> >> >>
> >> >
> >> > Uros,
> >> >
> >> > I have updated libgcc/cpuinfo.h and libgcc/cpuinfo.c. I understood
> >> > that CPU_TYPE_MAX in libgcc/cpuinfo.h processor_types is some kind
> >> > of barrier, this is why I put KNM before that. Is that correct thinking?
> >> > As for fold_builtin_cpu in i386.c I already have something like this:
> >> >
> >> > @@ -34217,6 +34229,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
> >> >  M_AMDFAM15H,
> >> >  M_INTEL_SILVERMONT,
> >> >  M_INTEL_KNL,
> >> > +M_INTEL_KNM,
> >> >  M_AMD_BTVER1,
> >> >  M_AMD_BTVER2,
> >> >  M_CPU_SUBTYPE_START,
> >> > @@ -34262,6 +34275,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
> >> >{"bonnell", M_INTEL_BONNELL},
> >> >{"silvermont", M_INTEL_SILVERMONT},
> >> >{"knl", M_INTEL_KNL},
> >> > +  {"knm", M_INTEL_KNM},
> >> >{"amdfam10h", M_AMDFAM10H},
> >> >{"barcelona", M_AMDFAM10H_BARCELONA},
> >> >{"shanghai", M_AMDFAM10H_SHANGHAI},
> >> >
> >> > I couldn't find any other place where I'm supposed to add anything extra.
> >>
> >> Please look at libgcc/config/i386/cpuinfo.h. The comment here says that:
> >>
> >> /* Any new types or subtypes have to be inserted at the end. */
> >>
> >> The above patch should then add M_INTEL_KNM as the last entry
> >> *before* M_CPU_SUBTYPE_START.
> >>
> >
> > Sorry, I didn't notice this value at first. I believe now it's correct.
> 
> OK for mainline SVN (with updated ChangeLog).
> 

Can you please commit for me?

Thanks,
Sebastian

> Thanks,
> Uros.


Re: [PATCH] Fix PR82220

2017-09-19 Thread Richard Biener
On Mon, 18 Sep 2017, Richard Sandiford wrote:

> Richard Biener  writes:
> > The following is said to fix a 482.sphinx3 regression.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> >
> > Richard.
> >
> > 2017-09-18  Richard Biener  
> >
> > PR tree-optimization/82220
> > * tree-vect-loop.c (vect_estimate_min_profitable_iters): Exclude
> > epilogue niters from the min_profitable_iters compute.
> >
> > Index: gcc/tree-vect-loop.c
> > ===
> > --- gcc/tree-vect-loop.c(revision 252907)
> > +++ gcc/tree-vect-loop.c(working copy)
> > @@ -3663,8 +3663,8 @@ vect_estimate_min_profitable_iters (loop
> >min_profitable_iters);
> >  
> >/* We want the vectorized loop to execute at least once.  */
> > -  if (min_profitable_iters < (vf + peel_iters_prologue + 
> > peel_iters_epilogue))
> > -min_profitable_iters = vf + peel_iters_prologue + peel_iters_epilogue;
> > +  if (min_profitable_iters < (vf + peel_iters_prologue))
> > +min_profitable_iters = vf + peel_iters_prologue;
> 
> Maybe we should still add 1 when peeling for gaps?

You mean add vf?  Yes, I guess so.

> Even adding the prologue count seems a bit weird if we've guessed it to
> be vf/2.  Wouldn't it be more profitable to vectorise with an iteration
> count of 1 vector iteration + 1 peeled iteration than 1 vector iteration
> + vf-1 peeled iterations, at least in percentage terms?  Was just wondering
> if we should only add peel_iters_prologue when npeels > 0.

Well, I specifically added the code to compensate for the case where
we only guess the count to vf/2.  I tried to make us reliably avoid
entering the vector path for bwaves -- which of course only works
with versioning -- where we have a runtime iteration count of 5 but
still end up peeling for alignment.

If we exactly know the iterations of the prologue we avoid peeling
for alignment already.

So yes, I made the guessed case err on the scalar side (which regressed
sphinx).  Before my patch it erred on the vector side by estimating
the prologue end epilogue iters as 0.  I suppose we should split the
handling to correctly handle the non-guessed case and have some
explicit code doing sth for the guessed one.

OTOH we might want to make the cost model check combined with the
prologue peeling properly handle the prologue/epilogue iterations.
I didn't check whether we do that (but IIRC the execution flow is
slightly sub-optimal here).

Richard.


Re: Enable no-exec stacks for more targets using the Linux kernel

2017-09-19 Thread Andreas Schwab
On Sep 18 2017, Joseph Myers  wrote:

> Building glibc for many different configurations and running the
> compilation parts of the testsuite runs into failures of the
> elf/check-execstack test for hppa, ia64 and microblaze.

ia64 is non-execstack by default, so it doesn't need any marking.  The
same is true for every architecture that doesn't override
elf_read_implies_exec, which includes microblaze and hppa.

> This fails because those configurations are not generating
> .note.GNU-stack sections to indicate that programs do not need an
> executable stack.

This needs to be fixed in glibc.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH][aarch64] Fix error calls in aarch64 code so they can be tranlated

2017-09-19 Thread Frédéric Marchal
On lundi 18 septembre 2017 13 h 20 min 25 s CEST Martin Sebor wrote:
> On 09/18/2017 12:26 PM, Steve Ellcey wrote:
> > This patch is for PR target/79868, where some aarch64 diagnostics are
> > said to be not translatable due to how they are implemented.  See the
> > bug report for more details on why the current setup of passing
> > the string 'pragma' or 'attribute' doesn't work.
> > 
> > This patch fixes it, unfortunately by increasing the number of calls we
> > have to 'error' (16 calls become 32 calls), but that seems to be the
> > most straight forward way to get translatable strings.
> 
> I haven't looked at all of them but from the few I have seen it
> seems that rephrasing the messages along the following lines would
> be a way to get around the translation issue and without increasing
> the number of calls (though not without the conditional):
> 
>error (is_pragma
>   ? G_("missing name in %<#pragma target\(\"%s=\")%>")
>   : G_("missing name in % attribute"),
>   "arch");
> 
> The additional benefit of this approach is that it would also make
> the quoting consistent with what seems to be the prevailing style
> of these sorts of messages.  (It would be nice to eventually
> converge on the same style/quoting and phrasing across all back
> and front ends.)

Indeed! That's even better as the message uses words the user sees in the 
source code whatever his/her locale language is.

With your proposal, I know I must not translate "target" because it clearly is 
part of the programming language. With the former message, I would have 
translated "target" as part of the human language message. Your approach is 
clearly better.

Frederic




Re: Backport fix: [PATCH] Fix target attribute handling (PR c++/81355).

2017-09-19 Thread Martin Liška
On 09/18/2017 03:05 PM, Jakub Jelinek wrote:
> On Mon, Sep 18, 2017 at 03:01:53PM +0200, Martin Liška wrote:
>> As discussed here:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81224
>>
>> We have fallout caused by the patch and it's backport to active branches.
>> I'm planning to revert the patch and install patch that will ignore empty 
>> string
>> values. I'm testing the patch.
>>
>> Jakub do we really want it also for GCC 7? Note that the problematic 
>> test-case is OK on GCC 7 branch
>> as it contains your patch mentioned in discussion.
> 
> The question is, has the GCC 7 patch changed solely testcases where we'd
> ICE on into ones where we warn, or are there cases where we used to accept
> it and now warn?
> Generally we don't want to introduce new warnings/errors on release branches
> on something that used to be accepted, unless really necessary.
> 
>   Jakub
> 

I've just installed the patch to GCC 5.x and 6.x. I'll do the same for 7.x in 
order
to preserve the behavior.

Martin


Re: [PATCH] Fix PR82220

2017-09-19 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, 18 Sep 2017, Richard Sandiford wrote:
>> Richard Biener  writes:
>> > The following is said to fix a 482.sphinx3 regression.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>> >
>> > Richard.
>> >
>> > 2017-09-18  Richard Biener  
>> >
>> >PR tree-optimization/82220
>> >* tree-vect-loop.c (vect_estimate_min_profitable_iters): Exclude
>> >epilogue niters from the min_profitable_iters compute.
>> >
>> > Index: gcc/tree-vect-loop.c
>> > ===
>> > --- gcc/tree-vect-loop.c   (revision 252907)
>> > +++ gcc/tree-vect-loop.c   (working copy)
>> > @@ -3663,8 +3663,8 @@ vect_estimate_min_profitable_iters (loop
>> >   min_profitable_iters);
>> >  
>> >/* We want the vectorized loop to execute at least once.  */
>> > -  if (min_profitable_iters < (vf + peel_iters_prologue + 
>> > peel_iters_epilogue))
>> > -min_profitable_iters = vf + peel_iters_prologue + peel_iters_epilogue;
>> > +  if (min_profitable_iters < (vf + peel_iters_prologue))
>> > +min_profitable_iters = vf + peel_iters_prologue;
>> 
>> Maybe we should still add 1 when peeling for gaps?
>
> You mean add vf?  Yes, I guess so.

I think 1 is enough, since peeling for gaps forces one final scalar
iteration to be done by the epilogue.  You only get vf iterations being
done by the epilogue if no iterations would have been done otherwise.

E.g. for vf=4 and no prologue peeling, a scalar iteration count of 5
would be 1 vector iteration and 1 epilogue iteration even with peeling
for gaps.  A scalar iteration count of 8 would be 1 vector iteration
and 4 epilogue iterations.  But a scalar iteration count of 4 would
not use the vector loop at all.  In that case 5 seems like the right
minimum.

>> Even adding the prologue count seems a bit weird if we've guessed it to
>> be vf/2.  Wouldn't it be more profitable to vectorise with an iteration
>> count of 1 vector iteration + 1 peeled iteration than 1 vector iteration
>> + vf-1 peeled iterations, at least in percentage terms?  Was just wondering
>> if we should only add peel_iters_prologue when npeels > 0.
>
> Well, I specifically added the code to compensate for the case where
> we only guess the count to vf/2.  I tried to make us reliably avoid
> entering the vector path for bwaves -- which of course only works
> with versioning -- where we have a runtime iteration count of 5 but
> still end up peeling for alignment.
>
> If we exactly know the iterations of the prologue we avoid peeling
> for alignment already.

Ah, OK.  For bwaves, did we peel more than one iteration for
alignment and so skip the vector loop entirely?  Or did we still
use the vector loop, but end up with something that was still
slower because of the extra overhead?

> So yes, I made the guessed case err on the scalar side (which regressed
> sphinx).  Before my patch it erred on the vector side by estimating
> the prologue end epilogue iters as 0.  I suppose we should split the
> handling to correctly handle the non-guessed case and have some
> explicit code doing sth for the guessed one.
>
> OTOH we might want to make the cost model check combined with the
> prologue peeling properly handle the prologue/epilogue iterations.
> I didn't check whether we do that (but IIRC the execution flow is
> slightly sub-optimal here).

Something like adding the number of peels to the threshold at runtime?
Yeah, that sounds like it might be better.

One problem I remember hitting was that, even if we do calculate
a reasonable profitability threshold, we'd do the check in the same
block as the versioning check, and so pay the full penalty of the
versioning check regardless.  I.e. it's "unprofitable_p | alias_p"
rather than "unprofitable_p || alias_p".  Not sure whether that's
a different problem from bwaves though.

Thanks,
Richard


[PATCH][GRAPHITE] Fix affine constraint generation

2017-09-19 Thread Richard Biener

The following plugs some holes in extract_affine which failed
to modulo-reduce signed values in conversions, failed to
interpret pointer-plus offsets as signed (yeah, stupid GIMPLE IL...)
and mishandled BIT_NOT_EXPR.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Ok?

Thanks,
Richard.

2017-09-19  Richard Biener  

* graphite-sese-to-poly.c (extract_affine): Properly handle
POINTER_PLUS_EXPR, BIT_NOT_EXPR and conversion to signed.

Index: gcc/graphite-sese-to-poly.c
===
--- gcc/graphite-sese-to-poly.c (revision 252968)
+++ gcc/graphite-sese-to-poly.c (working copy)
@@ -237,6 +237,7 @@ extract_affine (scop_p s, tree e, __isl_
 return NULL;
   }
 
+  tree type = TREE_TYPE (e);
   switch (TREE_CODE (e))
 {
 case POLYNOMIAL_CHREC:
@@ -247,8 +248,22 @@ extract_affine (scop_p s, tree e, __isl_
   res = extract_affine_mul (s, e, space);
   break;
 
-case PLUS_EXPR:
 case POINTER_PLUS_EXPR:
+  {
+   lhs = extract_affine (s, TREE_OPERAND (e, 0), isl_space_copy (space));
+   /* The RHS of a pointer-plus expression is to be interpreted
+  as signed value.  Try to look through a sign-changing conversion
+  first.  */
+   tree tem = TREE_OPERAND (e, 1);
+   STRIP_NOPS (tem);
+   rhs = extract_affine (s, tem, space);
+   if (TYPE_UNSIGNED (TREE_TYPE (tem)))
+ rhs = wrap (rhs, TYPE_PRECISION (type) - 1);
+   res = isl_pw_aff_add (lhs, rhs);
+   break;
+  }
+
+case PLUS_EXPR:
   lhs = extract_affine (s, TREE_OPERAND (e, 0), isl_space_copy (space));
   rhs = extract_affine (s, TREE_OPERAND (e, 1), space);
   res = isl_pw_aff_add (lhs, rhs);
@@ -260,8 +275,13 @@ extract_affine (scop_p s, tree e, __isl_
   res = isl_pw_aff_sub (lhs, rhs);
   break;
 
-case NEGATE_EXPR:
 case BIT_NOT_EXPR:
+  lhs = extract_affine (s, integer_minus_one_node, isl_space_copy (space));
+  rhs = extract_affine (s, TREE_OPERAND (e, 0), space);
+  res = isl_pw_aff_sub (lhs, rhs);
+  break;
+
+case NEGATE_EXPR:
   lhs = extract_affine (s, TREE_OPERAND (e, 0), isl_space_copy (space));
   rhs = extract_affine (s, integer_minus_one_node, space);
   res = isl_pw_aff_mul (lhs, rhs);
@@ -279,6 +299,12 @@ extract_affine (scop_p s, tree e, __isl_
   return res;
 
 CASE_CONVERT:
+  res = extract_affine (s, TREE_OPERAND (e, 0), space);
+  /* signed values, even if overflow is undefined, get modulo-reduced.  */
+  if (! TYPE_UNSIGNED (type))
+   res = wrap (res, TYPE_PRECISION (type) - 1);
+  break;
+
 case NON_LVALUE_EXPR:
   res = extract_affine (s, TREE_OPERAND (e, 0), space);
   break;
@@ -288,7 +314,6 @@ extract_affine (scop_p s, tree e, __isl_
   break;
 }
 
-  tree type = TREE_TYPE (e);
   if (TYPE_UNSIGNED (type))
 res = wrap (res, TYPE_PRECISION (type));
 


Re: [Ada] Validity check failure with packed array and pragma

2017-09-19 Thread Pierre-Marie de Rodat

On 09/18/2017 10:09 PM, Eric Botcazou wrote:

I'm not sure anyone really cares so it's up to you I'd say.


Ok, thanks. Done! Committed as r252971.

--
Pierre-Marie de Rodat


Re: [PATCH][RFA/RFC] Stack clash mitigation patch 02/08 - V3

2017-09-19 Thread Andreas Schwab
On Sep 18 2017, Jeff Law  wrote:

> On 09/18/2017 10:09 AM, Andreas Schwab wrote:
>> On Sep 18 2017, Jeff Law  wrote:
>> 
>>> Can you confirm if the probe was in the red zone vs the live areas on
>>> the stack?
>> 
>> It overwrites a nearby variable.  sp + 8 happens to be the address of
>> file_entries_new_size.
>> 
>>0x000140e8 <+1172>:  mov r6, sp
>>0x000140ec <+1176>:  add r3, r3, #7
>>0x000140f0 <+1180>:  bic r3, r3, #7
>>0x000140f4 <+1184>:  cmp r3, #4096   ; 0x1000
>>0x000140f8 <+1188>:  bcc 0x14110 
>>0x000140fc <+1192>:  sub r3, r3, #4096   ; 0x1000
>>0x00014100 <+1196>:  sub sp, sp, #4096   ; 0x1000
>>0x00014104 <+1200>:  cmp r3, #4096   ; 0x1000
>>0x00014108 <+1204>:  str r0, [sp, #8]
>>0x0001410c <+1208>:  bcs 0x140fc 
>>0x00014110 <+1212>:  ldr r7, [r11, #-56] ; 0xffc8
>>0x00014114 <+1216>:  sub sp, sp, r3
>>0x00014118 <+1220>:  mov r1, #0
>>0x0001411c <+1224>:  add r3, sp, #8
>>0x00014120 <+1228>:  mov r0, r3
>> => 0x00014124 <+1232>:  str r0, [sp, #8]
>> 
>> Andreas.
>> 
> Or better yet, include your .i and .s files in their entirety and the
> gcc -v output

This was generated with a very simple and incomplete backport of
-fstack-clash-protection to gcc7.  What I get with your full patches
with gcc8 looks correct.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[PATCH] cp: fix location comparison in member_name_cmp

2017-09-19 Thread Alexander Monakov
Hi,

After recent changes, the member_name_cmp qsort comparator can indicate
A < B < A (i.e. lacks anti-commutativity) for distinct TYPE_DECL nodes
that have the same source location.  If their order doesn't matter, the
comparator should return 0.

Invoking qsort with improper comparator at best makes the output unpredictable
(pedantically it invokes undefined behavior); this trips qsort checking I'm
preparing to resend.

(it also seems that this and the following comparators use less/greater
pointer comparison, but unless those pointers point into the same array
that invokes undefined behavior too, although benign in practice)

Bootstrapped and regtested on x86-64, OK to apply?

Thanks.
Alexander

* name-lookup.c (member_name_cmp): Return 0 if locations compare equal.

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index d0aaf2b1d16..046cd109533 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -1437,7 +1437,7 @@ member_name_cmp (const void *a_p, const void *b_p)
   if (TREE_CODE (a) == TREE_CODE (b))
 /* We can get two TYPE_DECLs or two USING_DECLs.  Place in source
order.  */
-return DECL_SOURCE_LOCATION (a) < DECL_SOURCE_LOCATION (b) ? -1 : +1;
+goto compare_locations;
 
   /* If one of them is a TYPE_DECL, it loses.  */
   if (TREE_CODE (a) == TYPE_DECL)
@@ -1456,7 +1456,10 @@ member_name_cmp (const void *a_p, const void *b_p)
  Order by source location.  We should really prevent this
  happening.  */
   gcc_assert (errorcount);
-  return DECL_SOURCE_LOCATION (a) < DECL_SOURCE_LOCATION (b) ? -1 : +1;
+compare_locations:
+  if (DECL_SOURCE_LOCATION (a) != DECL_SOURCE_LOCATION (b))
+return DECL_SOURCE_LOCATION (a) < DECL_SOURCE_LOCATION (b) ? -1 : +1;
+  return 0;
 }
 
 static struct {


Re: [PATCH] cp: fix location comparison in member_name_cmp

2017-09-19 Thread Nathan Sidwell

On 09/19/2017 07:06 AM, Alexander Monakov wrote:

Hi,

After recent changes, the member_name_cmp qsort comparator can indicate
A < B < A (i.e. lacks anti-commutativity) for distinct TYPE_DECL nodes
that have the same source location.  If their order doesn't matter, the
comparator should return 0.


Where do we get such type decls?

nathan

--
Nathan Sidwell


[PATCH] haifa-sched: fix autopref_rank_for_schedule qsort comparator

2017-09-19 Thread Alexander Monakov
Hello,

The autopref_rank_for_schedule qsort sub-comparator is not actually a proper
comparator.  For instance, it lacks transitivity: if there's insns A, B, C
such that B has AUTOPREF_MULTUPASS_DATA_IRRELEVANT status, but A and C
compare such that C < A, we can have A == B == C < A according to this
sub-comparator, and A < B < C < A if the following tie-breakers order
A before B and B before C.

I'm not sure this heuristic fits well into the overall scheduler sorting,
but if we want to just fix this comparison step, we should never skip
calls to autopref_rank_data.

Bootstrapped and regtested on x86-64 with PARAM_SCHED_AUTOPREF_QUEUE_DEPTH
set to '0' (otherwise this sub-comparator is unused).  OK to apply?

Thanks.
Alexander

* haifa-sched.c (autopref_rank_for_schedule): Always invoke
autopref_rank_data.

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index af0ed27b18f..031a0af96db 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -5707,7 +5707,8 @@ autopref_rank_data (autopref_multipass_data_t data1,
 static int
 autopref_rank_for_schedule (const rtx_insn *insn1, const rtx_insn *insn2)
 {
-  for (int write = 0; write < 2; ++write)
+  int r = 0;
+  for (int write = 0; write < 2 && !r; ++write)
 {
   autopref_multipass_data_t data1
= &INSN_AUTOPREF_MULTIPASS_DATA (insn1)[write];
@@ -5716,21 +5717,14 @@ autopref_rank_for_schedule (const rtx_insn *insn1, 
const rtx_insn *insn2)
 
   if (data1->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED)
autopref_multipass_init (insn1, write);
-  if (data1->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
-   continue;
 
   if (data2->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED)
autopref_multipass_init (insn2, write);
-  if (data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
-   continue;
 
-  if (!rtx_equal_p (data1->base, data2->base))
-   continue;
-
-  return autopref_rank_data (data1, data2);
+  r = autopref_rank_data (data1, data2);
 }
 
-  return 0;
+  return r;
 }
 
 /* True if header of debug dump was printed.  */


Re: [PATCH] cp: fix location comparison in member_name_cmp

2017-09-19 Thread Alexander Monakov
On Tue, 19 Sep 2017, Nathan Sidwell wrote:

> On 09/19/2017 07:06 AM, Alexander Monakov wrote:
> > Hi,
> > 
> > After recent changes, the member_name_cmp qsort comparator can indicate
> > A < B < A (i.e. lacks anti-commutativity) for distinct TYPE_DECL nodes
> > that have the same source location.  If their order doesn't matter, the
> > comparator should return 0.
> 
> Where do we get such type decls?

The first instance that tripped qsort checking for me was in PCH generation.
I think by adding

  gcc_assert (DECL_SOURCE_LOCATION (a) != DECL_SOURCE_LOCATION (b));

at the appropriate point in this comparator you can see for yourself.

Thanks.
Alexander


Re: [RFC][PATCH 1/5] Add separate parms for rtl unroller

2017-09-19 Thread Richard Biener
On Tue, Sep 19, 2017 at 8:25 AM, Kugan Vivekanandarajah
 wrote:
> Hi Richard,
>
> On 18 September 2017 at 17:50, Richard Biener
>  wrote:
>> On Mon, Sep 18, 2017 at 3:36 AM, Kugan Vivekanandarajah
>>  wrote:
>>> Hi Richard,
>>>
>>> On 15 September 2017 at 19:31, Richard Biener
>>>  wrote:
 On Fri, Sep 15, 2017 at 3:27 AM, Kugan Vivekanandarajah
  wrote:
> This patch adds separate params for rtl unroller so that they can be
> tunned accordingly. Default values I have are based on some testing on
> aarch64. I am happy to leave it as the current value and set them in
> the back-end.

 PARAM_MAX_AVERAGE_UNROLLED_INSNS is only used by the RTL
 unroller.  Why should we separate PARAM_MAX_UNROLL_TIMES?

 PARAM_MAX_UNROLLED_INSNS is only used by gimple passes
 that perform unrolling.  Since GIMPLE is three-address it should
 match RTL reasonably well -- but I'd be ok in having a separate param
 for those.  But I wouldn't name those 'partial'.

 That said, those are magic numbers and I expect we can find some
 that work well on RTL and GIMPLE.
>>>
>>> Thanks for the review. I am mostly interested in having separate
>>> params for RTL runtime unrolling as this is different to what GIMPLE
>>> unroller does.
>>
>> Why?  Do you just want to have more magic knobs to machine-auto-tune?
>>
>>> May be I should have separate params only for the
>>> runtime unrolling (or the partial unroller)  and let RTL/GIMPLE share
>>> the other. Any preference here ? Any preference on the name ?
>>>
>>> I am suspecting that RTL unroller which does the same as GIMPLE is
>>> kind of obsolete now?
>>
>> We do not have a GIMPLE loop unroller pass.  On GIMPLE we only do
>> complete peeling as a separate pass and several passes perform
>> unrolling as part of their transform.
>
> Sorry for my terminology. I was referring to complete unrolling of
> loops in tree-sea-loop-ivcanon.c. In any case, what I am interested in
> is partial unrolling of loops in loop-unroll.c. For Falkor at least,
> partially unrolling of small loops that satisfies certain condition
> seems to improve performance. Since this is not enabled by default and
> some of the params are also shared by other optimisations (as you also
> have pointed out), I thought that having separate params for
> loop-unroll.c will limit the interference of changing this params on
> other optimisations. My initial patch was wrong and this should only
> separate params that are shared.

Well, as the complete peeling doesn't use those params but only
passes that perform "partial" unrolling I see nothing wrong with sharing
the parameters.

> And also, is anyone working on a partial unroller for Gimple? I would
> like to give it a try if no one is working on this. Is there any
> preferences here.

People have talked about it but I'm not aware of actual implementations.
Though an implementation is quite trivial with the interesting piece being
the costing (as usual).

Richard.

> Thanks,
> Kugan
>
>
>
>
>> Richard.
>>
>>> Thanks,
>>> Kugan
>>>
>>>
>>>
>>>
>>>
 Richard.

>
> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2017-09-12  Kugan Vivekanandarajah  
>
> * loop-unroll.c (decide_unroll_constant_iterations): Use new params.
> (decide_unroll_runtime_iterations): Likewise.
> (decide_unroll_stupid): Likewise.
> * params.def (DEFPARAM): Separate and add new params for rtl unroller.


[PATCH] Fix PR82244

2017-09-19 Thread Richard Biener

The following fixes a bug in VRP where on removal of asserts it ended
up propagating a constant in place of abnormal SSA uses.  That doesn't
work when the use is in an abnormal PHI thus instead replace the
assert with a copy in that case.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-09-19  Richard Biener  

PR tree-optimization/82244
* tree-vrp.c (remove_range_assertions): Do not propagate
a constant to abnormals but replace the assert with a copy.

* gcc.dg/torture/pr82244.c: New testcase.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 252968)
+++ gcc/tree-vrp.c  (working copy)
@@ -7039,6 +7039,14 @@ remove_range_assertions (void)
  FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
SET_USE (use_p, var);
  }
+   /* But do not propagate constants as that is invalid.  */
+   else if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+ {
+   gassign *ass = gimple_build_assign (lhs, var);
+   gsi_replace (&si, ass, true);
+   gsi_next (&si);
+   continue;
+ }
else
  replace_uses_by (lhs, var);
 
Index: gcc/testsuite/gcc.dg/torture/pr82244.c
===
--- gcc/testsuite/gcc.dg/torture/pr82244.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr82244.c  (working copy)
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+
+typedef struct a {
+  struct a *b;
+} a;
+
+extern int d(void);
+extern int g(void);
+extern int h(void);
+extern int _setjmp(void *);
+
+int c(void)
+{
+  1 ? d() : 0;
+
+  a *e;
+  while (e) {
+  e = (e == (a *) c) ? 0 : e->b;
+  while (e) {
+ int f = 0;
+ g();
+ if (_setjmp(0)) {
+ if (f & 6) {
+ ;
+ } else if (f & 2) {
+ h();
+ }
+ }
+  }
+  }
+}


Re: [PATCH] cp: fix location comparison in member_name_cmp

2017-09-19 Thread Nathan Sidwell

On 09/19/2017 07:25 AM, Alexander Monakov wrote:

On Tue, 19 Sep 2017, Nathan Sidwell wrote:


On 09/19/2017 07:06 AM, Alexander Monakov wrote:

Hi,

After recent changes, the member_name_cmp qsort comparator can indicate
A < B < A (i.e. lacks anti-commutativity) for distinct TYPE_DECL nodes
that have the same source location.  If their order doesn't matter, the
comparator should return 0.


Where do we get such type decls?


The first instance that tripped qsort checking for me was in PCH generation.
I think by adding


What source example generates such type decls?  That you've encountered 
this suggests one of the invariants I was presuming is not true.


nathan

--
Nathan Sidwell


Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV

2017-09-19 Thread Rainer Orth
Daniel Santos  writes:

> On 09/17/2017 10:53 AM, Uros Bizjak wrote:
>> OK.
>>
>> Thanks,
>> Uros.
>
> Thanks. I should have posted this Friday when my tests finished, but
> I'll be committing with one minor change so tests don't run on m32 or mx32:
>
> --- a/gcc/testsuite/gcc.target/i386/pr82196-1.c
> +++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */
> +/* { dg-do compile { target lp64 } } */
>  /* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */
>  /* { dg-final { scan-assembler "call.*__sse_savms64_18" } } */
>  /* { dg-final { scan-assembler "jmp.*__sse_resms64x_18" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr82196-2.c
> b/gcc/testsuite/gcc.target/i386/pr82196-2.c
> index 31705bee29b..8fe58411d5e 100644
> --- a/gcc/testsuite/gcc.target/i386/pr82196-2.c
> +++ b/gcc/testsuite/gcc.target/i386/pr82196-2.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */
> +/* { dg-do compile { target lp64 } } */
>  /* { dg-options "-mavx -mcall-ms2sysv-xlogues -O2" } */
>  /* { dg-final { scan-assembler "call.*__avx_savms64_18" } } */
>  /* { dg-final { scan-assembler "jmp.*__avx_resms64x_18" } } */
>
> Other than that, full regression tests pass.

However, they do FAIL on 64-bit Solaris/x86:

+FAIL: gcc.target/i386/pr82196-1.c (test for excess errors)

Excess errors:
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/pr82196-1.c:14:1: 
error: bp cannot be used in asm here

+UNRESOLVED: gcc.target/i386/pr82196-1.c scan-assembler call.*__sse_savms64_18
+UNRESOLVED: gcc.target/i386/pr82196-1.c scan-assembler jmp.*__sse_resms64x_18
+FAIL: gcc.target/i386/pr82196-2.c (test for excess errors)
+UNRESOLVED: gcc.target/i386/pr82196-2.c scan-assembler call.*__avx_savms64_18
+UNRESOLVED: gcc.target/i386/pr82196-2.c scan-assembler jmp.*__avx_resms64x_18

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[C++ PATCH] C++2A P0386R1 - default member initializers for bit-fields

2017-09-19 Thread Jakub Jelinek
Hi!

The following patch implements P0386R1 - NSDMIs for bit-fields.
While working on that, I've discovered our parser mishandles attributes
on bitfields, already C++11 says:
identifier[opt] attribute-specifier-seq[opt] : constant-expression
in the grammar, but we actually parsed
identifier[opt] : constant-expression attribute-specifier-seq[opt]

I had to resolve a few issues:
1) cp_parser_constant_expression doesn't actually parse a
constant-expression, but a constant assignment-expression; the
rationale is that it gives better diagnostics etc.  For
the bitfield parsing, we really need a constant expression.
The patch has the parsing code inline, another option is to
adjust cp_parser_constant_expression so that it has (a defaulted)
bool argument strict or something similar and if that argument is true,
don't call cp_parser_assignment_expression in it, but instead do inline
the 
  cp_expr expr
= cp_parser_binary_expression (parser, false, false, false,
   PREC_NOT_OPERATOR, NULL);
  /* If the next token is a `?' then we're actually looking at
 a conditional-expression; otherwise we're done.  */
  if (cp_lexer_next_token_is (parser->lexer, CPP_QUERY))
expr = cp_parser_question_colon_clause (parser, expr);
part.  Another option is to add another argument to
cp_parser_assignment_expression too and do it in there, or add
cp_parser_conditional_expression and call it conditionally from
cp_parser_constant_expression (based on this strict or force_conditional
or whatever arg) and unconditionally from cp_parser_assignment_expression.
Preferences?

2) the above attribute issue; in the patch I parse attributes in both
spots unless NSDMI is present (in that case only at the right spot,
since there is no need for backwards compatibility).  Should there be
some pedwarn or something similar if attributes appear in the old
(incorrect) spot?  Only for C++11 attributes or for both C++11 and GNU?

3) we were temporarily (until check_bitfield_decl) stashing the bit width
expression in DECL_INITIAL, which obviously doesn't work too well when we
need to stash the NSDMI there temporarily.  I've tried to put it temporarily
into DECL_SIZE, but that doesn't work too well, because e.g.
variably_modified_type_p is called on still incomplete aggregate types and
it then thinks that such incomplete type is variable length.
So in the end I'm temporarily abusing DECL_BIT_FIELD_REPRESENTATIVE,
which seems to work fine (the middle-end only uses it after the type is
completed).

4) I hope default member initializers aren't valid for unnamed bit-fields,
but am not 100% sure if the current standard wording really disallows it.
There is:
"A brace-or-equal-initializer shall appear only in the declaration of a data 
member."
so, are unnamed bit-fields not data members?
There is additionally
"A data member is a non-function member introduced by a member-declarator."
and a paragraph before says that unnamed bit-field is not a member of the
class.

Bootstrapped/regtested on x86_64-linux and i686-linux (including make
check-c++-all), ok for trunk?  Or any preferences on 1) or 2) above?

2017-09-19  Jakub Jelinek  

P0386R1 - default member initializers for bit-fields
c-family/
* c-attribs.c (handle_packed_attribute): Test DECL_C_BIT_FIELD
rather than DECL_INITIAL.
(common_handle_aligned_attribute): Likewise.
c/
* c-decl.c (grokfield): Use SET_DECL_C_BIT_FIELD here if
with is non-NULL.
(finish_struct): Test DECL_C_BIT_FIELD instead of DECL_INITIAL,
don't SET_DECL_C_BIT_FIELD here.
cp/
* cp-tree.h (grokbitfield): Add INIT parameter.
* parser.c (cp_parser_member_declaration): Parse attributes before
colon of a bitfield in addition to after colon.  Parse C++2A bitfield
NSDMIs.  Adjust grokbitfield caller.  Handle DECL_INITIAL also for
DECL_C_BIT_FIELDs.
(cp_parser_objc_class_ivars): Adjust grokbitfield caller.
* class.c (check_bitfield_decl): Retrieve and clear width from
DECL_BIT_FIELD_REPRESENTATIVE rather than DECL_INITIAL.
(check_field_decl): Recurse even for DECL_C_BIT_FIELDs.
(check_field_decls): Test DECL_BIT_FIELD_REPRESENTATIVE rather than
DECL_INITIAL.
(remove_zero_width_bit_fields): Adjust comment.
* decl2.c (grokbitfield): Add INIT parameter, pass it to
cp_finish_decl.  Stash width into DECL_BIT_FIELD_REPRESENTATIVE rather
than DECL_INITIAL.
* pt.c (tsubst_decl): For DECL_C_BIT_FIELD, tsubst_expr
DECL_BIT_FIELD_REPRESENTATIVE rather than DECL_INITIAL for width,
and handle DECL_INITIAL for all FIELD_DECLs, not just non-bitfields.

* Make-lang.in (check-c++-all): Test also c++2a.
objc/
* objc-act.c (check_ivars, gen_declaration): For OBJCPLUS look at
DECL_BIT_FIEL

Re: [C++ PATCH] C++2A P0386R1 - default member initializers for bit-fields

2017-09-19 Thread Tim Song
On Tue, Sep 19, 2017 at 8:54 AM, Jakub Jelinek  wrote:
> Hi!
>
> The following patch implements P0386R1 - NSDMIs for bit-fields.

It's P0683R1.


[C++ PATCH] C++2A P0704R1 - fixing const-qualified pointers to members

2017-09-19 Thread Jakub Jelinek
Hi!

(Sorry for the resent, forgot to CC gcc-patches).

Another easy C++2A patch.  Here I wonder if we don't want
a -Wc++2a-compat warning and whether & (const | volatile)) == const
is best (for the case if there are other quals in the future),
or if we just want == const.

For the compat warning, the behavior will be that for c++17 and below
we'll error out or fail SFINAE, for c++2a we'll accept it, so not sure
if it falls into what we want to warn or not.

Regtested on x86_64-linux and i686-linux with make check-c++-all.

2017-09-19  Jakub Jelinek  

P0704R1 - fixing const-qualified pointers to members
* typeck2.c (build_m_component_ref): For -std=c++2a allow
pointer to const & qualified method on rvalue.

* g++.dg/cpp2a/ptrmem1.C: New test.

--- gcc/cp/typeck2.c.jj 2017-09-19 12:06:30.635052397 +0200
+++ gcc/cp/typeck2.c2017-09-19 12:12:50.725235890 +0200
@@ -1908,9 +1908,10 @@ build_m_component_ref (tree datum, tree
 {
   /* 5.5/6: In a .* expression whose object expression is an rvalue, the
 program is ill-formed if the second operand is a pointer to member
-function with ref-qualifier &. In a .* expression whose object
-expression is an lvalue, the program is ill-formed if the second
-operand is a pointer to member function with ref-qualifier &&.  */
+function with ref-qualifier & (for C++2A: unless its cv-qualifier-seq
+is const). In a .* expression whose object expression is an lvalue,
+the program is ill-formed if the second operand is a pointer to member
+function with ref-qualifier &&.  */
   if (FUNCTION_REF_QUALIFIED (type))
{
  bool lval = lvalue_p (datum);
@@ -1921,7 +1922,12 @@ build_m_component_ref (tree datum, tree
   ptrmem_type);
  return error_mark_node;
}
- else if (!lval && !FUNCTION_RVALUE_QUALIFIED (type))
+ else if (!lval
+  && !FUNCTION_RVALUE_QUALIFIED (type)
+  && (cxx_dialect < cxx2a
+  || ((type_memfn_quals (type)
+   & (TYPE_QUAL_CONST | TYPE_QUAL_VOLATILE))
+  != TYPE_QUAL_CONST)))
{
  if (complain & tf_error)
error ("pointer-to-member-function type %qT requires an lvalue",
--- gcc/testsuite/g++.dg/cpp2a/ptrmem1.C.jj 2017-09-19 12:30:49.344545579 
+0200
+++ gcc/testsuite/g++.dg/cpp2a/ptrmem1.C2017-09-19 12:33:40.831363371 
+0200
@@ -0,0 +1,23 @@
+// P0704R1
+// { dg-do compile { target c++11 } }
+
+struct S {
+  void ref() & {}
+  void cref() const& {}
+  void vref() volatile & {}
+  void cvref() const volatile & {}
+};
+
+void
+foo ()
+{
+  S{}.ref();   // { dg-error "argument discards qualifiers" }
+  S{}.cref();
+  S{}.vref();  // { dg-error "argument discards qualifiers" }
+  S{}.cvref(); // { dg-error "argument discards qualifiers" }
+
+  (S{}.*&S::ref)();// { dg-error "pointer-to-member-function type 'void 
\\(S::\\*\\)\\(\\) &' requires an lvalue" }
+  (S{}.*&S::cref)();   // { dg-error "pointer-to-member-function type 'void 
\\(S::\\*\\)\\(\\) const &' requires an lvalue" "" { target c++17_down } }
+  (S{}.*&S::vref)();   // { dg-error "pointer-to-member-function type 'void 
\\(S::\\*\\)\\(\\) volatile &' requires an lvalue" }
+  (S{}.*&S::cvref)();  // { dg-error "pointer-to-member-function type 'void 
\\(S::\\*\\)\\(\\) const volatile &' requires an lvalue" }
+}

Jakub


Re: [C++ PATCH] C++2A P0386R1 - default member initializers for bit-fields

2017-09-19 Thread Jakub Jelinek
On Tue, Sep 19, 2017 at 08:59:23AM -0400, Tim Song wrote:
> On Tue, Sep 19, 2017 at 8:54 AM, Jakub Jelinek  wrote:
> > Hi!
> >
> > The following patch implements P0386R1 - NSDMIs for bit-fields.
> 
> It's P0683R1.

Oops, got it right in the testcases, just not in the ChangeLog line
from which I've pasted the subject.  Fixed in my copy.

Jakub


[C++ PATCH] Some check-c++-all fixes

2017-09-19 Thread Jakub Jelinek
Hi!

In make check-c++-all I've noticed some UNSUPPORTED tests and some
failures, the following patch attempts to fix those.
The failures are due to new not emitting operator new (...) != NULL
comparison with -std=c++17 anymore, so there is nothing to fold away
(first 2 hunks).  The rest is about tests which are dg-do run, but
for -std=c++17 and above contain dg-error and thus the execution test
part is UNSUPPORTED.

Regtested with make check-c++-all on x86_64-linux and i686-linux, ok for
trunk?

2017-09-19  Jakub Jelinek  

* g++.dg/tree-ssa/pr31146-2.C: Only do scan-tree-dump for c++14_down.
* g++.dg/tree-ssa/pr41428.C: Likewise.
* g++.dg/expr/bool1.C: Only do dg-do compile instead of dg-do run for
c++17 and up.
* g++.dg/expr/bool3.C: Likewise.
* g++.dg/expr/bitfield5.C: Likewise.
* g++.old-deja/g++.jason/bool5.C: Likewise.

--- gcc/testsuite/g++.dg/tree-ssa/pr41428.C.jj  2015-05-29 15:04:33.0 
+0200
+++ gcc/testsuite/g++.dg/tree-ssa/pr41428.C 2017-09-19 14:23:13.366127516 
+0200
@@ -11,4 +11,6 @@ int foo(void)
   return *(int *)&f;
 }
 
-/* { dg-final { scan-tree-dump "Folded into: if \\\(1 != 0\\\)" "ccp1" } } */
+/* -std=c++17 and above doesn't emit operator new () != NULL, so there is
+   nothing to fold anymore.  */
+/* { dg-final { scan-tree-dump "Folded into: if \\\(1 != 0\\\)" "ccp1" { 
target c++14_down } } } */
--- gcc/testsuite/g++.dg/tree-ssa/pr31146-2.C.jj2016-09-21 
08:54:09.0 +0200
+++ gcc/testsuite/g++.dg/tree-ssa/pr31146-2.C   2017-09-19 14:23:41.675773292 
+0200
@@ -20,4 +20,6 @@ double foo (void)
   return v.a[2];
 }
 
-/* { dg-final { scan-tree-dump "Replaced .* != 0B. with .1" "forwprop1" } } */
+/* -std=c++17 and above doesn't emit operator new () != NULL, so there is
+   nothing to fold anymore.  */
+/* { dg-final { scan-tree-dump "Replaced .* != 0B. with .1" "forwprop1" { 
target c++14_down } } } */
--- gcc/testsuite/g++.dg/expr/bool1.C.jj2017-09-15 18:11:05.0 
+0200
+++ gcc/testsuite/g++.dg/expr/bool1.C   2017-09-19 14:16:46.139972669 +0200
@@ -1,5 +1,6 @@
-// { dg-do run }
-// PR C++/29295
+// PR c++/29295
+// { dg-do run { target c++14_down } }
+// { dg-do compile { target c++17 } }
 // make sure that a typedef for a bool will have the
 //  the same results as a bool itself.
 
--- gcc/testsuite/g++.dg/expr/bool3.C.jj2017-09-15 18:11:05.0 
+0200
+++ gcc/testsuite/g++.dg/expr/bool3.C   2017-09-19 14:16:40.439044001 +0200
@@ -1,5 +1,6 @@
-// { dg-do run }
-// PR C++/29295
+// PR c++/29295
+// { dg-do run { target c++14_down } }
+// { dg-do compile { target c++17 } }
 // make sure that a typedef for a bool will have the
 //  the same results as a bool itself.
 
--- gcc/testsuite/g++.dg/expr/bitfield5.C.jj2017-09-15 18:11:05.0 
+0200
+++ gcc/testsuite/g++.dg/expr/bitfield5.C   2017-09-19 14:12:40.087051397 
+0200
@@ -1,5 +1,6 @@
 // PR c++/30274
-// { dg-do run }
+// { dg-do run { target c++14_down } }
+// { dg-do compile { target c++17 } }
 
 struct S {
   bool x : 4;
--- gcc/testsuite/g++.old-deja/g++.jason/bool5.C.jj 2017-09-15 
18:11:08.0 +0200
+++ gcc/testsuite/g++.old-deja/g++.jason/bool5.C2017-09-19 
14:17:19.708552643 +0200
@@ -1,4 +1,5 @@
-// { dg-do run  }
+// { dg-do run { target c++14_down } }
+// { dg-do compile { target c++17 } }
 int main ()
 {
   bool b = false;

Jakub


Re: [PATCH] cp: fix location comparison in member_name_cmp

2017-09-19 Thread Alexander Monakov
On Tue, 19 Sep 2017, Nathan Sidwell wrote:
> > > > After recent changes, the member_name_cmp qsort comparator can indicate
> > > > A < B < A (i.e. lacks anti-commutativity) for distinct TYPE_DECL nodes
> > > > that have the same source location.  If their order doesn't matter, the
> > > > comparator should return 0.
> > > 
> > > Where do we get such type decls?
> > 
> > The first instance that tripped qsort checking for me was in PCH generation.
> > I think by adding
> 
> What source example generates such type decls?  That you've encountered this
> suggests one of the invariants I was presuming is not true.

You can use the gcc_assert mentioned in the previous email on GCC
bootstrap/regtest to find examples.  For me, the following example breaks (no
command line flags needed, just bare 'cc1plus t.i'):

struct
{
  int a, b, c, d;
  union
{
  int e;
};
} s;

1436  gcc_assert (DECL_SOURCE_LOCATION (a) != DECL_SOURCE_LOCATION (b));
(gdb) call debug_tree(a)
 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x719f7348
fields 
nonlocal decl_3 SI t.i:6:11 size  
unit-size 
align:32 warn_if_not_align:0 offset_align 128
offset 
bit-offset  context 
 chain > context 

full-name "union::"

chain >
nonlocal decl_4 VOID t.i:5:5
align:1 warn_if_not_align:0 context 
result  unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x719f7348 fields  context 
full-name "union::"

chain >
   >
$1 = 10
(gdb) call debug_tree(b)
 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 
0x719f7348
fields 
nonlocal decl_3 SI t.i:6:11 size  
unit-size 
align:32 warn_if_not_align:0 offset_align 128
offset 
bit-offset  context 
 chain > context 

full-name "union::"

chain >
decl_2 decl_5 VOID t.i:5:5
align:8 warn_if_not_align:0 context >
$2 = 10



Re: Add support to trace comparison instructions and switch statements

2017-09-19 Thread Tamar Christina
Adding the list back in.

From: Tamar Christina
Sent: Tuesday, September 19, 2017 2:11 PM
To: Dmitry Vyukov; Kostya Serebryany
Cc: 吴潍浠(此彼); Jakub Jelinek; Jeff Law; wishwu007; Alexander Potapenko; 
andreyknvl; nd
Subject: Re: Add support to trace comparison instructions and switch statements

Hi Jakub,

The testcase seems to be broken on AArch64 (aarch64-none-elf) and -O3:

PASS: gcc.dg/sancov/cmp0.c   -O0 -g  (test for excess errors)
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_const_cmp1 \\(0, " 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_const_cmp2 \\(0, " 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_const_cmp4 \\(0, " 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_const_cmp8 \\(0, " 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_const_cmp4 \\(27, " 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_const_cmp4 \\(37, " 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_const_cmp4 \\(48, " 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_cmp8 \\(" 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_cmpf \\(" 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_cmpd \\(" 1
FAIL: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_const_cmp" 7
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_cmp" 3
FAIL: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized 
"__builtin___sanitizer_cov_trace_switch \\(" 2

it's fine at O1, O2 and O3 though. Should the test be running for O0?

Thanks,
Tamar

From: gcc-patches-ow...@gcc.gnu.org  on behalf 
of Kostya Serebryany via gcc-patches 
Sent: Tuesday, September 12, 2017 5:35 PM
To: Dmitry Vyukov
Cc: 吴潍浠(此彼); Jakub Jelinek; gcc-patches; Jeff Law; wishwu007; Alexander 
Potapenko; andreyknvl
Subject: Re: Add support to trace comparison instructions and switch statements

On Tue, Sep 12, 2017 at 7:32 AM, Dmitry Vyukov  wrote:
> On Thu, Sep 7, 2017 at 9:02 AM, 吴潍浠(此彼)  wrote:
>> Hi
>> The trace-div and trace-gep options seems be used to evaluate corpus
>> to trigger specific kind of bugs. And they don't have strong effect to 
>> coverage.

These are used for what I call data-flow-driven mutations.
If we see x/y we tried to drive the inputs toward y==1.
Similarly, when we see a[idx], we try to drive towards idx being large
or negative.
When combined with value profiling, these do affect "generalized"
coverage and thus the corpus size.
They don't directly affect the regular edge coverage.

>>
>> The trace-pc-guard is useful, but it may be much more complex than trace-pc.
>> I think the best benefit of trace-pc-guard is avoiding dealing ASLR of 
>> programs,
>> especially programs with dlopen(). Seems hard to implement it in Linux 
>> kernel.

One of the benefits of trace-pc-guard is that you have to deal with
consecutive integers, not PCs (that are indeed affected by ARLR).

>>
>> The inline-8bit-counters and pc-table is too close to implementation of fuzz 
>> tool and
>> option trace-pc-guard .
>>
>> I am interesting in "stack-depth" and "func".

stack-depth is fully independent of all others.

"func" is a modifier that tells to insert callbacks only at the function entry.
It applies to trace-pc, trace-pc-guard, inline-8bit-counters, and pc-table
Other modifiers are bb (basic blocks) and edge (split critical edges,
then apply instrumentation to all BBs)

>> If we trace user-defined functions enter and exit,
>> we can calculate stack-depth and difference level of stack to past existed 
>> stack.
>> Adding __sanitizer_cov_trace_pc_{enter,exit} is easy , but it is not 
>> standard of llvm.

__sanitizer_cov_trace_pc_enter would be equivalent to trace-pc,func

we don't have __sanitizer_cov_trace_pc_exit. It's not very useful for
fuzzing (afaict).
I had one or two requests to implement __sanitizer_cov_trace_pc_exit
but at the end I was able to convince the folks that they don't need
it.

With pc-table and trace-pc[-guard] we already can distinguish func
entry from other blocks.
Can probably add special handling for exit, if someone explains why
this is interesting (in a separate thread, perhaps).

Also, gcc already has -finstrument-functions


>>
>> How do you think Dmitry ?
>>
>> Wish Wu
>>
>> --
>> From:Jakub Jelinek 
>> Time:2017 Sep 6 (Wed) 22:37
>> To:Wish Wu 
>> Cc:Dmitry Vy

Re: Add support to trace comparison instructions and switch statements

2017-09-19 Thread Martin Liška
On 09/19/2017 03:14 PM, Tamar Christina wrote:
> it's fine at O1, O2 and O3 though. Should the test be running for O0?

It's a known issue:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82183

Martin


Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV

2017-09-19 Thread Uros Bizjak
On Tue, Sep 19, 2017 at 2:13 PM, Rainer Orth
 wrote:
> Daniel Santos  writes:
>
>> On 09/17/2017 10:53 AM, Uros Bizjak wrote:
>>> OK.
>>>
>>> Thanks,
>>> Uros.
>>
>> Thanks. I should have posted this Friday when my tests finished, but
>> I'll be committing with one minor change so tests don't run on m32 or mx32:
>>
>> --- a/gcc/testsuite/gcc.target/i386/pr82196-1.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c
>> @@ -1,4 +1,4 @@
>> -/* { dg-do compile } */
>> +/* { dg-do compile { target lp64 } } */
>>  /* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */
>>  /* { dg-final { scan-assembler "call.*__sse_savms64_18" } } */
>>  /* { dg-final { scan-assembler "jmp.*__sse_resms64x_18" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/pr82196-2.c
>> b/gcc/testsuite/gcc.target/i386/pr82196-2.c
>> index 31705bee29b..8fe58411d5e 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr82196-2.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr82196-2.c
>> @@ -1,4 +1,4 @@
>> -/* { dg-do compile } */
>> +/* { dg-do compile { target lp64 } } */
>>  /* { dg-options "-mavx -mcall-ms2sysv-xlogues -O2" } */
>>  /* { dg-final { scan-assembler "call.*__avx_savms64_18" } } */
>>  /* { dg-final { scan-assembler "jmp.*__avx_resms64x_18" } } */
>>
>> Other than that, full regression tests pass.
>
> However, they do FAIL on 64-bit Solaris/x86:
>
> +FAIL: gcc.target/i386/pr82196-1.c (test for excess errors)
>
> Excess errors:
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/pr82196-1.c:14:1: 
> error: bp cannot be used in asm here
>
> +UNRESOLVED: gcc.target/i386/pr82196-1.c scan-assembler call.*__sse_savms64_18
> +UNRESOLVED: gcc.target/i386/pr82196-1.c scan-assembler jmp.*__sse_resms64x_18
> +FAIL: gcc.target/i386/pr82196-2.c (test for excess errors)
> +UNRESOLVED: gcc.target/i386/pr82196-2.c scan-assembler call.*__avx_savms64_18
> +UNRESOLVED: gcc.target/i386/pr82196-2.c scan-assembler jmp.*__avx_resms64x_18

Does the test compile with -fomit-frame-pointer?

Uros.


Re: [PATCH] haifa-sched: fix autopref_rank_for_schedule qsort comparator

2017-09-19 Thread Maxim Kuvyrkov
> On Sep 19, 2017, at 2:21 PM, Alexander Monakov  wrote:
> 
> Hello,
> 
> The autopref_rank_for_schedule qsort sub-comparator is not actually a proper
> comparator.  For instance, it lacks transitivity: if there's insns A, B, C
> such that B has AUTOPREF_MULTUPASS_DATA_IRRELEVANT status, but A and C
> compare such that C < A, we can have A == B == C < A according to this
> sub-comparator, and A < B < C < A if the following tie-breakers order
> A before B and B before C.
> 
> I'm not sure this heuristic fits well into the overall scheduler sorting,
> but if we want to just fix this comparison step, we should never skip
> calls to autopref_rank_data.
> 
> Bootstrapped and regtested on x86-64 with PARAM_SCHED_AUTOPREF_QUEUE_DEPTH
> set to '0' (otherwise this sub-comparator is unused).  OK to apply?

Thanks for investigating this.  Indeed there is a problem, but your patch looks 
wrong.  The "irrelevant" instructions can't be meaningfully compared with 
"relevant" ones, so calling autopref_rank_data on "irrelevant" instruction 
doesn't look right.

How about the following:
1. if both instructions are "irrelevant", then return "0".
2. if one instruction is "relevant" and another is "irrelevant", then 
"relevant" instruction is always greater (or lesser) than the non-relevant. 
3. if both instructions are "relevant", then call autopref_rank_data.

I don't have immediate answer on whether "relevant" or "irrelevant" 
instructions should be pushed towards beginning of the ready list.  Possibly, 
we want to delay "relevant" instructions and push "irrelevant" towards 
beginning of ready list so that more "relevant" instructions can enter ready 
list as their dependencies are resolved from scheduling "irrelevant" 
instructions.

WDYT?

--
Maxim Kuvyrkov
www.linaro.org


> Thanks.
> Alexander
> 
>   * haifa-sched.c (autopref_rank_for_schedule): Always invoke
>   autopref_rank_data.
> 
> diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
> index af0ed27b18f..031a0af96db 100644
> --- a/gcc/haifa-sched.c
> +++ b/gcc/haifa-sched.c
> @@ -5707,7 +5707,8 @@ autopref_rank_data (autopref_multipass_data_t data1,
> static int
> autopref_rank_for_schedule (const rtx_insn *insn1, const rtx_insn *insn2)
> {
> -  for (int write = 0; write < 2; ++write)
> +  int r = 0;
> +  for (int write = 0; write < 2 && !r; ++write)
> {
>   autopref_multipass_data_t data1
>   = &INSN_AUTOPREF_MULTIPASS_DATA (insn1)[write];
> @@ -5716,21 +5717,14 @@ autopref_rank_for_schedule (const rtx_insn *insn1, 
> const rtx_insn *insn2)
> 
>   if (data1->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED)
>   autopref_multipass_init (insn1, write);
> -  if (data1->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
> - continue;
> 
>   if (data2->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED)
>   autopref_multipass_init (insn2, write);
> -  if (data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
> - continue;
> 
> -  if (!rtx_equal_p (data1->base, data2->base))
> - continue;
> -
> -  return autopref_rank_data (data1, data2);
> +  r = autopref_rank_data (data1, data2);
> }
> 
> -  return 0;
> +  return r;
> }
> 
> /* True if header of debug dump was printed.  */



Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV

2017-09-19 Thread Rainer Orth
Uros Bizjak  writes:

>> However, they do FAIL on 64-bit Solaris/x86:
>>
>> +FAIL: gcc.target/i386/pr82196-1.c (test for excess errors)
>>
>> Excess errors:
>> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/pr82196-1.c:14:1:
>> error: bp cannot be used in asm here
>>
>> +UNRESOLVED: gcc.target/i386/pr82196-1.c scan-assembler 
>> call.*__sse_savms64_18
>> +UNRESOLVED: gcc.target/i386/pr82196-1.c scan-assembler 
>> jmp.*__sse_resms64x_18
>> +FAIL: gcc.target/i386/pr82196-2.c (test for excess errors)
>> +UNRESOLVED: gcc.target/i386/pr82196-2.c scan-assembler 
>> call.*__avx_savms64_18
>> +UNRESOLVED: gcc.target/i386/pr82196-2.c scan-assembler 
>> jmp.*__avx_resms64x_18
>
> Does the test compile with -fomit-frame-pointer?

It does indeed, and the patterns are present as expected.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling

2017-09-19 Thread Tsimbalist, Igor V
Here is an updated patch (version #2). The main differences are:

- Change attribute and option names;
- Add additional parameter to gimple_build_call_from_tree by adding a type 
parameter and
  use it 'nocf_check' attribute propagation;
- Reimplement fixes in expand_call_stmt to propagate 'nocf_check' attribute;
- Consider 'nocf_check' attribute in Identical Code Folding (ICF) optimization;
- Add warning for type inconsistency regarding 'nocf_check' attribute;
- Many small fixes;

gcc/c-family/
* c-attribs.c (handle_nocf_check_attribute): New function.
(c_common_attribute_table): Add 'nocf_check' handling.
* c-common.c (check_missing_format_attribute): New function.
* c-common.h: Likewise.

gcc/c/
* c-typeck.c (convert_for_assignment): Add check for nocf_check
attribute.
* gimple-parser.c: Add second argument NULL to
gimple_build_call_from_tree.

gcc/cp/
* typeck.c (convert_for_assignment): Add check for nocf_check
attribute.

gcc/
* cfgexpand.c (expand_call_stmt): Set REG_CALL_NOCF_CHECK for
call insn.
* combine.c (distribute_notes): Add REG_CALL_NOCF_CHECK handling.
* common.opt: Add fcf-protection flag.
* emit-rtl.c (try_split): Add REG_CALL_NOCF_CHECK handling.
* flag-types.h: Add enum cf_protection_level.
* gimple.c (gimple_build_call_from_tree): Add second parameter.
Add 'nocf_check' attribute propagation to gimple call.
* gimple.h (gf_mask): Add GF_CALL_NOCF_CHECK.
(gimple_call_nocf_check_p): New function.
(gimple_call_set_nocf_check): Likewise.
* gimplify.c: Add second argument to gimple_build_call_from_tree.
* ipa-icf.c: Add nocf_check attribute in statement hash.
* recog.c (peep2_attempt): Add REG_CALL_NOCF_CHECK handling.
* reg-notes.def: Add REG_NOTE (CALL_NOCF_CHECK).
* toplev.c (process_options): Add flag_cf_protection handling.

Is it ok for trunk?

Thanks,
Igor


> -Original Message-
> From: Richard Biener [mailto:richard.guent...@gmail.com]
> Sent: Friday, September 15, 2017 2:14 PM
> To: Tsimbalist, Igor V 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> 
> On Fri, Sep 15, 2017 at 1:12 PM, Tsimbalist, Igor V
>  wrote:
> >> -Original Message-
> >> From: Tsimbalist, Igor V
> >> Sent: Tuesday, September 12, 2017 5:35 PM
> >> To: 'Richard Biener' 
> >> Cc: 'gcc-patches@gcc.gnu.org' ; Tsimbalist,
> >> Igor V 
> >> Subject: RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> >>
> >> > -Original Message-
> >> > From: Tsimbalist, Igor V
> >> > Sent: Friday, August 18, 2017 4:43 PM
> >> > To: 'Richard Biener' 
> >> > Cc: gcc-patches@gcc.gnu.org; Tsimbalist, Igor V
> >> > 
> >> > Subject: RE: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> >> >
> >> > > -Original Message-
> >> > > From: Richard Biener [mailto:richard.guent...@gmail.com]
> >> > > Sent: Friday, August 18, 2017 3:53 PM
> >> > > To: Tsimbalist, Igor V 
> >> > > Cc: gcc-patches@gcc.gnu.org
> >> > > Subject: Re: 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> >> > >
> >> > > On Fri, Aug 18, 2017 at 3:11 PM, Tsimbalist, Igor V
> >> > >  wrote:
> >> > > >> -Original Message-
> >> > > >> From: Richard Biener [mailto:richard.guent...@gmail.com]
> >> > > >> Sent: Tuesday, August 15, 2017 3:43 PM
> >> > > >> To: Tsimbalist, Igor V 
> >> > > >> Cc: gcc-patches@gcc.gnu.org
> >> > > >> Subject: Re:
> >> > > >> 0001-Part-1.-Add-generic-part-for-Intel-CET-enabling
> >> > > >>
> >> > > >> On Tue, Aug 1, 2017 at 10:56 AM, Tsimbalist, Igor V
> >> > > >>  wrote:
> >> > > >> > Part#1. Add generic part for Intel CET enabling.
> >> > > >> >
> >> > > >> > The spec is available at
> >> > > >> >
> >> > > >> > https://software.intel.com/sites/default/files/managed/4d/2a
> >> > > >> > /co nt ro l-f low-enforcement-technology-preview.pdf
> >
> > <..skipped..>
> >
> >> > > >> I think 'notrack' is somewhat unspecific of a name, what
> >> > > >> prevented you to use 'nocet'?
> >> > > >
> >> > > > Actually it's specific. The HW will have a prefix with exactly
> >> > > > this name and
> >> > > the same meaning. And I think, what is more important,
> 'track/notrack'
> >> > > gives better semantic for a user. CET is a name bound with Intel
> >> > > specific technology.
> >> > >
> >> > > But 'tracking' something is quite unspecific.  Tracking for what?
> >> > > 'no_verify_cf' (aka do not verify control flow) maybe?
> >> >
> >> > The name just  has to suggest the right semantic. 'no_verify_cf' is
> >> > good, let's use it unless different name appears.
> >> I have renamed all newly introduced function and macro names to use
> >> 'noverify_cf'. But I still keep the attribute name as 'notrack'.
> >> Historically the attribute name follows the public CET specification,
> >> which uses 'no-track prefix' wording. Is it ok to keep such attribute name?
> >
> > 

Re: Add support to trace comparison instructions and switch statements

2017-09-19 Thread Tamar Christina
Ah thanks!

From: Martin Liška 
Sent: Tuesday, September 19, 2017 2:31 PM
To: Tamar Christina; Dmitry Vyukov; Kostya Serebryany
Cc: 吴潍浠(此彼); Jakub Jelinek; Jeff Law; wishwu007; Alexander Potapenko; 
andreyknvl; nd; GCC Patches
Subject: Re: Add support to trace comparison instructions and switch statements

On 09/19/2017 03:14 PM, Tamar Christina wrote:
> it's fine at O1, O2 and O3 though. Should the test be running for O0?

It's a known issue:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82183

Martin


RE: 0002-Part-2.-Document-finstrument-control-flow-and-notrack attribute

2017-09-19 Thread Tsimbalist, Igor V
Here is an updated patch (version #2). Mainly attribute and option  names were 
changed.

gcc/doc/
* extend.texi: Add 'nocf_check' documentation.
* gimple.texi: Add second parameter to gimple_build_call_from_tree.
* invoke.texi: Add -fcf-protection documentation.
* rtl.texi: Add REG_CALL_NOTRACK documenation.

Is it ok for trunk?

Thanks,
Igor


> -Original Message-
> From: Tsimbalist, Igor V
> Sent: Friday, September 15, 2017 5:14 PM
> To: 'Jeff Law' ; 'gcc-patches@gcc.gnu.org'  patc...@gcc.gnu.org>
> Cc: Tsimbalist, Igor V 
> Subject: RE: 0002-Part-2.-Document-finstrument-control-flow-and-notrack
> attribute
> 
> > -Original Message-
> > From: Jeff Law [mailto:l...@redhat.com]
> > Sent: Friday, August 25, 2017 10:59 PM
> > To: Tsimbalist, Igor V ; 'gcc-
> > patc...@gcc.gnu.org' 
> > Subject: Re:
> > 0002-Part-2.-Document-finstrument-control-flow-and-notrack
> > attribute
> >
> > On 08/01/2017 02:56 AM, Tsimbalist, Igor V wrote:
> > > Part#2. Document -finstrument-control-flow and notrack attribute.
> > >
> > >
> > > 0002-Part-2.-Document-finstrument-control-flow-and-notrac.patch
> > >
> > >
> > > From c3e45c80731672e74d638f787e80ba975279b9b9 Mon Sep 17 00:00:00
> > 2001
> > > From: Igor Tsimbalist 
> > > Date: Mon, 3 Jul 2017 17:12:49 +0300
> > > Subject: [PATCH 2/9] Part#2. Document -finstrument-control-flow and
> > > notrack  attribute.
> > >
> > > gcc/
> > >   * doc/extend.texi: Add 'notrack' documentation.
> > >   * doc/invoke.texi: Add -finstrument-control-flow documentation.
> > >   * doc/rtl.texi: Add REG_CALL_NOTRACK documenation.
> > > ---
> > >  gcc/doc/extend.texi | 52
> > > 
> > >  gcc/doc/invoke.texi | 22 ++
> > >  gcc/doc/rtl.texi| 15 +++
> > >  3 files changed, 89 insertions(+)
> > >
> > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index
> > > 6934b4c..80de8a7 100644
> > > --- a/gcc/doc/extend.texi
> > > +++ b/gcc/doc/extend.texi
> > > @@ -5632,6 +5632,58 @@ Specify which floating-point unit to use.
> > > You must specify the  @code{target("fpmath=sse,387")} option as
> > > @code{target("fpmath=sse+387")} because the comma would separate
> > > different options.
> > > +
> > > +@item notrack
> > > +@cindex @code{notrack} function attribute The @code{notrack}
> > > +attribute on a function is used to inform the compiler that the
> > > +function's prolog should not be instrumented when compiled with the
> > > +@option{-finstrument-control-flow} option.  The compiler assumes
> > > +that the function's address is a valid target for a control-flow 
> > > transfer.
> > Is the default to instrument everything when -finstrument-control-flow
> > is enabled?  Or can we avoid instrumentation on a function that never
> > has its address taken (ie, it is only called via a call instruction?)
> The instrumentation is on by default but for all platform except of x86 it 
> does
> nothing as the implementation is not supported. For x86 the implementation
> is lightweight and just increase a bit code size due to 'endbranch' 
> instruction.
> 
> Given a function decl is there an information already available if an address
> was taken from the function? I plan to do what you suggested later as an
> optimization especially for global function where ipa is required.
> 
> > > +
> > > +The @code{notrack} attribute on a type of pointer to function is
> > > +used to inform the compiler that a call through the pointer should
> > > +not be instrumented when compiled with the
> > > +@option{-finstrument-control-flow} option.  The compiler assumes
> > > +that the function's address from the pointer is a valid target for
> > > +a control-flow transfer.  A direct function call through a function
> > > +name is assumed as a save call thus direct calls will not be
> > > +instrumented by the compiler.
> > s/save/safe/
> >
> > FWIW, I think putting the attribute into in the type system is a good
> > thing :-)
> >
> > > +
> > > +The @code{notrack} attribute is applied to an object's type.  A The
> > > +@code{notrack} attribute is transfered to a call instruction at the
> > > +GIMPLE and RTL translation phases.  The attribute is not propagated
> > > +through assignment, store and load.
> > > +
> > > +@smallexample
> > > +@{
> > > +void (*foo)(void) __attribute__(notrack); void (*foo1)(void)
> > > +__attribute__(notrack); void (*foo2)(void);
> > > +
> > > +int
> > > +foo (void) /* The function's address is not tracked.  */
> > > +
> > > +  /* This call site is not tracked for
> > > + control-flow instrumentation.  */  (*foo1)();
> > > +  foo1 = foo2;
> > > +  /* This call site is still not tracked for
> > > + control-flow instrumentation.  */  (*foo1)();
> > > +
> > > +  /* This call site is tracked for
> > > + control-flow instrumentation.  */  (*foo2)();
> > > +  foo2 = foo1;
> > > +  /* This call site is still tracked for
> > > + control-flow instrumentation.  */  (

Re: [PATCH] More early LTO dwarf2out bits

2017-09-19 Thread Jakub Jelinek
On Thu, Oct 13, 2016 at 09:16:07AM +0200, Richard Biener wrote:
> 
> This merges a few more bits guarding stuff with ! early_dwarf, mostly
> to avoid creating locations that involve addresses of decls early
> but also to avoid wasting work for BLOCK_NONLOCALIZED_VARs.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, gdb testsuite
> tested on the same arch (from the gdb 7.12 branch), applied to trunk.
> 
> Richard.
> 
> 2016-10-13  Richard Biener  
> 
>   * dwarf2out.c (tree_add_const_value_attribute): Do not try
>   rtl_for_decl_init during early phase.
>   (gen_variable_die): Do not create locations during early phase.
>   (gen_label_die): Likewise.
>   (decls_for_scope): Do not waste time handling BLOCK_NONLOCALIZED_VARs
>   twice.

As Martin reported, this has very undesirable effect on the size of DWARF
debug info.

The problem is that add_const_value_attribute has lots of smarts to handle
various kinds of constants, which the
  if (CHAR_BIT == 8 && BITS_PER_UNIT == 8
  && initializer_constant_valid_p (init, type))
block doesn't implement, it relies on all the smarts to be done earlier.
One option is to duplicate that all on trees, another option is
to use the rtl_for_decl_init + add_const_value_attribute way
even for early_dwarf for constants, something like:

-  if (! early_dwarf)
+  if (! early_dwarf || CONSTANT_CLASS_P (init))
 {
   rtl = rtl_for_decl_init (init, type);
   if (rtl)
return add_const_value_attribute (die, rtl);
 }
   if (CHAR_BIT == 8 && BITS_PER_UNIT == 8
   && initializer_constant_valid_p (init, type))

For the duplicate on trees, some cases might not be that hard, like
handling of INTEGER_CSTs, just see if they are negative and fit into shwi,
(then add_AT_int), or fit into uhwi (then add_AT_unsigned); maybe that
would be enough for now.

> --- gcc/dwarf2out.c   (revision 241022)
> +++ gcc/dwarf2out.c   (working copy)
> @@ -17958,12 +17958,15 @@ tree_add_const_value_attribute (dw_die_r
>init = t;
>gcc_assert (!DECL_P (init));
>  
> -  rtl = rtl_for_decl_init (init, type);
> -  if (rtl)
> -return add_const_value_attribute (die, rtl);
> +  if (! early_dwarf)
> +{
> +  rtl = rtl_for_decl_init (init, type);
> +  if (rtl)
> + return add_const_value_attribute (die, rtl);
> +}
>/* If the host and target are sane, try harder.  */
> -  else if (CHAR_BIT == 8 && BITS_PER_UNIT == 8
> -&& initializer_constant_valid_p (init, type))
> +  if (CHAR_BIT == 8 && BITS_PER_UNIT == 8
> +  && initializer_constant_valid_p (init, type))
>  {
>HOST_WIDE_INT size = int_size_in_bytes (TREE_TYPE (init));
>if (size > 0 && (int) size == size)

Jakub


RE: 0003-Part-3.-Add-tests-for-finstrument-control-flow-and-notrack attribute

2017-09-19 Thread Tsimbalist, Igor V
Here is an updated patch (version #2). Mainly attribute and option  names were 
changed.
The test for ICF will be introduced in x86 specific tests (patch 0006-Part-6) 
as the implementation
checks if the CF instrumentation is on to adjust a hash based on 'nocf'_check' 
attribute presence.
In generic part CF instrumentation is off as no implementation exist.

The patch for x86 specific tests (patch 0006-Part-6) is being reviewed by Uros.

gcc/testsuite/
* c-c++-common/fcf-protection-1.c: New test.
* c-c++-common/fcf-protection-2.c: Likewise.
* c-c++-common/fcf-protection-3.c: Likewise.
* c-c++-common/fcf-protection-4.c: Likewise.
* c-c++-common/fcf-protection-5.c: Likewise.
* c-c++-common/attr-nocf-check-1.c: Likewise.
* c-c++-common/attr-nocf-check-2.c: Likewise.
* c-c++-common/attr-nocf-check-3.c: Likewise.

Is it ok for trunk?

Thanks,
Igor


> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Friday, August 25, 2017 11:01 PM
> To: Tsimbalist, Igor V ; 'gcc-
> patc...@gcc.gnu.org' 
> Subject: Re: 0003-Part-3.-Add-tests-for-finstrument-control-flow-and-
> notrack attribute
> 
> On 08/01/2017 02:56 AM, Tsimbalist, Igor V wrote:
> > Part#3. Add tests for -finstrument-control-flow and notrack attribute.
> >
> >
> > 0003-Part-3.-Add-tests-for-finstrument-control-flow-and-n.patch
> >
> >
> > From 7869de8a0c0ec55c4e9240c2483fefee97bf34c9 Mon Sep 17 00:00:00
> 2001
> > From: Igor Tsimbalist 
> > Date: Mon, 3 Jul 2017 17:29:08 +0300
> > Subject: [PATCH 3/9] Part#3. Add tests for -finstrument-control-flow
> > and  notrack attribute.
> >
> > gcc/testsuite/
> >
> > * c-c++-common/finstrument-control-flow.c: New test.
> > * c-c++-common/notrack-1.c: Likewise.
> > * c-c++-common/notrack-2.c: Likewise.
> No concerns with the existing tests.
> 
> We should consider an ICF test as I outlined in an earlier message.
> 
> We should also consider tests where we drop/add the notrack attribute as
> ISTM we ought to be getting warnings in those cases.
> 
> Finally, you should consider tests in gcc.target/i386 that verify we generate
> the proper instrumentation for a  few tests.
> 
> jeff



0003-Add-tests-for-fcf-protection-and-nocf_check-attribut.patch
Description: 0003-Add-tests-for-fcf-protection-and-nocf_check-attribut.patch


RE: [PATCH, i386] Enable option -mprefer-avx256 added for Intel AVX512 configuration

2017-09-19 Thread Shalnov, Sergey
Uros
Please, find the new patch attached.
Previously, I thought to change integer part of the " ix86_preferred_simd_mode" 
function in the same way 
as it done with floating point part. I haven't done it to disturb less code by 
the patch.
I completely agree with your proposal, currently the code looks better.

In case of  " ix86_autovectorize_vector_sizes ", I've done changes in the 
proposed way, but I keep 
algorithm the same as it was in mainline. Please note, the function can return 
"0" in original version.
I assume you just propose the "style of changes" in previous message.

If you like to change "ix86_autovectorize_vector_sizes" function 
algorithmically, I would propose to do 
this in separate patch.

Sergey

-Original Message-
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Monday, September 18, 2017 11:44 AM
To: Shalnov, Sergey 
Cc: gcc-patches@gcc.gnu.org; kirill.yuk...@gmail.com; Koval, Julia 
; Senkevich, Andrew 
Subject: Re: [PATCH, i386] Enable option -mprefer-avx256 added for Intel AVX512 
configuration

On Thu, Sep 14, 2017 at 2:10 PM, Shalnov, Sergey  
wrote:
> Hi,
> GCC has the option "mprefer-avx128" to use 128-bit AVX registers instead of 
> 256-bit AVX registers in the auto-vectorizer.
> This patch enables the command line option "mprefer-avx256" that reduces 
> 512-bit registers usage in "march=skylake-avx512" mode.
> This is the initial implementation of the option. Currently, 512-bit 
> registers might appears in some cases. I have a plan to continue fix the 
> cases where 512-bit registers are appear.
> Sergey
>
> 2017-09-14  Sergey Shalnov  sergey.shal...@intel.com
> * config/i386/i386.opt (mprefer-avx256): New flag.
> * config/i386/i386.c (ix86_preferred_simd_mode): Prefer 256-bit AVX modes 
> when the flag -mprefer-avx256 is on.

Please rewrite integer mode handling in ix86_preferred_simd_mode to some 
consistent form, like:

case E_QImode:
  if (TARGET_AVX512BW && !TARGET_PREFER_AVX256)
return V64QImode;
  else if (TARGET_AVX && !TARGET_PREFER_AVX128)
return V32QImode;
  else
return V16QImode;

...

and ix86_autovectorize_vector_sizes to some more readable form, like:

static unsigned int
ix86_autovectorize_vector_sizes (void)
{
  unsigned int bytesizes = 16;

  if (TARGET_AVX && !TARGET_PREFER_AVX128)
bytesizes |= 32;
  if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
bytesizes |= 64;

  return bytesizes;
}

Uros.


0001-Option-mprefer-avx256-added-for-Intel-AVX512-configu.patch
Description: 0001-Option-mprefer-avx256-added-for-Intel-AVX512-configu.patch


Re: [PATCH] More early LTO dwarf2out bits

2017-09-19 Thread Richard Biener
On Tue, 19 Sep 2017, Jakub Jelinek wrote:

> On Thu, Oct 13, 2016 at 09:16:07AM +0200, Richard Biener wrote:
> > 
> > This merges a few more bits guarding stuff with ! early_dwarf, mostly
> > to avoid creating locations that involve addresses of decls early
> > but also to avoid wasting work for BLOCK_NONLOCALIZED_VARs.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, gdb testsuite
> > tested on the same arch (from the gdb 7.12 branch), applied to trunk.
> > 
> > Richard.
> > 
> > 2016-10-13  Richard Biener  
> > 
> > * dwarf2out.c (tree_add_const_value_attribute): Do not try
> > rtl_for_decl_init during early phase.
> > (gen_variable_die): Do not create locations during early phase.
> > (gen_label_die): Likewise.
> > (decls_for_scope): Do not waste time handling BLOCK_NONLOCALIZED_VARs
> > twice.
> 
> As Martin reported, this has very undesirable effect on the size of DWARF
> debug info.
> 
> The problem is that add_const_value_attribute has lots of smarts to handle
> various kinds of constants, which the
>   if (CHAR_BIT == 8 && BITS_PER_UNIT == 8
>   && initializer_constant_valid_p (init, type))
> block doesn't implement, it relies on all the smarts to be done earlier.
> One option is to duplicate that all on trees, another option is
> to use the rtl_for_decl_init + add_const_value_attribute way
> even for early_dwarf for constants, something like:
> 
> -  if (! early_dwarf)
> +  if (! early_dwarf || CONSTANT_CLASS_P (init))
>  {
>rtl = rtl_for_decl_init (init, type);
>if (rtl)
>   return add_const_value_attribute (die, rtl);
>  }
>if (CHAR_BIT == 8 && BITS_PER_UNIT == 8
>&& initializer_constant_valid_p (init, type))

If we are sure that rtl_for_decl_init will never create symbol references
for CONSTANT_CLASS_P (init) that's fine, but yes, in the end it would be
nice to not have overlapping rtl/tree handling and have all constants
handled in the tree path...

Richard.

> For the duplicate on trees, some cases might not be that hard, like
> handling of INTEGER_CSTs, just see if they are negative and fit into shwi,
> (then add_AT_int), or fit into uhwi (then add_AT_unsigned); maybe that
> would be enough for now.
> 
> > --- gcc/dwarf2out.c (revision 241022)
> > +++ gcc/dwarf2out.c (working copy)
> > @@ -17958,12 +17958,15 @@ tree_add_const_value_attribute (dw_die_r
> >init = t;
> >gcc_assert (!DECL_P (init));
> >  
> > -  rtl = rtl_for_decl_init (init, type);
> > -  if (rtl)
> > -return add_const_value_attribute (die, rtl);
> > +  if (! early_dwarf)
> > +{
> > +  rtl = rtl_for_decl_init (init, type);
> > +  if (rtl)
> > +   return add_const_value_attribute (die, rtl);
> > +}
> >/* If the host and target are sane, try harder.  */
> > -  else if (CHAR_BIT == 8 && BITS_PER_UNIT == 8
> > -  && initializer_constant_valid_p (init, type))
> > +  if (CHAR_BIT == 8 && BITS_PER_UNIT == 8
> > +  && initializer_constant_valid_p (init, type))
> >  {
> >HOST_WIDE_INT size = int_size_in_bytes (TREE_TYPE (init));
> >if (size > 0 && (int) size == size)
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] More early LTO dwarf2out bits

2017-09-19 Thread Jakub Jelinek
On Tue, Sep 19, 2017 at 04:09:20PM +0200, Richard Biener wrote:
> > The problem is that add_const_value_attribute has lots of smarts to handle
> > various kinds of constants, which the
> >   if (CHAR_BIT == 8 && BITS_PER_UNIT == 8
> >   && initializer_constant_valid_p (init, type))
> > block doesn't implement, it relies on all the smarts to be done earlier.
> > One option is to duplicate that all on trees, another option is
> > to use the rtl_for_decl_init + add_const_value_attribute way
> > even for early_dwarf for constants, something like:
> > 
> > -  if (! early_dwarf)
> > +  if (! early_dwarf || CONSTANT_CLASS_P (init))
> >  {
> >rtl = rtl_for_decl_init (init, type);
> >if (rtl)
> > return add_const_value_attribute (die, rtl);
> >  }
> >if (CHAR_BIT == 8 && BITS_PER_UNIT == 8
> >&& initializer_constant_valid_p (init, type))
> 
> If we are sure that rtl_for_decl_init will never create symbol references
> for CONSTANT_CLASS_P (init) that's fine, but yes, in the end it would be
> nice to not have overlapping rtl/tree handling and have all constants
> handled in the tree path...

Will try now following plus testcase, the rest of constants I believe end up
being DW_FORM_block encoded and so is pretty much what we emit even for the
initializer_constant_valid_p tree fallback case.

--- gcc/dwarf2out.c.jj  2017-09-15 18:11:03.0 +0200
+++ gcc/dwarf2out.c 2017-09-19 16:03:27.678337475 +0200
@@ -19440,6 +19440,19 @@ tree_add_const_value_attribute (dw_die_r
   init = t;
   gcc_assert (!DECL_P (init));
 
+  if (TREE_CODE (init) == INTEGER_CST)
+{
+  if (tree_fits_uhwi_p (init))
+   {
+ add_AT_unsigned (die, DW_AT_const_value, tree_to_uhwi (init));
+  return true;
+   }
+  if (tree_fits_shwi_p (init))
+   {
+ add_AT_int (die, DW_AT_const_value, tree_to_shwi (init));
+ return true;
+   }
+}
   if (! early_dwarf)
 {
   rtl = rtl_for_decl_init (init, type);


Jakub


[PATCH] Fix PR81373

2017-09-19 Thread Richard Biener

The following forces scev analyzable SESE liveouts to be handed as
defs.  Otherwise we end up forgetting to code-generate them.  We
naturally expect SCEV-cprop to handle them but that pass has cost
cutoffs (and one can disable it).

Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok?

Thanks,
Richard.

2017-09-19  Richard Biener  

PR tree-optimization/81373
* graphite-scop-detection.c (build_cross_bb_scalars_def):
Force SESE live-out defs to be handled even if they are
scev_analyzable_p.

* gcc.dg/graphite/pr81373.c: New testcase.

Index: gcc/graphite-scop-detection.c
===
--- gcc/graphite-scop-detection.c   (revision 252968)
+++ gcc/graphite-scop-detection.c   (working copy)
@@ -1717,17 +1717,20 @@ build_cross_bb_scalars_def (scop_p scop,
   if (!def || !is_gimple_reg (def))
 return;
 
-  /* Do not gather scalar variables that can be analyzed by SCEV as they can be
- generated out of the induction variables.  */
-  if (scev_analyzable_p (def, scop->scop_info->region))
-return;
+  bool scev_analyzable = scev_analyzable_p (def, scop->scop_info->region);
 
   gimple *use_stmt;
   imm_use_iterator imm_iter;
   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, def)
-if ((def_bb != gimple_bb (use_stmt) && !is_gimple_debug (use_stmt))
-   /* PHIs have their effect at "BBs" on the edges.  See PR79622.  */
-   || gimple_code (SSA_NAME_DEF_STMT (def)) == GIMPLE_PHI)
+/* Do not gather scalar variables that can be analyzed by SCEV as they can
+   be generated out of the induction variables.  */
+if ((! scev_analyzable
+/* But gather SESE liveouts as we otherwise fail to rewrite their
+   exit PHIs.  */
+|| ! bb_in_sese_p (gimple_bb (use_stmt), scop->scop_info->region))
+   && ((def_bb != gimple_bb (use_stmt) && !is_gimple_debug (use_stmt))
+   /* PHIs have their effect at "BBs" on the edges.  See PR79622.  */
+   || gimple_code (SSA_NAME_DEF_STMT (def)) == GIMPLE_PHI))
   {
writes->safe_push (def);
DEBUG_PRINT (dp << "Adding scalar write: ";
Index: gcc/testsuite/gcc.dg/graphite/pr81373.c
===
--- gcc/testsuite/gcc.dg/graphite/pr81373.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr81373.c (working copy)
@@ -0,0 +1,40 @@
+/* { dg-options "-fno-tree-scev-cprop -fgraphite-identity -O 
-fdump-tree-graphite-all" } */
+
+void bar (void);
+
+int toto()
+{
+  int i, j, k;
+  int a[101][100];
+  int b[100];
+
+  for (i = 1; i < 100; i++)
+{
+  for (j = 1; j < 100; j++)
+   for (k = 1; k < 100; k++)
+ a[j][k] = a[j+1][i-1] + 2;
+
+  b[i] = b[i-1] + 2;
+
+  bar ();
+
+  for (j = 1; j < 100; j++)
+   a[j][i] = a[j+1][i-1] + 2;
+
+  b[i] = b[i-1] + 2;
+
+  bar ();
+
+  for (j = 1; j < 100; j++)
+   a[j][i] = a[j+1][i-1] + 2;
+
+  b[i] = a[i-1][i] + 2;
+
+  for (j = 1; j < 100; j++)
+   a[j][i] = a[j+1][i-1] + 2;
+}
+
+  return a[3][5] + b[1];
+}
+
+/* { dg-final { scan-tree-dump-times "number of SCoPs: 2" 1 "graphite"} } */


Re: [PATCH] haifa-sched: fix autopref_rank_for_schedule qsort comparator

2017-09-19 Thread Alexander Monakov
On Tue, 19 Sep 2017, Maxim Kuvyrkov wrote:
> How about the following:
> 1. if both instructions are "irrelevant", then return "0".
> 2. if one instruction is "relevant" and another is "irrelevant", then
> "relevant" instruction is always greater (or lesser) than the non-relevant. 
> 3. if both instructions are "relevant", then call autopref_rank_data.

Sounds good, the following patch implements this (with 'relevant' greater).

> I don't have immediate answer on whether "relevant" or "irrelevant"
> instructions should be pushed towards beginning of the ready list.  Possibly,
> we want to delay "relevant" instructions and push "irrelevant" towards
> beginning of ready list so that more "relevant" instructions can enter ready
> list as their dependencies are resolved from scheduling "irrelevant"
> instructions.
> 
> WDYT?

*nod*, this makes sense.

(bootstrap/regtest running on x86-64)

Thanks!

* haifa-sched.c (autopref_rank_for_schedule): Order 'irrelevant' insns
first, always call autopref_rank_data otherwise.

diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index af0ed27b18f..8f006de2319 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -5707,7 +5707,8 @@ autopref_rank_data (autopref_multipass_data_t data1,
 static int
 autopref_rank_for_schedule (const rtx_insn *insn1, const rtx_insn *insn2)
 {
-  for (int write = 0; write < 2; ++write)
+  int r = 0;
+  for (int write = 0; write < 2 && !r; ++write)
 {
   autopref_multipass_data_t data1
= &INSN_AUTOPREF_MULTIPASS_DATA (insn1)[write];
@@ -5716,21 +5717,20 @@ autopref_rank_for_schedule (const rtx_insn *insn1, 
const rtx_insn *insn2)
 
   if (data1->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED)
autopref_multipass_init (insn1, write);
-  if (data1->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
-   continue;
 
   if (data2->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED)
autopref_multipass_init (insn2, write);
-  if (data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
-   continue;
 
-  if (!rtx_equal_p (data1->base, data2->base))
-   continue;
+  int irrel1 = data1->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT;
+  int irrel2 = data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT;
 
-  return autopref_rank_data (data1, data2);
+  if (!irrel1 && !irrel2)
+   r = autopref_rank_data (data1, data2);
+  else
+   r = irrel2 - irrel1;
 }
 
-  return 0;
+  return r;
 }
 
 /* True if header of debug dump was printed.  */


[PATCH] PR libstdc++/82254 fix std::is_nothrow_invocable_r w.r.t throwing conversions

2017-09-19 Thread Jonathan Wakely

I forgot to make is_nothrow_invocable_r check whether the conversion
to the result type can throw. Fixed like so.

Tested powerpc64le-linux, committed to trunk. gcc-7 backport to
follow.


commit 067cdd4fba31f82316f71954335b622e3b6da58a
Author: Jonathan Wakely 
Date:   Tue Sep 19 14:59:33 2017 +0100

PR libstdc++/82254 fix std::is_nothrow_invocable_r w.r.t throwing 
conversions

PR libstdc++/82254
* include/std/type_traits (__is_invocable): Add partial 
specialization
for INVOKE case and remove is_void check from partial
specialization for INVOKE case.
(__is_nt_invocable_impl): New helper for is_nothrow_invocable_r.
(is_nothrow_invocable_r): Use __is_nt_invocable_impl.
* testsuite/20_util/is_nothrow_invocable/value.cc: Add tests for
conversions that can throw or fail to convert. Use static assert
strings to explain negative results.
* testsuite/20_util/is_nothrow_invocable/value_ext.cc: Use
is_nothrow_constructible in is_nt_invocable_conv.

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 15b0d92bcb6..036f7667bd8 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2592,7 +2592,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 struct __is_invocable_impl<_Result, _Ret, __void_t>
-: __or_, is_convertible>::type
+: is_convertible::type
+{ };
+
+  template
+struct __is_invocable_impl<_Result, void, __void_t>
+: true_type
 { };
 
   template
@@ -2691,10 +2696,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  __call_is_nothrow_<_Fn, _ArgTypes...>>::type
 { };
 
+  template
+struct __is_nt_invocable_impl : false_type { };
+
+  template
+struct __is_nt_invocable_impl<_Result, _Ret,
+ __void_t>
+: __and_,
+is_nothrow_constructible<_Ret, typename _Result::type>>
+{ };
+
+  template
+struct __is_nt_invocable_impl<_Result, void,
+ __void_t>
+: true_type
+{ };
+
   /// std::is_nothrow_invocable_r
   template
 struct is_nothrow_invocable_r
-: __and_<__is_invocable_impl<__invoke_result<_Fn, _ArgTypes...>, _Ret>,
+: __and_<__is_nt_invocable_impl<__invoke_result<_Fn, _ArgTypes...>, _Ret>,
  __call_is_nothrow_<_Fn, _ArgTypes...>>::type
 { };
 
diff --git a/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/value.cc 
b/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/value.cc
index 4ccb459e0f5..dfa76aae61c 100644
--- a/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/value.cc
+++ b/libstdc++-v3/testsuite/20_util/is_nothrow_invocable/value.cc
@@ -40,6 +40,10 @@ template
 
 void test01()
 {
+  struct T { T(int) { } };
+  struct NT { NT(int) noexcept { } };
+  struct Ex { explicit Ex(int) noexcept { } };
+
   using func_type = void(*)();
   static_assert( ! is_nt_invocable< func_type>(), "");
 
@@ -55,28 +59,46 @@ void test01()
   static_assert( ! is_nt_invocable< mem_type, int >(),   "");
   static_assert( ! is_nt_invocable< mem_type, int& >(),"");
 
-  static_assert(   is_nt_invocable< mem_type, X& >(),   "");
-  static_assert(   is_nt_invocable_r< int,  mem_type, X& >(), "");
-  static_assert(   is_nt_invocable_r< int&, mem_type, X& >(), "");
-  static_assert(   is_nt_invocable_r< long, mem_type, X& >(), "");
-  static_assert(   is_nt_invocable_r< int&, mem_type, X* >(), "");
+  static_assert(   is_nt_invocable< mem_type, X& >(),  "");
+  static_assert(   is_nt_invocable_r< int,   mem_type, X& >(), "");
+  static_assert(   is_nt_invocable_r< int&,  mem_type, X& >(), "");
+  static_assert(   is_nt_invocable_r< long,  mem_type, X& >(), "");
+  static_assert( ! is_nt_invocable_r< long&, mem_type, X& >(),
+  "conversion fails, cannot bind long& to int");
+  static_assert(   is_nt_invocable_r< int&,  mem_type, X* >(), "");
+
+  static_assert( ! is_nt_invocable_r< T,  mem_type, X& >(),
+  "conversion throws");
+  static_assert(   is_nt_invocable_r< NT, mem_type, X& >(), "");
+  static_assert( ! is_nt_invocable_r< Ex, mem_type, X& >(),
+  "conversion fails, would use explicit constructor");
 
   using memfun_type = int (X::*)();
 
-  static_assert( ! is_nt_invocable< memfun_type >(), "");
-  static_assert( ! is_nt_invocable< memfun_type, int >(),  "");
-  static_assert( ! is_nt_invocable< memfun_type, int& >(), "");
-  static_assert( ! is_nt_invocable< memfun_type, X& >(),   "");
-  static_assert( ! is_nt_invocable< memfun_type, X* >(),   "");
+  static_assert( ! is_nt_invocable< memfun_type >(),   "no object");
+  static_assert( ! is_nt_invocable< memfun_type, int >(),  "no object");
+  static_assert( ! is_nt_invocable< memfun_type, int& >(), "no object");
+  static_assert( ! is_nt_invocable< memfun_type, X& >(),   "call throws");
+  static_assert( !

Re: [libstdc++/71500] make back reference work with icase

2017-09-19 Thread Jonathan Wakely

On 18/09/17 16:54 -0700, Tim Shen wrote:

On Mon, Sep 18, 2017 at 4:01 PM, Jonathan Wakely  wrote:

On 18/09/17 10:58 -0700, Tim Shen via libstdc++ wrote:


On Mon, Sep 18, 2017 at 10:26 AM, Jonathan Wakely 
wrote:


We need to rewrite this to check the lengths are equal first, and then
call the 3-argument version of std::equal.

Alternatively, we could move the implementation of the C++14
std::equal overloads to __equal and make that available for C++11.
I'll try that.




Here's a proof of concept patch for that. It's a bit ugly.



Instead of having iterator tags in the interface, we can probe the
random-access-ness inside __equal4/__equal4_p, can't we? It's similar
to the existing "if (_RAIters()) { ... }".

I'd expect the patches to be renaming the current implementations and
adding wrappers, instead of adding new implementations.



Well I decided to split the existing functions up and use tag
dispatching, which is conceptually cleaner anyway. But as the
RandomAccessIterator version doesn't need any operations that aren't
valid for other categories, it's not strictly necessary. The tag
dispatching version should generate slightly smaller code for
unoptimized builds, but that's not very important.


Unoptimized builds don't inline small functions, therefore the first
patch generate two weak symbols, instead of one by the second patch.


Two small functions that only do the necessary work, rather than one
large function that has a branch for RAIters even when it can never be
taken.


It's unclear to me how would number of symbols penalize the
performance/binary size.


People who care about performance or binary size should be optimizing,
and in that case the RAIters branch will be known at compile-time and
the dead code should get removed, and the wrapper functions inlined.


Here's the patch doing it as you suggest. We can't call the new
functions __equal because t hat name is already taken by a helper
struct, hence __equal4.

Do you prefer this version?


Yes, I prefer this version for readability reasons:
1) subjectively, less scattered code; and
2) ideally I want `if constexpr (...)`), the if version is closer.


Yes, we could add _GLIBCXX17_CONSTEXPR there, but I'm not sure it's
worth doing.

3) The calls to __equal4 in _Backref_matcher are simpler.


I agree that it's not a big difference. I just wanted to point out the
small difference. I'm fine with either version.


I'll commit the second version.


Thanks for the prototyping!


--
Regards,
Tim Shen


Re: [libstdc++/71500] make back reference work with icase

2017-09-19 Thread Jonathan Wakely
The failures that need to be fixed can be seen at 
https://gcc.gnu.org/ml/gcc-testresults/2017-09/msg01633.html





RE: 0004-Part-4.-Update-x86-backend-to-enable-Intel-CET

2017-09-19 Thread Tsimbalist, Igor V
Uros, thank you for the approval. Based on the approval of the first 3 patches 
(I've submitted them today), I need to adjust option and attribute names. I 
will resubmit the patch when I fix option and attribute names.

Thanks,
Igor


> -Original Message-
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Monday, September 18, 2017 11:58 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Tsimbalist, Igor V ; Tsimbalist, Igor V
> 
> Subject: Re: 0004-Part-4.-Update-x86-backend-to-enable-Intel-CET
> 
> Hello!
> 
> > gcc/
> >
> > * common/config/i386/i386-common.c (OPTION_MASK_ISA_IBT_SET):
> New.
> > (OPTION_MASK_ISA_SHSTK_SET): Likewise.
> > (OPTION_MASK_ISA_IBT_UNSET): Likewise.
> > (OPTION_MASK_ISA_SHSTK_UNSET): Likewise.
> > (ix86_handle_option): Add -mibt, -mshstk, -mcet handling.
> > * config.gcc (extra_headers): Add cetintrin.h for x86 targets.
> > (extra_objs): Add cet.o for Linux/x86 targets.
> > (tmake_file): Add i386/t-cet for Linux/x86 targets.
> > * config/i386/cet.c: New file.
> > * config/i386/cetintrin.h: Likewise.
> > * config/i386/t-cet: Likewise.
> > * config/i386/cpuid.h (bit_SHSTK): New.
> > (bit_IBT): Likewise.
> > * config/i386/driver-i386.c (host_detect_local_cpu): Detect and pass
> > IBT and SHSTK bits.
> > * config/i386/i386-builtin-types.def
> > (VOID_FTYPE_UNSIGNED_PVOID): New.
> > (VOID_FTYPE_UINT64_PVOID): Likewise.
> > * config/i386/i386-builtin.def: Add CET intrinsics.
> > * config/i386/i386-c.c (ix86_target_macros_internal): Add
> > OPTION_MASK_ISA_IBT, OPTION_MASK_ISA_SHSTK handling.
> > * config/i386/i386-passes.def: Add pass_insert_endbranch pass.
> > * config/i386/i386-protos.h (make_pass_insert_endbranch): New
> > prototype.
> > * config/i386/i386.c (rest_of_insert_endbranch): New.
> > (pass_data_insert_endbranch): Likewise.
> > (pass_insert_endbranch): Likewise.
> > (make_pass_insert_endbranch): Likewise.
> > (ix86_notrack_prefixed_insn_p): Likewise.
> > (ix86_target_string): Add -mibt, -mshstk flags.
> > (ix86_option_override_internal): Add flag_instrument_control_flow
> > processing.
> > (ix86_valid_target_attribute_inner_p): Set OPT_mibt, OPT_mshstk.
> > (ix86_print_operand): Add 'notrack' prefix output.
> > (ix86_init_mmx_sse_builtins): Add CET intrinsics.
> > (ix86_expand_builtin): Expand CET intrinsics.
> > (x86_output_mi_thunk): Add 'endbranch' instruction.
> > * config/i386/i386.h (TARGET_IBT): New.
> > (TARGET_IBT_P): Likewise.
> > (TARGET_SHSTK): Likewise.
> > (TARGET_SHSTK_P): Likewise.
> > * config/i386/i386.md (unspecv): Add UNSPECV_NOP_RDSSP,
> > UNSPECV_INCSSP, UNSPECV_SAVEPREVSSP, UNSPECV_RSTORSSP,
> UNSPECV_WRSS,
> > UNSPECV_WRUSS, UNSPECV_SETSSBSY, UNSPECV_CLRSSBSY.
> > (builtin_setjmp_setup): New pattern.
> > (builtin_longjmp): Likewise.
> > (rdssp): Likewise.
> > (incssp): Likewise.
> > (saveprevssp): Likewise.
> > (rstorssp): Likewise.
> > (wrss): Likewise.
> > (wruss): Likewise.
> > (setssbsy): Likewise.
> > (clrssbsy): Likewise.
> > (nop_endbr): Likewise.
> > * config/i386/i386.opt: Add -mcet, -mibt, -mshstk and -mcet-switch
> > options.
> > * config/i386/immintrin.h: Include .
> > * config/i386/linux-common.h
> > (file_end_indicate_exec_stack_and_cet): New prototype.
> > (TARGET_ASM_FILE_END): New.
> 
> LGTM.
> 
> OK for mainline.
> 
> Thanks,
> Uros.


Re: [PATCH][RFA/RFC] Stack clash mitigation patch 02/08 - V3

2017-09-19 Thread Jeff Law
On 09/19/2017 03:08 AM, Andreas Schwab wrote:
> On Sep 18 2017, Jeff Law  wrote:
> 
>> On 09/18/2017 10:09 AM, Andreas Schwab wrote:
>>> On Sep 18 2017, Jeff Law  wrote:
>>>
 Can you confirm if the probe was in the red zone vs the live areas on
 the stack?
>>>
>>> It overwrites a nearby variable.  sp + 8 happens to be the address of
>>> file_entries_new_size.
>>>
>>>0x000140e8 <+1172>:  mov r6, sp
>>>0x000140ec <+1176>:  add r3, r3, #7
>>>0x000140f0 <+1180>:  bic r3, r3, #7
>>>0x000140f4 <+1184>:  cmp r3, #4096   ; 0x1000
>>>0x000140f8 <+1188>:  bcc 0x14110 
>>>0x000140fc <+1192>:  sub r3, r3, #4096   ; 0x1000
>>>0x00014100 <+1196>:  sub sp, sp, #4096   ; 0x1000
>>>0x00014104 <+1200>:  cmp r3, #4096   ; 0x1000
>>>0x00014108 <+1204>:  str r0, [sp, #8]
>>>0x0001410c <+1208>:  bcs 0x140fc 
>>>0x00014110 <+1212>:  ldr r7, [r11, #-56] ; 0xffc8
>>>0x00014114 <+1216>:  sub sp, sp, r3
>>>0x00014118 <+1220>:  mov r1, #0
>>>0x0001411c <+1224>:  add r3, sp, #8
>>>0x00014120 <+1228>:  mov r0, r3
>>> => 0x00014124 <+1232>:  str r0, [sp, #8]
>>>
>>> Andreas.
>>>
>> Or better yet, include your .i and .s files in their entirety and the
>> gcc -v output
> 
> This was generated with a very simple and incomplete backport of
> -fstack-clash-protection to gcc7.  What I get with your full patches
> with gcc8 looks correct.
Ah.  FWIW, it looks almost as if you've got a sign wrong when probing
the residual alloca allocation.  THe "8" is also strange as I'd expect
it to be "4" since words are 4 bytes on ARM.

I know some early bits had hardcoded "8" in some places, but I didn't
think those had ever been posted -- I may have given them to Matz for
him to play with.  Are you using those perhaps?

I expect to have a backport to gcc-7 now that things are unblocking on
the trunk.  You'd be welcome to work from those.  It might save you some
headaches.

jeff


Re: [PATCH] haifa-sched: fix autopref_rank_for_schedule qsort comparator

2017-09-19 Thread Maxim Kuvyrkov
> On Sep 19, 2017, at 5:25 PM, Alexander Monakov  wrote:
> 
> On Tue, 19 Sep 2017, Maxim Kuvyrkov wrote:
>> How about the following:
>> 1. if both instructions are "irrelevant", then return "0".
>> 2. if one instruction is "relevant" and another is "irrelevant", then
>> "relevant" instruction is always greater (or lesser) than the non-relevant. 
>> 3. if both instructions are "relevant", then call autopref_rank_data.
> 
> Sounds good, the following patch implements this (with 'relevant' greater).
> 
>> I don't have immediate answer on whether "relevant" or "irrelevant"
>> instructions should be pushed towards beginning of the ready list.  Possibly,
>> we want to delay "relevant" instructions and push "irrelevant" towards
>> beginning of ready list so that more "relevant" instructions can enter ready
>> list as their dependencies are resolved from scheduling "irrelevant"
>> instructions.
>> 
>> WDYT?
> 
> *nod*, this makes sense.
> 
> (bootstrap/regtest running on x86-64)
> 
> Thanks!
> 
>   * haifa-sched.c (autopref_rank_for_schedule): Order 'irrelevant' insns
>   first, always call autopref_rank_data otherwise.
> 
> diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
> index af0ed27b18f..8f006de2319 100644
> --- a/gcc/haifa-sched.c
> +++ b/gcc/haifa-sched.c
> @@ -5707,7 +5707,8 @@ autopref_rank_data (autopref_multipass_data_t data1,
> static int
> autopref_rank_for_schedule (const rtx_insn *insn1, const rtx_insn *insn2)
> {
> -  for (int write = 0; write < 2; ++write)
> +  int r = 0;
> +  for (int write = 0; write < 2 && !r; ++write)
> {
>   autopref_multipass_data_t data1
>   = &INSN_AUTOPREF_MULTIPASS_DATA (insn1)[write];
> @@ -5716,21 +5717,20 @@ autopref_rank_for_schedule (const rtx_insn *insn1, 
> const rtx_insn *insn2)
> 
>   if (data1->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED)
>   autopref_multipass_init (insn1, write);
> -  if (data1->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
> - continue;
> 
>   if (data2->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED)
>   autopref_multipass_init (insn2, write);
> -  if (data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
> - continue;
> 
> -  if (!rtx_equal_p (data1->base, data2->base))
> - continue;
> +  int irrel1 = data1->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT;
> +  int irrel2 = data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT;
> 
> -  return autopref_rank_data (data1, data2);
> +  if (!irrel1 && !irrel2)
> + r = autopref_rank_data (data1, data2);
> +  else
> + r = irrel2 - irrel1;

I'd like to keep read/write processing balanced.  In the above "read" analysis 
has greater weight than "write" analysis.  Also, autopref_rank_data() should 
not be called if !rtx_equal_p (data1->base, data2->base).

How about the attached patch?

--
Maxim Kuvyrkov
www.linaro.org



autopref_rank-fix.patch
Description: Binary data



Re: [PATCH] detect incompatible aliases (PR c/81854)

2017-09-19 Thread Martin Sebor

On 09/18/2017 03:44 PM, Joseph Myers wrote:

On Mon, 18 Sep 2017, Martin Sebor wrote:


It's meant as an escape hatch.  It allows declaring compatibility
symbols, for example by the libstdc++ _GLIBCXX_3_4_SYMVER macro
defined in libstdc++-v3/src/c++98/compatibility.cc.  The macro is
used to declare compatibility functions of all sorts of incompatible
types.  The originally posted patch had libstdc++ disable the warning
for the file with the symbols but Jonathan preferred this solution.

It could perhaps be tightened up to detect some of the cases on your
list but I'm not sure it's worth the effort and added complexity.
Let me know if you feel differently (or have a different suggestion),
otherwise I will go ahead and commit the patch as is.


Please add a comment explaining this reasoning and commit the patch.


Done in r252976.

Thanks
Martin


RE: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation

2017-09-19 Thread Tsimbalist, Igor V
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> Sent: Monday, September 18, 2017 12:17 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Tsimbalist, Igor V ; Tsimbalist, Igor V
> 
> Subject: Re: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
> 
> Hello!
> 
> > gcc/testsuite/
> >
> > * g++.dg/cet-notrack-1.C: New test.
> > * gcc.target/i386/cet-intrin-1.c: Likewise.
> > * gcc.target/i386/cet-intrin-10.c: Likewise.
> > * gcc.target/i386/cet-intrin-2.c: Likewise.
> > * gcc.target/i386/cet-intrin-3.c: Likewise.
> > * gcc.target/i386/cet-intrin-4.c: Likewise.
> > * gcc.target/i386/cet-intrin-5.c: Likewise.
> > * gcc.target/i386/cet-intrin-6.c: Likewise.
> > * gcc.target/i386/cet-intrin-7.c: Likewise.
> > * gcc.target/i386/cet-intrin-8.c: Likewise.
> > * gcc.target/i386/cet-intrin-9.c: Likewise.
> > * gcc.target/i386/cet-label.c: Likewise.
> > * gcc.target/i386/cet-notrack-1a.c: Likewise.
> > * gcc.target/i386/cet-notrack-1b.c: Likewise.
> > * gcc.target/i386/cet-notrack-2a.c: Likewise.
> > * gcc.target/i386/cet-notrack-2b.c: Likewise.
> > * gcc.target/i386/cet-notrack-3.c: Likewise.
> > * gcc.target/i386/cet-notrack-4a.c: Likewise.
> > * gcc.target/i386/cet-notrack-4b.c: Likewise.
> > * gcc.target/i386/cet-notrack-5a.c: Likewise.
> > * gcc.target/i386/cet-notrack-5b.c: Likewise.
> > * gcc.target/i386/cet-notrack-6a.c: Likewise.
> > * gcc.target/i386/cet-notrack-6b.c: Likewise.
> > * gcc.target/i386/cet-notrack-7.c: Likewise.
> > * gcc.target/i386/cet-property-1.c: Likewise.
> > * gcc.target/i386/cet-property-2.c: Likewise.
> > * gcc.target/i386/cet-rdssp-1.c: Likewise.
> > * gcc.target/i386/cet-sjlj-1.c: Likewise.
> > * gcc.target/i386/cet-sjlj-2.c: Likewise.
> > * gcc.target/i386/cet-sjlj-3.c: Likewise.
> > * gcc.target/i386/cet-switch-1.c: Likewise.
> > * gcc.target/i386/cet-switch-2.c: Likewise.
> > * lib/target-supports.exp (check_effective_target_cet): New proc.
> 
> A couple of questions:
> 
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mcet" } */
> +/* { dg-final { scan-assembler-times "setssbsy" 2 } } */
> +
> +#include 
> +
> +void f1 (void)
> +{
> +  __builtin_ia32_setssbsy ();
> +}
> +
> +void f2 (void)
> +{
> +  _setssbsy ();
> +}
> 
> Is there a reason that both, __builtin and intrinsic versions are tested in a
> couple of places? The intrinsic version is just a wrapper for __builtin, so 
> IMO
> testing intrinsic version should be enough.
No strong reason. Just to check that intrinsic names are recognized and 
processed correctly.
The implementation could change and the test will catch inconsistency. I would 
also assume
a user will use intrinsics that's why I add intrinsic check. Should I remove it?

> 
> diff --git a/gcc/testsuite/gcc.target/i386/cet-rdssp-1.c
> b/gcc/testsuite/gcc.target/i386/cet-rdssp-1.c
> new file mode 100644
> index 000..f9223a5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/cet-rdssp-1.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run { target cet } } */
> +/* { dg-options "-O2 -finstrument-control-flow -mcet" } */
> 
> The "target cet" directive just checks that CET instructions can be compiled.
> The test will (probably?) fail on targets with binutils that can compile CET
> instructions, but the target itself doesn't support CET. If this is the case, 
> then
> check header has to be introduced, so the test can be bypassed on targets
> without runtime support.
The test will not fail even if a target doesn't support CET as 'rdssp' 
instruction is a
NOP on such target and further usage of CET instruction is bypassed. In this 
case
the code

+  ssp = rdssp (ssp);

Will keep ssp as 0.

Thanks,
Igor

> Uros.


Re: [PATCH] haifa-sched: fix autopref_rank_for_schedule qsort comparator

2017-09-19 Thread Alexander Monakov
> I'd like to keep read/write processing balanced.  In the above "read" analysis
> has greater weight than "write" analysis.  Also, autopref_rank_data() should
> not be called if !rtx_equal_p (data1->base, data2->base).

I'm afraid this doesn't work.  Consider you have insns A, B, C such that all
are relevant (so that irr1, irr2 stay 0), A and C have the same base and
compare such that C < A, B has a different base and therefore according
to this subcomparator you can have A == B == C < A and with following
tiebreakers A < B < C < A.

In other words, if your comparison tries to subdivide input set into equivalence
classes (here you want to introduce equivalence classes based on 'base'), then
you should introduce ordering between classes (i.e. for two items with a 
different 'base' say if one is greater than the other).

Alexander


Re: [PATCH] haifa-sched: fix autopref_rank_for_schedule qsort comparator

2017-09-19 Thread Maxim Kuvyrkov
> On Sep 19, 2017, at 6:20 PM, Alexander Monakov  wrote:
> 
>> I'd like to keep read/write processing balanced.  In the above "read" 
>> analysis
>> has greater weight than "write" analysis.  Also, autopref_rank_data() should
>> not be called if !rtx_equal_p (data1->base, data2->base).
> 
> I'm afraid this doesn't work.  Consider you have insns A, B, C such that all
> are relevant (so that irr1, irr2 stay 0), A and C have the same base and
> compare such that C < A, B has a different base and therefore according
> to this subcomparator you can have A == B == C < A and with following
> tiebreakers A < B < C < A.
> 
> In other words, if your comparison tries to subdivide input set into 
> equivalence
> classes (here you want to introduce equivalence classes based on 'base'), then
> you should introduce ordering between classes (i.e. for two items with a 
> different 'base' say if one is greater than the other).

Right.  Let me meditate for couple of days on this :-) .

--
Maxim Kuvyrkov
www.linaro.org



Re: [PATCH][2/2] early LTO debug, main part

2017-09-19 Thread Jakub Jelinek
On Fri, Aug 04, 2017 at 02:21:29PM +0200, Richard Biener wrote:
> ! /* Initialize the various sections and labels for dwarf output.  */
>   
>   static void
> ! init_sections_and_labels (void)

...

These changes broke DWARF-5 support.  E.g. in gcc-7 and before this change
there was:

> !   if (!dwarf_split_debug_info)
>   {
...
> !   debug_loc_section = get_section (dwarf_version >= 5
> !? DEBUG_LOCLISTS_SECTION
> !: DEBUG_LOC_SECTION,
> !SECTION_DEBUG, NULL);

the above which would use .debug_loclists section for debug_loc_section
instead of .debug_loc for -gdwarf-5, because that is what DWARF-5 requires
and the section content is ABI incompatible.  But current trunk does
> !   debug_loc_section = get_section (DEBUG_LOC_SECTION,
> !SECTION_DEBUG, NULL);
only, so the DWARF-5 content is emitted into .debug_loc section.

Or, there used to be:

> !   if (!dwarf_split_debug_info && !DWARF2_ASM_LINE_DEBUG_INFO)
> ! debug_line_str_section = get_section (DEBUG_LINE_STR_SECTION,
> !   DEBUG_STR_SECTION_FLAGS, NULL);

which has no replacement at all, debug_line_str_hash is NULL and
so !DWARF2_ASM_LINE_DEBUG_INFO compiler ICEs on pretty much all
-gdwarf-5, because it can't emit the strings into that section.

Do you have rough time when you wrote changes to these functions (so that
I could diff init_sections_and_labels changes in between that date and
before your LTO debug changes and find out what needs to be double checked
besides those two)?

I presume we'll need some name for a LTO .debug_loclists variant.

Also, seeing:

#ifndef DEBUG_LINE_SECTION
#define DEBUG_LINE_SECTION  ".debug_line"
#endif
#ifndef DEBUG_DWO_LINE_SECTION
#define DEBUG_DWO_LINE_SECTION ".debug_line.dwo"
#endif
#ifndef DEBUG_LTO_LINE_SECTION
#define DEBUG_LTO_LINE_SECTION ".gnu.debuglto_.debug_line.dwo"
#endif

that looks like a pasto, , I'd expect the last one, since it doesn't have
DWO_ in the name, to be just .gnu.debuglto_.debug_line .

Jakub


Re: Enable no-exec stacks for more targets using the Linux kernel

2017-09-19 Thread Michael Eager

On 09/19/2017 12:17 AM, Andreas Schwab wrote:

On Sep 18 2017, Joseph Myers  wrote:


Building glibc for many different configurations and running the
compilation parts of the testsuite runs into failures of the
elf/check-execstack test for hppa, ia64 and microblaze.


ia64 is non-execstack by default, so it doesn't need any marking.  The
same is true for every architecture that doesn't override
elf_read_implies_exec, which includes microblaze and hppa.


This fails because those configurations are not generating
.note.GNU-stack sections to indicate that programs do not need an
executable stack.


This needs to be fixed in glibc.


The requirement that a null .note.GNU-stack section needs to be defined
to indicate that the default stack (i.e., non-executable) is used seems
backward.

I don't have any problem approving the MicroBlaze GCC changes, but, like
Andreas, I think that this is a glibc problem.

--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


[PING 3] [PATCH 3/4] enhance overflow and truncation detection in strncpy and strncat (PR 81117)

2017-09-19 Thread Martin Sebor

Ping #3: https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00912.html

Thanks
Martin

On 08/28/2017 08:34 PM, Martin Sebor wrote:

Ping #2: https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00912.html

On 08/23/2017 01:46 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00912.html

Jeff, is this version good to commit or are there any other
changes you'd like to see?

Martin

On 08/14/2017 04:40 PM, Martin Sebor wrote:

On 08/10/2017 01:29 PM, Martin Sebor wrote:

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 016f68d..1aa9e22 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c

[ ... ]

+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+{
+  /* Return the constant size unless it's zero (that's a
zero-length
+ array likely at the end of a struct).  */
+  tree size = TYPE_SIZE_UNIT (type);
+  if (size && TREE_CODE (size) == INTEGER_CST
+  && !integer_zerop (size))
+return size;
+}

Q. Do we have a canonical test for the trailing array idiom?   In some
contexts isn't it size 1?  ISTM This test needs slight improvement.
Ideally we'd use some canonical test for detect the trailing array
idiom
rather than open-coding it here.  You might look at the array index
warnings in tree-vrp.c to see if it's got a canonical test you can
call
or factor and use.


You're right, there is an API for this (array_at_struct_end_p,
as Richard pointed out).  I didn't want to use it because it
treats any array at the end of a struct as a flexible array
member, but simple tests show that that's what -Wstringop-
overflow does now, and it wasn't my intention to tighten up
the checking under this change.  It surprises me that no tests
exposed this. Let me relax the check and think about proposing
to tighten it up separately.


Done in the attached patch.  (I opened bug 81849 for the enhancement
to have -Wstringop-overflow diagnose overflows when writing to member
arrays bigger than 1 element even if they're last).

(I've left the handling for zero size in place because GCC allows
global arrays to be declared to have zero elements.)




@@ -3883,6 +3920,30 @@ expand_builtin_strncat (tree exp, rtx)
   return NULL_RTX;
 }

+/* Helper to check the sizes of sequences and the destination of
calls
+   to __builtin_strncpy (DST, SRC, LEN) and __builtin___strncpy_chk.
+   Returns true on success (no overflow warning), false
otherwise.  */
+
+static bool
+check_strncpy_sizes (tree exp, tree dst, tree src, tree len)
+{
+  tree dstsize = compute_objsize (dst, warn_stringop_overflow - 1);
+
+  if (!check_sizes (OPT_Wstringop_overflow_,
+exp, len, /*maxlen=*/NULL_TREE, src, dstsize))
+return false;
+
+  if (!dstsize || TREE_CODE (len) != INTEGER_CST)
+return true;
+
+  if (tree_int_cst_lt (dstsize, len))
+warning_at (EXPR_LOCATION (exp), OPT_Wstringop_truncation,
+"%K%qD specified bound %E exceeds destination size %E",
+exp, get_callee_fndecl (exp), len, dstsize);
+
+  return true;

So in the case where you issue the warning, what should the return
value
be?  According to the comment it should be false.  It looks like you
got
the wrong return value for the tree_int_cst_lt (dstsize, len) test.


Corrected.  The return value is unused by the only caller so
there is no test to exercise it.


Done in the attached patch.


+/* A helper of handle_builtin_stxncpy.  Check to see if the
specified
+   bound is a) equal to the size of the destination DST and if
so, b)
+   if it's immediately followed by DST[LEN - 1] = '\0'.  If a) holds
+   and b) does not, warn.  Otherwise, do nothing.  Return true if
+   diagnostic has been issued.
+
+   The purpose is to diagnose calls to strncpy and stpncpy that do
+   not nul-terminate the copy while allowing for the idiom where
+   such a call is immediately followed by setting the last element
+   to nul, as in:
+ char a[32];
+ strncpy (a, s, sizeof a);
+ a[sizeof a - 1] = '\0';
+*/

So using gsi_next to find the next statement could make the heuristic
fail to find the a[sizeof a - 1] = '\0'; statement when debugging is
enabled.

gsi_next_nondebug would be better as it would skip over any debug
insns.


Thanks.  I'll have to remember this.


I went with this simple approach for now since it worked for GDB.
If it turns out that there are important instances of this idiom
that rely on intervening statements the warning can be relaxed.


What might be even better would be to use the immediate uses of the
memory tag.  For your case there should be only one immediate use
and it
should point to the statement which NUL terminates the
destination.  Or
maybe that would be worse in that you only want to allow this
exception
when the statements are consecutive.



 /* Handle a memcpy-like ({mem{,p}cpy,__mem{,p}cpy_chk}) call.
If strlen of the second argument is known and length of the third
argument
is that plus one, strlen of the first argument is the same after
this
@@ -2512,6 +2789,19 @@ strlen_optimize_stmt (gimple_stmt_ite

Re: [PATCH 0/2] backport c++ header fixes to gcc-5-branch

2017-09-19 Thread Jack Howarth
On Mon, Sep 18, 2017 at 1:48 PM, Mike Stump  wrote:
> I was hoping an RM would approve this as it seems just a hair beyond a normal 
> darwin approval.  I'm fine with this, and it does help darwin.  Other ports 
> should not care.

Mike,
 Unless I am misreading this...

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81037#c12

wasn't that an approval from Richard Biener for applying the backport
way back on 9/14?
 Jack

>
> On Sep 18, 2017, at 10:31 AM, Jack Howarth  wrote:
>>
>> Pinging for the final gcc 5.5 release.
>>
>> On Mon, Aug 7, 2017 at 1:12 AM, Ryan Mounce  wrote:
>>> Fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81037
>>>
>>> Bootstrap now succeeds using Xcode 9 toolchain.
>>>
>>> Tested on macOS 10.13 beta, however same issue reported on macOS 10.12
>>> with Xcode 9.
>>>
>>> Ryan Mounce (2):
>>>  (header usage fix) remove unused system header includes
>>>  (header usage fix) include c++ headers in system.h
>>>
>>> gcc/ChangeLog| 29 +
>>> gcc/auto-profile.c   |  6 ++
>>> gcc/c/c-objc-common.c|  2 --
>>> gcc/config/sh/sh.c   |  2 +-
>>> gcc/config/sh/sh_treg_combine.cc |  7 +++
>>> gcc/cp/error.c   |  2 --
>>> gcc/diagnostic.c |  2 --
>>> gcc/fortran/error.c  |  2 --
>>> gcc/fortran/trans-common.c   |  2 +-
>>> gcc/genmatch.c   |  1 -
>>> gcc/graphite-isl-ast-to-gimple.c |  2 +-
>>> gcc/ipa-icf-gimple.c |  2 +-
>>> gcc/ipa-icf.c|  2 +-
>>> gcc/pretty-print.c   |  2 --
>>> gcc/system.h | 12 
>>> gcc/toplev.c |  2 --
>>> 16 files changed, 51 insertions(+), 26 deletions(-)
>>>
>>> --
>>> 2.13.2 (Apple Git-90)
>>>
>


Re: [PATCH] Fix PR81373

2017-09-19 Thread Sebastian Pop
On Tue, Sep 19, 2017 at 9:17 AM, Richard Biener  wrote:

>
> The following forces scev analyzable SESE liveouts to be handed as
> defs.  Otherwise we end up forgetting to code-generate them.  We
> naturally expect SCEV-cprop to handle them but that pass has cost
> cutoffs (and one can disable it).
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok?
>
> Thanks,
> Richard.
>
> 2017-09-19  Richard Biener  
>
> PR tree-optimization/81373
> * graphite-scop-detection.c (build_cross_bb_scalars_def):
> Force SESE live-out defs to be handled even if they are
> scev_analyzable_p.
>
> * gcc.dg/graphite/pr81373.c: New testcase.
>
>
Looks good to me.

Thanks,
Sebastian


[PATCH] Fix typo in simplify_binary_operation_1

2017-09-19 Thread Segher Boessenkool
I noticed this typo while working on PR77729.  Trivial, obvious, and
committed.


Segher


2017-09-17  Segher Boessenkool  

* simplify-rtx.c (simplify_binary_operation_1): Fix typo in comment.

---
 gcc/simplify-rtx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 97a0652..1b960b9 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -2665,7 +2665,7 @@ simplify_binary_operation_1 (enum rtx_code code, 
machine_mode mode,
  HOST_WIDE_INT c1 = INTVAL (XEXP (op0, 1));
  HOST_WIDE_INT c2 = INTVAL (trueop1);
 
- /* If (C1&C2) == C1, then (X&C1)|C2 becomes X.  */
+ /* If (C1&C2) == C1, then (X&C1)|C2 becomes C2.  */
  if ((c1 & c2) == c1
  && !side_effects_p (XEXP (op0, 0)))
return trueop1;
-- 
1.8.3.1



Re: [PATCH][GRAPHITE] Fix affine constraint generation

2017-09-19 Thread Sebastian Pop
On Tue, Sep 19, 2017 at 3:30 AM, Richard Biener  wrote:

>
> The following plugs some holes in extract_affine which failed
> to modulo-reduce signed values in conversions, failed to
> interpret pointer-plus offsets as signed (yeah, stupid GIMPLE IL...)
> and mishandled BIT_NOT_EXPR.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> Ok?
>
> Thanks,
> Richard.
>
> 2017-09-19  Richard Biener  
>
> * graphite-sese-to-poly.c (extract_affine): Properly handle
> POINTER_PLUS_EXPR, BIT_NOT_EXPR and conversion to signed.
>

This patch looks good to me.

Thanks,
Sebastian


Re: [PATCH][aarch64] Fix error calls in aarch64 code so they can be tranlated

2017-09-19 Thread Steve Ellcey
On Tue, 2017-09-19 at 09:50 +0200, Frédéric Marchal wrote:
> On lundi 18 septembre 2017 13 h 20 min 25 s CEST Martin Sebor wrote:
> > 
> > I haven't looked at all of them but from the few I have seen it
> > seems that rephrasing the messages along the following lines would
> > be a way to get around the translation issue and without increasing
> > the number of calls (though not without the conditional):
> > 
> >    error (is_pragma
> >   ? G_("missing name in %<#pragma target\(\"%s=\")%>")
> >   : G_("missing name in % attribute"),
> >   "arch");
> > 
> > The additional benefit of this approach is that it would also make
> > the quoting consistent with what seems to be the prevailing style
> > of these sorts of messages.  (It would be nice to eventually
> > converge on the same style/quoting and phrasing across all back
> > and front ends.)
> Indeed! That's even better as the message uses words the user sees in the 
> source code whatever his/her locale language is.
> 
> With your proposal, I know I must not translate "target" because it clearly 
> is 
> part of the programming language. With the former message, I would have 
> translated "target" as part of the human language message. Your approach is 
> clearly better.
> 
> Frederic

Are '%<...%>' described somewhere?  These aren't normal printf options
are they?  I don't think I have ever used them before.  I am also not
sure why you have '%s=' means vs. '%s'.  Is it even worth breaking the
word 'arch' out of the string vs. having something like:

error (is_pragma
? G_("missing name in %<#pragma target \"arch\"%>)
: G_("missing name in % attribute"));

Steve Ellcey
sell...@cavium.com


Re: [PATCH] Fix PR79622

2017-09-19 Thread Tamar Christina
-- sorry for the duplicate, forgot to post to list as well first time --

Hi Richard,

The testcase seems to fail on aarch64-none-elf when -O1 or -O2,

-O0, -Os and -O3 seem to work fine.

dc[0] ends up being 0 for the cases that fail.

Kind regards,
Tamar

From: gcc-patches-ow...@gcc.gnu.org  on behalf 
of Richard Biener 
Sent: Monday, September 18, 2017 8:31 AM
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] Fix PR79622

The following patch fixes the other known wrong-code bug in GRAPHITE
which shows we're mishandling PHIs in not properly considering the
edge copies they represent as living outside of the black-box we're
analyzing.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Note the testcase still fails with ISL 0.16.1 but passes with 0.18
for me.  I'll update the version in download_prerequesites to 0.18.

Richard.

2017-09-18  Richard Biener  

PR tree-optimization/79622
* graphite-scop-detection.c (build_cross_bb_scalars_def): Properly
handle PHIs.
(build_cross_bb_scalars_use): Likewise.

* gcc.dg/graphite/pr79622.c: New testcase.

Index: gcc/graphite-scop-detection.c
===
--- gcc/graphite-scop-detection.c   (revision 252806)
+++ gcc/graphite-scop-detection.c   (working copy)
@@ -1744,7 +1744,9 @@ build_cross_bb_scalars_def (scop_p scop,
   gimple *use_stmt;
   imm_use_iterator imm_iter;
   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, def)
-if (def_bb != gimple_bb (use_stmt) && !is_gimple_debug (use_stmt))
+if ((def_bb != gimple_bb (use_stmt) && !is_gimple_debug (use_stmt))
+   /* PHIs have their effect at "BBs" on the edges.  See PR79622.  */
+   || gimple_code (SSA_NAME_DEF_STMT (def)) == GIMPLE_PHI)
   {
writes->safe_push (def);
DEBUG_PRINT (dp << "Adding scalar write: ";
@@ -1758,7 +1760,8 @@ build_cross_bb_scalars_def (scop_p scop,
   }
 }

-/* Record DEF if it is used in other bbs different than DEF_BB in the SCOP.  */
+/* Record USE if it is defined in other bbs different than USE_STMT
+   in the SCOP.  */

 static void
 build_cross_bb_scalars_use (scop_p scop, tree use, gimple *use_stmt,
@@ -1774,7 +1777,9 @@ build_cross_bb_scalars_use (scop_p scop,
 return;

   gimple *def_stmt = SSA_NAME_DEF_STMT (use);
-  if (gimple_bb (def_stmt) != gimple_bb (use_stmt))
+  if (gimple_bb (def_stmt) != gimple_bb (use_stmt)
+  /* PHIs have their effect at "BBs" on the edges.  See PR79622.  */
+  || gimple_code (def_stmt) == GIMPLE_PHI)
 {
   DEBUG_PRINT (dp << "Adding scalar read: ";
   print_generic_expr (dump_file, use);
Index: gcc/testsuite/gcc.dg/graphite/pr79622.c
===
--- gcc/testsuite/gcc.dg/graphite/pr79622.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr79622.c (working copy)
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -floop-nest-optimize" } */
+
+int bf;
+
+int
+main (void)
+{
+  int dc[5];
+
+  for (bf = 0; bf < 2; ++bf)
+{
+  int l9, g5 = -1;
+
+  for (l9 = 0; l9 < 5; ++l9)
+   {
+ dc[l9] = g5;
+ g5 = (dc[l9] > 0);
+   }
+}
+
+  if (dc[0] != -1)
+__builtin_abort ();
+
+  return 0;
+}


Re: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation

2017-09-19 Thread Uros Bizjak
On Tue, Sep 19, 2017 at 5:18 PM, Tsimbalist, Igor V
 wrote:
>> -Original Message-
>> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
>> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
>> Sent: Monday, September 18, 2017 12:17 PM
>> To: gcc-patches@gcc.gnu.org
>> Cc: Tsimbalist, Igor V ; Tsimbalist, Igor V
>> 
>> Subject: Re: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
>>
>> Hello!
>>
>> > gcc/testsuite/
>> >
>> > * g++.dg/cet-notrack-1.C: New test.
>> > * gcc.target/i386/cet-intrin-1.c: Likewise.
>> > * gcc.target/i386/cet-intrin-10.c: Likewise.
>> > * gcc.target/i386/cet-intrin-2.c: Likewise.
>> > * gcc.target/i386/cet-intrin-3.c: Likewise.
>> > * gcc.target/i386/cet-intrin-4.c: Likewise.
>> > * gcc.target/i386/cet-intrin-5.c: Likewise.
>> > * gcc.target/i386/cet-intrin-6.c: Likewise.
>> > * gcc.target/i386/cet-intrin-7.c: Likewise.
>> > * gcc.target/i386/cet-intrin-8.c: Likewise.
>> > * gcc.target/i386/cet-intrin-9.c: Likewise.
>> > * gcc.target/i386/cet-label.c: Likewise.
>> > * gcc.target/i386/cet-notrack-1a.c: Likewise.
>> > * gcc.target/i386/cet-notrack-1b.c: Likewise.
>> > * gcc.target/i386/cet-notrack-2a.c: Likewise.
>> > * gcc.target/i386/cet-notrack-2b.c: Likewise.
>> > * gcc.target/i386/cet-notrack-3.c: Likewise.
>> > * gcc.target/i386/cet-notrack-4a.c: Likewise.
>> > * gcc.target/i386/cet-notrack-4b.c: Likewise.
>> > * gcc.target/i386/cet-notrack-5a.c: Likewise.
>> > * gcc.target/i386/cet-notrack-5b.c: Likewise.
>> > * gcc.target/i386/cet-notrack-6a.c: Likewise.
>> > * gcc.target/i386/cet-notrack-6b.c: Likewise.
>> > * gcc.target/i386/cet-notrack-7.c: Likewise.
>> > * gcc.target/i386/cet-property-1.c: Likewise.
>> > * gcc.target/i386/cet-property-2.c: Likewise.
>> > * gcc.target/i386/cet-rdssp-1.c: Likewise.
>> > * gcc.target/i386/cet-sjlj-1.c: Likewise.
>> > * gcc.target/i386/cet-sjlj-2.c: Likewise.
>> > * gcc.target/i386/cet-sjlj-3.c: Likewise.
>> > * gcc.target/i386/cet-switch-1.c: Likewise.
>> > * gcc.target/i386/cet-switch-2.c: Likewise.
>> > * lib/target-supports.exp (check_effective_target_cet): New proc.
>>
>> A couple of questions:
>>
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -mcet" } */
>> +/* { dg-final { scan-assembler-times "setssbsy" 2 } } */
>> +
>> +#include 
>> +
>> +void f1 (void)
>> +{
>> +  __builtin_ia32_setssbsy ();
>> +}
>> +
>> +void f2 (void)
>> +{
>> +  _setssbsy ();
>> +}
>>
>> Is there a reason that both, __builtin and intrinsic versions are tested in a
>> couple of places? The intrinsic version is just a wrapper for __builtin, so 
>> IMO
>> testing intrinsic version should be enough.
> No strong reason. Just to check that intrinsic names are recognized and 
> processed correctly.
> The implementation could change and the test will catch inconsistency. I 
> would also assume
> a user will use intrinsics that's why I add intrinsic check. Should I remove 
> it?

Actually, these __builtins are considered as implementation detail,
and their use should be discouraged. They are deliberately not
documented, and users should use intrinsic headers instead. That said,
builtins won't change without a reason, since Ada needs them.

It can happen that the test fails due to change of intrinsics, so I'd
recommend to remove them.

>> diff --git a/gcc/testsuite/gcc.target/i386/cet-rdssp-1.c
>> b/gcc/testsuite/gcc.target/i386/cet-rdssp-1.c
>> new file mode 100644
>> index 000..f9223a5
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/cet-rdssp-1.c
>> @@ -0,0 +1,39 @@
>> +/* { dg-do run { target cet } } */
>> +/* { dg-options "-O2 -finstrument-control-flow -mcet" } */
>>
>> The "target cet" directive just checks that CET instructions can be compiled.
>> The test will (probably?) fail on targets with binutils that can compile CET
>> instructions, but the target itself doesn't support CET. If this is the 
>> case, then
>> check header has to be introduced, so the test can be bypassed on targets
>> without runtime support.
> The test will not fail even if a target doesn't support CET as 'rdssp' 
> instruction is a
> NOP on such target and further usage of CET instruction is bypassed. In this 
> case
> the code
>
> +  ssp = rdssp (ssp);
>
> Will keep ssp as 0.

I assume that this is true also for other runtime tests, and this
clears my concern about runtime failures with updated binutils.

Uros.


Re: [PATCH, i386] Enable option -mprefer-avx256 added for Intel AVX512 configuration

2017-09-19 Thread Uros Bizjak
On Tue, Sep 19, 2017 at 4:00 PM, Shalnov, Sergey
 wrote:
> Uros
> Please, find the new patch attached.
> Previously, I thought to change integer part of the " 
> ix86_preferred_simd_mode" function in the same way
> as it done with floating point part. I haven't done it to disturb less code 
> by the patch.
> I completely agree with your proposal, currently the code looks better.
>
> In case of  " ix86_autovectorize_vector_sizes ", I've done changes in the 
> proposed way, but I keep
> algorithm the same as it was in mainline. Please note, the function can 
> return "0" in original version.
> I assume you just propose the "style of changes" in previous message.
>
> If you like to change "ix86_autovectorize_vector_sizes" function 
> algorithmically, I would propose to do
> this in separate patch.

In fact, I was just not careful enough when proposing new style.

The updated patch is OK for mainline.

Thanks,
Uros.


Re: [PATCH][aarch64] Fix error calls in aarch64 code so they can be tranlated

2017-09-19 Thread Martin Sebor

On 09/19/2017 09:54 AM, Steve Ellcey wrote:

On Tue, 2017-09-19 at 09:50 +0200, Frédéric Marchal wrote:

On lundi 18 septembre 2017 13 h 20 min 25 s CEST Martin Sebor wrote:


I haven't looked at all of them but from the few I have seen it
seems that rephrasing the messages along the following lines would
be a way to get around the translation issue and without increasing
the number of calls (though not without the conditional):

   error (is_pragma
  ? G_("missing name in %<#pragma target\(\"%s=\")%>")
  : G_("missing name in % attribute"),
  "arch");

The additional benefit of this approach is that it would also make
the quoting consistent with what seems to be the prevailing style
of these sorts of messages.  (It would be nice to eventually
converge on the same style/quoting and phrasing across all back
and front ends.)

Indeed! That's even better as the message uses words the user sees in the
source code whatever his/her locale language is.

With your proposal, I know I must not translate "target" because it clearly is
part of the programming language. With the former message, I would have
translated "target" as part of the human language message. Your approach is
clearly better.

Frederic


Are '%<...%>' described somewhere?  These aren't normal printf options
are they?  I don't think I have ever used them before.  I am also not
sure why you have '%s=' means vs. '%s'.  Is it even worth breaking the
word 'arch' out of the string vs. having something like:

error (is_pragma
? G_("missing name in %<#pragma target \"arch\"%>)
: G_("missing name in % attribute"));


%< and %> are (briefly) mentioned in pretty-print.c, just above
pp_format.  They should be used in preference to the apostrophe
to introduce quoting and color highlighting in diagnostics.
(There's also logic in the pretty printer to help detect
mismatches between these and other related directives, but none
for the apostrophe.)

I used %s= because it appears with some consistency in other
similar messages in gcc.pot, e.g.:

  % CPU can be used only for % attribute"

(I ran 'grep target gcc.pot | grep attr' to find them, along with
some of the inconsistencies among them.)

It doesn't look to me like parameterizing the message on the name
of the target attribute reduces the number of messages that need
to be translated so it doesn't gain much (if anything) here.  It
does have help in a few other places where I've been working
recently that deal with attributes and I used mostly out of habit.

Martin


Re: Enable no-exec stacks for more targets using the Linux kernel

2017-09-19 Thread Joseph Myers
On Tue, 19 Sep 2017, Andreas Schwab wrote:

> On Sep 18 2017, Joseph Myers  wrote:
> 
> > Building glibc for many different configurations and running the
> > compilation parts of the testsuite runs into failures of the
> > elf/check-execstack test for hppa, ia64 and microblaze.
> 
> ia64 is non-execstack by default, so it doesn't need any marking.  The
> same is true for every architecture that doesn't override
> elf_read_implies_exec, which includes microblaze and hppa.

Thanks for the explanation.

I've sent a glibc patch 
.  I think the 
key questions for architecture experts now are: on each of those three 
architectures, do trampolines ever require executable stacks, and, if they 
do, how does this work at present when the kernel defaults to 
non-executable and my understanding at 
 would be that 
glibc would only make thread stacks executable on those architectures, not 
the main process stacks, and GCC will never generate an explicit marker on 
those architectures to request an executable stack?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [libstdc++/71500] make back reference work with icase

2017-09-19 Thread Jonathan Wakely

On 19/09/17 15:38 +0100, Jonathan Wakely wrote:

On 18/09/17 16:54 -0700, Tim Shen wrote:

On Mon, Sep 18, 2017 at 4:01 PM, Jonathan Wakely  wrote:

On 18/09/17 10:58 -0700, Tim Shen via libstdc++ wrote:


On Mon, Sep 18, 2017 at 10:26 AM, Jonathan Wakely 
wrote:


We need to rewrite this to check the lengths are equal first, and then
call the 3-argument version of std::equal.

Alternatively, we could move the implementation of the C++14
std::equal overloads to __equal and make that available for C++11.
I'll try that.




Here's a proof of concept patch for that. It's a bit ugly.



Instead of having iterator tags in the interface, we can probe the
random-access-ness inside __equal4/__equal4_p, can't we? It's similar
to the existing "if (_RAIters()) { ... }".

I'd expect the patches to be renaming the current implementations and
adding wrappers, instead of adding new implementations.



Well I decided to split the existing functions up and use tag
dispatching, which is conceptually cleaner anyway. But as the
RandomAccessIterator version doesn't need any operations that aren't
valid for other categories, it's not strictly necessary. The tag
dispatching version should generate slightly smaller code for
unoptimized builds, but that's not very important.


Unoptimized builds don't inline small functions, therefore the first
patch generate two weak symbols, instead of one by the second patch.


Two small functions that only do the necessary work, rather than one
large function that has a branch for RAIters even when it can never be
taken.


It's unclear to me how would number of symbols penalize the
performance/binary size.


People who care about performance or binary size should be optimizing,
and in that case the RAIters branch will be known at compile-time and
the dead code should get removed, and the wrapper functions inlined.


Here's the patch doing it as you suggest. We can't call the new
functions __equal because t hat name is already taken by a helper
struct, hence __equal4.

Do you prefer this version?


Yes, I prefer this version for readability reasons:
1) subjectively, less scattered code; and
2) ideally I want `if constexpr (...)`), the if version is closer.


Yes, we could add _GLIBCXX17_CONSTEXPR there, but I'm not sure it's
worth doing.

3) The calls to __equal4 in _Backref_matcher are simpler.


I agree that it's not a big difference. I just wanted to point out the
small difference. I'm fine with either version.


I'll commit the second version.


Here's what I've committed, with a minimal test to catch this
happening in future.

I'll re-run the full set of test variations.


commit 371c5de025c0fc95420d96bf96f3da84e3725c9d
Author: Jonathan Wakely 
Date:   Tue Sep 19 17:36:52 2017 +0100

PR libstdc++/71500 restore C++11 compatibility in 

PR libstdc++/71500
* include/bits/regex_executor.tcc
(_Backref_matcher>::_M_apply): Use
std::__equal4 instead of C++14 4-iterator overloads of std::equal.
* include/bits/stl_algobase.h (__equal4): New functions implementing
4-iterator overloads of std::equal for use in C++11.
(equal(It1, It1, It2, It2), equal(It1, It1, It2, It2, BinaryPred)):
Move function bodies to new __equal4 functions.
* testsuite/28_regex/simple_c++11.cc: New.

diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index f6149fecf9d..2ceba35e7b8 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -366,17 +366,17 @@ namespace __detail
 	   _BiIter __actual_end)
   {
 	if (!_M_icase)
-	  return std::equal(__expected_begin, __expected_end,
-			__actual_begin, __actual_end);
+	  return std::__equal4(__expected_begin, __expected_end,
+			   __actual_begin, __actual_end);
 	typedef std::ctype<_CharT> __ctype_type;
 	const auto& __fctyp = use_facet<__ctype_type>(_M_traits.getloc());
-	return std::equal(__expected_begin, __expected_end,
-			  __actual_begin, __actual_end,
-			  [this, &__fctyp](_CharT __lhs, _CharT __rhs)
-			  {
-			return __fctyp.tolower(__lhs)
-== __fctyp.tolower(__rhs);
-			  });
+	return std::__equal4(__expected_begin, __expected_end,
+			 __actual_begin, __actual_end,
+			 [this, &__fctyp](_CharT __lhs, _CharT __rhs)
+			 {
+			   return __fctyp.tolower(__lhs)
+ == __fctyp.tolower(__rhs);
+			 });
   }
 
   bool _M_icase;
diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h
index f68ecb22b82..a80934c4faa 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1082,6 +1082,60 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   return true;
 }
 
+#if __cplusplus >= 201103L
+  // 4-iterator version of std::equal for use in C++11.
+  template
+inline bool
+__equal4(_II1 __first1, _II1 __last1, _II2 __first2,

[PATCH] Fix PR82255 (vectorizer cost model overcounts some vector load costs)

2017-09-19 Thread Bill Schmidt
Hi,

https://gcc.gnu.org/PR82255 identifies a problem in the vector cost model
where a vectorized load is treated as having the cost of a strided load
in a case where we will not actually generate a strided load.  This is
simply a mismatch between the conditions tested in the cost model and
those tested in the code that generates vectorized instructions.  This
patch fixes the problem by recognizing when only a single non-strided
load will be generated and reporting the cost accordingly.

I believe this patch is sufficient to catch all such cases, but I admit
that the code in vectorizable_load is complex enough that I could have
missed a trick.

I've added a test in the PowerPC cost model subdirectory.  Even though
this isn't a target-specific issue, the test does rely on a 16-byte 
vector size, so this seems safest.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
Is this ok for trunk?

Thanks!
Bill


[gcc]

2017-09-19  Bill Schmidt  

PR tree-optimization/82255
* tree-vect-stmts.c (vect_model_load_cost): Don't count
vec_construct cost when a true strided load isn't present.

[gcc/testsuite]

2017-09-19  Bill Schmidt  

PR tree-optimization/82255
* gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c: New file.


Index: gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c
===
--- gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c (nonexistent)
+++ gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c (working copy)
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+/* PR82255: Ensure we don't require a vec_construct cost when we aren't
+   going to generate a strided load.  */
+
+extern int abs (int __x) __attribute__ ((__nothrow__, __leaf__)) __attribute__ 
((__const__));
+
+static int
+foo (unsigned char *w, int i, unsigned char *x, int j)
+{
+  int tot = 0;
+  for (int a = 0; a < 16; a++)
+{
+  for (int b = 0; b < 16; b++)
+   tot += abs (w[b] - x[b]);
+  w += i;
+  x += j;
+}
+  return tot;
+}
+
+void
+bar (unsigned char *w, unsigned char *x, int i, int *result)
+{
+  *result = foo (w, 16, x, i);
+}
+
+/* { dg-final { scan-tree-dump-times "vec_construct required" 0 "vect" } } */
+
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 252760)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -1091,8 +1091,20 @@ vect_model_load_cost (stmt_vec_info stmt_info, int
prologue_cost_vec, body_cost_vec, true);
   if (memory_access_type == VMAT_ELEMENTWISE
   || memory_access_type == VMAT_STRIDED_SLP)
-inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_construct,
-stmt_info, 0, vect_body);
+{
+  stmt_vec_info stmt_info = vinfo_for_stmt (first_stmt);
+  int group_size = GROUP_SIZE (stmt_info);
+  int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
+  if (group_size < nunits)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"vect_model_load_cost: vec_construct required");
+ inside_cost += record_stmt_cost (body_cost_vec, ncopies,
+  vec_construct, stmt_info, 0,
+  vect_body);
+   }
+}
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,



[PATCH, i386]: Remove special handling of LEA addresses from ix86_split_long_move

2017-09-19 Thread Uros Bizjak
Hello!

"Convert TLS location to DEFAULT_TLS_SEG_REG address space" patch [1]
allows to remove special handling of LEA addresses from
ix86_split_long_move. The address space is the property of MEM RTX,
not the address itself.

2017-09-19  Uros Bizjak  

* config/i386/i386.c (ix86_split_long_move): Do not handle
address used for LEA in a special way.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

[1] https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01789.html

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 252978)
+++ config/i386/i386.c  (working copy)
@@ -26706,7 +26706,7 @@ ix86_split_long_move (rtx operands[])
 Do an lea to the last part and use only one colliding move.  */
   else if (collisions > 1)
{
- rtx base, addr, tls_base = NULL_RTX;
+ rtx base, addr;
 
  collisions = 1;
 
@@ -26723,44 +26723,13 @@ ix86_split_long_move (rtx operands[])
  struct ix86_address parts;
  int ok = ix86_decompose_address (addr, &parts);
  gcc_assert (ok);
- if (parts.seg == DEFAULT_TLS_SEG_REG)
-   {
- /* It is not valid to use %gs: or %fs: in
-lea though, so we need to remove it from the
-address used for lea and add it to each individual
-memory loads instead.  */
- addr = copy_rtx (addr);
- rtx *x = &addr;
- while (GET_CODE (*x) == PLUS)
-   {
- for (i = 0; i < 2; i++)
-   {
- rtx u = XEXP (*x, i);
- if (GET_CODE (u) == ZERO_EXTEND)
-   u = XEXP (u, 0);
- if (GET_CODE (u) == UNSPEC
- && XINT (u, 1) == UNSPEC_TP)
-   {
- tls_base = XEXP (*x, i);
- *x = XEXP (*x, 1 - i);
- break;
-   }
-   }
- if (tls_base)
-   break;
- x = &XEXP (*x, 0);
-   }
- gcc_assert (tls_base);
-   }
+ /* It is not valid to use %gs: or %fs: in lea.  */
+ gcc_assert (parts.seg == ADDR_SPACE_GENERIC);
}
  emit_insn (gen_rtx_SET (base, addr));
- if (tls_base)
-   base = gen_rtx_PLUS (GET_MODE (base), base, tls_base);
  part[1][0] = replace_equiv_address (part[1][0], base);
  for (i = 1; i < nparts; i++)
{
- if (tls_base)
-   base = copy_rtx (base);
  tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i);
  part[1][i] = replace_equiv_address (part[1][i], tmp);
}


[PATCH], Improve moving SFmode to GPR on PowerPC

2017-09-19 Thread Michael Meissner
On the PowerPC starting with ISA 2.07 (power8), moving a single precision value
(SFmode) from a vector register to a GPR involves converting the scalar value
in the register from being in double (DFmode) format to the 32-bit
vector/storage format, doing the move to the GPR, and then doing a shift right
32-bits to get the value into the bottom 32-bits of the GPR for use as a
scalar:

xscvdpspn 0,1
mfvsrd3,0
srdi  3,3,32

It turns out that the current processors starting with ISA 2.06 (power7)
through ISA 3.0 (power9) actually duplicates the 32-bit value produced by the
XSCVDPSPN and XSCVDPSP instructions into the top 32-bits of the register and to
the second 32-bit word.  This allows us to eliminate the shift instruction,
since the value is already in the correct location for a 32-bit scalar.

ISA 3.0 is being updated to include this specification (and other fixes) so
that future processors will also be able to eliminate the shift.

The new code is:

xscvdpspn 0,1
mfvsrwz   3,0

While I was working on the modifications, I noticed that if the user did a
round from DFmode to SFmode and then tried to move it to a GPR, it would
originally do:

frsp  1,2
xscvdpspn 0,1
mfvsrd3,0
srdi  3,3,32

The XSCVDPSP instruction already handles values outside of the SFmode range
(XSCVDPSPN does not), and so I added a combiner pattern to combine the two
instructions:

xscvdpsp  0,1
mfvsrwz   3,0

While I was looking at the code, I was noticing that if we have a SImode value
in a vector register, and we want to sign extended it and leave the value in a
GPR register, on power8 the register allocator would decide to do a 32-bit
store integer instruction and a sign extending load in the GPR to do the sign
extension.  I added a splitter to convert this into a pair of MFVSRWZ and
EXTSH instructions.

I built Spec 2006 with the changes, and I noticed the following changes in the
code:

* Round DF->SF and move to GPR: namd, wrf;
* Eliminate 32-bit shift: gromacs, namd, povray, wrf;
* Use of MFVSRWZ/EXTSW: gromacs, povray, calculix, h264ref.

I have built these changes on the following machines with bootstrap and no
regressions in the regression test:

* Big endian power7 (with both 32/64-bit targets);
* Little endian power8;
* Little endian power9 prototype.

Can I check these changes into GCC 8?  Can I back port these changes into the
GCC 7 branch?

[gcc]
2017-09-19  Michael Meissner  

* config/rs6000/vsx.md (vsx_xscvspdp_scalar2): Move insn so it is
next to vsx_xscvspdp.
(vsx_xscvdpsp_scalar): Use 'ww' constraint instead of 'f' to allow
SFmode values being in Altivec registers.
(vsx_xscvdpspn): Eliminate uneeded alternative.  Use correct
constraint ('ws') for DFmode.
(vsx_xscvspdpn): Likewise.
(vsx_xscvdpspn_scalar): Likewise.
(peephole for optimizing move SF to GPR): Adjust code to eliminate
needing to do the shift right 32-bits operation after XSCVDPSPN.
* config/rs6000/rs6000.md (extendsi2): Add alternative to do
sign extend from vector register to GPR via a split, preventing
the register allocator from doing the move via store/load.
(extendsi2 splitter): Likewise.
(movsi_from_sf): Adjust code to eliminate doing a 32-bit shift
right or vector extract after doing XSCVDPSPN.  Use MFVSRWZ
instead of MFVSRD to move the value to a GPR register.
(movdi_from_sf_zero_ext): Likewise.
(movsi_from_df): Add optimization to merge a convert from DFmode
to SFmode and moving the SFmode to a GPR to use XSCVDPSP instead
of round and XSCVDPSPN.
(reload_gpr_from_vsxsf): Use MFVSRWZ instead of MFVSRD to move the
value to a GPR register.  Rename p8_mfvsrd_4_disf insn to
p8_mfvsrwz_disf.
(p8_mfvsrd_4_disf): Likewise.
(p8_mfvsrwz_disf): Likewise.

[gcc/testsuite]
2017-09-19  Michael Meissner  

* gcc.target/powerpc/pr71977-1.c: Adjust scan-assembler codes to
reflect that we don't generate a 32-bit shift right after
XSCVDPSPN.
* gcc.target/powerpc/direct-move-float1.c: Likewise.
* gcc.target/powerpc/direct-move-float3.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 252844)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -1781,6 +1781,15 @@ (define_insn "vsx_xscvspdp"
   "xscvspdp %x0,%x1"
   [(set_attr "type" "fp")])
 
+;; Same as vsx_xscvspdp, but use SF as the type
+(define_insn "vsx_xscvspdp_scalar2"
+  [(set (match_operand:SF 0 "vsx_register_operand" "=ww")
+   (unspec:SF [(match_operand:V4SF 1 "vsx_register_operand"

[PATCH] Fix PR81422[aarch64] internal compiler error: in update_equiv_regs, at ira.c:3425

2017-09-19 Thread Qing Zhao
Hi,

This patch fixes the aarch64 bug 81422 
https://gcc.gnu.org/PR81422

Before adding REG_EQUIV notes in the TLS symbol handling code,
we should check whether the "dest" is a REG or NOT (sometimes,
it's a SUBREG as in this bug). Only when the “dest” is a REG, the note
will be added.

a new small testing case is added. 

this change has been tested on aarch64-unknown-linux-gnu with no regression. 

Thanks!

Qing


===

gcc/ChangeLog:

   * config/aarch64/aarch64.c (aarch64_load_symref_appropriately):
   check whether the dest is REG before adding REG_EQUIV note.

gcc/testsuite/ChangeLog:

   PR target/81422
   * gcc.target/aarch64/pr81422.C: New test.

---
 gcc/config/aarch64/aarch64.c   | 12 
 gcc/testsuite/gcc.target/aarch64/pr81422.C | 15 +++
 2 files changed, 23 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr81422.C

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a2ecd7a..6c3ef76 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1480,7 +1480,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
  tp = gen_lowpart (mode, tp);
 
emit_insn (gen_rtx_SET (dest, gen_rtx_PLUS (mode, tp, x0)));
-   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
+if (REG_P (dest)) 
+ set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
   }
 
@@ -1514,7 +1515,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
  }
 
emit_insn (gen_rtx_SET (dest, gen_rtx_PLUS (mode, tp, tmp_reg)));
-   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
+if (REG_P (dest)) 
+ set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
   }
 
@@ -1555,7 +1557,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
gcc_unreachable ();
  }
 
-   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
+if (REG_P (dest)) 
+ set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
   }
 
@@ -1584,7 +1587,8 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
emit_insn (gen_tlsie_tiny_sidi (dest, imm, tp));
  }
 
-   set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
+if (REG_P (dest))
+ set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
   }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/pr81422.C 
b/gcc/testsuite/gcc.target/aarch64/pr81422.C
new file mode 100644
index 000..5bcc948
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr81422.C
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O0" } */
+
+struct DArray
+{
+__SIZE_TYPE__ length;
+int* ptr;
+};
+
+void foo35(DArray)
+{
+static __thread int x[5];
+foo35({5, (int*)&x});
+}
+
-- 
1.9.1



Re: [PATCH] Fix PR82255 (vectorizer cost model overcounts some vector load costs)

2017-09-19 Thread Bill Schmidt
On 9/19/17 12:38 PM, Bill Schmidt wrote:
> Hi,
>
> https://gcc.gnu.org/PR82255 identifies a problem in the vector cost model
> where a vectorized load is treated as having the cost of a strided load
> in a case where we will not actually generate a strided load.  This is
> simply a mismatch between the conditions tested in the cost model and
> those tested in the code that generates vectorized instructions.  This
> patch fixes the problem by recognizing when only a single non-strided
> load will be generated and reporting the cost accordingly.
>
> I believe this patch is sufficient to catch all such cases, but I admit
> that the code in vectorizable_load is complex enough that I could have
> missed a trick.
>
> I've added a test in the PowerPC cost model subdirectory.  Even though
> this isn't a target-specific issue, the test does rely on a 16-byte 
> vector size, so this seems safest.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
> Is this ok for trunk?

After posting, I realized that I had wrongly recalculated stmt_info
in the patch.  Here's a new version (also passing regstrap) that removes
that flaw.

[gcc]

2017-09-19  Bill Schmidt  

PR tree-optimization/82255
* tree-vect-stmts.c (vect_model_load_cost): Don't count
vec_construct cost when a true strided load isn't present.

[gcc/testsuite]

2017-09-19  Bill Schmidt  

PR tree-optimization/82255
* gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c: New file.


Index: gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c
===
--- gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c (nonexistent)
+++ gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-pr82255.c (working copy)
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+/* PR82255: Ensure we don't require a vec_construct cost when we aren't
+   going to generate a strided load.  */
+
+extern int abs (int __x) __attribute__ ((__nothrow__, __leaf__)) __attribute__ 
((__const__));
+
+static int
+foo (unsigned char *w, int i, unsigned char *x, int j)
+{
+  int tot = 0;
+  for (int a = 0; a < 16; a++)
+{
+  for (int b = 0; b < 16; b++)
+   tot += abs (w[b] - x[b]);
+  w += i;
+  x += j;
+}
+  return tot;
+}
+
+void
+bar (unsigned char *w, unsigned char *x, int i, int *result)
+{
+  *result = foo (w, 16, x, i);
+}
+
+/* { dg-final { scan-tree-dump-times "vec_construct required" 0 "vect" } } */
+
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 252760)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -1091,8 +1091,19 @@ vect_model_load_cost (stmt_vec_info stmt_info, int
prologue_cost_vec, body_cost_vec, true);
   if (memory_access_type == VMAT_ELEMENTWISE
   || memory_access_type == VMAT_STRIDED_SLP)
-inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_construct,
-stmt_info, 0, vect_body);
+{
+  int group_size = GROUP_SIZE (stmt_info);
+  int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
+  if (group_size < nunits)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"vect_model_load_cost: vec_construct required");
+ inside_cost += record_stmt_cost (body_cost_vec, ncopies,
+  vec_construct, stmt_info, 0,
+  vect_body);
+   }
+}
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,



Re: [C++ PATCH] Some check-c++-all fixes

2017-09-19 Thread Jason Merrill
OK, thanks.

On Tue, Sep 19, 2017 at 9:03 AM, Jakub Jelinek  wrote:
> Hi!
>
> In make check-c++-all I've noticed some UNSUPPORTED tests and some
> failures, the following patch attempts to fix those.
> The failures are due to new not emitting operator new (...) != NULL
> comparison with -std=c++17 anymore, so there is nothing to fold away
> (first 2 hunks).  The rest is about tests which are dg-do run, but
> for -std=c++17 and above contain dg-error and thus the execution test
> part is UNSUPPORTED.
>
> Regtested with make check-c++-all on x86_64-linux and i686-linux, ok for
> trunk?
>
> 2017-09-19  Jakub Jelinek  
>
> * g++.dg/tree-ssa/pr31146-2.C: Only do scan-tree-dump for c++14_down.
> * g++.dg/tree-ssa/pr41428.C: Likewise.
> * g++.dg/expr/bool1.C: Only do dg-do compile instead of dg-do run for
> c++17 and up.
> * g++.dg/expr/bool3.C: Likewise.
> * g++.dg/expr/bitfield5.C: Likewise.
> * g++.old-deja/g++.jason/bool5.C: Likewise.
>
> --- gcc/testsuite/g++.dg/tree-ssa/pr41428.C.jj  2015-05-29 15:04:33.0 
> +0200
> +++ gcc/testsuite/g++.dg/tree-ssa/pr41428.C 2017-09-19 14:23:13.366127516 
> +0200
> @@ -11,4 +11,6 @@ int foo(void)
>return *(int *)&f;
>  }
>
> -/* { dg-final { scan-tree-dump "Folded into: if \\\(1 != 0\\\)" "ccp1" } } */
> +/* -std=c++17 and above doesn't emit operator new () != NULL, so there is
> +   nothing to fold anymore.  */
> +/* { dg-final { scan-tree-dump "Folded into: if \\\(1 != 0\\\)" "ccp1" { 
> target c++14_down } } } */
> --- gcc/testsuite/g++.dg/tree-ssa/pr31146-2.C.jj2016-09-21 
> 08:54:09.0 +0200
> +++ gcc/testsuite/g++.dg/tree-ssa/pr31146-2.C   2017-09-19 14:23:41.675773292 
> +0200
> @@ -20,4 +20,6 @@ double foo (void)
>return v.a[2];
>  }
>
> -/* { dg-final { scan-tree-dump "Replaced .* != 0B. with .1" "forwprop1" } } 
> */
> +/* -std=c++17 and above doesn't emit operator new () != NULL, so there is
> +   nothing to fold anymore.  */
> +/* { dg-final { scan-tree-dump "Replaced .* != 0B. with .1" "forwprop1" { 
> target c++14_down } } } */
> --- gcc/testsuite/g++.dg/expr/bool1.C.jj2017-09-15 18:11:05.0 
> +0200
> +++ gcc/testsuite/g++.dg/expr/bool1.C   2017-09-19 14:16:46.139972669 +0200
> @@ -1,5 +1,6 @@
> -// { dg-do run }
> -// PR C++/29295
> +// PR c++/29295
> +// { dg-do run { target c++14_down } }
> +// { dg-do compile { target c++17 } }
>  // make sure that a typedef for a bool will have the
>  //  the same results as a bool itself.
>
> --- gcc/testsuite/g++.dg/expr/bool3.C.jj2017-09-15 18:11:05.0 
> +0200
> +++ gcc/testsuite/g++.dg/expr/bool3.C   2017-09-19 14:16:40.439044001 +0200
> @@ -1,5 +1,6 @@
> -// { dg-do run }
> -// PR C++/29295
> +// PR c++/29295
> +// { dg-do run { target c++14_down } }
> +// { dg-do compile { target c++17 } }
>  // make sure that a typedef for a bool will have the
>  //  the same results as a bool itself.
>
> --- gcc/testsuite/g++.dg/expr/bitfield5.C.jj2017-09-15 18:11:05.0 
> +0200
> +++ gcc/testsuite/g++.dg/expr/bitfield5.C   2017-09-19 14:12:40.087051397 
> +0200
> @@ -1,5 +1,6 @@
>  // PR c++/30274
> -// { dg-do run }
> +// { dg-do run { target c++14_down } }
> +// { dg-do compile { target c++17 } }
>
>  struct S {
>bool x : 4;
> --- gcc/testsuite/g++.old-deja/g++.jason/bool5.C.jj 2017-09-15 
> 18:11:08.0 +0200
> +++ gcc/testsuite/g++.old-deja/g++.jason/bool5.C2017-09-19 
> 14:17:19.708552643 +0200
> @@ -1,4 +1,5 @@
> -// { dg-do run  }
> +// { dg-do run { target c++14_down } }
> +// { dg-do compile { target c++17 } }
>  int main ()
>  {
>bool b = false;
>
> Jakub


Re: [PATCH 1/3] improve detection of attribute conflicts (PR 81544)

2017-09-19 Thread Martin Sebor

On 09/06/2017 05:30 PM, Joseph Myers wrote:

On Thu, 17 Aug 2017, Martin Sebor wrote:


+/* Check LAST_DECL and NODE of the same symbol for attributes that are
+   recorded in EXCL to be mutually exclusive with ATTRNAME, diagnose
+   them, and return true if any have been found.  NODE can be a DECL
+   or a TYPE.  */
+
+static bool
+diag_attr_exclusions (tree last_decl, tree node, tree attrname,
+ const attribute_spec *spec)


EXCL is not an argument to this function, so the comment above it should
not refer to EXCL (presumably it should refer to SPEC instead).


+   note &= warning (OPT_Wattributes,
+"ignoring attribute %qE in declaration of "
+"a built-in function qD because it conflicts "
+"with attribute %qs",
+attrname, node, excl->name);


%qD not qD, presumably.


Yes, thanks.  Fixed.



(Generically, warning_at would be preferred to warning, but that may best
be kept separate if you don't already have a location available here.)


I agree (on both counts).




+static const struct attribute_spec::exclusions attr_gnu_inline_exclusions[] =
+{
+  ATTR_EXCL ("gnu_inline", true, true, true),
+  ATTR_EXCL ("noinline", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};


This says gnu_inline is incompatible with noinline, and is listed as the
EXCL field for the gnu_inline attribute.


+static const struct attribute_spec::exclusions attr_inline_exclusions[] =
+{
+  ATTR_EXCL ("always_inline", true, true, true),
+  ATTR_EXCL ("noinline", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};


This is listed as the EXCL field for the noinline attribute, but does not
mention gnu_inline.  Does this mean some asymmetry in when that pair is
diagnosed?  I don't see tests for that pair added by the patch.


Ah, good catch!  always_inline and gnu_inline are compatible with
one another so what's missing is noinline_exclusions.  I've added
it, simplified the others, and included a test case.


(Of course, gnu_inline + always_inline is OK, and attr_inline_exclusions
is also used for the always_inline attribute in this patch.)

In general, the data structures where you need to ensure manually that if
attribute A is listed in EXCL for B, then attribute B is also listed in
EXCL for A, seem concerning.  I'd expect either data structures that make
such asymmetry impossible, or a self-test that verifies that the tables in
use are in fact symmetric (unless there is some reason the symmetry is not
in fact required and symmetric diagnostics still result from asymmetric
tables - in which case the various combinations and orderings of
gnu_inline and noinline definitely need tests to show that the diagnostics
work).


If I understand correctly what you're concerned about then I don't
think there are any such cases in the updated version of the patch.


+both the @code{const} and the @code{pure} attribute is diagnnosed.


s/diagnnosed/diagnosed/


Thanks. In the attached update I've corrected this and the other
issues you and Jeff pointed out.

Martin

PR c/81544 - attribute noreturn and warn_unused_result on the same function accepted

gcc/c/ChangeLog:

	PR c/81544
	* c-decl.c (c_decl_attributes): Look up existing declaration and
	pass it to decl_attributes.

gcc/c-family/ChangeLog:

	PR c/81544
	* c-attribs.c (attr_aligned_exclusions): New array.
	(attr_alloc_exclusions, attr_cold_hot_exclusions): Same.
	(attr_common_exclusions, attr_const_pure_exclusions): Same.
	(attr_gnu_inline_exclusions, attr_inline_exclusions): Same.
	(attr_noreturn_exclusions, attr_returns_twice_exclusions): Same.
	(attr_warn_unused_result_exclusions): Same.
	(handle_hot_attribute, handle_cold_attribute): Simplify.
	(handle_const_attribute): Warn on function returning void.
	(handle_pure_attribute): Same.
	* c-warn.c (diagnose_mismatched_attributes): Simplify.

gcc/ChangeLog:

	PR c/81544
	* attribs.c (empty_attribute_table): Initialize new member of
	struct attribute_spec.
	(decl_attributes): Add argument.  Handle mutually exclusive
	combinations of attributes.
	* attribs.h (decl_attributes): Add default argument.
	* tree-core.h (attribute_spec::exclusions, exclude): New type and
	member.
	* doc/extend.texi (Common Function Attributes): Update const and pure.

gcc/testsuite/ChangeLog:

	PR c/81544
	* c-c++-common/Wattributes-2.c: New test.
	* c-c++-common/Wattributes.c: New test.
	* c-c++-common/attributes-3.c: Adjust.
	* gcc.dg/attr-noinline.c: Adjust.
	* gcc.dg/pr44964.c: Same.
	* gcc.dg/torture/pr42363.c: Same.
	* gcc.dg/tree-ssa/ssa-ccp-2.c: Same.

diff --git a/gcc/attribs.c b/gcc/attribs.c
index 05fa8ef..3f9ae37 100644
--- a/gcc/attribs.c
+++ b/gcc/attribs.c
@@ -94,7 +94,7 @@ static bool attributes_initialized = false;
 
 static const struct attribute_spec empty_attribute_table[] =
 {
-  { NULL, 0, 0, false, false, false, NULL, false }
+  { NULL, 0, 0, false, false, false, NULL, false, 

Re: [Patch, Fortran] PR 82018: -Wextra should imply -Wconversion-extra

2017-09-19 Thread Thomas Koenig

Am 18.09.2017 um 11:50 schrieb Dominique d'Humières:

Warning: Conversion from 'REAL(4)' to 'REAL(8)' at (1) [-Wconversion-extra]


Not me (not in the general case)


even if may allow to detect things such as ‘pi8=acos(-1.0)’?


This one would be interesting to catch (even with -Wall), although
it would be hard to avoid false positives.

I agree with Dominique that enabling -Wconversion-extra
the way it is now would be too noisy.

Are there warnings currently enabled with -Wconversion-extra
which would be useful in -Wextra (or even ones currently not in
-Wconversion-extra) without having too many false positives?

r8 = 1./3. could be one example, r8=acos(-1.0) another.  Something
like "it must have been calculated with a formula and, if evaluated
in the precision of the lhs, gives a different result from the one
if the user had specified it with the correct precision".

We don't do that yet, but I think it could be useful even with -Wall,
and certainly with -Wextra.

What do you think?


Re: [PATCH 1/3] improve detection of attribute conflicts (PR 81544)

2017-09-19 Thread Martin Sebor

On 09/12/2017 11:06 AM, Jeff Law wrote:

On 08/17/2017 10:03 AM, Martin Sebor wrote:


First, there is the risk that someone will write code based
on the pure declaration even if there's no intervening call
to the function between it and the const one.  Code tends to
be sloppy, and it's also not uncommon to declare the same
function more than once, for whatever reason.  (The ssa-ccp-2.c
test is an example of the latter.)

True.  But I think if we get into a state where the semantics of the
implementation are incompatible with the listed attributes (and we can
reliably detect that) then we should issue a separate and distinct
warning.   For example a pure function that reads global memory should
issue a warning/error.

ISTM such a warning/error would be a separate and distinct patch than
detection of attribute conflicts.


I agree.  This warning is for the case when the implementation
is not visible and the mismatch between the declaration and its
definition cannot be detected.  E.g., when the user of a function
defined in another translation unit declares it based on
a different (e.g., earlier) version of it, or based on
a misunderstanding of its guarantees.


Second, there are cases of attribute conflicts that GCC already
points out that are much more benign in their effects (the ones
I know about are always_inline/noinline and cold/hot).  In light
of the risk above it seems only helpful to include const/pure in
the same set.

always_inline/noinline conflicts can cause build failures in codebases
that require certain functions to be inlined.  We can argue about
whether or not that's a good policy for those codebases, but they
certainly exist.  I could envision the hot/cold case generating a link
error if the linker script shoved them into different memory segments
and getting them wrong caused an overflow.

But these IMHO are corner cases and should not drive major decisions here.


The consequence of a const/pure conflict can be "wrong" code (i.e.,
generating code for a call to a function declared const when its
definition is, in fact, only pure, will have unexpected effects).
When both the definition and the mismatched declaration are visible
the mismatch between it and its declaration could be detected (as
a future enhancement).  But without it, the second best and, IMO,
second most helpful thing GCC can do for two declarations of the
same function, one const and the other pure, is to issue a warning
about the mismatched guarantees the declarations provide to the
caller.  (The ideal would be to use the pure attribute instead
of const and also emit a warning.)


Third, I couldn't find another pair of attributes that GCC would
deliberately handle this way (silently accept both but prefer one
over the other), and introducing a special case just for these
two seemed like a wart.

Seems reasonable.


I do still have the question whether diagnosing attribute
conflicts under -Wattributes is right.  The manual implies
that -Wattributes is for attributes in the wrong places or
on the wrong entities, and that -Wignored-attributes should
be expected instead when GCC decides to drop one for some
reason.

This seems like it ought to be a distinct option given the way
-Wattributes is documented.  Alternately we can tweak the meaning of
-Wattributes.  I don't have strong opinions here.



Okay.





It is a little unfortunate that many -Wattributes warnings
print just "attribute ignored" (and have done so for years).
I think they should all be enhanced to also print why the
attribute is ignored (e.g., "'packed' attribute on function
declarations ignored/not valid/not supported" or something
like that).  Those that ignore attributes that would
otherwise be valid e.g., because they conflict with other
specifiers of the same attribute but with a different
operand might then be candidate for changing to
-Wignored-attributes.

Seems like a reasonable follow-up if you're interested.


Okay.






Thanks
Martin


gcc-81544-1.diff


PR c/81544 - attribute noreturn and warn_unused_result on the same function 
accepted

gcc/c/ChangeLog:

PR c/81544
* c-decl.c (c_decl_attributes): Look up existing declaration and
pass it to decl_attributes.

gcc/c-family/ChangeLog:

PR c/81544
* c-attribs.c (attr_aligned_exclusions): New array.
(attr_alloc_exclusions, attr_cold_hot_exclusions): Same.
(attr_common_exclusions, attr_const_pure_exclusions): Same.
(attr_gnu_inline_exclusions, attr_inline_exclusions): Same.
(attr_noreturn_exclusions, attr_returns_twice_exclusions): Same.
(attr_warn_unused_result_exclusions): Same.
(handle_hot_attribute, handle_cold_attribute): Simplify.
(handle_const_attribute): Warn on function returning void.
(handle_pure_attribute): Same.
* c-warn.c (diagnose_mismatched_attributes): Simplify.

gcc/testsuite/ChangeLog:

PR c/81544
* c-c++-common/Wattributes-2.c: New test.
* c-c++-

Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV

2017-09-19 Thread Daniel Santos
On 09/19/2017 01:58 AM, Jakub Jelinek wrote:
> On Mon, Sep 18, 2017 at 06:10:29PM -0500, Daniel Santos wrote:
>> Mike, can you take a look at this please?
>>
>> On 09/18/2017 10:17 AM, Dominique d'Humières wrote:
>>> This patch (r252896) breaks bootstrap on x86_64-apple-darwin10 configured 
>>> with
>>>
>>> ../work/configure --prefix=/opt/gcc/gcc8w 
>>> --enable-languages=c,c++,fortran,objc,obj-c++,ada,lto 
>>> --with-gmp=/opt/mp-new --with-system-zlib --with-isl=/opt/mp-new 
>>> --enable-lto --enable-plugin
>>>
>>> /opt/gcc/build_w/./gcc/xgcc -B/opt/gcc/build_w/./gcc/ 
>>> -B/opt/gcc/gcc8w/x86_64-apple-darwin10.8.0/bin/ 
>>> -B/opt/gcc/gcc8w/x86_64-apple-darwin10.8.0/lib/ -isystem 
>>> /opt/gcc/gcc8w/x86_64-apple-darwin10.8.0/include -isystem 
>>> /opt/gcc/gcc8w/x86_64-apple-darwin10.8.0/sys-include-g -O2 -O2  -g -O2 
>>> -DIN_GCC-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format 
>>> -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition  -isystem 
>>> ./include   -mmacosx-version-min=10.5 -pipe -fno-common -g -DIN_LIBGCC2 
>>> -fbuilding-libgcc -fno-stack-protector   -mmacosx-version-min=10.5 -pipe 
>>> -fno-common -I. -I. -I../.././gcc -I../../../work/libgcc 
>>> -I../../../work/libgcc/. -I../../../work/libgcc/../gcc 
>>> -I../../../work/libgcc/../include  -DHAVE_CC_TLS -DUSE_EMUTLS -o 
>>> avx_savms64_s.o -MT avx_savms64_s.o -MD -MP -MF avx_savms64_s.dep -DSHARED 
>>> -c -xassembler-with-cpp ../../../work/libgcc/config/i386/avx_savms64.S
>>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps 
>>> %xmm15,-0x30(%rax)'
>>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps 
>>> %xmm14,-0x20(%rax)'
>>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps 
>>> %xmm13,-0x10(%rax)'
>>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps 
>>> %xmm12, (%rax)'
>>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps 
>>> %xmm11, 0x10(%rax)'
>>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps 
>>> %xmm10, 0x20(%rax)'
>>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps 
>>> %xmm9, 0x30(%rax)'
>>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps 
>>> %xmm8, 0x40(%rax)'
>>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps 
>>> %xmm7, 0x50(%rax)'
>>> ../../../work/libgcc/config/i386/savms64.h:47:no such instruction: `vmovaps 
>>> %xmm6, 0x60(%rax)'
>>> make[3]: *** [avx_savms64_s.o] Error 1
>>>
>>> Dominique
>> Thanks for the report.  AVX has been out since early 2011 and Wikipedia
>> claims that AVX support was added to OSX in version 10.6.8 in June 2011
>> and you seem to be using 10.8.0.  I would presume that also means that
>> the assembler supports it.  So I'm going to guess that it's the
>> "-mmacosx-version-min=10.5" parameter.  Can you please try setting that
>> to 10.6.8 and let me know the result?  I don't know what the minimum
>> system requirements for GCC 8 are going to be, but if it includes these
>> older versions of OSX then I'll have to figure out how to cope with it
>> in the libgcc build.
> What can be done in libgcc is detect in configure whether the assembler
> supports AVX, and if not, provide some alternative (e.g. because the insns
> are always the same, you could just code them as .byte or something similar).
>
> Say like:
> --- i386-asm.h2017-09-18 18:34:30.917126996 +0200
> +++ i386-asm.h2017-09-19 08:56:58.829559038 +0200
> @@ -70,6 +70,7 @@ ASMNAME(fn):
>  #ifdef MS2SYSV_STUB_AVX
>  # define MS2SYSV_STUB_PREFIX __avx_
>  # define MOVAPS vmovaps
> +# define BYTE .byte
>  #elif defined(MS2SYSV_STUB_SSE)
>  # define MS2SYSV_STUB_PREFIX __sse_
>  # define MOVAPS movaps
> @@ -84,7 +85,8 @@ ASMNAME(fn):
>   FUNC_END(PASTE2(MS2SYSV_STUB_PREFIX, base_name))
>  
>  /* Save SSE registers 6-15. off is the offset of rax to get to xmm6.  */
> -# define SSE_SAVE   \
> +# ifdef HAVE_AS_AVX
> +#  define SSE_SAVE  \
>   MOVAPS %xmm15,-0x30(%rax); \
>   MOVAPS %xmm14,-0x20(%rax); \
>   MOVAPS %xmm13,-0x10(%rax); \
> @@ -95,6 +97,21 @@ ASMNAME(fn):
>   MOVAPS %xmm8,  0x40(%rax); \
>   MOVAPS %xmm7,  0x50(%rax); \
>   MOVAPS %xmm6,  0x60(%rax)
> +# else
> +/* If the assembler doesn't have AVX support, emit machine code for
> +   the above directly.  */
> +#  define SSE_SAVE  \
> + BYTE 0x44, 0x0f, 0x29, 0x78, 0xd0; \
> + BYTE 0x44, 0x0f, 0x29, 0x70, 0xe0; \
> + BYTE 0x44, 0x0f, 0x29, 0x68, 0xf0; \
> + BYTE 0x44, 0x0f, 0x29, 0x20;   \
> + BYTE 0x44, 0x0f, 0x29, 0x58, 0x10; \
> + BYTE 0x44, 0x0f, 0x29, 0x50, 0x20; \
> + BYTE 0x44, 0x0f, 0x29, 0x48, 0x30; \
> + BYTE 0x44, 0x0f, 0x29, 0x40, 0x40; \
> + BYTE 0x0f, 0x29, 0x78, 0x50;   \
> + BYTE 0x0f, 0x29, 0x70, 0x60
> +# endif
>  
>  /* Restore SSE registers 6-15. off is the offse

Re: [Patch, Fortran] PR 82018: -Wextra should imply -Wconversion-extra

2017-09-19 Thread Dominique d'Humières

> Le 19 sept. 2017 à 21:59, Thomas Koenig  a écrit :
> 
> Am 18.09.2017 um 11:50 schrieb Dominique d'Humières:
>> Warning: Conversion from 'REAL(4)' to 'REAL(8)' at (1) [-Wconversion-extra]
> 
> Not me (not in the general case)
> 
>> even if may allow to detect things such as ‘pi8=acos(-1.0)’?
> 
> This one would be interesting to catch (even with -Wall), although
> it would be hard to avoid false positives.
> 
> I agree with Dominique that enabling -Wconversion-extra
> the way it is now would be too noisy.
> 
> Are there warnings currently enabled with -Wconversion-extra
> which would be useful in -Wextra (or even ones currently not in
> -Wconversion-extra) without having too many false positives?
> 
> r8 = 1./3. could be one example, r8=acos(-1.0) another.  Something
> like "it must have been calculated with a formula and, if evaluated
> in the precision of the lhs, gives a different result from the one
> if the user had specified it with the correct precision".
> 
> We don't do that yet, but I think it could be useful even with -Wall,
> and certainly with -Wextra.
> 
> What do you think?

Is it really worth the trouble?

I am really upset by the time spent on warning at the expense of more serious 
problems.

Dominique



Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV

2017-09-19 Thread Daniel Santos
On 09/19/2017 07:13 AM, Rainer Orth wrote:
> Daniel Santos  writes:
>
>> On 09/17/2017 10:53 AM, Uros Bizjak wrote:
>>> OK.
>>>
>>> Thanks,
>>> Uros.
>> Thanks. I should have posted this Friday when my tests finished, but
>> I'll be committing with one minor change so tests don't run on m32 or mx32:
>>
>> --- a/gcc/testsuite/gcc.target/i386/pr82196-1.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c
>> @@ -1,4 +1,4 @@
>> -/* { dg-do compile } */
>> +/* { dg-do compile { target lp64 } } */
>>  /* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */
>>  /* { dg-final { scan-assembler "call.*__sse_savms64_18" } } */
>>  /* { dg-final { scan-assembler "jmp.*__sse_resms64x_18" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/pr82196-2.c
>> b/gcc/testsuite/gcc.target/i386/pr82196-2.c
>> index 31705bee29b..8fe58411d5e 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr82196-2.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr82196-2.c
>> @@ -1,4 +1,4 @@
>> -/* { dg-do compile } */
>> +/* { dg-do compile { target lp64 } } */
>>  /* { dg-options "-mavx -mcall-ms2sysv-xlogues -O2" } */
>>  /* { dg-final { scan-assembler "call.*__avx_savms64_18" } } */
>>  /* { dg-final { scan-assembler "jmp.*__avx_resms64x_18" } } */
>>
>> Other than that, full regression tests pass.
> However, they do FAIL on 64-bit Solaris/x86:
>
> +FAIL: gcc.target/i386/pr82196-1.c (test for excess errors)
>
> Excess errors:
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.target/i386/pr82196-1.c:14:1: 
> error: bp cannot be used in asm here
>
> +UNRESOLVED: gcc.target/i386/pr82196-1.c scan-assembler call.*__sse_savms64_18
> +UNRESOLVED: gcc.target/i386/pr82196-1.c scan-assembler jmp.*__sse_resms64x_18
> +FAIL: gcc.target/i386/pr82196-2.c (test for excess errors)
> +UNRESOLVED: gcc.target/i386/pr82196-2.c scan-assembler call.*__avx_savms64_18
> +UNRESOLVED: gcc.target/i386/pr82196-2.c scan-assembler jmp.*__avx_resms64x_18
>
>   Rainer

Sorry about that, I forgot about Solaris' default enabled frame
pointers.  I don't even need a test this complicated, I'll make it much
simpler.

Thanks,
Daniel


Re: [PATCH] Fix PR82255 (vectorizer cost model overcounts some vector load costs)

2017-09-19 Thread Richard Sandiford
Bill Schmidt  writes:
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c (revision 252760)
> +++ gcc/tree-vect-stmts.c (working copy)
> @@ -1091,8 +1091,19 @@ vect_model_load_cost (stmt_vec_info stmt_info, int
>   prologue_cost_vec, body_cost_vec, true);
>if (memory_access_type == VMAT_ELEMENTWISE
>|| memory_access_type == VMAT_STRIDED_SLP)
> -inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_construct,
> -  stmt_info, 0, vect_body);
> +{
> +  int group_size = GROUP_SIZE (stmt_info);
> +  int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
> +  if (group_size < nunits)
> + {
> +   if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> +  "vect_model_load_cost: vec_construct required");
> +   inside_cost += record_stmt_cost (body_cost_vec, ncopies,
> +vec_construct, stmt_info, 0,
> +vect_body);
> + }
> +}
>  
>if (dump_enabled_p ())
>  dump_printf_loc (MSG_NOTE, vect_location,

This feels like we've probably got the wrong memory_access_type.
If it's a just a contiguous load then it should be VMAT_CONTIGUOUS
instead.

Thanks,
Richard




Re: [PATCH 1/3] improve detection of attribute conflicts (PR 81544)

2017-09-19 Thread Joseph Myers
On Tue, 19 Sep 2017, Martin Sebor wrote:

> > In general, the data structures where you need to ensure manually that if
> > attribute A is listed in EXCL for B, then attribute B is also listed in
> > EXCL for A, seem concerning.  I'd expect either data structures that make
> > such asymmetry impossible, or a self-test that verifies that the tables in
> > use are in fact symmetric (unless there is some reason the symmetry is not
> > in fact required and symmetric diagnostics still result from asymmetric
> > tables - in which case the various combinations and orderings of
> > gnu_inline and noinline definitely need tests to show that the diagnostics
> > work).
> 
> If I understand correctly what you're concerned about then I don't
> think there are any such cases in the updated version of the patch.

I don't see how you ensure that it's not possible to have such asymmetry.  
My point wasn't so much "there was a bug in the previous patch version" as 
"the choice of data structures for defining such exclusions is prone to 
such bugs".  Which can be addressed either by using different data 
structures (e.g. listing incompatible pairs in a single array) or by a 
self-test to verify symmetry so a compiler with asymmetry doesn't build.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH,rs6000] Replace swap of a loaded vector constant with load of a swapped vector constant

2017-09-19 Thread Bill Schmidt
Hi Kelvin,

This is in quite good shape.  In addition to Segher's comments, I have a few
questions/requests below.

> On Sep 15, 2017, at 4:04 PM, Kelvin Nilsen  
> wrote:
> 
> @@ -385,6 +396,25 @@ const_load_sequence_p (swap_web_entry *insn_entry,
> split_const (XVECEXP (tocrel_base, 0, 0), &base, &offset);
> if (GET_CODE (base) != SYMBOL_REF || !CONSTANT_POOL_ADDRESS_P (base))
>   return false;
> +   else
> + {
> +   /* FIXME: The conditions under which
> +*  ((GET_CODE (const_vector) == SYMBOL_REF) &&
> +*   !CONSTANT_POOL_ADDRESS_P (const_vector))
> +* are not well understood.  This code prevents
> +* an internal compiler error which will occur in
> +* replace_swapped_load_constant () if we were to return
> +* true.  Some day, we should figure out how to properly
> +* handle this condition in
> +* replace_swapped_load_constant () and then we can
> +* remove this special test.  */
> +   rtx const_vector = get_pool_constant (base);
> +   if (GET_CODE (const_vector) == SYMBOL_REF)
> + {
> +   if (!CONSTANT_POOL_ADDRESS_P (const_vector))
> + return false;
> + }

Rewrite as

if (GET_CODE (const_vector) == SYMBOL_REF
&& !CONSTANT_POOL_ADDRESS_P (const_vector))
  return false;

> + }
>   }
> }
>   return true;
> @@ -1281,6 +1311,189 @@ replace_swap_with_copy (swap_web_entry *insn_entry
>   insn->set_deleted ();
> }
> 
> +/* Given that swap_insn represents a swap of a load of a constant
> +   vector value, replace with a single instruction that loads a
> +   swapped variant of the original constant. 
> +
> +   The "natural" representation of a byte array in memory is the same
> +   for big endian and little endian.
> +
> +   unsigned char byte_array[] =
> + { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f };
> +
> +   However, when loaded into a vector register, the representation
> +   depends on endian conventions.
> +
> +   In big-endian mode, the register holds:
> +
> + MSBLSB
> + [ f, e, d, c, b, a, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 ]
> +
> +   In little-endian mode, the register holds:
> +
> + MSBLSB
> + [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f ]
> +
> +
> +   Word arrays require different handling.  Consider the word array:
> +
> +   unsigned int word_array[] =
> + { 0x00010203, 0x04050607, 0x08090a0b, 0x0c0d0e0f };
> +
> +   The in-memory representation depends on endian configuration.  The
> +   equivalent array, declared as a byte array, in memory would be:
> +
> +   unsigned char big_endian_word_array_data[] =
> + { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f }
> +
> +   unsigned char little_endian_word_array_data[] =
> + { 3, 2, 1, 0, 7, 6, 5, 4, b, a, 9, 8, f, e, d, c }
> +
> +   In big-endian mode, the register holds:
> +
> + MSBLSB
> + [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f ]
> +
> +   In little-endian mode, the register holds:
> +
> + MSBLSB
> + [ c, d, e, f, 8, 9, a, b, 4, 5, 6, 7, 0, 1, 2, 3 ]
> +
> +
> +  Similar transformations apply to the vector of half-word and vector
> +  of double-word representations.

Thanks for providing these examples.

> +
> +  For now, don't handle vectors of quad-precision values.  Just return.
> +  A better solution is to fix the code generator to emit lvx/stvx for
> +  those.  */
> +static void
> +replace_swapped_load_constant (swap_web_entry *insn_entry, rtx swap_insn)
> +{
> +  /* Find the load.  */
> +  struct df_insn_info *insn_info = DF_INSN_INFO_GET (swap_insn);
> +  rtx_insn *load_insn = 0;
> +  df_ref use;
> +
> +  FOR_EACH_INSN_INFO_USE (use, insn_info)
> +{
> +  struct df_link *def_link = DF_REF_CHAIN (use);
> +  gcc_assert (def_link && !def_link->next);
> +  load_insn = DF_REF_INSN (def_link->ref);
> +  break;
> +}
> +  gcc_assert (load_insn);

I agree with Segher, please rewrite the above to not use a loop.  The
code here is correct, just awkward.  You are assuming that the first
use in a swap instruction will be the register you're looking for, and
that is correct.

> +
> +  /* Find the TOC-relative symbol access.  */
> +  insn_info = DF_INSN_INFO_GET (load_insn);
> +  rtx_insn *tocrel_insn = 0;
> +  FOR_EACH_INSN_INFO_USE (use, insn_info)
> +{
> +  struct df_link *def_link = DF_REF_CHAIN (use);
> +  gcc_assert (def_link && !def_link->next);
> +  tocrel_insn = DF_REF_INSN (def_link->ref);
> +  break;
> +}
> +  gcc_assert (tocrel_insn);

Please check all the uses here.  I am a little nervous about making
a similar assumption about a base register being the first use in
the load.  While this is likely provab

[PATXH, i386]: Add BT setcc patterns

2017-09-19 Thread Uros Bizjak
Hello!

As mentioned in PR 82259. These are similar to existing BT jcc patterns.

2017-09-19  Uros Bizjak  

* config/i386/i386.md (*scc_bt): New insn_and_split pattern.
(*scc_bt_1): Ditto.
(*scc_bt_mask): Ditto.

testsuite/ChangeLog:

2017-09-19  Uros Bizjak  

* gcc.target/i386/bt-5.c: New test.
* gcc.target/i386/bt-6.c: Ditto.
* gcc.target/i386/bt-mask-3.c: Ditto.
* gcc.target/i386/bt-mask-4.c: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 252984)
+++ config/i386/i386.md (working copy)
@@ -11246,6 +11246,98 @@
  (const_string "SI")
  (const_string "")))])
 
+(define_insn_and_split "*scc_bt"
+  [(set (match_operand:QI 0 "nonimmediate_operand")
+   (match_operator 1 "bt_comparison_operator"
+ [(zero_extract:SWI48
+(match_operand:SWI48 2 "nonimmediate_operand")
+(const_int 1)
+(match_operand:SI 3 "nonmemory_operand"))
+  (const_int 0)]))
+   (clobber (reg:CC FLAGS_REG))]
+  "(TARGET_USE_BT || optimize_function_for_size_p (cfun))
+   && (CONST_INT_P (operands[3])
+   ? (INTVAL (operands[3]) < GET_MODE_BITSIZE (mode)
+ && INTVAL (operands[3])
+  >= (optimize_function_for_size_p (cfun) ? 8 : 32))
+   : !memory_operand (operands[2], mode))
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (reg:CCC FLAGS_REG)
+   (compare:CCC
+ (zero_extract:SWI48
+   (match_dup 2)
+   (const_int 1)
+   (match_dup 3))
+ (const_int 0)))
+   (set (match_dup 0)
+   (match_op_dup 1 [(reg:CCC FLAGS_REG) (const_int 0)]))]
+{
+  operands[1] = shallow_copy_rtx (operands[1]);
+  PUT_CODE (operands[1], reverse_condition (GET_CODE (operands[1])));
+})
+
+(define_insn_and_split "*scc_bt_1"
+  [(set (match_operand:QI 0 "nonimmediate_operand")
+   (match_operator 1 "bt_comparison_operator"
+ [(zero_extract:SWI48
+(match_operand:SWI48 2 "register_operand")
+(const_int 1)
+(zero_extend:SI
+  (match_operand:QI 3 "register_operand")))
+  (const_int 0)]))
+   (clobber (reg:CC FLAGS_REG))]
+  "(TARGET_USE_BT || optimize_function_for_size_p (cfun))
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (reg:CCC FLAGS_REG)
+   (compare:CCC
+ (zero_extract:SWI48
+   (match_dup 2)
+   (const_int 1)
+   (match_dup 3))
+ (const_int 0)))
+   (set (match_dup 0)
+   (match_op_dup 1 [(reg:CCC FLAGS_REG) (const_int 0)]))]
+{
+  operands[3] = lowpart_subreg (SImode, operands[3], QImode);
+  operands[1] = shallow_copy_rtx (operands[1]);
+  PUT_CODE (operands[1], reverse_condition (GET_CODE (operands[1])));
+})
+
+;; Avoid useless masking of bit offset operand.
+(define_insn_and_split "*scc_bt_mask"
+  [(set (match_operand:QI 0 "nonimmediate_operand")
+   (match_operator 1 "bt_comparison_operator"
+ [(zero_extract:SWI48
+(match_operand:SWI48 2 "register_operand")
+(const_int 1)
+(and:SI
+  (match_operand:SI 3 "register_operand")
+  (match_operand 4 "const_int_operand")))]))
+   (clobber (reg:CC FLAGS_REG))]
+  "(TARGET_USE_BT || optimize_function_for_size_p (cfun))
+   && (INTVAL (operands[4]) & (GET_MODE_BITSIZE (mode)-1))
+  == GET_MODE_BITSIZE (mode)-1
+   && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(set (reg:CCC FLAGS_REG)
+   (compare:CCC
+ (zero_extract:SWI48
+   (match_dup 2)
+   (const_int 1)
+   (match_dup 3))
+ (const_int 0)))
+   (set (match_dup 0)
+   (match_op_dup 1 [(reg:CCC FLAGS_REG) (const_int 0)]))]
+{
+  operands[1] = shallow_copy_rtx (operands[1]);
+  PUT_CODE (operands[1], reverse_condition (GET_CODE (operands[1])));
+})
+
 (define_insn_and_split "*jcc_bt"
   [(set (pc)
(if_then_else (match_operator 0 "bt_comparison_operator"
Index: testsuite/gcc.target/i386/bt-5.c
===
--- testsuite/gcc.target/i386/bt-5.c(nonexistent)
+++ testsuite/gcc.target/i386/bt-5.c(working copy)
@@ -0,0 +1,11 @@
+/* PR target/36473 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=core2" } */
+/* { dg-additional-options "-mregparm=2" { target ia32 } } */
+
+int test(unsigned x, unsigned n)
+{
+  return !(x & ( 0x01 << n ));
+}
+
+/* { dg-final { scan-assembler "btl\[ \t\]" } } */
Index: testsuite/gcc.target/i386/bt-6.c
===
--- testsuite/gcc.target/i386/bt-6.c(nonexistent)
+++ testsuite/gcc.target/i386/bt-6.c(working copy)
@@ -0,0 +1,12 @@
+/* PR target/36473 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=core2" } */
+/* { dg-additional-options "-mregparm=2" { target ia32 } } */
+
+int test(unsig

[PATCH] Improve DW_AT_const_value with early_dwarf

2017-09-19 Thread Jakub Jelinek
On Tue, Sep 19, 2017 at 04:11:52PM +0200, Jakub Jelinek wrote:
> Will try now following plus testcase, the rest of constants I believe end up
> being DW_FORM_block encoded and so is pretty much what we emit even for the
> initializer_constant_valid_p tree fallback case.

I've bootstrapped/regtested this on x86_64-linux and i686-linux, ok for
trunk?  Most of the testcase changes are then to make sure newlines don't
match where they shouldn't, because this regression would be actually caught
by the testsuite otherwise.

.debug_info is ~ 28KB smaller.
On previous bootstrap from earlier today I saw:
readelf -wa obj96/gcc/cc1plus | grep DW_AT_const_value | sort | uniq -c | sort 
-n
  1 DW_AT_const_value  DW_FORM_block2
  3 DW_AT_const_value  DW_FORM_data8
 71 DW_AT_const_value  DW_FORM_string
311 DW_AT_const_value  DW_FORM_strp
472 DW_AT_const_value  DW_FORM_sdata
656 DW_AT_const_value  DW_FORM_data4
787 DW_AT_const_value  DW_FORM_data2
   2182 DW_AT_const_value  DW_FORM_block1
   2598 DW_AT_const_value  DW_FORM_data1
and with this patch:
readelf -wa obj98/gcc/cc1plus | grep DW_AT_const_value | sort | uniq -c | sort 
-n
  1 DW_AT_const_value  DW_FORM_block2
 71 DW_AT_const_value  DW_FORM_string
311 DW_AT_const_value  DW_FORM_strp
346 DW_AT_const_value  DW_FORM_data8
478 DW_AT_const_value  DW_FORM_block1
495 DW_AT_const_value  DW_FORM_sdata
   1123 DW_AT_const_value  DW_FORM_data2
   1362 DW_AT_const_value  DW_FORM_data4
   4089 DW_AT_const_value  DW_FORM_data1
so the number of these attributes with constant class significantly increased
over block class and the number of attributes increased too.

2017-09-19  Jakub Jelinek  

* dwarf2out.c (tree_add_const_value_attribute): For INTEGER_CST
that fits into uhwi or shwi, add DW_AT_const_value regardless
of early_dwarf without going through RTL, using add_AT_unsigned
or add_AT_int.

* g++.dg/debug/dwarf2/template-params-13.C: New test.
* g++.dg/debug/dwarf2/template-params-5.C: Adjust regexps so that
it doesn't match newlines.
* g++.dg/debug/dwarf2/template-params-3.C: Likewise.
* g++.dg/debug/dwarf2/template-func-params-3.C: Likewise.
* g++.dg/debug/dwarf2/lang-cpp98.C: Likewise.
* g++.dg/debug/dwarf2/template-func-params-2.C: Likewise.
* g++.dg/debug/dwarf2/template-func-params-1.C: Likewise.
* g++.dg/debug/dwarf2/template-func-params-5.C: Likewise.
* g++.dg/debug/dwarf2/template-params-1.C: Likewise.
* g++.dg/debug/dwarf2/template-params-2.C: Likewise.
* g++.dg/debug/dwarf2/lang-cpp14.C: Likewise.
* g++.dg/debug/dwarf2/lang-cpp11.C: Likewise.

--- gcc/dwarf2out.c.jj  2017-09-15 18:11:03.0 +0200
+++ gcc/dwarf2out.c 2017-09-19 16:03:27.678337475 +0200
@@ -19440,6 +19440,19 @@ tree_add_const_value_attribute (dw_die_r
   init = t;
   gcc_assert (!DECL_P (init));
 
+  if (TREE_CODE (init) == INTEGER_CST)
+{
+  if (tree_fits_uhwi_p (init))
+   {
+ add_AT_unsigned (die, DW_AT_const_value, tree_to_uhwi (init));
+ return true;
+   }
+  if (tree_fits_shwi_p (init))
+   {
+ add_AT_int (die, DW_AT_const_value, tree_to_shwi (init));
+ return true;
+   }
+}
   if (! early_dwarf)
 {
   rtl = rtl_for_decl_init (init, type);
--- gcc/testsuite/g++.dg/debug/dwarf2/template-params-5.C.jj2015-07-09 
11:07:21.0 +0200
+++ gcc/testsuite/g++.dg/debug/dwarf2/template-params-5.C   2017-09-19 
16:29:48.828668182 +0200
@@ -2,7 +2,7 @@
 // Origin PR debug/30161
 // { dg-options "-gdwarf-2 -dA" }
 // { dg-final { scan-assembler "DW_TAG_template_type_param" } }
-// { dg-final { scan-assembler "T.*DW_AT_name" } }
+// { dg-final { scan-assembler "T\[^\n\r]* DW_AT_name" } }
 
 template 
 struct vector
--- gcc/testsuite/g++.dg/debug/dwarf2/template-params-3.C.jj2015-07-09 
11:07:21.0 +0200
+++ gcc/testsuite/g++.dg/debug/dwarf2/template-params-3.C   2017-09-19 
16:29:33.812855410 +0200
@@ -2,7 +2,7 @@
 // Origin PR debug/30161
 // { dg-options "-gdwarf-2 -dA -gno-strict-dwarf -fno-merge-debug-strings" }
 // { dg-final { scan-assembler "DW_TAG_template_value_param" } }
-// { dg-final { scan-assembler "f.*DW_AT_name" } }
+// { dg-final { scan-assembler "f\[^\n\r]* DW_AT_name" } }
 // { dg-final { scan-assembler 
"DW_AT_location\[^\\r\\n\]*\[\\r\\n\]*\[^\\r\\n\]*DW_OP_addr\[^\\r\\n\]*\[\\r\\n\]*\[^\\r\\n\]*_Z4blehv\[^\\r\\n\]*\[\\r\\n\]*\[^\\r\\n\]*DW_OP_stack_value"
 } } */
 
 typedef void (*func_ptr) ();
--- gcc/testsuite/g++.dg/debug/dwarf2/template-func-params-3.C.jj   
2015-07-09 11:07:21.0 +0200
+++ gcc/testsuite/g++.dg/debug/dwarf2/template-func-params-3.C  2017-09-19 
16:27:59.886026562 +0200
@@ -2,7 +2,7 @@
 // Origin PR debug/30161
 // { dg-options "-gdwarf-2 -dA -gno-strict-dwarf -fno-merge-debug-strings" }
 //

[PATCH] rs6000: Don't touch below the stack pointer (PR77687)

2017-09-19 Thread Segher Boessenkool
With the 32-bit SVR4 ABI we don't have a red zone, so we have to restore
the callee-saved registers before we restore the stack pointer.

The previous fix for this PR failed in two ways, for huge frames: first,
we use a negative offset from r11 in that case, so the (mem:BLK 11) access
does no good; second, sched does not handle accesses to mem:BLK correctly
in this case (does not make dependencies).

This patch fixes it by doing a store to (mem:BLK (scratch)) instead.
This means no unrelated (not to stack) loads/stores can be moved over the
stack restore either, but so be it.

I intend to commit this tomorrow if there are no objections.


Segher


2017-09-19  Segher Boessenkool  

PR target/77687
* config/rs6000/rs6000.md (stack_restore_tie): Store to a scratch
address instead of to r1 and r11.

gcc/testsuite/
PR target/77687
* gcc.target/powerpc/pr77687.c: New testcase.


---
 gcc/config/rs6000/rs6000.md|  6 ++
 gcc/testsuite/gcc.target/powerpc/pr77687.c | 20 
 2 files changed, 22 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr77687.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 7f17628..13ba743 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -13161,14 +13161,12 @@ (define_insn "stack_tie"
 
 ; Some 32-bit ABIs do not have a red zone, so the stack deallocation has to
 ; stay behind all restores from the stack, it cannot be reordered to before
-; one.  See PR77687.  This insn is an add or mr, and a stack_tie on the
-; operands of that.
+; one.  See PR77687.  This insn is an add or mr, and a memory clobber.
 (define_insn "stack_restore_tie"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
(plus:SI (match_operand:SI 1 "gpc_reg_operand" "r,r")
 (match_operand:SI 2 "reg_or_cint_operand" "O,rI")))
-   (set (mem:BLK (match_dup 0)) (const_int 0))
-   (set (mem:BLK (match_dup 1)) (const_int 0))]
+   (set (mem:BLK (scratch)) (const_int 0))]
   "TARGET_32BIT"
   "@
mr %0,%1
diff --git a/gcc/testsuite/gcc.target/powerpc/pr77687.c 
b/gcc/testsuite/gcc.target/powerpc/pr77687.c
new file mode 100644
index 000..95b27ae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr77687.c
@@ -0,0 +1,20 @@
+/* { dg-options "-std=gnu99 -O2" } */
+/* { dg-final { scan-assembler-not {\mmr r?1,r?11\M.*11.*\mblr\M} } } */
+
+/* PR77687: We used to do stack accesses (via r11) after restoring r1.  */
+
+void g(int, char *);
+const char * dum = "hello";
+
+void f(int x)
+{
+   char big[20];
+ start:
+   g(x, big);
+   g(x, big);
+   register void *p asm("r11") = &&start;
+   asm("" : : "r"(p));
+   asm("" : : :"r28");
+   asm("" : : :"r29");
+   asm("" : : :"r30");
+}
-- 
1.8.3.1



Re: [PATCH] Fix PR82255 (vectorizer cost model overcounts some vector load costs)

2017-09-19 Thread Bill Schmidt
On Sep 19, 2017, at 3:58 PM, Richard Sandiford  
wrote:
> 
> Bill Schmidt  writes:
>> Index: gcc/tree-vect-stmts.c
>> ===
>> --- gcc/tree-vect-stmts.c(revision 252760)
>> +++ gcc/tree-vect-stmts.c(working copy)
>> @@ -1091,8 +1091,19 @@ vect_model_load_cost (stmt_vec_info stmt_info, int
>>  prologue_cost_vec, body_cost_vec, true);
>>   if (memory_access_type == VMAT_ELEMENTWISE
>>   || memory_access_type == VMAT_STRIDED_SLP)
>> -inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_construct,
>> - stmt_info, 0, vect_body);
>> +{
>> +  int group_size = GROUP_SIZE (stmt_info);
>> +  int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
>> +  if (group_size < nunits)
>> +{
>> +  if (dump_enabled_p ())
>> +dump_printf_loc (MSG_NOTE, vect_location,
>> + "vect_model_load_cost: vec_construct required");
>> +  inside_cost += record_stmt_cost (body_cost_vec, ncopies,
>> +   vec_construct, stmt_info, 0,
>> +   vect_body);
>> +}
>> +}
>> 
>>   if (dump_enabled_p ())
>> dump_printf_loc (MSG_NOTE, vect_location,
> 
> This feels like we've probably got the wrong memory_access_type.
> If it's a just a contiguous load then it should be VMAT_CONTIGUOUS
> instead.

I tend to agree -- that will take more surgery to the code that detects strided 
loads in
both the cost model analysis and in vectorizable_load.

Bill

> 
> Thanks,
> Richard
> 
> 



Re: [PATCH][2/2] early LTO debug, main part

2017-09-19 Thread Jakub Jelinek
On Tue, Sep 19, 2017 at 05:24:14PM +0200, Jakub Jelinek wrote:
> These changes broke DWARF-5 support.  E.g. in gcc-7 and before this change
> there was:

Here is a fix, make check-g++ RUNTESTFLAGS=dwarf2.exp now passes again
even with !DWARF2_ASM_LINE_DEBUG_INFO compiler.  Haven't tried LTO (except
for what is in make check).  Bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk?

2017-09-19  Jakub Jelinek  

* dwarf2out.c (DEBUG_LTO_DWO_INFO_SECTION): Reorder defines.
(DEBUG_LTO_ABBREV_SECTION): Likewise.
(DEBUG_LTO_MACINFO_SECTION): Likewise.
(DEBUG_MACRO_SECTION): Likewise.
(DEBUG_LTO_MACRO_SECTION): Likewise.
(DEBUG_STR_DWO_SECTION): Likewise.
(DEBUG_LTO_STR_DWO_SECTION): Likewise.
(DEBUG_LTO_LINE_SECTION): Drop .dwo suffix from the name.
(DEBUG_LTO_DWO_LINE_SECTION): Define.
(DEBUG_LTO_LINE_STR_SECTION): Define.
(init_sections_and_labels): Initialize debug_line_str_section
variable.  Initialize debug_loc_section for -gdwarf-5 to
DEBUG_LOCLISTS_SECTION.  Formatting fixes.

--- gcc/dwarf2out.c.jj  2017-09-19 16:51:16.0 +0200
+++ gcc/dwarf2out.c 2017-09-19 18:07:51.348919965 +0200
@@ -3702,24 +3702,24 @@ new_addr_loc_descr (rtx addr, enum dtpre
 #ifndef DEBUG_DWO_INFO_SECTION
 #define DEBUG_DWO_INFO_SECTION ".debug_info.dwo"
 #endif
-#ifndef DEBUG_LTO_DWO_INFO_SECTION
-#define DEBUG_LTO_DWO_INFO_SECTION ".gnu.debuglto_.debug_info.dwo"
-#endif
 #ifndef DEBUG_LTO_INFO_SECTION
 #define DEBUG_LTO_INFO_SECTION ".gnu.debuglto_.debug_info"
 #endif
+#ifndef DEBUG_LTO_DWO_INFO_SECTION
+#define DEBUG_LTO_DWO_INFO_SECTION ".gnu.debuglto_.debug_info.dwo"
+#endif
 #ifndef DEBUG_ABBREV_SECTION
 #define DEBUG_ABBREV_SECTION   ".debug_abbrev"
 #endif
+#ifndef DEBUG_LTO_ABBREV_SECTION
+#define DEBUG_LTO_ABBREV_SECTION ".gnu.debuglto_.debug_abbrev"
+#endif
 #ifndef DEBUG_DWO_ABBREV_SECTION
 #define DEBUG_DWO_ABBREV_SECTION ".debug_abbrev.dwo"
 #endif
 #ifndef DEBUG_LTO_DWO_ABBREV_SECTION
 #define DEBUG_LTO_DWO_ABBREV_SECTION ".gnu.debuglto_.debug_abbrev.dwo"
 #endif
-#ifndef DEBUG_LTO_ABBREV_SECTION
-#define DEBUG_LTO_ABBREV_SECTION ".gnu.debuglto_.debug_abbrev"
-#endif
 #ifndef DEBUG_ARANGES_SECTION
 #define DEBUG_ARANGES_SECTION  ".debug_aranges"
 #endif
@@ -3729,35 +3729,38 @@ new_addr_loc_descr (rtx addr, enum dtpre
 #ifndef DEBUG_MACINFO_SECTION
 #define DEBUG_MACINFO_SECTION ".debug_macinfo"
 #endif
+#ifndef DEBUG_LTO_MACINFO_SECTION
+#define DEBUG_LTO_MACINFO_SECTION  ".gnu.debuglto_.debug_macinfo"
+#endif
 #ifndef DEBUG_DWO_MACINFO_SECTION
 #define DEBUG_DWO_MACINFO_SECTION  ".debug_macinfo.dwo"
 #endif
 #ifndef DEBUG_LTO_DWO_MACINFO_SECTION
 #define DEBUG_LTO_DWO_MACINFO_SECTION  ".gnu.debuglto_.debug_macinfo.dwo"
 #endif
-#ifndef DEBUG_LTO_MACINFO_SECTION
-#define DEBUG_LTO_MACINFO_SECTION  ".gnu.debuglto_.debug_macinfo"
+#ifndef DEBUG_MACRO_SECTION
+#define DEBUG_MACRO_SECTION".debug_macro"
+#endif
+#ifndef DEBUG_LTO_MACRO_SECTION
+#define DEBUG_LTO_MACRO_SECTION ".gnu.debuglto_.debug_macro"
 #endif
 #ifndef DEBUG_DWO_MACRO_SECTION
 #define DEBUG_DWO_MACRO_SECTION".debug_macro.dwo"
 #endif
-#ifndef DEBUG_MACRO_SECTION
-#define DEBUG_MACRO_SECTION".debug_macro"
-#endif
 #ifndef DEBUG_LTO_DWO_MACRO_SECTION
 #define DEBUG_LTO_DWO_MACRO_SECTION".gnu.debuglto_.debug_macro.dwo"
 #endif
-#ifndef DEBUG_LTO_MACRO_SECTION
-#define DEBUG_LTO_MACRO_SECTION ".gnu.debuglto_.debug_macro"
-#endif
 #ifndef DEBUG_LINE_SECTION
 #define DEBUG_LINE_SECTION ".debug_line"
 #endif
+#ifndef DEBUG_LTO_LINE_SECTION
+#define DEBUG_LTO_LINE_SECTION ".gnu.debuglto_.debug_line"
+#endif
 #ifndef DEBUG_DWO_LINE_SECTION
 #define DEBUG_DWO_LINE_SECTION ".debug_line.dwo"
 #endif
-#ifndef DEBUG_LTO_LINE_SECTION
-#define DEBUG_LTO_LINE_SECTION ".gnu.debuglto_.debug_line.dwo"
+#ifndef DEBUG_LTO_DWO_LINE_SECTION
+#define DEBUG_LTO_DWO_LINE_SECTION ".gnu.debuglto_.debug_line.dwo"
 #endif
 #ifndef DEBUG_LOC_SECTION
 #define DEBUG_LOC_SECTION  ".debug_loc"
@@ -3790,18 +3793,18 @@ new_addr_loc_descr (rtx addr, enum dtpre
 #ifndef DEBUG_LTO_DWO_STR_OFFSETS_SECTION
 #define DEBUG_LTO_DWO_STR_OFFSETS_SECTION 
".gnu.debuglto_.debug_str_offsets.dwo"
 #endif
-#ifndef DEBUG_STR_DWO_SECTION
-#define DEBUG_STR_DWO_SECTION   ".debug_str.dwo"
-#endif
-#ifndef DEBUG_LTO_STR_DWO_SECTION
-#define DEBUG_LTO_STR_DWO_SECTION ".gnu.debuglto_.debug_str.dwo"
-#endif
 #ifndef DEBUG_STR_SECTION
 #define DEBUG_STR_SECTION  ".debug_str"
 #endif
 #ifndef DEBUG_LTO_STR_SECTION
 #define DEBUG_LTO_STR_SECTION ".gnu.debuglto_.debug_str"
 #endif
+#ifndef DEBUG_STR_DWO_SECTION
+#define DEBUG_STR_DWO_SECTION   ".debug_str.dwo"
+#endif
+#ifndef DEBUG_LTO_STR_DWO_SECTION
+#define DEBUG_LTO_STR_DWO_SECTION ".gnu.debuglto_.debug_str.dwo"
+#endif
 #ifndef DEBUG_RANGES_SECTION
 #define DEBUG_RANGES_SECTION   ".debug_ranges"
 #endif
@@ -3811,6 +3814,9 @@ new_addr_loc_descr (rtx addr, enum dtpre
 #ifndef DEBUG_LINE_STR_SE

Re: [PATCH][x86] Knights Mill -march/-mtune options

2017-09-19 Thread Uros Bizjak
On Tue, Sep 19, 2017 at 9:01 AM, Peryt, Sebastian
 wrote:

>> >> >> > This patch adds  options -march=/-mtune=knm for Knights Mill.
>> >> >> >
>> >> >> > 2017-09-14  Sebastian Peryt   gcc/
>> >> >> >
>> >> >> > * config.gcc: Support "knm".
>> >> >> > * config/i386/driver-i386.c (host_detect_local_cpu): Detect 
>> >> >> > "knm".
>> >> >> > * config/i386/i386-c.c (ix86_target_macros_internal): Handle
>> >> >> > PROCESSOR_KNM.
>> >> >> > * config/i386/i386.c (m_KNM): Define.
>> >> >> > (processor_target_table): Add "knm".
>> >> >> > (PTA_KNM): Define.
>> >> >> > (ix86_option_override_internal): Add "knm".
>> >> >> > (ix86_issue_rate): Add PROCESSOR_KNM.
>> >> >> > (ix86_adjust_cost): Ditto.
>> >> >> > (ia32_multipass_dfa_lookahead): Ditto.
>> >> >> > (get_builtin_code_for_version): Handle PROCESSOR_KNM.
>> >> >> > (fold_builtin_cpu): Define M_INTEL_KNM.
>> >> >> > * config/i386/i386.h (TARGET_KNM): Define.
>> >> >> > (processor_type): Add PROCESSOR_KNM.
>> >> >> > * config/i386/x86-tune.def: Add m_KNM.
>> >> >> > * doc/invoke.texi: Add knm as x86 -march=/-mtune= CPU type.
>> >> >> >
>> >> >> >
>> >> >> > gcc/testsuite/
>> >> >> >
>> >> >> > * gcc.target/i386/funcspec-5.c: Test knm.
>> >> >> >
>> >> >> > Is it ok for trunk?
>> >> >>
>> >> >> You also have to update libgcc/cpuinfo.h together with
>> >> >> fold_builtin_cpu from i386.c. Please note that all new processor
>> >> >> types and subtypes have to be added at the end of the enum.
>> >> >>
>> >> >
>> >> > Uros,
>> >> >
>> >> > I have updated libgcc/cpuinfo.h and libgcc/cpuinfo.c. I understood
>> >> > that CPU_TYPE_MAX in libgcc/cpuinfo.h processor_types is some kind
>> >> > of barrier, this is why I put KNM before that. Is that correct thinking?
>> >> > As for fold_builtin_cpu in i386.c I already have something like this:
>> >> >
>> >> > @@ -34217,6 +34229,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
>> >> >  M_AMDFAM15H,
>> >> >  M_INTEL_SILVERMONT,
>> >> >  M_INTEL_KNL,
>> >> > +M_INTEL_KNM,
>> >> >  M_AMD_BTVER1,
>> >> >  M_AMD_BTVER2,
>> >> >  M_CPU_SUBTYPE_START,
>> >> > @@ -34262,6 +34275,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
>> >> >{"bonnell", M_INTEL_BONNELL},
>> >> >{"silvermont", M_INTEL_SILVERMONT},
>> >> >{"knl", M_INTEL_KNL},
>> >> > +  {"knm", M_INTEL_KNM},
>> >> >{"amdfam10h", M_AMDFAM10H},
>> >> >{"barcelona", M_AMDFAM10H_BARCELONA},
>> >> >{"shanghai", M_AMDFAM10H_SHANGHAI},
>> >> >
>> >> > I couldn't find any other place where I'm supposed to add anything 
>> >> > extra.
>> >>
>> >> Please look at libgcc/config/i386/cpuinfo.h. The comment here says that:
>> >>
>> >> /* Any new types or subtypes have to be inserted at the end. */
>> >>
>> >> The above patch should then add M_INTEL_KNM as the last entry
>> >> *before* M_CPU_SUBTYPE_START.
>> >>
>> >
>> > Sorry, I didn't notice this value at first. I believe now it's correct.
>>
>> OK for mainline SVN (with updated ChangeLog).
>>
>
> Can you please commit for me?

Please send an updated ChangeLog.

Uros.


RE: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation

2017-09-19 Thread Tsimbalist, Igor V
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> Sent: Tuesday, September 19, 2017 6:13 PM
> To: Tsimbalist, Igor V 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
> 
> On Tue, Sep 19, 2017 at 5:18 PM, Tsimbalist, Igor V
>  wrote:
> >> -Original Message-
> >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> >> ow...@gcc.gnu.org] On Behalf Of Uros Bizjak
> >> Sent: Monday, September 18, 2017 12:17 PM
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: Tsimbalist, Igor V ; Tsimbalist,
> >> Igor V 
> >> Subject: Re: 0006-Part-6.-Add-x86-tests-for-Intel-CET-implementation
> >>
> >> Hello!
> >>
> >> > gcc/testsuite/
> >> >
> >> > * g++.dg/cet-notrack-1.C: New test.
> >> > * gcc.target/i386/cet-intrin-1.c: Likewise.
> >> > * gcc.target/i386/cet-intrin-10.c: Likewise.
> >> > * gcc.target/i386/cet-intrin-2.c: Likewise.
> >> > * gcc.target/i386/cet-intrin-3.c: Likewise.
> >> > * gcc.target/i386/cet-intrin-4.c: Likewise.
> >> > * gcc.target/i386/cet-intrin-5.c: Likewise.
> >> > * gcc.target/i386/cet-intrin-6.c: Likewise.
> >> > * gcc.target/i386/cet-intrin-7.c: Likewise.
> >> > * gcc.target/i386/cet-intrin-8.c: Likewise.
> >> > * gcc.target/i386/cet-intrin-9.c: Likewise.
> >> > * gcc.target/i386/cet-label.c: Likewise.
> >> > * gcc.target/i386/cet-notrack-1a.c: Likewise.
> >> > * gcc.target/i386/cet-notrack-1b.c: Likewise.
> >> > * gcc.target/i386/cet-notrack-2a.c: Likewise.
> >> > * gcc.target/i386/cet-notrack-2b.c: Likewise.
> >> > * gcc.target/i386/cet-notrack-3.c: Likewise.
> >> > * gcc.target/i386/cet-notrack-4a.c: Likewise.
> >> > * gcc.target/i386/cet-notrack-4b.c: Likewise.
> >> > * gcc.target/i386/cet-notrack-5a.c: Likewise.
> >> > * gcc.target/i386/cet-notrack-5b.c: Likewise.
> >> > * gcc.target/i386/cet-notrack-6a.c: Likewise.
> >> > * gcc.target/i386/cet-notrack-6b.c: Likewise.
> >> > * gcc.target/i386/cet-notrack-7.c: Likewise.
> >> > * gcc.target/i386/cet-property-1.c: Likewise.
> >> > * gcc.target/i386/cet-property-2.c: Likewise.
> >> > * gcc.target/i386/cet-rdssp-1.c: Likewise.
> >> > * gcc.target/i386/cet-sjlj-1.c: Likewise.
> >> > * gcc.target/i386/cet-sjlj-2.c: Likewise.
> >> > * gcc.target/i386/cet-sjlj-3.c: Likewise.
> >> > * gcc.target/i386/cet-switch-1.c: Likewise.
> >> > * gcc.target/i386/cet-switch-2.c: Likewise.
> >> > * lib/target-supports.exp (check_effective_target_cet): New proc.
> >>
> >> A couple of questions:
> >>
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -mcet" } */
> >> +/* { dg-final { scan-assembler-times "setssbsy" 2 } } */
> >> +
> >> +#include 
> >> +
> >> +void f1 (void)
> >> +{
> >> +  __builtin_ia32_setssbsy ();
> >> +}
> >> +
> >> +void f2 (void)
> >> +{
> >> +  _setssbsy ();
> >> +}
> >>
> >> Is there a reason that both, __builtin and intrinsic versions are
> >> tested in a couple of places? The intrinsic version is just a wrapper
> >> for __builtin, so IMO testing intrinsic version should be enough.
> > No strong reason. Just to check that intrinsic names are recognized and
> processed correctly.
> > The implementation could change and the test will catch inconsistency.
> > I would also assume a user will use intrinsics that's why I add intrinsic 
> > check.
> Should I remove it?
> 
> Actually, these __builtins are considered as implementation detail, and their
> use should be discouraged. They are deliberately not documented, and users
> should use intrinsic headers instead. That said, builtins won't change without
> a reason, since Ada needs them.
> 
> It can happen that the test fails due to change of intrinsics, so I'd 
> recommend
> to remove them.
Ok, I will remove intrinsic.

> >> diff --git a/gcc/testsuite/gcc.target/i386/cet-rdssp-1.c
> >> b/gcc/testsuite/gcc.target/i386/cet-rdssp-1.c
> >> new file mode 100644
> >> index 000..f9223a5
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/i386/cet-rdssp-1.c
> >> @@ -0,0 +1,39 @@
> >> +/* { dg-do run { target cet } } */
> >> +/* { dg-options "-O2 -finstrument-control-flow -mcet" } */
> >>
> >> The "target cet" directive just checks that CET instructions can be
> compiled.
> >> The test will (probably?) fail on targets with binutils that can
> >> compile CET instructions, but the target itself doesn't support CET.
> >> If this is the case, then check header has to be introduced, so the
> >> test can be bypassed on targets without runtime support.
> > The test will not fail even if a target doesn't support CET as 'rdssp'
> > instruction is a NOP on such target and further usage of CET
> > instruction is bypassed. In this case the code
> >
> > +  ssp = rdssp (ssp);
> >
> > Will keep ssp as 0.
> 
> I assume that this is true also for other runtime tests, and this clears my
> concern about runtime failures with updated binutils.
Yes, that's true for other runtime tests.

Igor

> 
> Uros.


Re: [Patch, Fortran] PR 82018: -Wextra should imply -Wconversion-extra

2017-09-19 Thread Steve Kargl
On Tue, Sep 19, 2017 at 10:26:52PM +0200, Dominique d'Humières wrote:
> 
> I am really upset by the time spent on warning at the expense
> of more serious problems.
> 

https://tinyurl.com/y6wunnk9

https://gcc.gnu.org/ml/fortran/2017-09/msg00065.html

or this is a consequence

https://gcc.gnu.org/ml/gcc-bugs/2017-09/msg01987.html
https://gcc.gnu.org/ml/gcc-bugs/2017-09/msg01866.html
https://gcc.gnu.org/ml/gcc-bugs/2017-09/msg01865.html
https://gcc.gnu.org/ml/gcc-bugs/2017-09/msg01862.html
https://gcc.gnu.org/ml/gcc-bugs/2017-09/msg01854.html
https://gcc.gnu.org/ml/gcc-bugs/2017-09/msg01850.html
...
https://gcc.gnu.org/ml/gcc-bugs/2017-09/msg01842.html

-- 
Steve


Re: [PATCH] Fix PR82255 (vectorizer cost model overcounts some vector load costs)

2017-09-19 Thread Bill Schmidt
On Sep 19, 2017, at 4:21 PM, Bill Schmidt  wrote:
> 
> On Sep 19, 2017, at 3:58 PM, Richard Sandiford  
> wrote:
>> 
>> Bill Schmidt  writes:
>>> Index: gcc/tree-vect-stmts.c
>>> ===
>>> --- gcc/tree-vect-stmts.c   (revision 252760)
>>> +++ gcc/tree-vect-stmts.c   (working copy)
>>> @@ -1091,8 +1091,19 @@ vect_model_load_cost (stmt_vec_info stmt_info, int
>>> prologue_cost_vec, body_cost_vec, true);
>>>  if (memory_access_type == VMAT_ELEMENTWISE
>>>  || memory_access_type == VMAT_STRIDED_SLP)
>>> -inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_construct,
>>> -stmt_info, 0, vect_body);
>>> +{
>>> +  int group_size = GROUP_SIZE (stmt_info);
>>> +  int nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
>>> +  if (group_size < nunits)
>>> +   {
>>> + if (dump_enabled_p ())
>>> +   dump_printf_loc (MSG_NOTE, vect_location,
>>> +"vect_model_load_cost: vec_construct required");
>>> + inside_cost += record_stmt_cost (body_cost_vec, ncopies,
>>> +  vec_construct, stmt_info, 0,
>>> +  vect_body);
>>> +   }
>>> +}
>>> 
>>>  if (dump_enabled_p ())
>>>dump_printf_loc (MSG_NOTE, vect_location,
>> 
>> This feels like we've probably got the wrong memory_access_type.
>> If it's a just a contiguous load then it should be VMAT_CONTIGUOUS
>> instead.
> 
> I tend to agree -- that will take more surgery to the code that detects 
> strided loads in
> both the cost model analysis and in vectorizable_load.
> 
It's just a little less clear how to clean this up in vect_analyze_data_refs.  
The
too-simplistic test there now is:

  else if (is_a  (vinfo)
   && TREE_CODE (DR_STEP (dr)) != INTEGER_CST)
{
  if (nested_in_vect_loop_p (loop, stmt))
{
  if (dump_enabled_p ())
{
  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
   "not vectorized: not suitable for strided "
   "load ");
  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
}
  return false;
}
  STMT_VINFO_STRIDED_P (stmt_info) = true;
}

Furthermore, this is being set while processing each data reference, not after
all information has been gathered and analyzed.  So this is too early.

Next place to fix this would be in vect_analyze_slp_cost_1, where

  vect_memory_access_type memory_access_type
= (STMT_VINFO_STRIDED_P (stmt_info)
   ? VMAT_STRIDED_SLP
   : VMAT_CONTIGUOUS);

is too simplistic.  I guess we could do something here, but it would require 
unpacking
all the grouped accesses and duplicating all the logic from scratch to 
determine 
whether it's a single load or not.

So it's going to be a little painful to do it "right."  I submitted this patch 
because I felt
it would be the simplest solution.

Thanks,
Bill


> Bill
> 
>> 
>> Thanks,
>> Richard



Re: PING^2: Fwd: SSA range class and removal of VR_ANTI_RANGEs

2017-09-19 Thread Jeff Law
On 07/26/2017 05:20 AM, Richard Biener wrote:
> On Tue, Jul 25, 2017 at 4:50 PM, Andrew MacLeod  wrote:
>> On 07/25/2017 03:12 AM, Richard Biener wrote:
>>>
>>> On Fri, Jul 21, 2017 at 9:30 PM, Aldy Hernandez  wrote:

 On Mon, Jul 17, 2017 at 6:23 AM, Richard Biener
  wrote:
>
> On Mon, Jul 17, 2017 at 8:51 AM, Aldy Hernandez 
> wrote:
>>
>> How does this look?
>
> It's a change that on its own doesn't look worthwhile to me.
>
> So please post the changes that will build ontop of this.  Like removing
> anti-ranges from VRP or your on-demand value-range stuff.
>
> Thanks,
> Richard.

  From the looks of it, we can have a variety of VRP ranges that are not
 representable at all with the an integer range class.  For instance, I
 see the following ranges built in set_value_range():

 [INT, (nop_expr SSA)]

 [INT, (plus_expr SSA INT)]

 [(negate_expr SSA), (negate_expr SSA)]

 [(plus_expr (negate_expr SSA INT)),
   (plus_expr (negate_expr SSA) INT)]

 [SSA, SSA]

 So...I guess the first suggestion is out of the question ;-).
>>>
>>> Well.  We do not want range operations implemented twice (really!)
>>> so range operations need to work on both representations.  So we
>>> should think of a way to get rid of anti-ranges in VRP which, frankly,
>>> means that for the sake symbolic ranges we have to trade them
>>> with handling open/closed ranges which I'm not sure will be less of
>>> a hassle to handle?
>>
>>
>> I originally looked at ranges with symbolic expressions, but as soon as
>> there are ranges comprised of more than 1 range, symbolics became a
>> nightmare.  At best you can do endpoints, and even then you have to start
>> adding complexity..  [0, 100] Union  [5, x_4]  now has to become  [0,
>> max(100, x_4)] , and chained operations were making the expressions more and
>> more complicated.  Trying to maintain these expression across optimizations
>> was also very problematic as the IL changes and these expressions are not in
>> the IL and don't suffer updates easily.
>>
>> At which point one asks themselves whether it actually makes sense to
>> integrate that into the range representation, or try a different tactic.
>>
>> Seems to me its cleaner to have an integral range, and when appropriate
>> track symbols separately to determine if their values can be refined.If
>> at some point you can resolve those symbols to an integral value,  then you
>> simply update the integral range with the new range you've determined.
>>
>> VRP chooses to do this by creating a completely different structure for
>> ranges, and representing endpoints as expression trees. It then updates the
>> integral ranges at the end of the pass. It uses anti-ranges as a shorthand
>> to handle a special case of a range comprised of 2 ranges.  As it stands
>> right now, VRP's version of union and intersection can never have much in
>> common with a general integral range implementation. They are 2 very
>> different beasts.
>>
>> So we cant do much about VRP internally yet without some significant
>> surgery.
>>
>> This issue seems orthogonal to me though.  irange is not intended to ever do
>> anything with symbols, thus can never replace VRPs internal value_range
>> class.We're simply replacing "struct range_info_def" with a new range
>> structure and API which can support more precise ranges than a pair of
>> integral endpoints.  We'll then build on that by providing routines which
>> can generate more precise range information as well.
>>
>> For the moment we provide an implementation with the same precision to
>>   a) ensure code generation remains the same
>>   b) allow the compiler to use it for a while to make sure no bugs have been
>> introduced
>>   c) and there is nothing that would utilize the additional precision yet
>> anyway.
>>
>> I just think its important to get it in the code base so its being used and
>> we can be sure its correct.
> 
> But nothing uses all the methods.
> 
> And we're ending up with two variants of range + range, etc.
But I think this is inevitable.  Consider for example floating point
ranges.  There's a belief that a limited form of VRP would be useful
(essentially tracking if the FP value could have the value 0 and perhaps
tracking NaNs as well).  We're not likely to be tracking actual values,
but boolean states about what values an object may or may not have.

I could also well see building another set of evaluation routines which
track zero, nonzero and unknown bits for integer values on top of the
basic framework Andrew is working on.

To some degree symbolics feel the same in that I suspect that we most
often are going to want to work with them as expressions.

Which leads me to wonder if we really need to templateize all the new
code so that we can have a different storage for the ranges and
different ways to extract ranges from operations we see.

Additionally, A

configure erroneously detects eh_frame misoptimization

2017-09-19 Thread Steven Taschuk
The behaviour of echo for arguments containing the two-character
substring `\0` varies among implementations: in coreutils echo,
and in the builtins of ash, bash, busybox sh, csh, and fish, the two
characters `\0` are emitted literally; the builtins of tcsh and zsh
emit a null character and continue rendering the rest of the string;
dash's builtin terminates the string early.

Ubuntu since 6.10 uses dash as /bin/sh, so on such systems,
configure misdiagnoses the assembler as unable to optimize eh_frame
sections correctly, yielding ultimately an unnecessary use of
`--traditional-format` in the driver's invocation of the assembler.

diff -ru gcc-4.2.1-orig/gcc/configure gcc-4.2.1/gcc/configure
--- gcc-4.2.1-orig/gcc/configure2007-01-01 22:44:31.0 -0500
+++ gcc-4.2.1/gcc/configure 2017-09-19 06:50:48.546060400 -0400
@@ -14076,7 +14076,8 @@
 .LSCIE1:
.4byte  0x0
.byte   0x1
-   .ascii "z\0"
+   .ascii "z"
+   .byte   0x0
.byte   0x1
.byte   0x78
.byte   0x1a


Re: configure erroneously detects eh_frame misoptimization

2017-09-19 Thread Andrew Pinski
On Tue, Sep 19, 2017 at 4:42 PM, Steven Taschuk  wrote:
> The behaviour of echo for arguments containing the two-character
> substring `\0` varies among implementations: in coreutils echo,
> and in the builtins of ash, bash, busybox sh, csh, and fish, the two
> characters `\0` are emitted literally; the builtins of tcsh and zsh
> emit a null character and continue rendering the rest of the string;
> dash's builtin terminates the string early.
>
> Ubuntu since 6.10 uses dash as /bin/sh, so on such systems,
> configure misdiagnoses the assembler as unable to optimize eh_frame
> sections correctly, yielding ultimately an unnecessary use of
> `--traditional-format` in the driver's invocation of the assembler.


configure is autogenerated so you should be modifying configure.ac and
regenerate configure.
I think overall this patch is correct approach to fix this bug.  You
should provide a changelog and have the patch against the trunk of GCC
rather than doing it against a release (especially a release is from a
no longer a supported branch).

Please https://gcc.gnu.org/contribute.html for some more information
on the all of the requirements when it comes to submitting a patch.  I
provided some of the requirements above.  Note since this is a small
enough patch, the legal requirement is waved but if you are thinking
about contributing in the future, you should think about getting a
copyright assigned signed if you don't already have one.

Thanks,
Andrew

Thanks,
Andrew

>
> diff -ru gcc-4.2.1-orig/gcc/configure gcc-4.2.1/gcc/configure
> --- gcc-4.2.1-orig/gcc/configure2007-01-01 22:44:31.0 -0500
> +++ gcc-4.2.1/gcc/configure 2017-09-19 06:50:48.546060400 -0400
> @@ -14076,7 +14076,8 @@
>  .LSCIE1:
> .4byte  0x0
> .byte   0x1
> -   .ascii "z\0"
> +   .ascii "z"
> +   .byte   0x0
> .byte   0x1
> .byte   0x78
> .byte   0x1a


libgo patch committed: Restore "goroutine in C code" message

2017-09-19 Thread Ian Lance Taylor
In the 1.9 upgrade of libgo I took out the word "goroutine" from a
traceback showing a goroutine running in C code, to let
TestCgoNumGoroutine pass.  However, it turns out that some code is
actually checking for that string; for example,
https://github.com/grpc/grpc-go/blob/master/test/leakcheck/leakcheck.go#L44
So keep the message the same, and change the test.  Bootstrapped and
ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 252953)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-be69546afcac182cc93c569bc96665f0ef72d66a
+5fb74cd7192123a9ea06dcae0d5d8d0b3538db8f
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/testdata/testprogcgo/numgoroutine.go
===
--- libgo/go/runtime/testdata/testprogcgo/numgoroutine.go   (revision 
252767)
+++ libgo/go/runtime/testdata/testprogcgo/numgoroutine.go   (working copy)
@@ -75,6 +75,7 @@ func checkNumGoroutine(label string, wan
sbuf := make([]byte, 32<<10)
sbuf = sbuf[:runtime.Stack(sbuf, true)]
n = strings.Count(string(sbuf), "goroutine ")
+   n -= strings.Count(string(sbuf), "goroutine in C code")
if n != want {
fmt.Printf("%s Stack: want %d; got %d:\n%s\n", label, want, n, 
string(sbuf))
return "", false
Index: libgo/go/runtime/traceback_gccgo.go
===
--- libgo/go/runtime/traceback_gccgo.go (revision 252767)
+++ libgo/go/runtime/traceback_gccgo.go (working copy)
@@ -216,7 +216,7 @@ func tracebackothers(me *g) {
print("\tgoroutine running on other thread; stack 
unavailable\n")
printcreatedby(gp)
} else if readgstatus(gp)&^_Gscan == _Gsyscall {
-   print("\tin C code; stack unavailable\n")
+   print("\tgoroutine in C code; stack unavailable\n")
printcreatedby(gp)
} else {
gp.traceback = &tb


Re: Merge from trunk to gccgo branch

2017-09-19 Thread Ian Lance Taylor
On Mon, Sep 18, 2017 at 4:13 PM, Ian Lance Taylor  wrote:
> On Mon, Sep 18, 2017 at 3:24 PM, Ian Lance Taylor  wrote:
>> I merged revision 252949 from trunk to the gccgo branch.
>
> Missed a patch.  Merged revision 252954 to the gccgo branch.

And now merged revision 252993 to the gccgo branch.

Ian


[committed][PATCH] Stack clash protection patch 01/08 - V4

2017-09-19 Thread Jeff Law

This patch introduces the new option -fstack-clash-protection and a
couple params that can be used to control its behavior.  I think the
only change of significance since V3 is the params are in powers of 2
rather than byte counts per Richi's request.

The controlling predicate for the testsuite currently returns false for
all targets.  I'll enable each target in that routine as the backend
code for the target is committed.

I hope I'll get as far as the x86 bits tonight.

Bootstrapped and regression tested on x86_64.  Installing on the trunk.

Jeff
commit 7383fdfe62020c8ce568f07414f1a20f019277f5
Author: Jeff Law 
Date:   Thu Jun 29 16:36:21 2017 -0400

* common.opt (-fstack-clash-protection): New option.
* flag-types.h (enum stack_check_type): Note difference between
-fstack-check= and -fstack-clash-protection.
* params.def (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE): New PARAM.
(PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL): Likewise.
* toplev.c (process_options): Issue warnings/errors for cases
not handled with -fstack-clash-protection.
* doc/invoke.texi (-fstack-clash-protection): Document new option.
(-fstack-check): Note additional problem with -fstack-check=generic.
Note that -fstack-check is primarily for Ada and refer users
to -fstack-clash-protection for stack-clash-protection.
Document new params for stack clash protection.

* gcc.dg/stack-check-2.c: New test.
* lib/target-supports.exp
(check_effective_target_supports_stack_clash_protection): New 
function.
(check_effective_target_frame_pointer_for_non_leaf): Likewise.
(check_effective_target_caller_implicit_probes): Likewise.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9576c8c4f47..a1aa3f6327d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,18 @@
+2017-09-19  Jeff Law  
+
+   * common.opt (-fstack-clash-protection): New option.
+   * flag-types.h (enum stack_check_type): Note difference between
+   -fstack-check= and -fstack-clash-protection.
+   * params.def (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE): New PARAM.
+   (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL): Likewise.
+   * toplev.c (process_options): Issue warnings/errors for cases
+   not handled with -fstack-clash-protection.
+   * doc/invoke.texi (-fstack-clash-protection): Document new option.
+   (-fstack-check): Note additional problem with -fstack-check=generic.
+   Note that -fstack-check is primarily for Ada and refer users
+   to -fstack-clash-protection for stack-clash-protection.
+   Document new params for stack clash protection.
+
 2017-09-19  Uros Bizjak  
 
* config/i386/i386.md (*scc_bt): New insn_and_split pattern.
diff --git a/gcc/common.opt b/gcc/common.opt
index fa6dd845d54..f22661cfbba 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2320,13 +2320,18 @@ Common Report Var(flag_variable_expansion_in_unroller) 
Optimization
 Apply variable expansion when loops are unrolled.
 
 fstack-check=
-Common Report RejectNegative Joined
+Common Report RejectNegative Joined Optimization
 -fstack-check=[no|generic|specific]Insert stack checking code into the 
program.
 
 fstack-check
 Common Alias(fstack-check=, specific, no)
 Insert stack checking code into the program.  Same as -fstack-check=specific.
 
+fstack-clash-protection
+Common Report Var(flag_stack_clash_protection) Optimization
+Insert code to probe each page of stack space as it is allocated to protect
+from stack-clash style attacks.
+
 fstack-limit
 Common Var(common_deferred_options) Defer
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 204c9b77b61..f0f9559b024 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10187,6 +10187,21 @@ compilation without.  The value for compilation with 
profile feedback
 needs to be more conservative (higher) in order to make tracer
 effective.
 
+@item stack-clash-protection-guard-size
+Specify the size of the operating system provided stack guard as
+2 raised to @var{num} bytes.  The default value is 12 (4096 bytes).
+Acceptable values are between 12 and 30.  Higher values may reduce the
+number of explicit probes, but a value larger than the operating system
+provided guard will leave code vulnerable to stack clash style attacks.
+
+@item stack-clash-protection-probe-interval
+Stack clash protection involves probing stack space as it is allocated.  This
+param controls the maximum distance between probes into the stack as 2 raised
+to @var{num} bytes.  Acceptable values are between 10 and 16 and defaults to
+12.  Higher values may reduce the number of explicit probes, but a value
+larger than the operating system provided guard will leave code vulnerable to
+stack clash style attacks.
+
 @item max-cse-path-length
 
 The maximum number of basic blocks on path that CSE consider

[committed][PATCH] Stack clash protection patch 02/08 - V4

2017-09-19 Thread Jeff Law

The main purpose of this patch is to protect dynamic stack allocations
from stack-clash attacks.  I think the only notable change since V3 was
to use a target hook instead of a target macro to control a bit of
special behavior we're going to want on aarch64.

Bootstrapped and regression tested on x86.  Installing on the trunk.

Jeff
commit feb64697b00b4a52346eaa3b7f349a0636936f7c
Author: Jeff Law 
Date:   Wed Jun 28 14:02:16 2017 -0400

2017-09-18  Jeff Law  

* explow.c: Include "params.h".
(anti_adjust_stack_and_probe_stack_clash): New function.
(get_stack_check_protect): Likewise.
(compute_stack_clash_protection_loop_data): Likewise.
(emit_stack_clash_protection_loop_start): Likewise.
(emit_stack_clash_protection_loop_end): Likewise.
(allocate_dynamic_stack_space): Use get_stack_check_protect.
Use anti_adjust_stack_and_probe_stack_clash.
* explow.h (compute_stack_clash_protection_loop_data): Prototype.
(emit_stack_clash_protection_loop_start): Likewise.
(emit_stack_clash_protection_loop_end): Likewise.
* rtl.h (get_stack_check_protect): Prototype.
* target.def (stack_clash_protection_final_dynamic_probe): New hook.
* targhooks.c (default_stack_clash_protection_final_dynamic_probe): 
New.
* targhooks.h (default_stack_clash_protection_final_dynamic_probe):
Prototype.
* doc/tm.texi.in 
(TARGET_STACK_CLASH_PROTECTION_FINAL_DYNAMIC_PROBE):
Add @hook.
* doc/tm.texi: Rebuilt.
* config/aarch64/aarch64.c (aarch64_expand_prologue): Use
get_stack_check_protect.
* config/alpha/alpha.c (alpha_expand_prologue): Likewise.
* config/arm/arm.c (arm_expand_prologue): Likewise.
(arm_frame_pointer_required): Likewise.
* config/i386/i386.c (ix86_expand_prologue): Likewise.
* config/ia64/ia64.c (ia64_expand_prologue): Likewise.
* config/mips/mips.c (mips_expand_prologue): Likewise.
* config/powerpcspe/powerpcspe.c (rs6000_emit_prologue): Likewise.
* config/rs6000/rs6000.c (rs6000_emit_prologue): Likewise.
* config/sparc/sparc.c (sparc_expand_prologue): Likewise.
(sparc_flat_expand_prologue): Likewise.

* gcc.dg/stack-check-3.c: New test.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a1aa3f6327d..b8012611405 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,37 @@
 2017-09-19  Jeff Law  
 
+   * explow.c: Include "params.h".
+   (anti_adjust_stack_and_probe_stack_clash): New function.
+   (get_stack_check_protect): Likewise.
+   (compute_stack_clash_protection_loop_data): Likewise.
+   (emit_stack_clash_protection_loop_start): Likewise.
+   (emit_stack_clash_protection_loop_end): Likewise.
+   (allocate_dynamic_stack_space): Use get_stack_check_protect.
+   Use anti_adjust_stack_and_probe_stack_clash.
+   * explow.h (compute_stack_clash_protection_loop_data): Prototype.
+   (emit_stack_clash_protection_loop_start): Likewise.
+   (emit_stack_clash_protection_loop_end): Likewise.
+   * rtl.h (get_stack_check_protect): Prototype.
+   * target.def (stack_clash_protection_final_dynamic_probe): New hook.
+   * targhooks.c (default_stack_clash_protection_final_dynamic_probe): New.
+   * targhooks.h (default_stack_clash_protection_final_dynamic_probe): 
+   Prototype.
+   * doc/tm.texi.in (TARGET_STACK_CLASH_PROTECTION_FINAL_DYNAMIC_PROBE):
+   Add @hook.
+   * doc/tm.texi: Rebuilt.
+   * config/aarch64/aarch64.c (aarch64_expand_prologue): Use
+   get_stack_check_protect.
+   * config/alpha/alpha.c (alpha_expand_prologue): Likewise.
+   * config/arm/arm.c (arm_expand_prologue): Likewise.
+   (arm_frame_pointer_required): Likewise.
+   * config/i386/i386.c (ix86_expand_prologue): Likewise.
+   * config/ia64/ia64.c (ia64_expand_prologue): Likewise.
+   * config/mips/mips.c (mips_expand_prologue): Likewise.
+   * config/powerpcspe/powerpcspe.c (rs6000_emit_prologue): Likewise.
+   * config/rs6000/rs6000.c (rs6000_emit_prologue): Likewise.
+   * config/sparc/sparc.c (sparc_expand_prologue): Likewise.
+   (sparc_flat_expand_prologue): Likewise.
+
* common.opt (-fstack-clash-protection): New option.
* flag-types.h (enum stack_check_type): Note difference between
-fstack-check= and -fstack-clash-protection.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1c1400871d4..5e26cb70da0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3666,12 +3666,14 @@ aarch64_expand_prologue (void)
 {
   if (crtl->is_leaf && !cfun->calls_alloca)
{
- if (frame_size > PROBE_INTERVAL && frame_size > STACK_CHECK_PROTECT)
-   aar

[committed][PATCH] Stack clash protection patch 02b/08 - V4

2017-09-19 Thread Jeff Law

The only change since V3 of this patch was to fix a typo in the sparc.c
changes.

The purpose of this patch is to allow targets that are not going to have
stack-clash protected prologues to use their -fstack-check=specific
prologue support to provide a degree of protection for their prologue
stack allocations.

This affects alpha, arm, ia64, mips, sparc and spu.   I have not
bootstrapped this patch, though I have built one cross for each target
to verify buildability.

Installing on the trunk.

Jeff
commit 42565dde29b63f74e92194a1cb5aa0edb33916a4
Author: Jeff Law 
Date:   Tue Jul 18 11:10:16 2017 -0400

* config/alpha/alpha.c (alpha_expand_prologue): Also check
flag_stack_clash_protection.
* config/arm/arm.c (arm_compute_static_chain_stack_bytes): Likewise.
(arm_expand_prologue, thumb1_expand_prologue): Likewise.
(arm_frame_pointer_required): Likewise.
* config/ia64/ia64.c (ia64_compute_frame_size): Likewise.
(ia64_expand_prologue): Likewise.
* config/mips/mips.c (mips_expand_prologue): Likewise.
* config/powerpcspe/powerpcspe.c (rs6000_expand_prologue): Likewise.
* config/sparc/sparc.c (sparc_expand_prologue): Likewise.
(sparc_flat_expand_prologue): Likewise.
* config/spu/spu.c (spu_expand_prologue): Likewise.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b8012611405..070692c5053 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,18 @@
 2017-09-19  Jeff Law  
 
+   * config/alpha/alpha.c (alpha_expand_prologue): Also check
+   flag_stack_clash_protection.
+   * config/arm/arm.c (arm_compute_static_chain_stack_bytes): Likewise.
+   (arm_expand_prologue, thumb1_expand_prologue): Likewise.
+   (arm_frame_pointer_required): Likewise.
+   * config/ia64/ia64.c (ia64_compute_frame_size): Likewise.
+   (ia64_expand_prologue): Likewise.
+   * config/mips/mips.c (mips_expand_prologue): Likewise.
+   * config/powerpcspe/powerpcspe.c (rs6000_expand_prologue): Likewise.
+   * config/sparc/sparc.c (sparc_expand_prologue): Likewise.
+   (sparc_flat_expand_prologue): Likewise.
+   * config/spu/spu.c (spu_expand_prologue): Likewise.
+
* explow.c: Include "params.h".
(anti_adjust_stack_and_probe_stack_clash): New function.
(get_stack_check_protect): Likewise.
diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
index a4e8b2b6c30..41f3e3a1957 100644
--- a/gcc/config/alpha/alpha.c
+++ b/gcc/config/alpha/alpha.c
@@ -7760,7 +7760,7 @@ alpha_expand_prologue (void)
  Note that we are only allowed to adjust sp once in the prologue.  */
 
   probed_size = frame_size;
-  if (flag_stack_check)
+  if (flag_stack_check || flag_stack_clash_protection)
 probed_size += get_stack_check_protect ();
 
   if (probed_size <= 32768)
@@ -7775,7 +7775,7 @@ alpha_expand_prologue (void)
  /* We only have to do this probe if we aren't saving registers or
 if we are probing beyond the frame because of -fstack-check.  */
  if ((sa_size == 0 && probed_size > probed - 4096)
- || flag_stack_check)
+ || flag_stack_check || flag_stack_clash_protection)
emit_insn (gen_probe_stack (GEN_INT (-probed_size)));
}
 
@@ -7805,7 +7805,8 @@ alpha_expand_prologue (void)
 late in the compilation, generate the loop as a single insn.  */
   emit_insn (gen_prologue_stack_probe_loop (count, ptr));
 
-  if ((leftover > 4096 && sa_size == 0) || flag_stack_check)
+  if ((leftover > 4096 && sa_size == 0)
+ || flag_stack_check || flag_stack_clash_protection)
{
  rtx last = gen_rtx_MEM (DImode,
  plus_constant (Pmode, ptr, -leftover));
@@ -7813,7 +7814,7 @@ alpha_expand_prologue (void)
  emit_move_insn (last, const0_rtx);
}
 
-  if (flag_stack_check)
+  if (flag_stack_check || flag_stack_clash_protection)
{
  /* If -fstack-check is specified we have to load the entire
 constant into a register and subtract from the sp in one go,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 679e838b0aa..d97f88719c5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -19164,7 +19164,8 @@ arm_compute_static_chain_stack_bytes (void)
   /* See the defining assertion in arm_expand_prologue.  */
   if (IS_NESTED (arm_current_func_type ())
   && ((TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)
- || (flag_stack_check == STATIC_BUILTIN_STACK_CHECK
+ || ((flag_stack_check == STATIC_BUILTIN_STACK_CHECK
+  || flag_stack_clash_protection)
  && !df_regs_ever_live_p (LR_REGNUM)))
   && arm_r3_live_at_start_p ()
   && crtl->args.pretend_args_size == 0)
@@ -21466,7 +21467,8 @@ arm_expand_prologue (void)
  clobbered when creating the frame, we need to save and restore it.  */
  

[committed][PATCH] Stack clash protection 03/08 - V4

2017-09-19 Thread Jeff Law

This patch just provides some generic logging facilities that will be
used by the backends to verify their behavior at a high level.  I don't
think it changed since V3.

This patch was bootstrapped and regression tested with the next patch
(x86 bits).  It was not bootstrapped and regression tested on its own.

Committing to the trunk.

Jeff
commit 02a09d2cdd5c3e1a6b3017587a28ee5d788705b7
Author: law 
Date:   Wed Sep 20 05:23:51 2017 +

* function.c (dump_stack_clash_frame_info): New function.
* function.h (dump_stack_clash_frame_info): Prototype.
(enum stack_clash_probes): New enum.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@252997 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 070692c5053..97833130937 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,9 @@
 2017-09-19  Jeff Law  
 
+* function.c (dump_stack_clash_frame_info): New function.
+* function.h (dump_stack_clash_frame_info): Prototype.
+(enum stack_clash_probes): New enum.
+
* config/alpha/alpha.c (alpha_expand_prologue): Also check
flag_stack_clash_protection.
* config/arm/arm.c (arm_compute_static_chain_stack_bytes): Likewise.
diff --git a/gcc/function.c b/gcc/function.c
index 3ae5a3afc0b..c03e2ac5142 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5681,6 +5681,58 @@ get_arg_pointer_save_area (void)
   return ret;
 }
 
+
+/* If debugging dumps are requested, dump information about how the
+   target handled -fstack-check=clash for the prologue.
+
+   PROBES describes what if any probes were emitted.
+
+   RESIDUALS indicates if the prologue had any residual allocation
+   (i.e. total allocation was not a multiple of PROBE_INTERVAL).  */
+
+void
+dump_stack_clash_frame_info (enum stack_clash_probes probes, bool residuals)
+{
+  if (!dump_file)
+return;
+
+  switch (probes)
+{
+case NO_PROBE_NO_FRAME:
+  fprintf (dump_file,
+  "Stack clash no probe no stack adjustment in prologue.\n");
+  break;
+case NO_PROBE_SMALL_FRAME:
+  fprintf (dump_file,
+  "Stack clash no probe small stack adjustment in prologue.\n");
+  break;
+case PROBE_INLINE:
+  fprintf (dump_file, "Stack clash inline probes in prologue.\n");
+  break;
+case PROBE_LOOP:
+  fprintf (dump_file, "Stack clash probe loop in prologue.\n");
+  break;
+}
+
+  if (residuals)
+fprintf (dump_file, "Stack clash residual allocation in prologue.\n");
+  else
+fprintf (dump_file, "Stack clash no residual allocation in prologue.\n");
+
+  if (frame_pointer_needed)
+fprintf (dump_file, "Stack clash frame pointer needed.\n");
+  else
+fprintf (dump_file, "Stack clash no frame pointer needed.\n");
+
+  if (TREE_THIS_VOLATILE (cfun->decl))
+fprintf (dump_file,
+"Stack clash noreturn prologue, assuming no implicit"
+" probes in caller.\n");
+  else
+fprintf (dump_file,
+"Stack clash not noreturn prologue.\n");
+}
+
 /* Add a list of INSNS to the hash HASHP, possibly allocating HASHP
for the first time.  */
 
diff --git a/gcc/function.h b/gcc/function.h
index 91e01383252..76434cd59a5 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -551,6 +551,14 @@ do {   
\
   ((TARGET_PTRMEMFUNC_VBIT_LOCATION == ptrmemfunc_vbit_in_pfn)  \
? MAX (FUNCTION_BOUNDARY, 2 * BITS_PER_UNIT) : FUNCTION_BOUNDARY)
 
+enum stack_clash_probes {
+  NO_PROBE_NO_FRAME,
+  NO_PROBE_SMALL_FRAME,
+  PROBE_INLINE,
+  PROBE_LOOP
+};
+
+extern void dump_stack_clash_frame_info (enum stack_clash_probes, bool);
 
 
 extern void push_function_context (void);


[committed][PATCH] Stack clash protection patch 04/08 - V4

2017-09-19 Thread Jeff Law

This patch introduces the x86 stack clash protected prologue support as
well as the tests.  I believe the only change since V3 was the more
aggressive introduction of scheduling barriers.  It also enables the
stack-clash tests for x86 targets.

The scheduling barriers prevent the memory dependency breaking bits in
the scheduler from rewriting and rearranging the allocations and probes.
 While I haven't seen problems from that on x86, I did see the scheduler
muck things up on aarch64 and there's a test for that in the aarch64
specific patches.  Better to be safe here and just emit the scheduling
barriers.

Bootstrapped and regression tested on x86.  Installing on the trunk.

Jeff
commit 57e17e31cb358fcd0d7cea8264b5063762ab3971
Author: law 
Date:   Wed Sep 20 05:35:07 2017 +

* config/i386/i386.c (ix86_adjust_stack_and_probe_stack_clash): New.
(ix86_expand_prologue): Dump stack clash info as needed.
Call ix86_adjust_stack_and_probe_stack_clash as needed.

* gcc.dg/stack-check-4.c: New test.
* gcc.dg/stack-check-5.c: New test.
* gcc.dg/stack-check-6.c: New test.
* gcc.dg/stack-check-6a.c: New test.
* gcc.dg/stack-check-7.c: New test.
* gcc.dg/stack-check-8.c: New test.
* gcc.dg/stack-check-9.c: New test.
* gcc.dg/stack-check-10.c: New test.
* lib/target-supports.exp
(check_effective_target_supports_stack_clash_protection): Enable for
x86 and x86_64 targets.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@252998 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 97833130937..f9d3a419f1a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,9 @@
 2017-09-19  Jeff Law  
 
+   * config/i386/i386.c (ix86_adjust_stack_and_probe_stack_clash): New.
+   (ix86_expand_prologue): Dump stack clash info as needed.
+   Call ix86_adjust_stack_and_probe_stack_clash as needed.
+
 * function.c (dump_stack_clash_frame_info): New function.
 * function.h (dump_stack_clash_frame_info): Prototype.
 (enum stack_clash_probes): New enum.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 05b0520759c..fdfe5951ca1 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13924,6 +13924,147 @@ release_scratch_register_on_entry (struct scratch_reg 
*sr)
 
 #define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP)
 
+/* Emit code to adjust the stack pointer by SIZE bytes while probing it.
+
+   This differs from the next routine in that it tries hard to prevent
+   attacks that jump the stack guard.  Thus it is never allowed to allocate
+   more than PROBE_INTERVAL bytes of stack space without a suitable
+   probe.  */
+
+static void
+ix86_adjust_stack_and_probe_stack_clash (const HOST_WIDE_INT size)
+{
+  struct machine_function *m = cfun->machine;
+
+  /* If this function does not statically allocate stack space, then
+ no probes are needed.  */
+  if (!size)
+{
+  dump_stack_clash_frame_info (NO_PROBE_NO_FRAME, false);
+  return;
+}
+
+  /* If we are a noreturn function, then we have to consider the
+ possibility that we're called via a jump rather than a call.
+
+ Thus we don't have the implicit probe generated by saving the
+ return address into the stack at the call.  Thus, the stack
+ pointer could be anywhere in the guard page.  The safe thing
+ to do is emit a probe now.
+
+ ?!? This should be revamped to work like aarch64 and s390 where
+ we track the offset from the most recent probe.  Normally that
+ offset would be zero.  For a non-return function we would reset
+ it to PROBE_INTERVAL - (STACK_BOUNDARY / BITS_PER_UNIT).   Then
+ we just probe when we cross PROBE_INTERVAL.  */
+  if (TREE_THIS_VOLATILE (cfun->decl))
+{
+  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
+  -GET_MODE_SIZE (word_mode)));
+  emit_insn (gen_blockage ());
+}
+
+  /* If we allocate less than the size of the guard statically,
+ then no probing is necessary, but we do need to allocate
+ the stack.  */
+  if (size < (1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE)))
+{
+  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+GEN_INT (-size), -1,
+m->fs.cfa_reg == stack_pointer_rtx);
+  dump_stack_clash_frame_info (NO_PROBE_SMALL_FRAME, true);
+  return;
+}
+
+  /* We're allocating a large enough stack frame that we need to
+ emit probes.  Either emit them inline or in a loop depending
+ on the size.  */
+  HOST_WIDE_INT probe_interval
+= 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);
+  if (size <= 4 * probe_interval)
+{
+  HOST_WIDE_INT i;
+  for (i = probe_interval; i <= size; i += probe_inte

[committed][PATCH] Stack clash protection 05/08 - V4

2017-09-19 Thread Jeff Law

This patch introduces a new note we can add to insns.  REG_STACK_CHECK
which is added to a stack allocation to tell to the optimizers that the
allocation should not be combined with other allocations or otherwise
modified.  Since V3 the scheduler was modified to honor REG_STACK_CHECK
as well.

I've run into two instances of this problem.  First in combine-stack-adj
on x86.  Essentially it merged two stack allocations and rewrote the
actual probes.  The net result was the code was vulnerable to problems
if an async signal was delivered between the allocation point and the
probing point.

The second instance which I saw on aarch64 was the scheduler doing
dependency breaking which had the same ultimate effect.

This patch addresses both by marking the relevant insn with the new note
and having the passes appropriately ignore insns with that note.

It includes a test for the combine-stack-adj instance of this problem.
THe aarch64 patches include a test for the scheduler instance.


Bootstrapped and regression tested on x86_64.  Installing on the trunk.

Note I'm stopping here for the night.  s390 is fully ack'd, but needs
another round of testing which will be running overnight.  ppc and
aarch64 are close, but not fully ack'd yet.

Jeff
commit e5b97fee1ab41413bf44623c397a9569dfc1f39e
Author: Jeff Law 
Date:   Fri Jul 21 01:03:08 2017 -0400

* combine-stack-adj.c (combine_stack_adjustments_for_block): Do
nothing for stack adjustments with REG_STACK_CHECK.
* sched-deps.c (parse_add_or_inc): Reject insns with
REG_STACK_CHECK from dependency breaking.
* config/i386/i386.c (pro_epilogue_adjust_stack): Return insn.
(ix86_adjust_satck_and_probe_stack_clash): Add REG_STACK_NOTEs.
* reg-notes.def (STACK_CHECK): New note.

* gcc.target/i386/stack-check-11.c: New test.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f9d3a419f1a..8f3485bfc1d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,13 @@
 2017-09-19  Jeff Law  
 
+   * combine-stack-adj.c (combine_stack_adjustments_for_block): Do
+   nothing for stack adjustments with REG_STACK_CHECK.
+   * sched-deps.c (parse_add_or_inc): Reject insns with
+   REG_STACK_CHECK from dependency breaking.
+   * config/i386/i386.c (pro_epilogue_adjust_stack): Return insn.
+   (ix86_adjust_satck_and_probe_stack_clash): Add REG_STACK_NOTEs.
+   * reg-notes.def (STACK_CHECK): New note.
+
* config/i386/i386.c (ix86_adjust_stack_and_probe_stack_clash): New.
(ix86_expand_prologue): Dump stack clash info as needed.
Call ix86_adjust_stack_and_probe_stack_clash as needed.
diff --git a/gcc/combine-stack-adj.c b/gcc/combine-stack-adj.c
index 9ec14a3e443..82d6dba856f 100644
--- a/gcc/combine-stack-adj.c
+++ b/gcc/combine-stack-adj.c
@@ -508,6 +508,8 @@ combine_stack_adjustments_for_block (basic_block bb)
continue;
 
   set = single_set_for_csa (insn);
+  if (set && find_reg_note (insn, REG_STACK_CHECK, NULL_RTX))
+   set = NULL_RTX;
   if (set)
{
  rtx dest = SET_DEST (set);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index fdfe5951ca1..d19c770d4cb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13547,7 +13547,7 @@ ix86_add_queued_cfa_restore_notes (rtx insn)
zero if %r11 register is live and cannot be freely used and positive
otherwise.  */
 
-static void
+static rtx
 pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset,
   int style, bool set_cfa)
 {
@@ -13638,6 +13638,7 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx 
offset,
   m->fs.sp_valid = valid;
   m->fs.sp_realigned = realigned;
 }
+  return insn;
 }
 
 /* Find an available register to be used as dynamic realign argument
@@ -13987,9 +13988,11 @@ ix86_adjust_stack_and_probe_stack_clash (const 
HOST_WIDE_INT size)
   for (i = probe_interval; i <= size; i += probe_interval)
{
  /* Allocate PROBE_INTERVAL bytes.  */
- pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
-GEN_INT (-probe_interval), -1,
-m->fs.cfa_reg == stack_pointer_rtx);
+ rtx insn
+   = pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+GEN_INT (-PROBE_INTERVAL), -1,
+m->fs.cfa_reg == stack_pointer_rtx);
+ add_reg_note (insn, REG_STACK_CHECK, const0_rtx);
 
  /* And probe at *sp.  */
  emit_stack_probe (stack_pointer_rtx);
diff --git a/gcc/reg-notes.def b/gcc/reg-notes.def
index 943eff41d30..a542990cde2 100644
--- a/gcc/reg-notes.def
+++ b/gcc/reg-notes.def
@@ -224,6 +224,10 @@ REG_NOTE (ARGS_SIZE)
pseudo reg.  */
 REG_NOTE (RETURNED)
 
+/* Indicates the instruction is a stack check probe that should not
+   be combined with other stac

RE: [PATCH][x86] Knights Mill -march/-mtune options

2017-09-19 Thread Peryt, Sebastian
> -Original Message-
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Tuesday, September 19, 2017 11:23 PM
> To: Peryt, Sebastian 
> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin 
> Subject: Re: [PATCH][x86] Knights Mill -march/-mtune options
> 
> On Tue, Sep 19, 2017 at 9:01 AM, Peryt, Sebastian 
> wrote:
> 
> >> >> >> > This patch adds  options -march=/-mtune=knm for Knights Mill.
> >> >> >> >
> >> >> >> > 2017-09-14  Sebastian Peryt   gcc/
> >> >> >> >
> >> >> >> > * config.gcc: Support "knm".
> >> >> >> > * config/i386/driver-i386.c (host_detect_local_cpu): Detect
> "knm".
> >> >> >> > * config/i386/i386-c.c (ix86_target_macros_internal): 
> >> >> >> > Handle
> >> >> >> > PROCESSOR_KNM.
> >> >> >> > * config/i386/i386.c (m_KNM): Define.
> >> >> >> > (processor_target_table): Add "knm".
> >> >> >> > (PTA_KNM): Define.
> >> >> >> > (ix86_option_override_internal): Add "knm".
> >> >> >> > (ix86_issue_rate): Add PROCESSOR_KNM.
> >> >> >> > (ix86_adjust_cost): Ditto.
> >> >> >> > (ia32_multipass_dfa_lookahead): Ditto.
> >> >> >> > (get_builtin_code_for_version): Handle PROCESSOR_KNM.
> >> >> >> > (fold_builtin_cpu): Define M_INTEL_KNM.
> >> >> >> > * config/i386/i386.h (TARGET_KNM): Define.
> >> >> >> > (processor_type): Add PROCESSOR_KNM.
> >> >> >> > * config/i386/x86-tune.def: Add m_KNM.
> >> >> >> > * doc/invoke.texi: Add knm as x86 -march=/-mtune= CPU type.
> >> >> >> >
> >> >> >> >
> >> >> >> > gcc/testsuite/
> >> >> >> >
> >> >> >> > * gcc.target/i386/funcspec-5.c: Test knm.
> >> >> >> >
> >> >> >> > Is it ok for trunk?
> >> >> >>
> >> >> >> You also have to update libgcc/cpuinfo.h together with
> >> >> >> fold_builtin_cpu from i386.c. Please note that all new
> >> >> >> processor types and subtypes have to be added at the end of the enum.
> >> >> >>
> >> >> >
> >> >> > Uros,
> >> >> >
> >> >> > I have updated libgcc/cpuinfo.h and libgcc/cpuinfo.c. I
> >> >> > understood that CPU_TYPE_MAX in libgcc/cpuinfo.h processor_types
> >> >> > is some kind of barrier, this is why I put KNM before that. Is that 
> >> >> > correct
> thinking?
> >> >> > As for fold_builtin_cpu in i386.c I already have something like this:
> >> >> >
> >> >> > @@ -34217,6 +34229,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
> >> >> >  M_AMDFAM15H,
> >> >> >  M_INTEL_SILVERMONT,
> >> >> >  M_INTEL_KNL,
> >> >> > +M_INTEL_KNM,
> >> >> >  M_AMD_BTVER1,
> >> >> >  M_AMD_BTVER2,
> >> >> >  M_CPU_SUBTYPE_START,
> >> >> > @@ -34262,6 +34275,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
> >> >> >{"bonnell", M_INTEL_BONNELL},
> >> >> >{"silvermont", M_INTEL_SILVERMONT},
> >> >> >{"knl", M_INTEL_KNL},
> >> >> > +  {"knm", M_INTEL_KNM},
> >> >> >{"amdfam10h", M_AMDFAM10H},
> >> >> >{"barcelona", M_AMDFAM10H_BARCELONA},
> >> >> >{"shanghai", M_AMDFAM10H_SHANGHAI},
> >> >> >
> >> >> > I couldn't find any other place where I'm supposed to add anything
> extra.
> >> >>
> >> >> Please look at libgcc/config/i386/cpuinfo.h. The comment here says that:
> >> >>
> >> >> /* Any new types or subtypes have to be inserted at the end. */
> >> >>
> >> >> The above patch should then add M_INTEL_KNM as the last entry
> >> >> *before* M_CPU_SUBTYPE_START.
> >> >>
> >> >
> >> > Sorry, I didn't notice this value at first. I believe now it's correct.
> >>
> >> OK for mainline SVN (with updated ChangeLog).
> >>
> >
> > Can you please commit for me?
> 
> Please send an updated ChangeLog.
> 
> Uros.

Uros,
Below please find attached Changelog. This is for patch v3 (also attached here 
just in case).

2017-09-20  Sebastian Peryt  

gcc/

* config.gcc: Support "knm".
* config/i386/driver-i386.c (host_detect_local_cpu): Detect "knm".
* config/i386/i386-c.c (ix86_target_macros_internal): Handle
PROCESSOR_KNM.
* config/i386/i386.c (m_KNM): Define.
(processor_target_table): Add "knm".
(PTA_KNM): Define.
(ix86_option_override_internal): Add "knm".
(ix86_issue_rate): Add PROCESSOR_KNM.
(ix86_adjust_cost): Ditto.
(ia32_multipass_dfa_lookahead): Ditto.
(get_builtin_code_for_version): Handle PROCESSOR_KNM.
(fold_builtin_cpu): Add M_INTEL_KNM.
* config/i386/i386.h (processor_costs): Define TARGET_KNM.
(processor_type): Add PROCESSOR_KNM.
 * config/i386/x86-tune.def: Add m_KNM.
* doc/invoke.texi: Add knm as x86 -march=/-mtune= CPU type.

libgcc/
* config/i386/cpuinfo.h (processor_types): Add INTEL_KNM.
* config/i386/cpuinfo.c (get_intel_cpu): Detect Knights Mill.

gcc/testsuite/

* gcc.target/i386/builtin_target.c: Test knm.
* gcc.target/i386/funcspec-56.inc: Test arch=knm.

Thanks,
Sebastian


KNM_enabling_v3.patch
Description: KNM_enabling_v3.patch


Re: Enable no-exec stacks for more targets using the Linux kernel

2017-09-19 Thread Nagaraju Mekala
>On Tue, 19 Sep 2017, Andreas Schwab wrote:

> >On Sep 18 2017, Joseph Myers  wrote:
> >
> > Building glibc for many different configurations and running the
> > compilation parts of the testsuite runs into failures of the
> > elf/check-execstack test for hppa, ia64 and microblaze.
> >
>> ia64 is non-execstack by default, so it doesn't need any marking.  The
>> same is true for every architecture that doesn't override
>> elf_read_implies_exec, which includes microblaze and hppa.

> Thanks for the explanation.

> I've sent a glibc patch 
> .  I think the 
>key questions for architecture experts now are: on each of those three 
>architectures, do trampolines ever require executable stacks, and, if they 
>do, how does this work at present when the kernel defaults to 
>non-executable and my understanding at 
> would be that 
>glibc would only make thread stacks executable on those architectures, not 
>the main process stacks, and GCC will never generate an explicit marker on 
>those architectures to request an executable stack?

Microblaze is a soft processor with many configuration options.
If we don't use the MMU, there is nothing preventing execution of code on the 
stack in the MicroBlaze architecture.
 With the MMU, you have the option to make any page, including the stack pages, 
executable or not.

It is recommended to prevent execution on the stack by defining those pages as 
non-executable in the MMU. 
In particular, trampolines would have to be possible to code without execution 
on the stack

Thanks,
Nagaraju



Re: Enable no-exec stacks for more targets using the Linux kernel

2017-09-19 Thread Andreas Schwab
On Sep 19 2017, Joseph Myers  wrote:

> I've sent a glibc patch 
> .  I think the 
> key questions for architecture experts now are: on each of those three 
> architectures, do trampolines ever require executable stacks, and, if they 
> do, how does this work at present when the kernel defaults to 
> non-executable and my understanding at 
>  would be that 
> glibc would only make thread stacks executable on those architectures, not 
> the main process stacks, and GCC will never generate an explicit marker on 
> those architectures to request an executable stack?

For ia64 on linux there is EF_IA_64_LINUX_EXECUTABLE_STACK to request
executable heap and stack.  But since ia64 uses function descriptors,
trampolines never need that.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


  1   2   >