[PATCH] Fix Yr constraint uses in various insns

2016-05-24 Thread Jakub Jelinek
Hi!

Similarly to the last patch, this one fixes various misc patterns.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-24  Jakub Jelinek  

* config/i386/sse.md (vec_set_0): Use sse4_noavx isa instead
of sse4 for the first alternative, drop %v from the template
and d operand modifier.  Split second alternative into one sse4_noavx
and one avx alternative, use *x instead of *v in the former and v
instead of *v in the latter.
(*sse4_1_extractps): Use noavx isa instead of * for the first
alternative, drop %v from the template.  Split second alternative into
one noavx and one avx alternative, use *x instead of *v in the
former and v instead of *v in the latter.
(_movntdqa): Guard the first 2 alternatives
with noavx and the last one with avx.
(sse4_1_phminposuw): Guard first alternative with noavx isa,
split the second one into one noavx and one avx alternative,
use *x and Bm in the former and x and m in the latter one.
(_ptest): Use noavx instead of * for the first two
alternatives.

--- gcc/config/i386/sse.md.jj   2016-05-24 10:55:52.0 +0200
+++ gcc/config/i386/sse.md  2016-05-24 14:50:14.566277449 +0200
@@ -6623,18 +6623,19 @@ (define_expand "vec_init"
 ;; see comment above inline_secondary_memory_needed function in i386.c
 (define_insn "vec_set_0"
   [(set (match_operand:VI4F_128 0 "nonimmediate_operand"
- "=Yr,*v,v,Yi,x,x,v,Yr ,*x ,x  ,m ,m   ,m")
+ "=Yr,*x,v,v,Yi,x,x,v,Yr ,*x ,x  ,m ,m   ,m")
(vec_merge:VI4F_128
  (vec_duplicate:VI4F_128
(match_operand: 2 "general_operand"
- " Yr,*v,m,r ,m,x,v,*rm,*rm,*rm,!x,!*re,!*fF"))
+ " Yr,*x,v,m,r ,m,x,v,*rm,*rm,*rm,!x,!*re,!*fF"))
  (match_operand:VI4F_128 1 "vector_move_operand"
- " C , C,C,C ,C,0,v,0  ,0  ,x  ,0 ,0   ,0")
+ " C , C,C,C,C ,C,0,v,0  ,0  ,x  ,0 ,0   ,0")
  (const_int 1)))]
   "TARGET_SSE"
   "@
-   %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
-   %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
+   insertps\t{$0xe, %2, %0|%0, %2, 0xe}
+   insertps\t{$0xe, %2, %0|%0, %2, 0xe}
+   vinsertps\t{$0xe, %2, %2, %0|%0, %2, %2, 0xe}
%vmov\t{%2, %0|%0, %2}
%vmovd\t{%2, %0|%0, %2}
movss\t{%2, %0|%0, %2}
@@ -6646,20 +6647,20 @@ (define_insn "vec_set_0"
#
#
#"
-  [(set_attr "isa" 
"sse4,sse4,sse2,sse2,noavx,noavx,avx,sse4_noavx,sse4_noavx,avx,*,*,*")
+  [(set_attr "isa" 
"sse4_noavx,sse4_noavx,avx,sse2,sse2,noavx,noavx,avx,sse4_noavx,sse4_noavx,avx,*,*,*")
(set (attr "type")
- (cond [(eq_attr "alternative" "0,1,7,8,9")
+ (cond [(eq_attr "alternative" "0,1,2,8,9,10")
  (const_string "sselog")
-   (eq_attr "alternative" "11")
- (const_string "imov")
(eq_attr "alternative" "12")
+ (const_string "imov")
+   (eq_attr "alternative" "13")
  (const_string "fmov")
   ]
   (const_string "ssemov")))
-   (set_attr "prefix_extra" "*,*,*,*,*,*,*,1,1,1,*,*,*")
-   (set_attr "length_immediate" "*,*,*,*,*,*,*,1,1,1,*,*,*")
-   (set_attr "prefix" 
"maybe_vex,maybe_vex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex,*,*,*")
-   (set_attr "mode" "SF,SF,,SI,SF,SF,SF,TI,TI,TI,*,*,*")])
+   (set_attr "prefix_extra" "*,*,*,*,*,*,*,*,1,1,1,*,*,*")
+   (set_attr "length_immediate" "*,*,*,*,*,*,*,*,1,1,1,*,*,*")
+   (set_attr "prefix" 
"orig,orig,maybe_evex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex,*,*,*")
+   (set_attr "mode" "SF,SF,SF,,SI,SF,SF,SF,TI,TI,TI,*,*,*")])
 
 ;; A subset is vec_setv4sf.
 (define_insn "*vec_setv4sf_sse4_1"
@@ -6761,14 +6762,15 @@ (define_insn_and_split "*vec_extractv4sf
   "operands[1] = gen_lowpart (SFmode, operands[1]);")
 
 (define_insn_and_split "*sse4_1_extractps"
-  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,v,v")
+  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,v,v")
(vec_select:SF
- (match_operand:V4SF 1 "register_operand" "Yr,*v,0,v")
- (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n,n,n")])))]
+ (match_operand:V4SF 1 "register_operand" "Yr,*x,v,0,v")
+ (parallel [(match_operand:SI 2 "const_0_to_3_operand" 
"n,n,n,n,n")])))]
   "TARGET_SSE4_1"
   "@
-   %vextractps\t{%2, %1, %0|%0, %1, %2}
-   %vextractps\t{%2, %1, %0|%0, %1, %2}
+   extractps\t{%2, %1, %0|%0, %1, %2}
+   extractps\t{%2, %1, %0|%0, %1, %2}
+   vextractps\t{%2, %1, %0|%0, %1, %2}
#
#"
   "&& reload_completed && SSE_REG_P (operands[0])"
@@ -6793,13 +6795,13 @@ (define_insn_and_split "*sse4_1_extractp
 }
   DONE;
 }
-  [(set_attr "isa" "*,*,noavx,avx")
-   (set_attr "type" "sselog,sselog,*,*")
-   (set_attr "prefix_data16" "1,1,*,*")
-   (set_attr "prefix_extra" "1,1,*,*")
-   (set_attr "length_immediate" "1,1,*,*")
-   (set_attr "prefix" "maybe_vex,maybe_vex,*,*")
-   (set_attr "mode" "V4SF,V4SF,*,*")])
+  [(set_attr "is

[PATCH] Fix one more Yr use

2016-05-24 Thread Jakub Jelinek
Hi!

Another case (separate patch because I thought I should add an avx512f
alternative here, but later on found out it is already handled by
having the vrndscale* patterns defined before these ones
and having the same RTL for them (except allowing 0 to 255 instead
of just 0 to 15).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-24  Jakub Jelinek  

* config/i386/sse.md (_round):
Limit 1st alternative to noavx isa, split 2nd alternative into one
noavx and one avx alternative, use *x and Bm in the former and
x and m in the latter.

--- gcc/config/i386/sse.md.jj   2016-05-24 10:55:52.0 +0200
+++ gcc/config/i386/sse.md  2016-05-24 14:50:14.566277449 +0200
@@ -14986,22 +14996,19 @@ (define_insn "_ptest"
(set_attr "mode" "")])
 
 (define_insn "_round"
-  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x")
+  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
(unspec:VF_128_256
- [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm")
-  (match_operand:SI 2 "const_0_to_15_operand" "n,n")]
+ [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm,xm")
+  (match_operand:SI 2 "const_0_to_15_operand" "n,n,n")]
  UNSPEC_ROUND))]
   "TARGET_ROUND"
   "%vround\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "ssecvt")
-   (set (attr "prefix_data16")
- (if_then_else
-   (match_test "TARGET_AVX")
- (const_string "*")
- (const_string "1")))
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssecvt")
+   (set_attr "prefix_data16" "1,1,*")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
-   (set_attr "prefix" "maybe_vex")
+   (set_attr "prefix" "orig,orig,vex")
(set_attr "mode" "")])
 
 (define_expand "_round_sfix"

Jakub


Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 6:22 PM, H.J. Lu  wrote:
> On Tue, May 24, 2016 at 8:52 AM, Uros Bizjak  wrote:
>> On Tue, May 24, 2016 at 5:40 PM, H.J. Lu  wrote:
>>
 No, this is a flag, not a variable. Let's figure out how to extend
 target flags to more than 63 flags first.
>>>
>>> Extending target flags to more than 63 bits requires replacing
>>> HOST_WIDE_INT with a bit vector.  Since target flags is used in
>>> TARGET_SUBTARGET_DEFAULT, change it to a bit vector is a
>>> non-trivial change.  On the other hand, -mgeneral-regs-only is a
>>> command-line option which doesn't require support for
>>> TARGET_SUBTARGET_DEFAULT, similar to other -m options like
>>> -mmitigate-rop.  Using flag_general_regs_only is an option.
>>
>> I have been informed that Intel people are looking into how to extend
>> target flags to accommodate additional ISA flags. There is no point to
>> hurry with an unoptimal solution. Perhaps you can coordinate your
>> patch with their efforts?
>
> iISA flags use x86_isa_flags, not target_flags.  -mgeneral-regs-only
> shouldn't use x86_isa_flags.  It was my oversight to use target_flags
> with -mgeneral-regs-only to begin with.   I don't think using
> flag_general_regs_only is not an optimal solution, which I should have
> used in the first place.  The x86 change for interrupt handler depends
> on -mgeneral-regs-only.

Oh, target_flags is only a 32bit integer :(. Is there a reason it
can't be extended to HOST_WIDE_INT, as is the case with
ix86_isa_flags?

Uros.


[PATCH] Fix up Yr constraint

2016-05-24 Thread Jakub Jelinek
Hi!

The Yr constraint contrary to what has been said when it has been submitted
actually is always NO_REX_SSE_REGS or NO_REGS, never ALL_SSE_REGS, so
the RA restriction to only the first 8 regs is done no matter what we tune
for.

This is because we test X86_TUNE_AVOID_4BYTE_PREFIXES, which is an enum
value (59), rather than actually checking if the tune flag.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-24  Jakub Jelinek  

* config/i386/i386.h (TARGET_AVOID_4BYTE_PREFIXES): Define.
* config/i386/constraints.md (Yr): Test TARGET_AVOID_4BYTE_PREFIXES
rather than X86_TUNE_AVOID_4BYTE_PREFIXES.

--- gcc/config/i386/i386.h.jj   2016-05-24 10:56:02.0 +0200
+++ gcc/config/i386/i386.h  2016-05-24 15:13:05.715906018 +0200
@@ -465,6 +465,8 @@ extern unsigned char ix86_tune_features[
ix86_tune_features[X86_TUNE_SLOW_PSHUFB]
 #define TARGET_VECTOR_PARALLEL_EXECUTION \
ix86_tune_features[X86_TUNE_VECTOR_PARALLEL_EXECUTION]
+#define TARGET_AVOID_4BYTE_PREFIXES \
+   ix86_tune_features[X86_TUNE_AVOID_4BYTE_PREFIXES]
 #define TARGET_FUSE_CMP_AND_BRANCH_32 \
ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32]
 #define TARGET_FUSE_CMP_AND_BRANCH_64 \
--- gcc/config/i386/constraints.md.jj   2016-05-12 10:29:41.0 +0200
+++ gcc/config/i386/constraints.md  2016-05-24 15:14:21.647914550 +0200
@@ -142,7 +142,7 @@ (define_register_constraint "Yf"
  "@internal Any x87 register when 80387 FP arithmetic is enabled.")
 
 (define_register_constraint "Yr"
- "TARGET_SSE ? (X86_TUNE_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : 
ALL_SSE_REGS) : NO_REGS"
+ "TARGET_SSE ? (TARGET_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : ALL_SSE_REGS) 
: NO_REGS"
  "@internal Lower SSE register when avoiding REX prefix and all SSE registers 
otherwise.")
 
 (define_register_constraint "Yv"

Jakub


More backwards/FSM jump thread refactoring and extension

2016-05-24 Thread Jeff Law
Here's the next patch which does a bit more refactoring in the backwards 
jump threader and extends the backwards jump threader to handle simple 
copies and constant initializations.


The extension isn't all that useful right now -- while it does fire 
often during bootstraps, its doing so for cases that would be caught 
slightly later (within the same pass).  As a result there's no changes 
in the testsuite.


The extension becomes useful in an upcoming patch where the backwards 
threader is disentangled from DOM/VRP entirely.  In that mode the 
threader can't depend on cprop to have eliminated the copies and 
propagated as many constants as possible into PHI arguments.


Bootstrapped and regression tested on x86_64 linux.  Installing on the 
trunk.


Jeff



commit 913a4b1f209105a774789311094e90986db322fb
Author: Jeff Law 
Date:   Tue May 24 11:56:50 2016 -0400

* tree-ssa-threadbackwards.c (convert_and_register_jump_thread_path):
New function, extracted from...
(fsm_find_control_statement_thread_paths): Here.  Use the new function.
Allow simple copies and constant initializations in the SSA chain.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2b20cc8..9442109 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2016-05-24  Jeff Law  
+
+   * tree-ssa-threadbackwards.c (convert_and_register_jump_thread_path):
+   New function, extracted from...
+   (fsm_find_control_statement_thread_paths): Here.  Use the new function.
+   Allow simple copies and constant initializations in the SSA chain.
+
 2016-05-24  Marek Polacek  
 
PR c/71249
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 73ab4ea..4d0fd9c 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -356,6 +356,44 @@ profitable_jump_thread_path (vec 
*&path,
   return taken_edge;
 }
 
+/* PATH is vector of blocks forming a jump threading path in reverse
+   order.  TAKEN_EDGE is the edge taken from path[0].
+
+   Convert that path into the form used by register_jump_thread and
+   register the path.   */
+
+static void
+convert_and_register_jump_thread_path (vec *&path,
+  edge taken_edge)
+{
+  vec *jump_thread_path = new vec ();
+
+  /* Record the edges between the blocks in PATH.  */
+  for (unsigned int j = 0; j < path->length () - 1; j++)
+{
+  basic_block bb1 = (*path)[path->length () - j - 1];
+  basic_block bb2 = (*path)[path->length () - j - 2];
+  if (bb1 == bb2)
+   continue;
+
+  edge e = find_edge (bb1, bb2);
+  gcc_assert (e);
+  jump_thread_edge *x = new jump_thread_edge (e, EDGE_FSM_THREAD);
+  jump_thread_path->safe_push (x);
+}
+
+  /* Add the edge taken when the control variable has value ARG.  */
+  jump_thread_edge *x
+= new jump_thread_edge (taken_edge, EDGE_NO_COPY_SRC_BLOCK);
+  jump_thread_path->safe_push (x);
+
+  register_jump_thread (jump_thread_path);
+  --max_threaded_paths;
+
+  /* Remove BBI from the path.  */
+  path->pop ();
+}
+
 /* We trace the value of the SSA_NAME NAME back through any phi nodes looking
for places where it gets a constant value and save the path.  Stop after
having recorded MAX_PATHS jump threading paths.  */
@@ -377,24 +415,30 @@ fsm_find_control_statement_thread_paths (tree name,
   if (var_bb == NULL)
 return;
 
-  /* For the moment we assume that an SSA chain only contains phi nodes, and
- eventually one of the phi arguments will be an integer constant.  In the
- future, this could be extended to also handle simple assignments of
- arithmetic operations.  */
+  /* We allow the SSA chain to contains PHIs and simple copies and constant
+ initializations.  */
   if (gimple_code (def_stmt) != GIMPLE_PHI
-  || (gimple_phi_num_args (def_stmt)
+  && gimple_code (def_stmt) != GIMPLE_ASSIGN)
+return;
+
+  if (gimple_code (def_stmt) == GIMPLE_PHI
+  && (gimple_phi_num_args (def_stmt)
  >= (unsigned) PARAM_VALUE (PARAM_FSM_MAXIMUM_PHI_ARGUMENTS)))
 return;
 
+  if (gimple_code (def_stmt) == GIMPLE_ASSIGN
+  && gimple_assign_rhs_code (def_stmt) != INTEGER_CST
+  && gimple_assign_rhs_code (def_stmt) != SSA_NAME)
+return;
+
   /* Avoid infinite recursion.  */
   if (visited_bbs->add (var_bb))
 return;
 
-  gphi *phi = as_a  (def_stmt);
   int next_path_length = 0;
   basic_block last_bb_in_path = path->last ();
 
-  if (loop_containing_stmt (phi)->header == gimple_bb (phi))
+  if (loop_containing_stmt (def_stmt)->header == gimple_bb (def_stmt))
 {
   /* Do not walk through more than one loop PHI node.  */
   if (seen_loop_phi)
@@ -469,9 +513,9 @@ fsm_find_control_statement_thread_paths (tree name,
 
   /* Iterate over the arguments of PHI.  */
   unsigned int i;
-  if (gimple_phi_num_args (phi)
-  < (unsigned) PARAM_VALUE (PARAM_FSM_MAXIMUM_PHI_ARGUMENTS))
+  if (gimple_code (def_stmt) == GIMPLE_PHI)
 {
+  

Re: [PATCH, ARM] Do not set ARM_ARCH_ISA_THUMB for armv5

2016-05-24 Thread Kyrill Tkachov

Hi Thomas,

On 10/05/16 14:26, Thomas Preudhomme wrote:

Hi,

ARM_ARCH_ISA_THUMB is currently set to 1 when compiling for armv5 despite
armv5 not supporting Thumb instructions (armv5t does):

arm-none-eabi-gcc -dM -march=armv5 -E - < /dev/null | grep ISA_THUMB
#define __ARM_ARCH_ISA_THUMB 1

The reason is TARGET_ARM_ARCH_ISA_THUMB being set to 1 if target does not
support Thumb-2 and is ARMv4T, ARMv5 or later. This patch replaces that logic
by checking whether the given architecture has the right feature bit
(FL_THUMB).

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-05-06  Thomas Preud'homme  

 * config/arm/arm-protos.h (arm_arch_thumb): Declare.
 * config/arm/arm.c (arm_arch_thumb): Define.
 (arm_option_override): Initialize arm_arch_thumb.
 * config/arm/arm.h (TARGET_ARM_ARCH_ISA_THUMB): Use arm_arch_thumb to
 determine if target support Thumb-1 ISA.


diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index
d8179c441bb53dced94d2ebf497aad093e4ac600..4d11c91133ff1b875afcbf58abc4491c2c93768e
100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -603,6 +603,9 @@ extern int arm_tune_cortex_a9;
 interworking clean.  */
  extern int arm_cpp_interwork;
  
+/* Nonzero if chip supports Thumb.  */

+extern int arm_arch_thumb;
+


Bit of bikeshedding really, but I think a better name would be
arm_arch_thumb1.
This is because we also have the macros TARGET_THUMB and TARGET_THUMB2
where TARGET_THUMB2 means either Thumb-1 or Thumb-2 and a casual reader
might think that arm_arch_thumb means that there is support for either.

Also, please add a simple test that compiles something with -march=armv5 (plus 
-marm)
and checks that __ARM_ARCH_ISA_THUMB is not defined.

Ok with that change and the test.

Thanks,
Kyrill

P.S. I think your mailer sometimes mangles long lines in the patches
(for example the git hash headers). Can you please send your patches as
attachments? That will also make it easier for me to extract and apply
them to my tree without having to manually select the inlined patch
from the message.


  /* Nonzero if chip supports Thumb 2.  */
  extern int arm_arch_thumb2;
  
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h

index
ad123dde991a3e4c4b9563ee6ebb84981767988f..f64e8caa8bc08b7aff9fe385567de9936a964004
100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2191,9 +2191,8 @@ extern int making_const_table;
  #define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)
  
  /* The highest Thumb instruction set version supported by the chip.  */

-#define TARGET_ARM_ARCH_ISA_THUMB  \
-  (arm_arch_thumb2 ? 2 \
-  : ((TARGET_ARM_ARCH >= 5 || arm_arch4t) ? 1 : 0))
+#define TARGET_ARM_ARCH_ISA_THUMB  \
+  (arm_arch_thumb2 ? 2 : (arm_arch_thumb ? 1 : 0))
  
  /* Expands to an upper-case char of the target's architectural

 profile.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index
71b51439dc7ba5be67671e9fb4c3f18040cce58f..de1c2d4600529518a92ed44815cff05308baa31c
100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -852,6 +852,9 @@ int arm_tune_cortex_a9 = 0;
 interworking clean.  */
  int arm_cpp_interwork = 0;
  
+/* Nonzero if chip supports Thumb.  */

+int arm_arch_thumb;
+
  /* Nonzero if chip supports Thumb 2.  */
  int arm_arch_thumb2;
  
@@ -3170,6 +3173,7 @@ arm_option_override (void)

arm_arch7em = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH7EM);
arm_arch8 = ARM_FSET_HAS_CPU1 (insn_flags, FL_ARCH8);
arm_arch8_1 = ARM_FSET_HAS_CPU2 (insn_flags, FL2_ARCH8_1);
+  arm_arch_thumb = ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB);
arm_arch_thumb2 = ARM_FSET_HAS_CPU1 (insn_flags, FL_THUMB2);
arm_arch_xscale = ARM_FSET_HAS_CPU1 (insn_flags, FL_XSCALE);
  



Before patch:

% arm-none-eabi-gcc -dM -march=armv4 -E - < /dev/null | grep ISA_THUMB
cc1: warning: target CPU does not support THUMB instructions
% arm-none-eabi-gcc -dM -march=armv4t -E - < /dev/null | grep ISA_THUMB
#define __ARM_ARCH_ISA_THUMB 1
% arm-none-eabi-gcc -dM -march=armv5 -E - < /dev/null | grep ISA_THUMB
cc1: warning: target CPU does not support THUMB instructions
#define __ARM_ARCH_ISA_THUMB 1
% arm-none-eabi-gcc -dM -march=armv5t -E - < /dev/null | grep ISA_THUMB
#define __ARM_ARCH_ISA_THUMB 1

After patch:

% arm-none-eabi-gcc -dM -march=armv5 -E - < /dev/null | grep ISA_THUMB
cc1: warning: target CPU does not support THUMB instructions
% arm-none-eabi-gcc -dM -march=armv5t -E - < /dev/null | grep ISA_THUMB
#define __ARM_ARCH_ISA_THUMB 1
% arm-none-eabi-gcc -dM -march=armv4 -E - < /dev/null | grep ISA_THUMB
cc1: warning: target CPU does not support THUMB instructions
% arm-none-eabi-gcc -dM -march=armv4t -E - < /dev/null | grep ISA_THUMB
#define __ARM_ARCH_ISA_THUMB 1





Re: [PATCH] Fix PR70434, change FE IL for vector indexing

2016-05-24 Thread Richard Biener
On May 24, 2016 6:17:19 PM GMT+02:00, Jakub Jelinek  wrote:
>On Mon, May 23, 2016 at 04:22:57PM +0200, Richard Biener wrote:
>> *** /dev/null1970-01-01 00:00:00.0 +
>> --- gcc/testsuite/c-c++-common/vector-subscript-5.c  2016-05-23
>16:17:41.148043066 +0200
>> ***
>> *** 0 
>> --- 1,13 
>> + /* { dg-do compile } */
>> + 
>> + typedef int U __attribute__ ((vector_size (16)));
>> + 
>> + int
>> + foo (int i)
>> + {
>> +   register U u
>> + #if __SSE2__
>> +   asm ("xmm0");
>> + #endif
>> +   return u[i];
>> + }
>
>This test fails on i?86 (and supposedly on all non-x86 arches too).

Oops, sorry.  And thanks for the fix.

Richard.

>I've tested following fix and committed as obvious to trunk:
>
>2016-05-24  Jakub Jelinek  
>
>   PR middle-end/70434
>   PR c/69504
>   * c-c++-common/vector-subscript-5.c (foo): Move ; out of the ifdef.
>
>--- gcc/testsuite/c-c++-common/vector-subscript-5.c.jj 2016-05-24
>10:56:00.0 +0200
>+++ gcc/testsuite/c-c++-common/vector-subscript-5.c2016-05-24
>18:11:51.778520055 +0200
>@@ -7,7 +7,8 @@ foo (int i)
> {
>   register U u
> #if __SSE2__
>-  asm ("xmm0");
>+  asm ("xmm0")
> #endif
>+  ;
>   return u[i];
> }
>
>   Jakub




Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread H.J. Lu
On Tue, May 24, 2016 at 9:53 AM, Uros Bizjak  wrote:
> On Tue, May 24, 2016 at 6:22 PM, H.J. Lu  wrote:
>> On Tue, May 24, 2016 at 8:52 AM, Uros Bizjak  wrote:
>>> On Tue, May 24, 2016 at 5:40 PM, H.J. Lu  wrote:
>>>
> No, this is a flag, not a variable. Let's figure out how to extend
> target flags to more than 63 flags first.

 Extending target flags to more than 63 bits requires replacing
 HOST_WIDE_INT with a bit vector.  Since target flags is used in
 TARGET_SUBTARGET_DEFAULT, change it to a bit vector is a
 non-trivial change.  On the other hand, -mgeneral-regs-only is a
 command-line option which doesn't require support for
 TARGET_SUBTARGET_DEFAULT, similar to other -m options like
 -mmitigate-rop.  Using flag_general_regs_only is an option.
>>>
>>> I have been informed that Intel people are looking into how to extend
>>> target flags to accommodate additional ISA flags. There is no point to
>>> hurry with an unoptimal solution. Perhaps you can coordinate your
>>> patch with their efforts?
>>
>> iISA flags use x86_isa_flags, not target_flags.  -mgeneral-regs-only
>> shouldn't use x86_isa_flags.  It was my oversight to use target_flags
>> with -mgeneral-regs-only to begin with.   I don't think using
>> flag_general_regs_only is not an optimal solution, which I should have
>> used in the first place.  The x86 change for interrupt handler depends
>> on -mgeneral-regs-only.
>
> Oh, target_flags is only a 32bit integer :(. Is there a reason it
> can't be extended to HOST_WIDE_INT, as is the case with
> ix86_isa_flags?

target_flags is generic, not target specific.  I want to limit my
change to x86 backend and -mgeneral-regs-only doesn't need
to use target_flags .

-- 
H.J.


Re: [PATCH 2/2][GCC] Add one more pattern to RTL if-conversion

2016-05-24 Thread Mikhail Maltsev
On 05/23/2016 05:15 PM, Kyrill Tkachov wrote:
> 
> expand_simple_binop may fail. I think you should add a check that diff_rtx is
> non-NULL
> and bail out early if it is.
> 
Fixed.

-- 
Regards,
Mikhail Maltsev
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index a9c146b..e1473eb 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1260,6 +1260,7 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
   {
 ST_ADD_FLAG,
 ST_SHIFT_FLAG,
+ST_SHIFT_ADD_FLAG,
 ST_IOR_FLAG
   };
 
@@ -1384,6 +1385,12 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	  normalize = -1;
 	  reversep = true;
 	}
+  else if (exact_log2 (abs_hwi (diff)) >= 0
+	   && (STORE_FLAG_VALUE == 1 || if_info->branch_cost >= 2))
+	{
+	  strategy = ST_SHIFT_ADD_FLAG;
+	  normalize = 1;
+	}
   else
 	return FALSE;
 
@@ -1453,6 +1460,24 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	gen_int_mode (ifalse, mode), if_info->x,
 	0, OPTAB_WIDEN);
 	  break;
+	case ST_SHIFT_ADD_FLAG:
+	  {
+	/* if (test) x = 5; else x = 1;
+	   =>   x = (test != 0) << 2 + 1;  */
+	HOST_WIDE_INT diff_log = exact_log2 (abs_hwi (diff));
+	rtx diff_rtx
+	  = expand_simple_binop (mode, ASHIFT, target, GEN_INT (diff_log),
+ if_info->x, 0, OPTAB_WIDEN);
+	if (!diff_rtx)
+	  {
+		end_sequence ();
+		return false;
+	  }
+	target = expand_simple_binop (mode, (diff < 0) ? MINUS : PLUS,
+	  gen_int_mode (ifalse, mode), diff_rtx,
+	  if_info->x, 0, OPTAB_WIDEN);
+	break;
+	  }
 	}
 
   if (! target)
diff --git a/gcc/testsuite/gcc.dg/ifcvt-6.c b/gcc/testsuite/gcc.dg/ifcvt-6.c
new file mode 100644
index 000..c2cfb17
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ifcvt-6.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target x86_64-*-* } } */
+/* { dg-options "-fdump-rtl-ce1 -O2" } */
+
+int
+test1 (int a)
+{
+  return a % 2 != 0 ? 7 : 3;
+}
+
+/* { dg-final { scan-rtl-dump "3 true changes made" "ce1" } } */
+/* { dg-final { scan-assembler-not "sbbl" } } */


Re: [PATCH #3], Add PowerPC ISA 3.0 vpermr/xxpermr support

2016-05-24 Thread Kelvin Nilsen

I have committed gcc.target/powerpc/p9-vpermr.c to trunk (separately
from the other files mentioned in this ChangeLog), revision 236655.
Approved offline.

On 05/23/2016 05:16 PM, Segher Boessenkool wrote:
> On Mon, May 23, 2016 at 06:22:22PM -0400, Michael Meissner wrote:
>> Here are the patches for xxpermr/vpermr support that are broken out from 
>> fixing
>> the xxperm fusion bug.  I have built a compiler with these patches (and the
>> xxperm patches) and it bootstraps and does not cause a regression.  Are they 
>> ok
>> to add to GCC 7 and eventually to GCC 6.2?
>>
>> [gcc]
>> 2016-05-23  Michael Meissner  
>>  Kelvin Nilsen  
>>
>>  * config/rs6000/rs6000.c (rs6000_expand_vector_set): Generate
>>  vpermr/xxpermr on ISA 3.0.
>>  (altivec_expand_vec_perm_le): Likewise.
>>  * config/rs6000/altivec.md (UNSPEC_VPERMR): New unspec.
>>  (altivec_vpermr__internal): Add VPERMR/XXPERMR support for
>>  ISA 3.0.
>>
>> [gcc/testsuite]
>> 2016-05-23  Michael Meissner  
>>  Kelvin Nilsen  
>>
>>  * gcc.target/powerpc/p9-vpermr.c: New test for ISA 3.0 vpermr
>>  support.
> 
> Okay for trunk.  Okay for 6 after a week or so.
> 
> Thanks,
> 
> 
> Segher
> 
> 

-- 
Kelvin Nilsen, Ph.D.  kdnil...@linux.vnet.ibm.com
home office: 801-756-4821, cell: 520-991-6727
IBM Linux Technology Center - PPC Toolchain



[gomp4.5] Linear clause modifiers

2016-05-24 Thread Jakub Jelinek
Hi!

This patch adds parsing/resolving/translation of linear clause
modifiers, adds support for linear-step that is a uniform dummy argument
and tweaks a couple of further linear clause related things.

Tested on x86_64-linux, committed to gomp-4_5-branch.

2016-05-24  Jakub Jelinek  

* gfortran.h (enum gfc_omp_linear_op): New.
(struct gfc_omp_namelist): Add u.linear_op field.
* openmp.c (gfc_match_omp_clauses): Add support for parsing
linear clause modifiers.
(resolve_omp_clauses): Diagnose linear clause modifiers when not
in declare simd.  Only check for integer type if ref modifier is not
used.  Remove diagnostics for required VALUE attribute.  Diagnose
VALUE attribute with ref or uval modifiers.  Allow non-constant
linear-step, if it is a dummy argument alone and is mentioned in
uniform clause.
* dump-parse-tree.c (show_omp_namelist): Print linear clause
modifiers.
* trans-openmp.c (gfc_trans_omp_clauses): Test declare_simd
instead of block == NULL_TREE.  Translate linear clause modifiers
and clause with uniform dummy argument linear-step.

* gfortran.dg/gomp/declare-simd-2.f90: New test.
* gfortran.dg/gomp/linear-1.f90: New test.

--- gcc/fortran/gfortran.h.jj   2016-05-13 12:37:21.0 +0200
+++ gcc/fortran/gfortran.h  2016-05-23 17:20:09.508803607 +0200
@@ -1134,6 +1134,14 @@ enum gfc_omp_map_op
   OMP_MAP_ALWAYS_TOFROM
 };
 
+enum gfc_omp_linear_op
+{
+  OMP_LINEAR_DEFAULT,
+  OMP_LINEAR_REF,
+  OMP_LINEAR_VAL,
+  OMP_LINEAR_UVAL
+};
+
 /* For use in OpenMP clauses in case we need extra information
(aligned clause alignment, linear clause step, etc.).  */
 
@@ -1146,6 +1154,7 @@ typedef struct gfc_omp_namelist
   gfc_omp_reduction_op reduction_op;
   gfc_omp_depend_op depend_op;
   gfc_omp_map_op map_op;
+  gfc_omp_linear_op linear_op;
 } u;
   struct gfc_omp_namelist_udr *udr;
   struct gfc_omp_namelist *next;
--- gcc/fortran/openmp.c.jj 2016-05-16 17:56:25.0 +0200
+++ gcc/fortran/openmp.c2016-05-24 17:40:34.636152910 +0200
@@ -1092,13 +1092,50 @@ gfc_match_omp_clauses (gfc_omp_clauses *
  end_colon = false;
  head = NULL;
  if ((mask & OMP_CLAUSE_LINEAR)
- && gfc_match_omp_variable_list ("linear (",
- &c->lists[OMP_LIST_LINEAR],
- false, &end_colon,
- &head) == MATCH_YES)
+ && gfc_match ("linear (") == MATCH_YES)
{
+ gfc_omp_linear_op linear_op = OMP_LINEAR_DEFAULT;
  gfc_expr *step = NULL;
 
+ if (gfc_match_omp_variable_list (" ref (",
+  &c->lists[OMP_LIST_LINEAR],
+  false, NULL, &head)
+ == MATCH_YES)
+   linear_op = OMP_LINEAR_REF;
+ else if (gfc_match_omp_variable_list (" val (",
+   &c->lists[OMP_LIST_LINEAR],
+   false, NULL, &head)
+ == MATCH_YES)
+   linear_op = OMP_LINEAR_VAL;
+ else if (gfc_match_omp_variable_list (" uval (",
+   &c->lists[OMP_LIST_LINEAR],
+   false, NULL, &head)
+ == MATCH_YES)
+   linear_op = OMP_LINEAR_UVAL;
+ else if (gfc_match_omp_variable_list ("",
+   &c->lists[OMP_LIST_LINEAR],
+   false, &end_colon, &head)
+ == MATCH_YES)
+   linear_op = OMP_LINEAR_DEFAULT;
+ else
+   {
+ gfc_free_omp_namelist (*head);
+ gfc_current_locus = old_loc;
+ *head = NULL;
+ break;
+   }
+ if (linear_op != OMP_LINEAR_DEFAULT)
+   {
+ if (gfc_match (" :") == MATCH_YES)
+   end_colon = true;
+ else if (gfc_match (" )") != MATCH_YES)
+   {
+ gfc_free_omp_namelist (*head);
+ gfc_current_locus = old_loc;
+ *head = NULL;
+ break;
+   }
+   }
  if (end_colon && gfc_match (" %e )", &step) != MATCH_YES)
{
  gfc_free_omp_namelist (*head);
@@ -1114,6 +1151,9 @@ gfc_match_omp_clauses (gfc_omp_clauses *
  mpz_set_si (step->value.integer, 1);
}
  (*head)->expr = step;
+ if (linear_op != OMP_LINEAR_DEFAULT)
+   for (gfc_omp_namelist *n = *head; n; n = n->next)
+

Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 7:18 PM, H.J. Lu  wrote:

>> Oh, target_flags is only a 32bit integer :(. Is there a reason it
>> can't be extended to HOST_WIDE_INT, as is the case with
>> ix86_isa_flags?
>
> target_flags is generic, not target specific.  I want to limit my
> change to x86 backend and -mgeneral-regs-only doesn't need
> to use target_flags .

I have thrown together a quick patch that defines target_flags as HOST_WIDE_INT.

(Patch still needs a small correction, so opth-gen.awk will emit
HOST_WIDE_INT_1 for MASK_* defines, have to go now, but I was able to
compile functional x86_64-apple-darwin15.5.0 crosscompiler.)

Uros.
Index: common/config/i386/i386-common.c
===
--- common/config/i386/i386-common.c(revision 236644)
+++ common/config/i386/i386-common.c(working copy)
@@ -223,6 +223,11 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA_RDRND_UNSET OPTION_MASK_ISA_RDRND
 #define OPTION_MASK_ISA_F16C_UNSET OPTION_MASK_ISA_F16C
 
+#define OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET \
+  (OPTION_MASK_ISA_MMX_UNSET \
+   | OPTION_MASK_ISA_SSE_UNSET \
+   | OPTION_MASK_ISA_MPX)
+
 /* Implement TARGET_HANDLE_OPTION.  */
 
 bool
@@ -236,6 +241,21 @@ ix86_handle_option (struct gcc_options *opts,
 
   switch (code)
 {
+case OPT_mgeneral_regs_only:
+  if (value)
+   {
+ /* Disable MPX, MMX, SSE and x87 instructions if only the
+general registers are allowed..  */
+ opts->x_ix86_isa_flags
+   &= ~OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET;
+ opts->x_ix86_isa_flags_explicit
+   |= OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET;
+ opts->x_target_flags &= ~MASK_80387;
+   }
+  else
+   gcc_unreachable ();
+  return true;
+
 case OPT_mmmx:
   if (value)
{
Index: common.opt
===
--- common.opt  (revision 236644)
+++ common.opt  (working copy)
@@ -23,7 +23,7 @@
 ; Please try to keep this file in ASCII collating order.
 
 Variable
-int target_flags
+HOST_WIDE_INT target_flags
 
 Variable
 int optimize
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 236645)
+++ config/i386/i386.c  (working copy)
@@ -5337,7 +5337,10 @@ ix86_option_override_internal (bool main_args_p,
&& !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_PKU))
  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_PKU;
 
-   if (!(opts_set->x_target_flags & MASK_80387))
+   /* Don't enable x87 instructions if only the general registers
+  are allowed.  */
+   if (!(opts_set->x_target_flags & MASK_GENERAL_REGS_ONLY)
+   && !(opts_set->x_target_flags & MASK_80387))
  {
if (processor_alias_table[i].flags & PTA_NO_80387)
  opts->x_target_flags &= ~MASK_80387;
Index: config/i386/i386.opt
===
--- config/i386/i386.opt(revision 236644)
+++ config/i386/i386.opt(working copy)
@@ -74,7 +74,7 @@ HOST_WIDE_INT x_ix86_isa_flags_explicit
 
 ;; which flags were passed by the user
 Variable
-int ix86_target_flags_explicit
+HOST_WIDE_INT ix86_target_flags_explicit
 
 ;; which flags were passed by the user
 TargetSave
@@ -897,3 +897,7 @@ Enum(stack_protector_guard) String(global) Value(S
 mmitigate-rop
 Target Var(flag_mitigate_rop) Init(0)
 Attempt to avoid generating instruction sequences containing ret bytes.
+
+mgeneral-regs-only
+Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Save
+Generate code which uses only the general registers.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 236644)
+++ doc/invoke.texi (working copy)
@@ -1173,7 +1173,7 @@ See RS/6000 and PowerPC Options.
 -msse2avx -mfentry -mrecord-mcount -mnop-mcount -m8bit-idiv @gol
 -mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol
 -malign-data=@var{type} -mstack-protector-guard=@var{guard} @gol
--mmitigate-rop}
+-mmitigate-rop -mgeneral-regs-only}
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol
@@ -24298,6 +24298,12 @@ opcodes, to mitigate against certain forms of atta
 this option is limited in what it can do and should not be relied
 on to provide serious protection.
 
+@item -mgeneral-regs-only
+@opindex mgeneral-regs-only
+Generate code that uses only the general-purpose registers.  This
+prevents the compiler from using floating-point, vector, mask and bound
+registers.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above
Index: doc/tm.texi
===
--- doc/tm.texi (revision 236644)
+++ doc/tm.texi (working copy)
@@ -652,7 +652,7 @@ macro to define @code{__ELF__}, so you probably do
 it yourself.
 @e

Re: [PATCH] Fix up Yr constraint

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 6:55 PM, Jakub Jelinek  wrote:
> Hi!
>
> The Yr constraint contrary to what has been said when it has been submitted
> actually is always NO_REX_SSE_REGS or NO_REGS, never ALL_SSE_REGS, so
> the RA restriction to only the first 8 regs is done no matter what we tune
> for.
>
> This is because we test X86_TUNE_AVOID_4BYTE_PREFIXES, which is an enum
> value (59), rather than actually checking if the tune flag.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-05-24  Jakub Jelinek  
>
> * config/i386/i386.h (TARGET_AVOID_4BYTE_PREFIXES): Define.
> * config/i386/constraints.md (Yr): Test TARGET_AVOID_4BYTE_PREFIXES
> rather than X86_TUNE_AVOID_4BYTE_PREFIXES.

Uh, another brown-paper bag bug...

OK everywhere.

Thanks,
Uros.

> --- gcc/config/i386/i386.h.jj   2016-05-24 10:56:02.0 +0200
> +++ gcc/config/i386/i386.h  2016-05-24 15:13:05.715906018 +0200
> @@ -465,6 +465,8 @@ extern unsigned char ix86_tune_features[
> ix86_tune_features[X86_TUNE_SLOW_PSHUFB]
>  #define TARGET_VECTOR_PARALLEL_EXECUTION \
> ix86_tune_features[X86_TUNE_VECTOR_PARALLEL_EXECUTION]
> +#define TARGET_AVOID_4BYTE_PREFIXES \
> +   ix86_tune_features[X86_TUNE_AVOID_4BYTE_PREFIXES]
>  #define TARGET_FUSE_CMP_AND_BRANCH_32 \
> ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32]
>  #define TARGET_FUSE_CMP_AND_BRANCH_64 \
> --- gcc/config/i386/constraints.md.jj   2016-05-12 10:29:41.0 +0200
> +++ gcc/config/i386/constraints.md  2016-05-24 15:14:21.647914550 +0200
> @@ -142,7 +142,7 @@ (define_register_constraint "Yf"
>   "@internal Any x87 register when 80387 FP arithmetic is enabled.")
>
>  (define_register_constraint "Yr"
> - "TARGET_SSE ? (X86_TUNE_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : 
> ALL_SSE_REGS) : NO_REGS"
> + "TARGET_SSE ? (TARGET_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : 
> ALL_SSE_REGS) : NO_REGS"
>   "@internal Lower SSE register when avoiding REX prefix and all SSE 
> registers otherwise.")
>
>  (define_register_constraint "Yv"
>
> Jakub


Re: [PATCH] Fix Yr constraint uses in vpmov* insns

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 6:49 PM, Jakub Jelinek  wrote:
> Hi!
>
> Looking at the Yr constraint, it seems to me it is really meant to be used
> for noavx, only in that case whether we use xmm0-xmm7 or xmm8+ matters for
> the size of the instruction (number of prefixes).
> In most of the places where Yr is used, we typically have 2 noavx
> alternatives, one with Yr constraint, another one with *x, and then one
> avx alternative with x or v.
>
> But in a couple of spots we do the wrong thing, e.g. use Yr constraint
> always (which (ought to act, see a later patch) acts as first half of x
> for -mtune={silvermont,intel} and otherwise as v, and otherwise
> uses *, which means limiting RA unnecessarily.
>
> The following patch fixes the vpmov* insns.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Uros.

> 2016-05-24  Jakub Jelinek  
>
> * config/i386/sse.md (sse4_1_v8qiv8hi2): Limit
> first two alternatives to noavx, use *x instead of *v in the second
> one, add avx alternative without *.
> (sse4_1_v4qiv4si2, sse4_1_v4hiv4si2,
> sse4_1_v2qiv2di2, sse4_1_v2hiv2di2,
> sse4_1_v2siv2di2): Likewise.
>
> --- gcc/config/i386/sse.md.jj   2016-05-24 10:55:52.0 +0200
> +++ gcc/config/i386/sse.md  2016-05-24 14:50:14.566277449 +0200
> @@ -14748,19 +14752,20 @@ (define_insn "avx512bw_v32qiv32hi2
> (set_attr "mode" "XI")])
>
>  (define_insn "sse4_1_v8qiv8hi2"
> -  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*v")
> +  [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
> (any_extend:V8HI
>   (vec_select:V8QI
> -   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*vm")
> +   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*xm,vm")
> (parallel [(const_int 0) (const_int 1)
>(const_int 2) (const_int 3)
>(const_int 4) (const_int 5)
>(const_int 6) (const_int 7)]]
>"TARGET_SSE4_1 &&  && "
>"%vpmovbw\t{%1, %0|%0, %q1}"
> -  [(set_attr "type" "ssemov")
> +  [(set_attr "isa" "noavx,noavx,avx")
> +   (set_attr "type" "ssemov")
> (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "maybe_vex")
> +   (set_attr "prefix" "orig,orig,maybe_evex")
> (set_attr "mode" "TI")])
>
>  (define_insn "avx512f_v16qiv16si2"
> @@ -14790,17 +14795,18 @@ (define_insn "avx2_v8qiv8si2 (set_attr "mode" "OI")])
>
>  (define_insn "sse4_1_v4qiv4si2"
> -  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*v")
> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
> (any_extend:V4SI
>   (vec_select:V4QI
> -   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*vm")
> +   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*xm,vm")
> (parallel [(const_int 0) (const_int 1)
>(const_int 2) (const_int 3)]]
>"TARGET_SSE4_1 && "
>"%vpmovbd\t{%1, %0|%0, %k1}"
> -  [(set_attr "type" "ssemov")
> +  [(set_attr "isa" "noavx,noavx,avx")
> +   (set_attr "type" "ssemov")
> (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "maybe_vex")
> +   (set_attr "prefix" "orig,orig,maybe_evex")
> (set_attr "mode" "TI")])
>
>  (define_insn "avx512f_v16hiv16si2"
> @@ -14825,17 +14831,18 @@ (define_insn "avx2_v8hiv8si2 (set_attr "mode" "OI")])
>
>  (define_insn "sse4_1_v4hiv4si2"
> -  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*v")
> +  [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v")
> (any_extend:V4SI
>   (vec_select:V4HI
> -   (match_operand:V8HI 1 "nonimmediate_operand" "Yrm,*vm")
> +   (match_operand:V8HI 1 "nonimmediate_operand" "Yrm,*xm,vm")
> (parallel [(const_int 0) (const_int 1)
>(const_int 2) (const_int 3)]]
>"TARGET_SSE4_1 && "
>"%vpmovwd\t{%1, %0|%0, %q1}"
> -  [(set_attr "type" "ssemov")
> +  [(set_attr "isa" "noavx,noavx,avx")
> +   (set_attr "type" "ssemov")
> (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "maybe_vex")
> +   (set_attr "prefix" "orig,orig,maybe_evex")
> (set_attr "mode" "TI")])
>
>  (define_insn "avx512f_v8qiv8di2"
> @@ -14868,16 +14875,17 @@ (define_insn "avx2_v4qiv4di2 (set_attr "mode" "OI")])
>
>  (define_insn "sse4_1_v2qiv2di2"
> -  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*v")
> +  [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v")
> (any_extend:V2DI
>   (vec_select:V2QI
> -   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*vm")
> +   (match_operand:V16QI 1 "nonimmediate_operand" "Yrm,*xm,vm")
> (parallel [(const_int 0) (const_int 1)]]
>"TARGET_SSE4_1 && "
>"%vpmovbq\t{%1, %0|%0, %w1}"
> -  [(set_attr "type" "ssemov")
> +  [(set_attr "isa" "noavx,noavx,avx")
> +   (set_attr "type" "ssemov")
> (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "maybe_vex")
> +   (set_attr "prefix" "orig,orig

Re: [PATCH] Fix Yr constraint uses in various insns

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 6:50 PM, Jakub Jelinek  wrote:
> Hi!
>
> Similarly to the last patch, this one fixes various misc patterns.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-05-24  Jakub Jelinek  
>
> * config/i386/sse.md (vec_set_0): Use sse4_noavx isa instead
> of sse4 for the first alternative, drop %v from the template
> and d operand modifier.  Split second alternative into one sse4_noavx
> and one avx alternative, use *x instead of *v in the former and v
> instead of *v in the latter.
> (*sse4_1_extractps): Use noavx isa instead of * for the first
> alternative, drop %v from the template.  Split second alternative into
> one noavx and one avx alternative, use *x instead of *v in the
> former and v instead of *v in the latter.
> (_movntdqa): Guard the first 2 alternatives
> with noavx and the last one with avx.
> (sse4_1_phminposuw): Guard first alternative with noavx isa,
> split the second one into one noavx and one avx alternative,
> use *x and Bm in the former and x and m in the latter one.
> (_ptest): Use noavx instead of * for the first two
> alternatives.

OK.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2016-05-24 10:55:52.0 +0200
> +++ gcc/config/i386/sse.md  2016-05-24 14:50:14.566277449 +0200
> @@ -6623,18 +6623,19 @@ (define_expand "vec_init"
>  ;; see comment above inline_secondary_memory_needed function in i386.c
>  (define_insn "vec_set_0"
>[(set (match_operand:VI4F_128 0 "nonimmediate_operand"
> - "=Yr,*v,v,Yi,x,x,v,Yr ,*x ,x  ,m ,m   ,m")
> + "=Yr,*x,v,v,Yi,x,x,v,Yr ,*x ,x  ,m ,m   ,m")
> (vec_merge:VI4F_128
>   (vec_duplicate:VI4F_128
> (match_operand: 2 "general_operand"
> - " Yr,*v,m,r ,m,x,v,*rm,*rm,*rm,!x,!*re,!*fF"))
> + " Yr,*x,v,m,r ,m,x,v,*rm,*rm,*rm,!x,!*re,!*fF"))
>   (match_operand:VI4F_128 1 "vector_move_operand"
> - " C , C,C,C ,C,0,v,0  ,0  ,x  ,0 ,0   ,0")
> + " C , C,C,C,C ,C,0,v,0  ,0  ,x  ,0 ,0   ,0")
>   (const_int 1)))]
>"TARGET_SSE"
>"@
> -   %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
> -   %vinsertps\t{$0xe, %d2, %0|%0, %d2, 0xe}
> +   insertps\t{$0xe, %2, %0|%0, %2, 0xe}
> +   insertps\t{$0xe, %2, %0|%0, %2, 0xe}
> +   vinsertps\t{$0xe, %2, %2, %0|%0, %2, %2, 0xe}
> %vmov\t{%2, %0|%0, %2}
> %vmovd\t{%2, %0|%0, %2}
> movss\t{%2, %0|%0, %2}
> @@ -6646,20 +6647,20 @@ (define_insn "vec_set_0"
> #
> #
> #"
> -  [(set_attr "isa" 
> "sse4,sse4,sse2,sse2,noavx,noavx,avx,sse4_noavx,sse4_noavx,avx,*,*,*")
> +  [(set_attr "isa" 
> "sse4_noavx,sse4_noavx,avx,sse2,sse2,noavx,noavx,avx,sse4_noavx,sse4_noavx,avx,*,*,*")
> (set (attr "type")
> - (cond [(eq_attr "alternative" "0,1,7,8,9")
> + (cond [(eq_attr "alternative" "0,1,2,8,9,10")
>   (const_string "sselog")
> -   (eq_attr "alternative" "11")
> - (const_string "imov")
> (eq_attr "alternative" "12")
> + (const_string "imov")
> +   (eq_attr "alternative" "13")
>   (const_string "fmov")
>]
>(const_string "ssemov")))
> -   (set_attr "prefix_extra" "*,*,*,*,*,*,*,1,1,1,*,*,*")
> -   (set_attr "length_immediate" "*,*,*,*,*,*,*,1,1,1,*,*,*")
> -   (set_attr "prefix" 
> "maybe_vex,maybe_vex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex,*,*,*")
> -   (set_attr "mode" "SF,SF,,SI,SF,SF,SF,TI,TI,TI,*,*,*")])
> +   (set_attr "prefix_extra" "*,*,*,*,*,*,*,*,1,1,1,*,*,*")
> +   (set_attr "length_immediate" "*,*,*,*,*,*,*,*,1,1,1,*,*,*")
> +   (set_attr "prefix" 
> "orig,orig,maybe_evex,maybe_vex,maybe_vex,orig,orig,vex,orig,orig,vex,*,*,*")
> +   (set_attr "mode" "SF,SF,SF,,SI,SF,SF,SF,TI,TI,TI,*,*,*")])
>
>  ;; A subset is vec_setv4sf.
>  (define_insn "*vec_setv4sf_sse4_1"
> @@ -6761,14 +6762,15 @@ (define_insn_and_split "*vec_extractv4sf
>"operands[1] = gen_lowpart (SFmode, operands[1]);")
>
>  (define_insn_and_split "*sse4_1_extractps"
> -  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,v,v")
> +  [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,rm,rm,v,v")
> (vec_select:SF
> - (match_operand:V4SF 1 "register_operand" "Yr,*v,0,v")
> - (parallel [(match_operand:SI 2 "const_0_to_3_operand" 
> "n,n,n,n")])))]
> + (match_operand:V4SF 1 "register_operand" "Yr,*x,v,0,v")
> + (parallel [(match_operand:SI 2 "const_0_to_3_operand" 
> "n,n,n,n,n")])))]
>"TARGET_SSE4_1"
>"@
> -   %vextractps\t{%2, %1, %0|%0, %1, %2}
> -   %vextractps\t{%2, %1, %0|%0, %1, %2}
> +   extractps\t{%2, %1, %0|%0, %1, %2}
> +   extractps\t{%2, %1, %0|%0, %1, %2}
> +   vextractps\t{%2, %1, %0|%0, %1, %2}
> #
> #"
>"&& reload_completed && SSE_REG_P (operands[0])"
> @@ -6793,13 +6795,13 @@ (define_insn_and_split "*sse4_1_extractp
>  }
>DONE;
>  }
> -  [(set_attr "isa"

Re: [PATCH] Fix one more Yr use

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 6:52 PM, Jakub Jelinek  wrote:
> Hi!
>
> Another case (separate patch because I thought I should add an avx512f
> alternative here, but later on found out it is already handled by
> having the vrndscale* patterns defined before these ones
> and having the same RTL for them (except allowing 0 to 255 instead
> of just 0 to 15).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-05-24  Jakub Jelinek  
>
> * config/i386/sse.md (_round):
> Limit 1st alternative to noavx isa, split 2nd alternative into one
> noavx and one avx alternative, use *x and Bm in the former and
> x and m in the latter.

OK.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2016-05-24 10:55:52.0 +0200
> +++ gcc/config/i386/sse.md  2016-05-24 14:50:14.566277449 +0200
> @@ -14986,22 +14996,19 @@ (define_insn "_ptest"
> (set_attr "mode" "")])
>
>  (define_insn "_round"
> -  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x")
> +  [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
> (unspec:VF_128_256
> - [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm")
> -  (match_operand:SI 2 "const_0_to_15_operand" "n,n")]
> + [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm,xm")
> +  (match_operand:SI 2 "const_0_to_15_operand" "n,n,n")]
>   UNSPEC_ROUND))]
>"TARGET_ROUND"
>"%vround\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "type" "ssecvt")
> -   (set (attr "prefix_data16")
> - (if_then_else
> -   (match_test "TARGET_AVX")
> - (const_string "*")
> - (const_string "1")))
> +  [(set_attr "isa" "noavx,noavx,avx")
> +   (set_attr "type" "ssecvt")
> +   (set_attr "prefix_data16" "1,1,*")
> (set_attr "prefix_extra" "1")
> (set_attr "length_immediate" "1")
> -   (set_attr "prefix" "maybe_vex")
> +   (set_attr "prefix" "orig,orig,vex")
> (set_attr "mode" "")])
>
>  (define_expand "_round_sfix"
>
> Jakub


Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 8:15 PM, Uros Bizjak  wrote:
> On Tue, May 24, 2016 at 7:18 PM, H.J. Lu  wrote:
>
>>> Oh, target_flags is only a 32bit integer :(. Is there a reason it
>>> can't be extended to HOST_WIDE_INT, as is the case with
>>> ix86_isa_flags?
>>
>> target_flags is generic, not target specific.  I want to limit my
>> change to x86 backend and -mgeneral-regs-only doesn't need
>> to use target_flags .
>
> I have thrown together a quick patch that defines target_flags as 
> HOST_WIDE_INT.
>
> (Patch still needs a small correction, so opth-gen.awk will emit
> HOST_WIDE_INT_1 for MASK_* defines, have to go now, but I was able to
> compile functional x86_64-apple-darwin15.5.0 crosscompiler.)

And here is attached complete (but untested!!) patch that should "just
work"(TM).

Uros.
Index: common/config/i386/i386-common.c
===
--- common/config/i386/i386-common.c(revision 236644)
+++ common/config/i386/i386-common.c(working copy)
@@ -223,6 +223,11 @@
 #define OPTION_MASK_ISA_RDRND_UNSET OPTION_MASK_ISA_RDRND
 #define OPTION_MASK_ISA_F16C_UNSET OPTION_MASK_ISA_F16C
 
+#define OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET \
+  (OPTION_MASK_ISA_MMX_UNSET \
+   | OPTION_MASK_ISA_SSE_UNSET \
+   | OPTION_MASK_ISA_MPX)
+
 /* Implement TARGET_HANDLE_OPTION.  */
 
 bool
@@ -236,6 +241,21 @@
 
   switch (code)
 {
+case OPT_mgeneral_regs_only:
+  if (value)
+   {
+ /* Disable MPX, MMX, SSE and x87 instructions if only the
+general registers are allowed..  */
+ opts->x_ix86_isa_flags
+   &= ~OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET;
+ opts->x_ix86_isa_flags_explicit
+   |= OPTION_MASK_ISA_GENERAL_REGS_ONLY_UNSET;
+ opts->x_target_flags &= ~MASK_80387;
+   }
+  else
+   gcc_unreachable ();
+  return true;
+
 case OPT_mmmx:
   if (value)
{
Index: common.opt
===
--- common.opt  (revision 236644)
+++ common.opt  (working copy)
@@ -23,7 +23,7 @@
 ; Please try to keep this file in ASCII collating order.
 
 Variable
-int target_flags
+HOST_WIDE_INT target_flags
 
 Variable
 int optimize
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 236645)
+++ config/i386/i386.c  (working copy)
@@ -5337,7 +5337,10 @@
&& !(opts->x_ix86_isa_flags_explicit & OPTION_MASK_ISA_PKU))
  opts->x_ix86_isa_flags |= OPTION_MASK_ISA_PKU;
 
-   if (!(opts_set->x_target_flags & MASK_80387))
+   /* Don't enable x87 instructions if only the general registers
+  are allowed.  */
+   if (!(opts_set->x_target_flags & MASK_GENERAL_REGS_ONLY)
+   && !(opts_set->x_target_flags & MASK_80387))
  {
if (processor_alias_table[i].flags & PTA_NO_80387)
  opts->x_target_flags &= ~MASK_80387;
Index: config/i386/i386.opt
===
--- config/i386/i386.opt(revision 236644)
+++ config/i386/i386.opt(working copy)
@@ -74,7 +74,7 @@
 
 ;; which flags were passed by the user
 Variable
-int ix86_target_flags_explicit
+HOST_WIDE_INT ix86_target_flags_explicit
 
 ;; which flags were passed by the user
 TargetSave
@@ -897,3 +897,7 @@
 mmitigate-rop
 Target Var(flag_mitigate_rop) Init(0)
 Attempt to avoid generating instruction sequences containing ret bytes.
+
+mgeneral-regs-only
+Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Save
+Generate code which uses only the general registers.
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 236644)
+++ doc/invoke.texi (working copy)
@@ -1173,7 +1173,7 @@
 -msse2avx -mfentry -mrecord-mcount -mnop-mcount -m8bit-idiv @gol
 -mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol
 -malign-data=@var{type} -mstack-protector-guard=@var{guard} @gol
--mmitigate-rop}
+-mmitigate-rop -mgeneral-regs-only}
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol
@@ -24298,6 +24298,12 @@
 this option is limited in what it can do and should not be relied
 on to provide serious protection.
 
+@item -mgeneral-regs-only
+@opindex mgeneral-regs-only
+Generate code that uses only the general-purpose registers.  This
+prevents the compiler from using floating-point, vector, mask and bound
+registers.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above
Index: doc/tm.texi
===
--- doc/tm.texi (revision 236644)
+++ doc/tm.texi (working copy)
@@ -652,7 +652,7 @@
 it yourself.
 @end defmac
 
-@deftypevar {extern int} target_flags
+@deftypevar {extern HOST_WIDE_INT} target_flags
 This variable is declared in @file{options.h}, which is included before
 any target-specific headers.
 @end deft

Re: [PATCH] c++/71147 - [6 Regression] Flexible array member wrongly rejected in template

2016-05-24 Thread Jason Merrill

On 05/24/2016 12:15 PM, Martin Sebor wrote:

+  else if (TREE_CODE (type) == ARRAY_TYPE /* && TYPE_DOMAIN (type) */)


Why is this commented out rather than removed in this version of the 
patch?  Let's remove it, as before.  OK with that change.


Jason



Re: [PATCH] Fix up Yr constraint

2016-05-24 Thread Jakub Jelinek
On Tue, May 24, 2016 at 08:35:12PM +0200, Uros Bizjak wrote:
> On Tue, May 24, 2016 at 6:55 PM, Jakub Jelinek  wrote:
> > Hi!
> >
> > The Yr constraint contrary to what has been said when it has been submitted
> > actually is always NO_REX_SSE_REGS or NO_REGS, never ALL_SSE_REGS, so
> > the RA restriction to only the first 8 regs is done no matter what we tune
> > for.
> >
> > This is because we test X86_TUNE_AVOID_4BYTE_PREFIXES, which is an enum
> > value (59), rather than actually checking if the tune flag.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > 2016-05-24  Jakub Jelinek  
> >
> > * config/i386/i386.h (TARGET_AVOID_4BYTE_PREFIXES): Define.
> > * config/i386/constraints.md (Yr): Test TARGET_AVOID_4BYTE_PREFIXES
> > rather than X86_TUNE_AVOID_4BYTE_PREFIXES.
> 
> Uh, another brown-paper bag bug...
> 
> OK everywhere.

I fear it might be too dangerous for -mavx512* for the branches; I went
through all the Yr uses on the trunk, but not on the branches.
Would you be ok with using 
"TARGET_SSE ? (TARGET_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : SSE_REGS) : 
NO_REGS"
on the branches instead?
Or I guess we could use it on the trunk too, it should make no difference there
(because on the trunk it is only used when !TARGET_AVX).
Or maybe even
"TARGET_SSE ? ((TARGET_AVOID_4BYTE_PREFIXES && !TARGET_AVX) ? NO_REX_SSE_REGS : 
SSE_REGS) : NO_REGS"
(again, should make zero difference on the trunk, but might be better for
the branches).

> > --- gcc/config/i386/i386.h.jj   2016-05-24 10:56:02.0 +0200
> > +++ gcc/config/i386/i386.h  2016-05-24 15:13:05.715906018 +0200
> > @@ -465,6 +465,8 @@ extern unsigned char ix86_tune_features[
> > ix86_tune_features[X86_TUNE_SLOW_PSHUFB]
> >  #define TARGET_VECTOR_PARALLEL_EXECUTION \
> > ix86_tune_features[X86_TUNE_VECTOR_PARALLEL_EXECUTION]
> > +#define TARGET_AVOID_4BYTE_PREFIXES \
> > +   ix86_tune_features[X86_TUNE_AVOID_4BYTE_PREFIXES]
> >  #define TARGET_FUSE_CMP_AND_BRANCH_32 \
> > ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_32]
> >  #define TARGET_FUSE_CMP_AND_BRANCH_64 \
> > --- gcc/config/i386/constraints.md.jj   2016-05-12 10:29:41.0 +0200
> > +++ gcc/config/i386/constraints.md  2016-05-24 15:14:21.647914550 +0200
> > @@ -142,7 +142,7 @@ (define_register_constraint "Yf"
> >   "@internal Any x87 register when 80387 FP arithmetic is enabled.")
> >
> >  (define_register_constraint "Yr"
> > - "TARGET_SSE ? (X86_TUNE_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : 
> > ALL_SSE_REGS) : NO_REGS"
> > + "TARGET_SSE ? (TARGET_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : 
> > ALL_SSE_REGS) : NO_REGS"
> >   "@internal Lower SSE register when avoiding REX prefix and all SSE 
> > registers otherwise.")
> >
> >  (define_register_constraint "Yv"

Jakub


Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread H.J. Lu
On Tue, May 24, 2016 at 11:44 AM, Uros Bizjak  wrote:
> On Tue, May 24, 2016 at 8:15 PM, Uros Bizjak  wrote:
>> On Tue, May 24, 2016 at 7:18 PM, H.J. Lu  wrote:
>>
 Oh, target_flags is only a 32bit integer :(. Is there a reason it
 can't be extended to HOST_WIDE_INT, as is the case with
 ix86_isa_flags?
>>>
>>> target_flags is generic, not target specific.  I want to limit my
>>> change to x86 backend and -mgeneral-regs-only doesn't need
>>> to use target_flags .
>>
>> I have thrown together a quick patch that defines target_flags as 
>> HOST_WIDE_INT.
>>
>> (Patch still needs a small correction, so opth-gen.awk will emit
>> HOST_WIDE_INT_1 for MASK_* defines, have to go now, but I was able to
>> compile functional x86_64-apple-darwin15.5.0 crosscompiler.)
>
> And here is attached complete (but untested!!) patch that should "just
> work"(TM).
>

-mgeneral-regs-only doesn't need to use target_flags and it shouldn't
use target_flags.


-- 
H.J.


Re: [patch,openacc] use firstprivate pointers for subarrays in c and c++

2016-05-24 Thread Cesar Philippidis
On 05/23/2016 11:09 PM, Jakub Jelinek wrote:
> On Mon, May 23, 2016 at 07:31:53PM -0700, Cesar Philippidis wrote:
>> @@ -12559,7 +12560,7 @@ c_finish_omp_clauses (tree clauses, enum 
>> c_omp_region_type ort)
>>t = OMP_CLAUSE_DECL (c);
>>if (TREE_CODE (t) == TREE_LIST)
>>  {
>> -  if (handle_omp_array_sections (c, ort & C_ORT_OMP))
>> +  if (handle_omp_array_sections (c, ort & (C_ORT_OMP | C_ORT_ACC)))
>>  {
>>remove = true;
>>break;
> 
> You haven't touched the /c/ handle_omp_array_sections{,_1}.  As I said, I 
> believe
> you can just drop the is_omp argument altogether (unlike C++), or, pass for
> consistency ort itself there as well.  But I bet the argument will be
> unused.

OK, I removed is_omp. I only had to guard one call to
handle_omp_array_sections from c_finish_omp_clauses because OpenACC
doesn't support array reductions.

Is this OK for trunk?

Cesar

2016-05-24  Cesar Philippidis  

	gcc/c
	* c-parser.c (c_parser_oacc_declare): Add support for
	GOMP_MAP_FIRSTPRIVATE_POINTER.
	* c-typeck.c (handle_omp_array_sections_1): Remove is_omp argument.
	(handle_omp_array_sections): Likewise.
	(c_finish_omp_clauses): Add specific errors and warning messages for
	OpenACC.  Use firsrtprivate pointers for OpenACC subarrays.  Update
	calls to handle_omp_array_sections.

	gcc/cp/
	* parser.c (cp_parser_oacc_declare): Add support for
	GOMP_MAP_FIRSTPRIVATE_POINTER.
	* semantics.c (handle_omp_array_sections_1): Replace bool is_omp
	argument with enum c_omp_region_type ort.  Don't privatize OpenACC
	non-static members.
	(handle_omp_array_sections): Replace bool is_omp argument with enum
	c_omp_region_type ort.  Update call to handle_omp_array_sections_1.
	(finish_omp_clauses): Add specific errors and warning messages for
	OpenACC.  Use firsrtprivate pointers for OpenACC subarrays.  Update
	call to handle_omp_array_sections.

	gcc/
	* gimplify.c (omp_notice_variable): Use zero-length arrays for data
	pointers inside OACC_DATA regions.
	(gimplify_scan_omp_clauses): Prune firstprivate clause associated
	with OACC_DATA, OACC_ENTER_DATA and OACC_EXIT data regions.
	(gimplify_adjust_omp_clauses): Fix typo in comment.

	gcc/testsuite/
	* c-c++-common/goacc/data-clause-duplicate-1.c: Adjust test.
	* c-c++-common/goacc/deviceptr-1.c: Likewise.
	* c-c++-common/goacc/kernels-alias-3.c: Likewise.
	* c-c++-common/goacc/kernels-alias-4.c: Likewise.
	* c-c++-common/goacc/kernels-alias-5.c: Likewise.
	* c-c++-common/goacc/kernels-alias-8.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
	* c-c++-common/goacc/pcopy.c: Likewise.
	* c-c++-common/goacc/pcopyin.c: Likewise.
	* c-c++-common/goacc/pcopyout.c: Likewise.
	* c-c++-common/goacc/pcreate.c: Likewise.
	* c-c++-common/goacc/pr70688.c: New test.
	* c-c++-common/goacc/present-1.c: Adjust test.
	* c-c++-common/goacc/reduction-5.c: Likewise.
	* g++.dg/goacc/data-1.C: New test.

	libgomp/
	* oacc-mem.c (acc_malloc): Update handling of shared-memory targets.
	(acc_free): Likewise.
	(acc_memcpy_to_device): Likewise.
	(acc_memcpy_from_device): Likewise.
	(acc_deviceptr): Likewise.
	(acc_hostptr): Likewise.
	(acc_is_present): Likewise.
	(acc_map_data): Likewise.
	(acc_unmap_data): Likewise.
	(present_create_copy): Likewise.
	(delete_copyout): Likewise.
	(update_dev_host): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Remove xfail.
	* testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/data-2.c: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/data-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/enter_exit-lib.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/lib-13.c: Adjust test so that
	it only runs on nvptx targets.
	* testsuite/libgomp.oacc-c-c++-common/lib-14.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-15.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-16.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-17.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-18.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-20.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-21.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-22.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-23.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-24.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-25.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-28.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-29.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-30.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-34.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-42.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-43.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-44.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-47.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/lib-48.c: Likewise.
	* testsuite/libgomp.oacc-c-c+

Re: [patch,openacc] use firstprivate pointers for subarrays in c and c++

2016-05-24 Thread Jakub Jelinek
On Tue, May 24, 2016 at 12:16:35PM -0700, Cesar Philippidis wrote:
> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -11939,8 +11939,7 @@ c_finish_omp_cancellation_point (location_t loc, tree 
> clauses)
>  
>  static tree
>  handle_omp_array_sections_1 (tree c, tree t, vec &types,
> -  bool &maybe_zero_len, unsigned int &first_non_one,
> -  bool is_omp)
> +  bool &maybe_zero_len, unsigned int &first_non_one)
>  {
>tree ret, low_bound, length, type;
>if (TREE_CODE (t) != TREE_LIST)
> @@ -11949,7 +11948,6 @@ handle_omp_array_sections_1 (tree c, tree t, 
> vec &types,
>   return error_mark_node;
>ret = t;
>if (TREE_CODE (t) == COMPONENT_REF
> -   && is_omp

Sorry, I've missed this one.  The patch is ok if you add on top of the
current patch ort argument to c-typeck.c (handle_omp_array_sections{,_1})
and use here && ort == C_ORT_OMP like in the C++ FE.

Jakub


Re: [PATCH] c++/71147 - [6 Regression] Flexible array member wrongly rejected in template

2016-05-24 Thread Martin Sebor

On 05/24/2016 12:51 PM, Jason Merrill wrote:

On 05/24/2016 12:15 PM, Martin Sebor wrote:

+  else if (TREE_CODE (type) == ARRAY_TYPE /* && TYPE_DOMAIN (type) */)


Why is this commented out rather than removed in this version of the
patch?  Let's remove it, as before.  OK with that change.


It was commented out by accident.

Since c++/71147 is a regression, should I also backport the patch
to the 6.x branch?

Martin



Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread H.J. Lu
On Tue, May 24, 2016 at 12:06 PM, H.J. Lu  wrote:
> On Tue, May 24, 2016 at 11:44 AM, Uros Bizjak  wrote:
>> On Tue, May 24, 2016 at 8:15 PM, Uros Bizjak  wrote:
>>> On Tue, May 24, 2016 at 7:18 PM, H.J. Lu  wrote:
>>>
> Oh, target_flags is only a 32bit integer :(. Is there a reason it
> can't be extended to HOST_WIDE_INT, as is the case with
> ix86_isa_flags?

 target_flags is generic, not target specific.  I want to limit my
 change to x86 backend and -mgeneral-regs-only doesn't need
 to use target_flags .
>>>
>>> I have thrown together a quick patch that defines target_flags as 
>>> HOST_WIDE_INT.
>>>
>>> (Patch still needs a small correction, so opth-gen.awk will emit
>>> HOST_WIDE_INT_1 for MASK_* defines, have to go now, but I was able to
>>> compile functional x86_64-apple-darwin15.5.0 crosscompiler.)
>>
>> And here is attached complete (but untested!!) patch that should "just
>> work"(TM).
>>
>
> -mgeneral-regs-only doesn't need to use target_flags and it shouldn't
> use target_flags.
>

Use target_flags won't hurt -mgeneral-regs-only.   I have no problem
with it.

-- 
H.J.


Re: C++ PATCH for c++/70584 (parenthesized argument to x86 builtin)

2016-05-24 Thread Jason Merrill

On 05/23/2016 02:58 PM, Jason Merrill wrote:

The C++14 decltype(auto) obfuscation was confusing the x86 builtin; it's
a simple matter to undo it during delayed folding, thanks to the
maybe_undo_parenthesized_ref function that Patrick recently introduced.


But using cp_fold_maybe_rvalue here is wrong, as it will mean 
unconditionally replacing a variable with its initializer.  Better to 
use plain cp_fold and improve cp_fold_maybe_rvalue to handle getting a 
decl back from cp_fold.


Tested x86_64-pc-linux-gnu, applying to trunk.




Re: New hashtable power 2 rehash policy

2016-05-24 Thread François Dumont

Attached patch applied then.

I had to regorganize things a little now that some pieces have been 
integrated in 71181 patch.


2016-05-24  François Dumont  

* include/bits/c++config (_GLIBCXX14_USE_CONSTEXPR): New.
* include/bits/hashtable_policy.h
(_Prime_rehash_policy::__has_load_factor): New. Mark rehash policy
having load factor management.
(_Mask_range_hashing): New.
(__clp2): New.
(_Power2_rehash_policy): New.
(_Inserts<>): Remove last template parameter, _Unique_keys, so that
partial specializations only depend on whether iterators are constant
or not.
* testsuite/23_containers/unordered_set/hash_policy/26132.cc: Adapt to
test new hash policy.
* testsuite/23_containers/unordered_set/hash_policy/load_factor.cc:
Likewise.
* testsuite/23_containers/unordered_set/hash_policy/rehash.cc:
Likewise.
* testsuite/23_containers/unordered_set/insert/hash_policy.cc:
Likewise.
* testsuite/23_containers/unordered_set/max_load_factor/robustness.cc:
Likewise.
* testsuite/23_containers/unordered_set/hash_policy/power2_rehash.cc:
New.
* testsuite/performance/23_containers/insert/54075.cc: Add benchmark
using the new hash policy.
* testsuite/performance/23_containers/insert_erase/41975.cc: Likewise.

François

On 23/05/2016 13:31, Jonathan Wakely wrote:

On 17/05/16 22:28 +0200, François Dumont wrote:

On 14/05/2016 19:06, Daniel Krügler wrote:

1) The function __clp2 is declared using _GLIBCXX14_CONSTEXPR, which
means that it is an inline function if and *only* if
_GLIBCXX14_CONSTEXPR really expands to constexpr, otherwise it is
*not* inline, which is probably not intended and could easily cause
ODR problems. I suggest to mark it unconditionally as inline,
regardless of _GLIBCXX14_CONSTEXPR.


Maybe _GLIBCXX14_CONSTEXPR should take inline value previous to C++14 
mode.


That's probably a good idea.


For the moment I simply added the inline as done in other situations.


OK, thanks.



2) Furthermore I suggest to declare __clp2 as noexcept - this is
(intentionally) *not* implied by constexpr.

3) Is there any reason, why _Power2_rehash_policy::_M_next_bkt
shouldn't be noexcept?

4) Similar to (3) for _Power2_rehash_policy's member functions
_M_bkt_for_elements, _M_need_rehash, _M_state, _M_reset
For noexcept I throught we were only adding it if necessary. We might 
have to go through a lot of code to find all places where noexcept 
could be added. Jonathan will give his feedback.


I'm in favour of adding it anywhere that that definitely can't throw.
We don't *need* to do that everywhere, but it doesn't hurt.


For the moment I have added it on all those methods.


Great.


Thanks for feedback, updated and tested patch attached.


OK for trunk - thanks!




Index: include/bits/c++config
===
--- include/bits/c++config	(revision 236662)
+++ include/bits/c++config	(working copy)
@@ -106,8 +106,10 @@
 #ifndef _GLIBCXX14_CONSTEXPR
 # if __cplusplus >= 201402L
 #  define _GLIBCXX14_CONSTEXPR constexpr
+#  define _GLIBCXX14_USE_CONSTEXPR constexpr
 # else
 #  define _GLIBCXX14_CONSTEXPR
+#  define _GLIBCXX14_USE_CONSTEXPR const
 # endif
 #endif
 
Index: include/bits/hashtable_policy.h
===
--- include/bits/hashtable_policy.h	(revision 236662)
+++ include/bits/hashtable_policy.h	(working copy)
@@ -31,6 +31,8 @@
 #ifndef _HASHTABLE_POLICY_H
 #define _HASHTABLE_POLICY_H 1
 
+#include  // for std::min.
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -457,6 +459,8 @@
   /// smallest prime that keeps the load factor small enough.
   struct _Prime_rehash_policy
   {
+using __has_load_factor = std::true_type;
+
 _Prime_rehash_policy(float __z = 1.0) noexcept
 : _M_max_load_factor(__z), _M_next_resize(0) { }
 
@@ -501,6 +505,135 @@
 mutable std::size_t	_M_next_resize;
   };
 
+  /// Range hashing function assuming that second arg is a power of 2.
+  struct _Mask_range_hashing
+  {
+typedef std::size_t first_argument_type;
+typedef std::size_t second_argument_type;
+typedef std::size_t result_type;
+
+result_type
+operator()(first_argument_type __num,
+	   second_argument_type __den) const noexcept
+{ return __num & (__den - 1); }
+  };
+
+  /// Compute closest power of 2.
+  _GLIBCXX14_CONSTEXPR
+  inline std::size_t
+  __clp2(std::size_t n) noexcept
+  {
+#if __SIZEOF_SIZE_T__ >= 8
+std::uint_fast64_t x = n;
+#else
+std::uint_fast32_t x = n;
+#endif
+// Algorithm from Hacker's Delight, Figure 3-3.
+x = x - 1;
+x = x | (x >> 1);
+x = x | (x >> 2);
+x = x | (x >> 4);
+x = x | (x >> 8);
+x = x | (x >>16);
+#if __SIZEOF_SIZE_T__ >= 8
+x = x | (x >>32);
+#endif
+return x + 1;
+  }
+
+  /// Rehash policy providing power of 2 bucket numbers. Avoids modulo
+  /// operations.
+  struct _Powe

[PATCH], Add PowerPC ISA 3.0 vector count trailing zeros and vector parity support

2016-05-24 Thread Michael Meissner
This patch adds support for two sets of new instructions in ISA 3.0, vector
count trailing zeros, and vector parity.  In addition, it defines many of the
support macros that will be used by other built-in functions that will be added
shortly.

I have bootstraped this and there were no regressions.  Is it ok to apply to
the trunk?  Assuming it is ok to apply to the trunk, is it ok to back port to
the GCC 6.2 branch?

[gcc]
2016-05-24  Michael Meissner  

* config/rs6000/altivec.md (VParity): New mode iterator for vector
parity built-in functions.
(p9v_ctz2): Add support for ISA 3.0 vector count trailing
zeros.
(p9v_parity2): Likewise.
* config/rs6000/vector.md (VEC_IP): New mode iterator for vector
parity.
(ctz2): ISA 3.0 expander for vector count trailing zeros.
(parity2): ISA 3.0 expander for vector parity.
* config/rs6000/rs6000-builtin.def (BU_P9_MISC_1): New macros for
power9 built-ins.
(BU_P9_64BIT_MISC_0): Likewise.
(BU_P9_MISC_0): Likewise.
(BU_P9V_AV_1): Likewise.
(BU_P9V_AV_2): Likewise.
(BU_P9V_AV_3): Likewise.
(BU_P9V_AV_P): Likewise.
(BU_P9V_VSX_1): Likewise.
(BU_P9V_OVERLOAD_1): Likewise.
(BU_P9V_OVERLOAD_2): Likewise.
(BU_P9V_OVERLOAD_3): Likewise.
(VCTZB): Add vector count trailing zeros support.
(VCTZH): Likewise.
(VCTZW): Likewise.
(VCTZD): Likewise.
(VPRTYBD): Add vector parity support.
(VPRTYBQ): Likewise.
(VPRTYBW): Likewise.
(VCTZ): Add overloaded vector count trailing zeros support.
(VPRTYB): Add overloaded vector parity support.
* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
overloaded vector count trailing zeros and parity instructions.
* config/rs6000/rs6000.md (wd mode attribute): Add V1TI and TI for
vector parity support.
* config/rs6000/altivec.h (vec_vctz): Add ISA 3.0 vector count
trailing zeros support.
(vec_cntlz): Likewise.
(vec_vctzb): Likewise.
(vec_vctzd): Likewise.
(vec_vctzh): Likewise.
(vec_vctzw): Likewise.
(vec_vprtyb): Add ISA 3.0 vector parity support.
(vec_vprtybd): Likewise.
(vec_vprtybw): Likewise.
(vec_vprtybq): Likewise.
* doc/extend.texi (PowerPC AltiVec Built-in Functions): Document
the ISA 3.0 vector count trailing zeros and vector parity built-in
functions.

[gcc/testsuite]
2016-05-24  Michael Meissner  

* gcc.target/powerpc/p9-vparity.c: New file to check SIA 3.0
vector parity built-in functions.
* gcc.target/powerpc/ctz-3.c: New file to check ISA 3.0 vector
count trailing zeros automatic vectorization.
* gcc.target/powerpc/ctz-4.c: New file to check ISA 3.0 vector
count trailing zeros built-in functions.



-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 236663)
+++ gcc/config/rs6000/altivec.md(.../gcc/config/rs6000) (working copy)
@@ -193,6 +193,13 @@ (define_mode_iterator VM2 [V4SI
   (KF "FLOAT128_VECTOR_P (KFmode)")
   (TF "FLOAT128_VECTOR_P (TFmode)")])
 
+;; Specific iterator for parity which does not have a byte/half-word form, but
+;; does have a quad word form
+(define_mode_iterator VParity [V4SI
+  V2DI
+  V1TI
+  (TI "TARGET_VSX_TIMODE")])
+
 (define_mode_attr VI_char [(V2DI "d") (V4SI "w") (V8HI "h") (V16QI "b")])
 (define_mode_attr VI_scalar [(V2DI "DI") (V4SI "SI") (V8HI "HI") (V16QI "QI")])
 (define_mode_attr VI_unit [(V16QI "VECTOR_UNIT_ALTIVEC_P (V16QImode)")
@@ -3415,7 +3422,7 @@ (define_expand "vec_unpacku_float_lo_v8h
 }")
 
 
-;; Power8 vector instructions encoded as Altivec instructions
+;; Power8/power9 vector instructions encoded as Altivec instructions
 
 ;; Vector count leading zeros
 (define_insn "*p8v_clz2"
@@ -3426,6 +3433,15 @@ (define_insn "*p8v_clz2"
   [(set_attr "length" "4")
(set_attr "type" "vecsimple")])
 
+;; Vector count trailing zeros
+(define_insn "*p9v_ctz2"
+  [(set (match_operand:VI2 0 "register_operand" "=v")
+   (ctz:VI2 (match_operand:VI2 1 "register_operand" "v")))]
+  "TARGET_P9_VECTOR"
+  "vctz %0,%1"
+  [(set_attr "length" "4")
+   (set_attr "type" "vecsimple")])
+
 ;; Vector population count
 (define_insn "*p8v_popcount2"
   [(set (match_operand:VI2 0 "register_operand" "=v")
@@ -3435,6 +3451,15 @@ (define_insn "*p8v_popcount2"
   [(set_attr "length" "4")
(set_attr "type" "vecsimple")])
 

Re: [PATCH] c++/71147 - [6 Regression] Flexible array member wrongly rejected in template

2016-05-24 Thread Jason Merrill

On 05/24/2016 04:43 PM, Martin Sebor wrote:

On 05/24/2016 12:51 PM, Jason Merrill wrote:

On 05/24/2016 12:15 PM, Martin Sebor wrote:

+  else if (TREE_CODE (type) == ARRAY_TYPE /* && TYPE_DOMAIN (type) */)


Why is this commented out rather than removed in this version of the
patch?  Let's remove it, as before.  OK with that change.


It was commented out by accident.

Since c++/71147 is a regression, should I also backport the patch
to the 6.x branch?


OK.

Jason




Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack)

2016-05-24 Thread Alexander Monakov
On Fri, 20 May 2016, Nathan Sidwell wrote:
> ah,  that's much more understandable,  thanks.  Presumably this doesn't
> support worker-single mode (in OpenACC parlance, I don't know what the OpenMP
> version of that is?)

I don't see why you have concerns.  In OpenMP, what OpenACC calls
'worker-single mode' should correspond to execution of a sequential region
(outside of any 'parallel' region). The region is executed by the initial
thread (warp 0), while other warps, having formed a thread pool, are suspended
on that thread pool's barrier.  When the initial thread reaches the parallel
region, it unblocks the warps in the pool.  The other warps may need data that
is allocated on warp 0 stack, so here it's essential that soft-stacks can
exist on global memory and thus be world-readable.

> And neither would it support calls  from vector-partitioned code (I think
> that's SIMD in OpenMP-land?).

Actually it would: the plan is to switch soft-stack pointer to a region of
.local memory when entering OpenMP SIMD region.  This makes soft-stacks use
lane-private storage inside of SIMD regions (but then it's, of course, no
longer world-readable and not modifiable by atomics).

> It seems like we should reject the combination of -msoft-stack -fopenacc?

Possibly; the doc text makes it explicit that the option is exposed only for
the purpose of testing the compiler, anyway.

> why so many changelogs?  The on-branch development history is irrelvant for
> trunk -- the usual single changelog style should be followed.

OK, if branch history is not interesting for review, I can squash it; I'll
have to do that for the final commit anyway.

> > +  else if (need_frameptr || cfun->machine->has_varadic ||
> > cfun->calls_alloca)
> > +{
> > +  /* Maintain 64-bit stack alignment.  */
> 
> This block needs a more descriptive comment -- it appears to be doing a great
> deal more than maintaining 64-bit stack alignment!

The comment is just for the line that follows, not the whole block.

> > +  int keep_align = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
> > +  sz = ROUND_UP (sz, keep_align);
> > +  int bits = POINTER_SIZE;
> > +  fprintf (file, "\t.reg.u%d %%frame;\n", bits);
> > +  fprintf (file, "\t.reg.u32 %%fstmp0;\n");
> > +  fprintf (file, "\t.reg.u%d %%fstmp1;\n", bits);
> > +  fprintf (file, "\t.reg.u%d %%fstmp2;\n", bits);
> 
> Some of these register names appear to be long lived -- and referenced in
> other functions.  It would be better to give those more descriptive names, or
> even give them hard-regs.

That's just %fstmp2 (pointer into __nvptx_stacks) and %fstmp1 (previous stack
pointer that we need to restore). I can rename them to %ssloc and %ssold
(better names welcome), but I don't see a value in making them hard-regs --
there's no interface with the middle-end that would be interested in those.

> You should  certainly  do so for those that are already hard regs (%frame &
> %stack)

Sorry, do what? They are already hard regs, and have descriptive names.

> -- is it more feasible to augment init_frame to initialize them?

I don't think so. The whole block could be moved to a separate function though.

>   Since ptx is a virtual target, we just define a few
> > hard registers for special purposes and leave pseudos unallocated.
> > @@ -200,6 +205,7 @@ struct GTY(()) machine_function
> >bool is_varadic;  /* This call is varadic  */
> >bool has_varadic;  /* Current function has a varadic call.  */
> >bool has_chain; /* Current function has outgoing static chain.  */
> > +  bool using_softstack; /* Need to restore __nvptx_stacks[tid.y].  */
> 
> Comment should describe what the attribute is, not what it implies.  In this
> case I think it's /*  Current function has   a soft stack frame.  */

Yes; note it's false when current function is leaf, so the description should
be more like "Current function has a soft stack frame that needs restoring".

> > diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
> > index 33a4862..e5650b6 100644
> > --- a/gcc/config/nvptx/nvptx.md
> > +++ b/gcc/config/nvptx/nvptx.md
> 
> 
> > +(define_insn "set_softstack_insn"
> > +  [(unspec [(match_operand 0 "nvptx_register_operand" "R")] UNSPEC_ALLOCA)]
> > +  "TARGET_SOFT_STACK"
> > +{
> > +  return (cfun->machine->using_softstack
> > + ? "%.\\tst.shared%t0\\t[%%fstmp2], %0;"
> > + : "");
> > +})
> 
> Is this alloca related (UNSPEC_ALLOCA) or restore related (invoked in
> restore_stack_block), or stack setting (as insn name suggests).  Things seem
> inconsistently named.  Comments would be good.

OK, I'll add some in a respin. This is related to stack setting. I can add a
new UNSPEC for that (UNSPEC_SET_SOFTSTACK).

> >
> >  (define_expand "restore_stack_block"
> >[(match_operand 0 "register_operand" "")
> >(match_operand 1 "register_operand" "")]
> >""
> > {
> > +  if (TARGET_SOFT_STACK)
> > +{
> > +  emit_move_insn (operands[0], operands[1]);
> > +  emi

Re: [PATCH], Add PowerPC ISA 3.0 vector count trailing zeros and vector parity support

2016-05-24 Thread Segher Boessenkool
On Tue, May 24, 2016 at 05:05:14PM -0400, Michael Meissner wrote:
> This patch adds support for two sets of new instructions in ISA 3.0, vector
> count trailing zeros, and vector parity.  In addition, it defines many of the
> support macros that will be used by other built-in functions that will be 
> added
> shortly.
> 
> I have bootstraped this and there were no regressions.  Is it ok to apply to
> the trunk?  Assuming it is ok to apply to the trunk, is it ok to back port to
> the GCC 6.2 branch?

Okay for trunk.  Okay for 6 after a week or so.  A few typoes...

> [gcc/testsuite]
> 2016-05-24  Michael Meissner  
> 
>   * gcc.target/powerpc/p9-vparity.c: New file to check SIA 3.0
>   vector parity built-in functions.

Typo (ISA).

> +/* Miscellaneous builtins for instructions added in ISA 3.0.  These
> +   instructions don't require either the DFP or VSX options, just the basic 

Trailing space (multiple times).

> +If the ISA 3.00 additions to the vector/scalar (power9-vector)
> +instruction set are available:

3.0 (multiple times).

Thanks,


Segher


Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-24 Thread Joseph Myers
On Tue, 24 May 2016, Uros Bizjak wrote:

> > I have thrown together a quick patch that defines target_flags as 
> > HOST_WIDE_INT.
> >
> > (Patch still needs a small correction, so opth-gen.awk will emit
> > HOST_WIDE_INT_1 for MASK_* defines, have to go now, but I was able to
> > compile functional x86_64-apple-darwin15.5.0 crosscompiler.)
> 
> And here is attached complete (but untested!!) patch that should "just
> work"(TM).

Have you made sure that cl_host_wide_int gets set for options in 
target_flags, so that get_option_state, option_enabled etc. work correctly 
with such options?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH], Add support for PowerPC ISA 3.0 VNEGD/VNEGW instructions

2016-05-24 Thread Segher Boessenkool
On Wed, May 18, 2016 at 02:30:31PM -0400, Michael Meissner wrote:
> Unlike some of my patches, this is a fairly simple patch to add support for 
> the
> VNEGW and VNEGD instructions that were added in ISA 3.0.  Note, ISA 3.0 does
> not provide negation for V16QImode/V8HImode, just V4SImode/V2DImode.
> 
> I discovered that when we added ISA 2.07 support for V2DImode, we didn't
> provide an expander for negv2di2, which I added with this patch.
> 
> [gcc]
> 2016-05-18  Michael Meissner  
> 
>   * config/rs6000/altivec.md (VNEG iterator): New iterator for
>   VNEGW/VNEGD instructions.
>   (p9_neg2): New insns for ISA 3.0 VNEGW/VNEGD.
>   (neg2): Add expander for V2DImode added in ISA 2.06, and
>   support for ISA 3.0 VNEGW/VNEGD instructions.
> 
> [gcc/testsuite]
> 2016-05-18  Michael Meissner  
> 
>   * gcc.target/powerpc/p9-vneg.c: New test for ISA 3.0 VNEGW/VNEGD
>   instructions.

I forgot to review this patch, sorry.

> Index: gcc/config/rs6000/altivec.md
> ===
> --- gcc/config/rs6000/altivec.md  
> (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
> (revision 236398)
> +++ gcc/config/rs6000/altivec.md  (.../gcc/config/rs6000) (working copy)
> @@ -203,6 +203,9 @@ (define_mode_attr VP_small [(V2DI "V4SI"
>  (define_mode_attr VP_small_lc [(V2DI "v4si") (V4SI "v8hi") (V8HI "v16qi")])
>  (define_mode_attr VU_char [(V2DI "w") (V4SI "h") (V8HI "b")])
>  
> n+;; Vector negate
> +(define_mode_iterator VNEG [V4SI V2DI])

Your patch is mangled here, but you'll find out (it won't apply this way).

>  (define_expand "neg2"
> -  [(use (match_operand:VI 0 "register_operand" ""))
> -   (use (match_operand:VI 1 "register_operand" ""))]
> -  "TARGET_ALTIVEC"
> +  [(set (match_operand:VI2 0 "register_operand" "")
> + (neg:VI2 (match_operand:VI2 1 "register_operand" "")))]
> +  ""
>"
>  {
> -  rtx vzero;
> +  if (!TARGET_P9_VECTOR || (mode != V4SImode && mode != 
> V2DImode))
> +{
> +  rtx vzero;
>  
> -  vzero = gen_reg_rtx (GET_MODE (operands[0]));
> -  emit_insn (gen_altivec_vspltis (vzero, const0_rtx));
> -  emit_insn (gen_sub3 (operands[0], vzero, operands[1])); 
> -  
> -  DONE;
> +  vzero = gen_reg_rtx (GET_MODE (operands[0]));
> +  emit_move_insn (vzero, CONST0_RTX (mode));
> +  emit_insn (gen_sub3 (operands[0], vzero, operands[1])); 
> +  DONE;
> +}
>  }")

Please remove the quotes around the C block as well, while you're here?
And a trailing space.

Okay for trunk, okay for 6 after a week or so.

Thanks,


Segher


[PATCH, rs6000 testsuite] PR71050, Fix lhs-1.c testcase

2016-05-24 Thread Pat Haugen
The following simplifies the given testcase so it is no longer sensitive to 
subreg (and hopefully other) codegen changes. Tested on powerpc64, ok for trunk?

-Pat


testsuite/ChangeLog:
2016-05-24  Pat Haugen  

PR target/71050
* gcc.target/powerpc/lhs-1.c: Fix testcase to avoid subreg changes.


Index: gcc/testsuite/gcc.target/powerpc/lhs-1.c
===
--- gcc/testsuite/gcc.target/powerpc/lhs-1.c(revision 236325)
+++ gcc/testsuite/gcc.target/powerpc/lhs-1.c(working copy)
@@ -4,19 +4,12 @@
 /* { dg-options "-O2 -mcpu=power5" } */
 /* { dg-final { scan-assembler-times "nop" 3 } } */
 
-/* Test generation of nops in load hit store situation.  */
+/* Test generation of nops in load hit store situation. Make sure enough nop 
insns are
+   generated to move the load to a new dispatch group. With the simple stw/lwz 
pair below,
+   that would be 3 nop insns for Power5.  */
 
-typedef union {
-  double val;
-  struct {
-unsigned int w1;
-unsigned int w2;
-  };
-} words;
-
-unsigned int f (double d, words *u)
+unsigned int f (volatile unsigned int *u, unsigned int u2)
 {
-  u->val = d;
-  return u->w2;
+  *u = u2;
+  return *u;
 }
-



Re: [PATCH, rs6000 testsuite] PR71050, Fix lhs-1.c testcase

2016-05-24 Thread Segher Boessenkool
On Tue, May 24, 2016 at 04:55:45PM -0500, Pat Haugen wrote:
> The following simplifies the given testcase so it is no longer sensitive to 
> subreg (and hopefully other) codegen changes. Tested on powerpc64, ok for 
> trunk?
> 
> -Pat
> 
> 
> testsuite/ChangeLog:
> 2016-05-24  Pat Haugen  
> 
> PR target/71050
> * gcc.target/powerpc/lhs-1.c: Fix testcase to avoid subreg changes.

It is okay for trunk.  One thing (well, two)...

> Index: gcc/testsuite/gcc.target/powerpc/lhs-1.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/lhs-1.c  (revision 236325)
> +++ gcc/testsuite/gcc.target/powerpc/lhs-1.c  (working copy)
> @@ -4,19 +4,12 @@
>  /* { dg-options "-O2 -mcpu=power5" } */
>  /* { dg-final { scan-assembler-times "nop" 3 } } */
>  
> -/* Test generation of nops in load hit store situation.  */
> +/* Test generation of nops in load hit store situation. Make sure enough nop 
> insns are
> +   generated to move the load to a new dispatch group. With the simple 
> stw/lwz pair below,
> +   that would be 3 nop insns for Power5.  */

Long lines; dot space space.

Thanks,


Segher


Re: C++ PATCH for c++/70735 (static locals and generic lambdas)

2016-05-24 Thread Paolo Carlini

Hi,

On 23/05/2016 21:01, Jason Merrill wrote:

+// PR c++/70735
+// { dg-do run { target c++1y } }
+

[...]

@@ -0,0 +1,19 @@
+// PR c++/70735
+// { dg-do run { target c++1y } }

I'm changing these c++1y to c++14.

Paolo.


Re: C++ PATCH for c++/70735 (static locals and generic lambdas)

2016-05-24 Thread Mike Stump
On May 24, 2016, at 3:35 PM, Paolo Carlini  wrote:
> On 23/05/2016 21:01, Jason Merrill wrote:
>> +// PR c++/70735
>> +// { dg-do run { target c++1y } }
>> +
> [...]
>> @@ -0,0 +1,19 @@
>> +// PR c++/70735
>> +// { dg-do run { target c++1y } }
> I'm changing these c++1y to c++14.

Thanks.  :-)  

I think:

  g++.dg/pr65295.C

can be updated to use c++14 as well.  It is the last one that needs updating.


Re: More backwards/FSM jump thread refactoring and extension

2016-05-24 Thread Trevor Saunders
On Tue, May 24, 2016 at 10:58:18AM -0600, Jeff Law wrote:
> --- a/gcc/tree-ssa-threadbackward.c
> +++ b/gcc/tree-ssa-threadbackward.c
> @@ -356,6 +356,44 @@ profitable_jump_thread_path (vec 
> *&path,
>return taken_edge;
>  }
>  
> +/* PATH is vector of blocks forming a jump threading path in reverse
> +   order.  TAKEN_EDGE is the edge taken from path[0].
> +
> +   Convert that path into the form used by register_jump_thread and
> +   register the path.   */
> +
> +static void
> +convert_and_register_jump_thread_path (vec *&path,

is there a reason that isn't vec * instead of
vec *&? It seems like that's just useless indirection, and
allowing this function to be able to change more than it needs.

> +edge taken_edge)
> +{
> +  vec *jump_thread_path = new vec ();

Its not new, but I'm always a little sad to see something that's only
sizeof(void *) big be malloced on its own.

Trev



Re: [PATCH][MIPS] Add -minline-intermix to ignore compression flags when inlining

2016-05-24 Thread Sandra Loosemore

On 05/24/2016 08:23 AM, Robert Suchanek wrote:



[snip]

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 73f1cb6..2f6195e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -837,6 +837,7 @@ Objective-C and Objective-C++ Dialects}.
  -mips16  -mno-mips16  -mflip-mips16 @gol
  -minterlink-compressed -mno-interlink-compressed @gol
  -minterlink-mips16  -mno-interlink-mips16 @gol
+-minline-intermix -mno-inline-intermix @gol


Funky indentation here


  -mabi=@var{abi}  -mabicalls  -mno-abicalls @gol
  -mshared  -mno-shared  -mplt  -mno-plt  -mxgot  -mno-xgot @gol
  -mgp32  -mgp64  -mfp32  -mfpxx  -mfp64  -mhard-float  -msoft-float @gol
@@ -17916,6 +17917,18 @@ Aliases of @option{-minterlink-compressed} and
  @option{-mno-interlink-compressed}.  These options predate the microMIPS ASE
  and are retained for backwards compatibility.

+@item -minline-intermix
+@itemx -mno-inline-intermix
+@opindex minline-intermix
+@opindex mno-inline-intermix
+Enable inlining of functions which have opposing compression flags e.g.
+@code{mips16}/@code{nomips16} attributes.
+This is useful when using the @code{mips16} attribute to balance code size
+and performance so that a function will be compressed when not inlined or
+vice-versa.  When using this option it is necessary to protect functions
+that cannot be compiled as MIPS16 with a @code{noinline} attribute to ensure
+they are not inlined into a MIPS16 function.


This flag applies to microMIPS inlining, too, right?  It's confusing to 
only mention MIPS16.


Maybe you could say something like this instead:

Allow inlining even if the compression flags differ between caller and 
callee.  This is useful in conjunction with the @code{mips16}, 
@code{micromips}, or @code{nocompression} function attributes.  The code 
for the inlined function is compiled using the compression flags for the 
callee, so you may need to use the @code{noinline} attribute on 
functions that must be compiled with particular compression settings.


-Sandra



Re: [PATCH][MIPS] Add support for code_readable function attribute

2016-05-24 Thread Sandra Loosemore

On 05/24/2016 08:25 AM, Robert Suchanek wrote:

[snip]

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e4d6c1c..dd23c70 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -4441,6 +4441,23 @@ On MIPS targets, you can use the @code{nocompression} 
function attribute
  to locally turn off MIPS16 and microMIPS code generation.  This attribute
  overrides the @option{-mips16} and @option{-mmicromips} options on the
  command line (@pxref{MIPS Options}).
+
+@item code_readable
+@cindex @code{code_readable} function attribute, MIPS
+For MIPS targets that support PC-relative addressing modes, this attribute
+can be used to control how an object is addressed.  The attribute takes
+a single optional argument:


The problem here is that we don't tell users that the argument has to be 
a string constant in quotes, and not just a token.


How about changing the above text to end with:

"...a single optional argument, which must be one of the following 
string constants:"


and then changing this to be @table @code and quoting the @item strings:


+
+@table @samp
+@item no
+The function should not read the instruction stream as data.
+@item yes
+The function can read the instruction stream as data.
+@item pcrel
+The function can read the instruction stream in a pc-relative mode.
+@end table
+


Then it'll be consistent with this:


+If there is no argument supplied, the default of @code{"yes"} applies.
  @end table

  @node MSP430 Function Attributes


-Sandra



Re: More backwards/FSM jump thread refactoring and extension

2016-05-24 Thread Jeff Law

On 05/24/2016 06:03 PM, Trevor Saunders wrote:

On Tue, May 24, 2016 at 10:58:18AM -0600, Jeff Law wrote:

--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -356,6 +356,44 @@ profitable_jump_thread_path (vec 
*&path,
   return taken_edge;
 }

+/* PATH is vector of blocks forming a jump threading path in reverse
+   order.  TAKEN_EDGE is the edge taken from path[0].
+
+   Convert that path into the form used by register_jump_thread and
+   register the path.   */
+
+static void
+convert_and_register_jump_thread_path (vec *&path,


is there a reason that isn't vec * instead of
vec *&? It seems like that's just useless indirection, and
allowing this function to be able to change more than it needs.
I didn't try to clean up anything of that nature.  It's a good follow-up 
item though.  Thanks for pointing it out.





+  edge taken_edge)
+{
+  vec *jump_thread_path = new vec ();


Its not new, but I'm always a little sad to see something that's only
sizeof(void *) big be malloced on its own.
I wouldn't be terribly surprised if the backwards/FSM threader drops the 
jump_thread_edge representation after I pull it out of the main threader 
into its own pass.


jeff


RE: [Patch V2] Fix SLP PR58135.

2016-05-24 Thread Kumar, Venkataramanan
Hi Christophe, 

> -Original Message-
> From: Christophe Lyon [mailto:christophe.l...@linaro.org]
> Sent: Tuesday, May 24, 2016 8:45 PM
> To: Kumar, Venkataramanan 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: Re: [Patch V2] Fix SLP PR58135.
> 
> Hi Venkat,
> 
> 
> On 23 May 2016 at 11:54, Kumar, Venkataramanan
>  wrote:
> > Hi Richard,
> >
> >> -Original Message-
> >> From: Richard Biener [mailto:richard.guent...@gmail.com]
> >> Sent: Thursday, May 19, 2016 4:08 PM
> >> To: Kumar, Venkataramanan 
> >> Cc: gcc-patches@gcc.gnu.org
> >> Subject: Re: [Patch V2] Fix SLP PR58135.
> >>
> >> On Wed, May 18, 2016 at 5:29 PM, Kumar, Venkataramanan
> >>  wrote:
> >> > Hi Richard,
> >> >
> >> >> -Original Message-
> >> >> From: Richard Biener [mailto:richard.guent...@gmail.com]
> >> >> Sent: Tuesday, May 17, 2016 5:40 PM
> >> >> To: Kumar, Venkataramanan 
> >> >> Cc: gcc-patches@gcc.gnu.org
> >> >> Subject: Re: [Patch V2] Fix SLP PR58135.
> >> >>
> >> >> On Tue, May 17, 2016 at 1:56 PM, Kumar, Venkataramanan
> >> >>  wrote:
> >> >> > Hi Richard,
> >> >> >
> >> >> > I created the patch by passing -b option to git. Now the patch
> >> >> > is more
> >> >> readable.
> >> >> >
> >> >> > As per your suggestion I tried to fix the PR by splitting the
> >> >> > SLP store group at
> >> >> vector boundary after the SLP tree is built.
> >> >> >
> >> >> > Boot strap PASSED on x86_64.
> >> >> > Checked the patch with check_GNU_style.sh.
> >> >> >
> >> >> > The gfortran.dg/pr46519-1.f test now does SLP vectorization.
> >> >> > Hence it
> >> >> generated 2 more vzeroupper.
> >> >> > As recommended I adjusted the test case by adding
> >> >> > -fno-tree-slp-vectorize
> >> >> to make it as expected after loop vectorization.
> >> >> >
> >> >> > The following tests are now passing.
> >> >> >
> >> >> > -- Snip-
> >> >> > Tests that now work, but didn't before:
> >> >> >
> >> >> > gcc.dg/vect/bb-slp-19.c -flto -ffat-lto-objects
> >> >> > scan-tree-dump-times
> >> >> > slp2 "basic block vectorized" 1
> >> >> >
> >> >> > gcc.dg/vect/bb-slp-19.c scan-tree-dump-times slp2 "basic block
> >> >> > vectorized" 1
> >> >> >
> >> >> > New tests that PASS:
> >> >> >
> >> >> > gcc.dg/vect/pr58135.c (test for excess errors)
> >> >> > gcc.dg/vect/pr58135.c -flto -ffat-lto-objects (test for excess
> >> >> > errors)
> >> >> >
> >> >> > -- Snip-
> >> >> >
> >> >> > ChangeLog
> >> >> >
> >> >> > 2016-05-14  Venkataramanan Kumar
> >> >> 
> >> >> >  PR tree-optimization/58135
> >> >> > * tree-vect-slp.c:  When group size is not multiple of vector 
> >> >> > size,
> >> >> >  allow splitting of store group at vector boundary.
> >> >> >
> >> >> > Test suite  ChangeLog
> >> >> > 2016-05-14  Venkataramanan Kumar
> >> >> 
> >> >> > * gcc.dg/vect/bb-slp-19.c:  Remove XFAIL.
> >> >> > * gcc.dg/vect/pr58135.c:  Add new.
> >> >> > * gfortran.dg/pr46519-1.f: Adjust test case.
> >> >> >
> >> >> > The attached patch Ok for trunk?
> >> >>
> >> >>
> >> >> Please avoid the excessive vertical space around the
> >> >> vect_build_slp_tree
> >> call.
> >> > Yes fixed in the attached patch.
> >> >>
> >> >> +  /* Calculate the unrolling factor.  */
> >> >> +  unrolling_factor = least_common_multiple
> >> >> + (nunits, group_size) / group_size;
> >> >> ...
> >> >> +  else
> >> >> {
> >> >>   /* Calculate the unrolling factor based on the smallest type. 
> >> >>  */
> >> >>   if (max_nunits > nunits)
> >> >> -unrolling_factor = least_common_multiple (max_nunits,
> group_size)
> >> >> -   / group_size;
> >> >> +   unrolling_factor
> >> >> +   = least_common_multiple (max_nunits,
> >> >> + group_size)/group_size;
> >> >>
> >> >> please compute the "correct" unroll factor immediately and move
> >> >> the "unrolling of BB required" error into the if() case by
> >> >> post-poning the nunits < group_size check (and use max_nunits here).
> >> >>
> >> > Yes fixed in the attached patch.
> >> >
> >> >> +  if (is_a  (vinfo)
> >> >> + && nunits < group_size
> >> >> + && unrolling_factor != 1
> >> >> + && is_a  (vinfo))
> >> >> +   {
> >> >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >> >> +  "Build SLP failed: store group "
> >> >> +  "size not a multiple of the vector size "
> >> >> +  "in basic block SLP\n");
> >> >> + /* Fatal mismatch.  */
> >> >> + matches[nunits] = false;
> >> >>
> >> >> this is too pessimistic - you want to add the extra 'false' at
> >> >> group_size / max_nunits * max_nunits.
> >> > Yes fixed in attached patch.
> >> >
> >> >>
> >> >> It looks like you leak 'node' in the if () path as well.  You need
> >> >>
> >> >>   vect_free_slp_tree (node);
> >> >>   loads.release ();
> >> >>
> >> >> thus treat it as a failure case.
> >> >

Re: [Patch] Implement is_[nothrow_]swappable (p0185r1)

2016-05-24 Thread Daniel Krügler
2016-05-23 13:50 GMT+02:00 Jonathan Wakely :
> On 17/05/16 20:39 +0200, Daniel Krügler wrote:
>>
>> This is an implementation of the Standard is_swappable traits according to
>>
>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0185r1.html
>>
>> During that work it has been found that std::array's member swap's
>> exception
>> specification for zero-size arrays was incorrectly depending on the
>> value_type
>> and that was fixed as well.
>
> This looks good to me, I'll get it committed (with some adjustment to
> the ChangeLog format) - thanks.

Unfortunately I need to withdraw the suggested patch. Besides some
obvious errors there are issues that require me to get the testsuite
run on my Windows system, which had not yet succeeded.

I would appreciate, if anyone who has succeeded to run the test suite
on a Windows system (preferably mingw), could contact me off-list.

Thanks,

- Daniel


-- 


SavedURI :Show URLShow URLSavedURI :
SavedURI :Hide URLHide URLSavedURI :
https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154



Re: libgomp: In OpenACC testing, cycle though $offload_targets, and by default only build for the offload target that we're actually going to test

2016-05-24 Thread Thomas Schwinge
Hi!

Ping...

On Wed, 18 May 2016 13:41:25 +0200, I wrote:
> Ping.
> 
> On Wed, 11 May 2016 15:45:13 +0200, I wrote:
> > Ping.
> > 
> > On Mon, 02 May 2016 11:54:27 +0200, I wrote:
> > > On Fri, 29 Apr 2016 09:43:41 +0200, Jakub Jelinek  
> > > wrote:
> > > > On Thu, Apr 28, 2016 at 12:43:43PM +0200, Thomas Schwinge wrote:
> > > > > commit 3b521f3e35fdb4b320e95b5f6a82b8d89399481a
> > > > > Author: Thomas Schwinge 
> > > > > Date:   Thu Apr 21 11:36:39 2016 +0200
> > > > > 
> > > > > libgomp: Unconfuse offload plugins vs. offload targets
> > > > 
> > > > I don't like this patch at all, rather than unconfusing stuff it
> > > > makes stuff confusing.  Plugins are just a way to support various
> > > > offloading targets.
> > > 
> > > Huh; my patch exactly clarifies that the offload_targets variable does
> > > not actually list offload target names, but does list libgomp offload
> > > plugin names...
> > > 
> > > > Can you please post just a short patch without all those changes
> > > > that does what you want, rather than renaming everything at the same 
> > > > time?
> > > 
> > > I thought incremental, self-contained patches were easier to review.
> > > Anyway, here's the three patches merged into one:
> > > 
> > > commit 8060ae3474072eef685381d80f566d1c0942c603
> > > Author: Thomas Schwinge 
> > > Date:   Thu Apr 21 11:36:39 2016 +0200
> > > 
> > > libgomp: In OpenACC testing, cycle though $offload_targets, and by 
> > > default only build for the offload target that we're actually going to 
> > > test
> > > 
> > >   libgomp/
> > >   * plugin/configfrag.ac (offload_targets): Actually enumerate
> > >   offload targets, and add...
> > >   (offload_plugins): ... this one to enumerate offload plugins.
> > >   (OFFLOAD_PLUGINS): Renamed from OFFLOAD_TARGETS.
> > >   * target.c (gomp_target_init): Adjust to that.
> > >   * testsuite/lib/libgomp.exp: Likewise.
> > >   (offload_targets_s, offload_targets_s_openacc): Remove 
> > > variables.
> > >   (offload_target_to_openacc_device_type): New proc.
> > >   (check_effective_target_openacc_nvidia_accel_selected)
> > >   (check_effective_target_openacc_host_selected): Examine
> > >   $openacc_device_type instead of $offload_target_openacc.
> > >   * Makefile.in: Regenerate.
> > >   * config.h.in: Likewise.
> > >   * configure: Likewise.
> > >   * testsuite/Makefile.in: Likewise.
> > >   * testsuite/libgomp.oacc-c++/c++.exp: Cycle through
> > >   $offload_targets (plus "disable") instead of
> > >   $offload_targets_s_openacc, and add "-foffload=$offload_target" 
> > > to
> > >   tagopt.
> > >   * testsuite/libgomp.oacc-c/c.exp: Likewise.
> > >   * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
> > > ---
> > >  libgomp/Makefile.in|  1 +
> > >  libgomp/config.h.in|  4 +-
> > >  libgomp/configure  | 44 +++--
> > >  libgomp/plugin/configfrag.ac   | 39 +++-
> > >  libgomp/target.c   |  8 +--
> > >  libgomp/testsuite/Makefile.in  |  1 +
> > >  libgomp/testsuite/lib/libgomp.exp  | 72 
> > > ++
> > >  libgomp/testsuite/libgomp.oacc-c++/c++.exp | 30 +
> > >  libgomp/testsuite/libgomp.oacc-c/c.exp | 30 +
> > >  libgomp/testsuite/libgomp.oacc-fortran/fortran.exp | 22 ---
> > >  10 files changed, 142 insertions(+), 109 deletions(-)
> > > 
> > > diff --git libgomp/Makefile.in libgomp/Makefile.in
> > > [snipped]
> > > diff --git libgomp/config.h.in libgomp/config.h.in
> > > [snipped]
> > > diff --git libgomp/configure libgomp/configure
> > > [snipped]
> > > diff --git libgomp/plugin/configfrag.ac libgomp/plugin/configfrag.ac
> > > index 88b4156..de0a6f6 100644
> > > --- libgomp/plugin/configfrag.ac
> > > +++ libgomp/plugin/configfrag.ac
> > > @@ -26,8 +26,6 @@
> > >  # see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> > >  # .
> > >  
> > > -offload_targets=
> > > -AC_SUBST(offload_targets)
> > >  plugin_support=yes
> > >  AC_CHECK_LIB(dl, dlsym, , [plugin_support=no])
> > >  if test x"$plugin_support" = xyes; then
> > > @@ -142,7 +140,13 @@ AC_SUBST(PLUGIN_HSA_LIBS)
> > >  
> > >  
> > >  
> > > -# Get offload targets and path to install tree of offloading compiler.
> > > +# Parse offload targets, and figure out libgomp plugin, and configure the
> > > +# corresponding offload compiler.  offload_plugins and offload_targets 
> > > will be
> > > +# populated in the same order.
> > > +offload_plugins=
> > > +offload_targets=
> > > +AC_SUBST(offload_plugins)
> > > +AC_SUBST(offload_targets)
> > >  offload_additional_options=
> > >  offload_additional_lib_paths=
> > >  AC_SUBST(offload_addit

Re: [PATCH] Fix up Yr constraint

2016-05-24 Thread Uros Bizjak
On Tue, May 24, 2016 at 9:02 PM, Jakub Jelinek  wrote:
> On Tue, May 24, 2016 at 08:35:12PM +0200, Uros Bizjak wrote:
>> On Tue, May 24, 2016 at 6:55 PM, Jakub Jelinek  wrote:
>> > Hi!
>> >
>> > The Yr constraint contrary to what has been said when it has been submitted
>> > actually is always NO_REX_SSE_REGS or NO_REGS, never ALL_SSE_REGS, so
>> > the RA restriction to only the first 8 regs is done no matter what we tune
>> > for.
>> >
>> > This is because we test X86_TUNE_AVOID_4BYTE_PREFIXES, which is an enum
>> > value (59), rather than actually checking if the tune flag.
>> >
>> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>> >
>> > 2016-05-24  Jakub Jelinek  
>> >
>> > * config/i386/i386.h (TARGET_AVOID_4BYTE_PREFIXES): Define.
>> > * config/i386/constraints.md (Yr): Test TARGET_AVOID_4BYTE_PREFIXES
>> > rather than X86_TUNE_AVOID_4BYTE_PREFIXES.
>>
>> Uh, another brown-paper bag bug...
>>
>> OK everywhere.
>
> I fear it might be too dangerous for -mavx512* for the branches; I went
> through all the Yr uses on the trunk, but not on the branches.
> Would you be ok with using
> "TARGET_SSE ? (TARGET_AVOID_4BYTE_PREFIXES ? NO_REX_SSE_REGS : SSE_REGS) : 
> NO_REGS"
> on the branches instead?
> Or I guess we could use it on the trunk too, it should make no difference 
> there
> (because on the trunk it is only used when !TARGET_AVX).
> Or maybe even
> "TARGET_SSE ? ((TARGET_AVOID_4BYTE_PREFIXES && !TARGET_AVX) ? NO_REX_SSE_REGS 
> : SSE_REGS) : NO_REGS"
> (again, should make zero difference on the trunk, but might be better for
> the branches).

Indeed, let's play safe and go with the later version on branches.
Please also add a small comment, to avoid head-scratching in the
future.

Uros.


Re: Splitting up gcc/omp-low.c?

2016-05-24 Thread Thomas Schwinge
Hi!

Ping.

Given that we conceptually agreed about this task, but apparently nobody
is now interested in reviewing my proposed changes (and tells me how
they'd like me to submit the patch for review), should I maybe just
execute the steps?

On Wed, 18 May 2016 13:42:37 +0200, Thomas Schwinge  
wrote:
> Ping.
> 
> On Wed, 11 May 2016 15:44:14 +0200, I wrote:
> > Ping.
> > 
> > On Tue, 03 May 2016 11:34:39 +0200, I wrote:
> > > On Wed, 13 Apr 2016 18:01:09 +0200, I wrote:
> > > > On Fri, 08 Apr 2016 11:36:03 +0200, I wrote:
> > > > > On Thu, 10 Dec 2015 09:08:35 +0100, Jakub Jelinek  
> > > > > wrote:
> > > > > > On Wed, Dec 09, 2015 at 06:23:22PM +0100, Bernd Schmidt wrote:
> > > > > > > On 12/09/2015 05:24 PM, Thomas Schwinge wrote:
> > > > > > > >how about we split up gcc/omp-low.c into several
> > > > > > > >files?  Would it make sense (I have not yet looked in detail) to 
> > > > > > > >do so
> > > > > > > >along the borders of the several passes defined therein?
> > > > 
> > > > > > > I suspect a split along the ompexp/omplow boundary would be quite 
> > > > > > > easy to
> > > > > > > achieve.
> > > > 
> > > > That was indeed the first one that I tackled, omp-expand.c (spelled out
> > > > "expand" instead of "exp" to avoid confusion as "exp" might also be 
> > > > short
> > > > for "expression"; OK?) [...]
> > > 
> > > That's the one I'd suggest to pursue next, now that GCC 6.1 has been
> > > released.  How would you like me to submit the patch for review?  (It's
> > > huge, obviously.)
> > > 
> > > A few high-level comments, and questions that remain to be answered:
> > > 
> > > > Stuff that does not relate to OMP lowering, I did not move stuff out of
> > > > omp-low.c (into a new omp.c, or omp-misc.c, for example) so far, but
> > > > instead just left all that in omp-low.c.  We'll see how far we get.
> > > > 
> > > > One thing I noticed is that there sometimes is more than one suitable
> > > > place to put stuff: omp-low.c and omp-expand.c categorize by compiler
> > > > passes, and omp-offload.c -- at least in part -- [would be] about the 
> > > > orthogonal
> > > > "offloading" category.  For example, see the OMPTODO "struct oacc_loop
> > > > and enum oacc_loop_flags" in gcc/omp-offload.h.  We'll see how that 
> > > > goes.
> > > 
> > > > Some more comments, to help review:
> > > 
> > > > As I don't know how this is usually done: is it appropriate to remove
> > > > "Contributed by Diego Novillo" from omp-low.c (he does get mentioned for
> > > > his OpenMP work in gcc/doc/contrib.texi; a ton of other people have been
> > > > contributing a ton of other stuff since omp-low.c has been created), or
> > > > does this line stay in omp-low.c, or do I even duplicate it into the new
> > > > files?
> > > > 
> > > > I tried not to re-order stuff when moving.  But: we may actually want to
> > > > reorder stuff, to put it into a more sensible order.  Any suggestions?
> > > 
> > > > I had to export a small number of functions (see the prototypes not 
> > > > moved
> > > > but added to the header files).
> > > > 
> > > > Because it's also used in omp-expand.c, I moved the one-line static
> > > > inline is_reference function from omp-low.c to omp-low.h, and renamed it
> > > > to omp_is_reference because of the very generic name.  Similar functions
> > > > stay in omp-low.c however, so they're no longer defined next to each
> > > > other.  OK, or does this need a different solution?


Grüße
 Thomas


Re: [PATCH] Make basic asm implicitly clobber memory

2016-05-24 Thread Bernd Edlinger
On 05/23/16 23:46, David Wohlferd wrote:
> On 5/23/2016 12:46 AM, Richard Biener wrote:
>  > On Sun, 22 May 2016, Andrew Haley wrote:
>  >> On 05/20/2016 07:50 AM, David Wohlferd wrote:
>  >>> I realize deprecation/removal is drastic.  Especially since basic
>  >>> asm (mostly) works as is.  But fixing memory clobbers while leaving
>  >>> the rest broken feels like half a solution, meaning that some day
>  >>> we're going to have to fiddle with this again.
>  >>
>  >> Yes, we will undoubtedly have to fiddle with basic asm again.  We
>  >> should plan for deprecation.
>  >
>  > I think adding memory clobbers is worth having.  I also think that
>  > deprecating basic asms would be a good thing, so can we please
>  > add a new warning for that?  "warning: basic asms are deprecated"
>
> I've still got the -Wbasic-asm patch where I proposed this for v6. I can
> dust it off again and re-submit it.  A couple questions first:
>
> 1) In this patch the warning was disabled by default.  But it sounds
> like you want it enabled by default?  Easy to change, I'm just
> confirming your intent.
>

For practical reasons I would suggest to enable a warning like that,
only with -Wall otherwise you would have to decorate lots of test cases
with dg-warning statements (and it is rather difficult to do that for
all affected targets).

> 2) Is 'deprecated' handled differently than other types of warnings?
> There is a -Wno-deprecated, but it seems to have a very specific meaning
> that does not apply here.
>
> 3) The warning text in the old patch was "asm statement in function does
> not use extended syntax".  The intent was:
>
> a) Don't make it sound like basic asm is completely gone, since it can
> still be used at top level.
> b) Don't make it sound like all inline asm is gone, since extended asm
> can still be used in functions.
> c) Convey all that in as few words as possible.
>

The warning could also mention the changed behavior regarding the memory
clobbers, and recommend using extended asm syntax for that reason.
That was at least my initial thought.

> Now that we want to add the word 'deprecated,' perhaps one of these:
>
> - Basic asm in functions is deprecated in favor of extended syntax
> - asm in functions without extended syntax is deprecated
> - Deprecated: basic asm in function
> - Deprecated: asm in function without extended syntax
>
> I like the last one (people may not know what 'basic' means in this
> context), but any of these would work for me.  Preferences?
>
> In order to avoid conflicts, I'll wait for Bernd to commit his patch first.
>

Maybe we should not deprecate every use case, asm("") was fine, for
certain reasons.

Furthermore I think the ia64 port could still theoretically use
traditional asm to specify the stop bits (see config/ia64/ia64.c,
rtx_needs_barrier).

BTW: My patch still waits to be reviewed in detail by one of the global
reviewers, before I can apply it.

Meanwhile I added this to doc/extend.texi, in response to David's
comments:

--- gcc/doc/extend.texi (revision 231412)
+++ gcc/doc/extend.texi (working copy)
@@ -7508,7 +7508,7 @@
  inside them.  GCC has no visibility of symbols in the @code{asm} and may
  discard them as unreferenced.  It also does not know about side effects of
  the assembler code, such as modifications to memory or registers.  Unlike
-some compilers, GCC assumes that no changes to either memory or registers
+some compilers, GCC assumes that no changes to general purpose registers
  occur.  This assumption may change in a future release.

  To avoid complications from future changes to the semantics and the


Which is just the fact.  Obviously the doc will need further
polishing though, I'd like to leave that to David.


Thanks
Bernd.


Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-24 Thread Richard Biener
On Mon, 23 May 2016, Jan Hubicka wrote:

> > 
> > The assert below is unnecessary btw - it is ensured by IL checking.
> I removed the assert but had to add a check that sizes match. As sported by 
> the
> testsuite, the declaration size doesn't need to match the size of object that 
> we
> see.
> > 
> > Rather than annotating an ARRAY_REF I'd have FEs annotate FIELD_DECLs
> > that they are possibly flexible-size members.
> 
> This was my original plan. The problem however is that in many cases we do
> not see any FIELD_DECL.  When I dump the Fortran cases we give up on, I 
> typically
> see something like:
> Index: trans-types.c
> ===
> --- trans-types.c   (revision 236556)
> +++ trans-types.c   (working copy)
> @@ -1920,7 +1920,7 @@ gfc_get_array_type_bounds (tree etype, i
>  
>/* We define data as an array with the correct size if possible.
>   Much better than doing pointer arithmetic.  */
> -  if (stride)
> +  if (stride && akind >= GFC_ARRAY_ALLOCATABLE)
>  rtype = build_range_type (gfc_array_index_type, gfc_index_zero_node,
>   int_const_binop (MINUS_EXPR, stride,
>build_int_cst (TREE_TYPE 
> (stride), 1)));
> 
> It does not seem to make sense to build range types for arrays where the
> permitted value range is often above the upper bound.

Well, the ME explicitely allows domains with NULL TYPE_MAX_VALUE for this.
In the above case TYPE_MIN_VALUE is zero so you can omit the domain but
I believe that usually the FE communicates a lower bound of one to the ME.

> In that case I think we may just add ARRAY_TYPE_STRICT_DOMAIN flag 
> specifying that the value must be within the given range. Then we can just
> build arrays with strict ranges when we know these are not trailing.
> 
> Honza
> > 
> > Richard.
> > 
> 
>   * tree.c (array_at_struct_end_p): Look through MEM_REF.
> Index: tree.c
> ===
> --- tree.c(revision 236529)
> +++ tree.c(working copy)
> @@ -13076,9 +13076,28 @@ array_at_struct_end_p (tree ref)
>ref = TREE_OPERAND (ref, 0);
>  }
>  
> +  tree size = NULL;
> +
> +  if (TREE_CODE (ref) == MEM_REF
> +  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR)
> +{
> +  size = TYPE_SIZE (TREE_TYPE (ref));
> +  ref = TREE_OPERAND (TREE_OPERAND (ref, 0), 0);
> +}
> +
>/* If the reference is based on a declared entity, the size of the array
>   is constrained by its given domain.  (Do not trust commons PR/69368).  
> */
>if (DECL_P (ref)
> +  /* Be sure the size of MEM_REF target match.  For example:
> +
> +char buf[10];
> +struct foo *str = (struct foo *)&buf;
> +
> +str->trailin_array[2] = 1;
> +
> +  is valid because BUF allocate enough space.  */
> +
> +  && (!size || operand_equal_p (DECL_SIZE (ref), size, 0))

But it's still an array at struct end.  So I don't see how you
can validly claim it is not.

Richard.

>&& !(flag_unconstrained_commons
>  && TREE_CODE (ref) == VAR_DECL && DECL_COMMON (ref)))
>  return false;
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[AArch64, 0/6] Remove inline assembly in arm_neon.h

2016-05-24 Thread Jiong Wang

This patch set is a further step to remove those intrinsics implemented
by inline assembly.

The touched intrinsics including those for fixed-point coversion,
frsqrt*, fabd and faddp.

The implementation approach is quite simple, for these intrinsics that
were implemented by inline assembly:

  * If there are rtl instruction patterns introduced later, then migrate
to builtins which are backed by these patterns.

  * If there aren't rtl instruction patterns, then add missing patterns,
and migrate to builtins.

AArch64 boostrap OK, no regression on linux configuration, also no
regression on big-endian bare-metal tests.

---
Jiong Wang (6)
  Reimplement scalar fixed-point intrinsics
  Reimplement vector fixed-point intrinsics
  Reimplement frsqrte intrinsics
  Reimplement frsqrts intrinsics
  Reimplement fabd intrinsics & merge rtl patterns
  Reimplement vpadd intrinsics & extends rtl patterns to all modes

 gcc/config/aarch64/aarch64-builtins.c|  12 ++-
 gcc/config/aarch64/aarch64-builtins.def  | 473 
++
 gcc/config/aarch64/aarch64-simd-builtins.def | 447 
-

 gcc/config/aarch64/aarch64-simd.md   |  72 ++---
 gcc/config/aarch64/aarch64.c |  20 ++---
 gcc/config/aarch64/aarch64.md|  26 ++
 gcc/config/aarch64/arm_neon.h| 700 +++--
 gcc/config/aarch64/iterators.md  |  31 ++-
 gcc/config/aarch64/t-aarch64 |   2 +-
 9 files changed, 842 insertions(+), 941 deletions(-)


[AArch64, 1/6] Reimplement scalar fixed-point intrinsics

2016-05-24 Thread Jiong Wang

This patch reimplement scalar intrinsics for conversion between floating-
point and fixed-point.

Previously, all such intrinsics are implemented through inline assembly.
This patch added RTL pattern for these operations that those intrinsics
can be implemented through builtins.

gcc/
2016-05-23  Jiong Wang

* config/aarch64/aarch64-builtins.c (TYPES_BINOP_USS): New
(TYPES_BINOP_SUS): Likewise.
(aarch64_simd_builtin_data): Update include file name.
(aarch64_builtins): Likewise.
* config/aarch64/aarch64-simd-builtins.def: Rename to
aarch64-builtins.def.
(scvtfsi): New entries for conversion between scalar
float-point and fixed-point.
(scvtfdi): Likewise.
(ucvtfsi): Likewise.
(ucvtfdi): Likewise.
(fcvtzssf): Likewise.
(fcvtzsdf): Likewise.
(fcvtzusf): Likewise.
(fcvtzudf): Likewise.
* config/aarch64/aarch64.md
(3): New
pattern for conversion between scalar float to fixed-pointer.
(3): Likewise.
(UNSPEC_FCVTZS_SCALAR): New UNSPEC enumeration.
(UNSPEC_FCVTZU_SCALAR): Likewise.
(UNSPEC_SCVTF_SCALAR): Likewise.
(UNSPEC_UCVTF_SCALAR): Likewise.
* config/aarch64/aarch64-simd.md
(3): New pattern for conversion
between scalar variant of SIMD and fixed-point
(3): Likewise.
* config/aarch64/arm_neon.h (vcvtd_n_f64_s64): Remove inline assembly.  
Use
builtin.
(vcvtd_n_f64_u64): Likewise.
(vcvtd_n_s64_f64): Likewise.
(vcvtd_n_u64_f64): Likewise.
(vcvtd_n_f32_s32): Likewise.
(vcvts_n_f32_u32): Likewise.
(vcvtd_n_s32_f32): Likewise.
(vcvts_n_u32_f32): Likewise.
* config/aarch64/iterators.md (UNSPEC_FCVTZS): New.
(UNSPEC_FCVTZU): Likewise.
(UNSPEC_SCVTF): Likewise.
(UNSPEC_UCVTF): Likewise.
(fcvt_target): Support integer to float mapping.
(FCVT_TARGET): Likewise.
(FCVT_FIXED2F): New iterator.
(FCVT_F2FIXED): Likewise.
(FCVT_FIXED2F_SCALAR): Likewise.
(FCVT_F2FIXED_SCALAR): Likewise.
(fcvt_fixed_insn): New define_int_attr.
* config/aarch64/t-aarch64 (aarch64-builtins.o): Change dependency file
name from "aarch64-simd-builtins.def" to "aarch64-builtins.def".

>From 91adf34dbcf5a233c3d159e7038256d3f5c7572e Mon Sep 17 00:00:00 2001
From: "Jiong.Wang" 
Date: Mon, 23 May 2016 12:11:53 +0100
Subject: [PATCH 1/6] 1

---
 gcc/config/aarch64/aarch64-builtins.c|  12 +-
 gcc/config/aarch64/aarch64-builtins.def  | 457 +++
 gcc/config/aarch64/aarch64-simd-builtins.def | 447 --
 gcc/config/aarch64/aarch64-simd.md   |  22 ++
 gcc/config/aarch64/aarch64.md|  26 ++
 gcc/config/aarch64/arm_neon.h| 148 +++--
 gcc/config/aarch64/iterators.md  |  25 +-
 gcc/config/aarch64/t-aarch64 |   2 +-
 8 files changed, 591 insertions(+), 548 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-builtins.def
 delete mode 100644 gcc/config/aarch64/aarch64-simd-builtins.def

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 5573903..d79ba3d 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -139,6 +139,14 @@ aarch64_types_binop_ssu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_unsigned };
 #define TYPES_BINOP_SSU (aarch64_types_binop_ssu_qualifiers)
 static enum aarch64_type_qualifiers
+aarch64_types_binop_uss_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none, qualifier_none };
+#define TYPES_BINOP_USS (aarch64_types_binop_uss_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_binop_sus_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_unsigned, qualifier_none };
+#define TYPES_BINOP_SUS (aarch64_types_binop_sus_qualifiers)
+static enum aarch64_type_qualifiers
 aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_poly, qualifier_poly, qualifier_poly };
 #define TYPES_BINOPP (aarch64_types_binopp_qualifiers)
@@ -291,7 +299,7 @@ aarch64_types_storestruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #include "aarch64-builtin-iterators.h"
 
 static aarch64_simd_builtin_datum aarch64_simd_builtin_data[] = {
-#include "aarch64-simd-builtins.def"
+#include "aarch64-builtins.def"
 };
 
 /* There's only 8 CRC32 builtins.  Probably not worth their own .def file.  */
@@ -336,7 +344,7 @@ enum aarch64_builtins
   AARCH64_BUILTIN_RSQRT_V4SF,
   AARCH64_SIMD_BUILTIN_BASE,
   AARCH64_SIMD_BUILTIN_LANE_CHECK,
-#include "aarch64-simd-builtins.def"
+#include "aarch64-builtins.def"
   /* The first enum element which is based on an insn_data pattern.  */
   AARCH64_SIMD_PATTERN_START = AARCH64_SIMD_BUILTIN_LANE_CHECK + 1,
   AARCH64_SIMD_BUILTIN_MAX = AARCH64_S

[AArch64, 3/6] Reimplement frsqrte intrinsics

2016-05-24 Thread Jiong Wang

These intrinsics were implemented before the instruction pattern
"aarch64_rsqrte" added, that these intrinsics were implemented through
inline assembly.

This mirgrate the implementation to builtin.

gcc/
2016-05-23  Jiong Wang 

* config/aarch64/aarch64-builtins.def (rsqrte): New builtins 
for modes

VALLF.
* config/aarch64/aarch64-simd.md (aarch64_rsqrte_2): 
Rename to

"aarch64_rsqrte".
* config/aarch64/aarch64.c (get_rsqrte_type): Update gen* name.
* config/aarch64/arm_neon.h (vrsqrts_f32): Remove inline 
assembly.  Use

builtin.
(vrsqrted_f64): Likewise.
(vrsqrte_f32): Likewise.
(vrsqrteq_f32): Likewise.
(vrsqrteq_f64): Likewise.

>From 4921317940fe69353cd057cc329943350bc45adf Mon Sep 17 00:00:00 2001
From: "Jiong.Wang" 
Date: Mon, 23 May 2016 12:12:19 +0100
Subject: [PATCH 3/6] 3

---
 gcc/config/aarch64/aarch64-builtins.def |  3 ++
 gcc/config/aarch64/aarch64-simd.md  |  2 +-
 gcc/config/aarch64/aarch64.c| 10 ++--
 gcc/config/aarch64/arm_neon.h   | 87 -
 4 files changed, 41 insertions(+), 61 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.def b/gcc/config/aarch64/aarch64-builtins.def
index 5e6280c..32bcd06 100644
--- a/gcc/config/aarch64/aarch64-builtins.def
+++ b/gcc/config/aarch64/aarch64-builtins.def
@@ -459,3 +459,6 @@
   BUILTIN_VALLI (BINOP_SUS, ucvtf, 3)
   BUILTIN_VALLF (BINOP, fcvtzs, 3)
   BUILTIN_VALLF (BINOP_USS, fcvtzu, 3)
+
+  /* Implemented by aarch64_rsqrte.  */
+  BUILTIN_VALLF (UNOP, rsqrte, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 66ca2de..c34d21e 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -382,7 +382,7 @@
   [(set_attr "type" "neon_mul__scalar")]
 )
 
-(define_insn "aarch64_rsqrte_2"
+(define_insn "aarch64_rsqrte"
   [(set (match_operand:VALLF 0 "register_operand" "=w")
 	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")]
 		 UNSPEC_RSQRTE))]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index bd45a7d..18a8c1e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7358,11 +7358,11 @@ get_rsqrte_type (machine_mode mode)
 {
   switch (mode)
   {
-case DFmode:   return gen_aarch64_rsqrte_df2;
-case SFmode:   return gen_aarch64_rsqrte_sf2;
-case V2DFmode: return gen_aarch64_rsqrte_v2df2;
-case V2SFmode: return gen_aarch64_rsqrte_v2sf2;
-case V4SFmode: return gen_aarch64_rsqrte_v4sf2;
+case DFmode:   return gen_aarch64_rsqrtedf;
+case SFmode:   return gen_aarch64_rsqrtesf;
+case V2DFmode: return gen_aarch64_rsqrtev2df;
+case V2SFmode: return gen_aarch64_rsqrtev2sf;
+case V4SFmode: return gen_aarch64_rsqrtev4sf;
 default: gcc_unreachable ();
   }
 }
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index bd712fc..4c9976e 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -9163,17 +9163,6 @@ vqrdmulhq_n_s32 (int32x4_t a, int32_t b)
result;  \
  })
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vrsqrte_f32 (float32x2_t a)
-{
-  float32x2_t result;
-  __asm__ ("frsqrte %0.2s,%1.2s"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
 vrsqrte_f64 (float64x1_t a)
 {
@@ -9196,39 +9185,6 @@ vrsqrte_u32 (uint32x2_t a)
   return result;
 }
 
-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
-vrsqrted_f64 (float64_t a)
-{
-  float64_t result;
-  __asm__ ("frsqrte %d0,%d1"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vrsqrteq_f32 (float32x4_t a)
-{
-  float32x4_t result;
-  __asm__ ("frsqrte %0.4s,%1.4s"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vrsqrteq_f64 (float64x2_t a)
-{
-  float64x2_t result;
-  __asm__ ("frsqrte %0.2d,%1.2d"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
 vrsqrteq_u32 (uint32x4_t a)
 {
@@ -9240,17 +9196,6 @@ vrsqrteq_u32 (uint32x4_t a)
   return result;
 }
 
-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-vrsqrtes_f32 (float32_t a)
-{
-  float32_t result;
-  __asm__ ("frsqrte %s0,%s1"
-   : "=w"(result)
-   : "w"(a)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 v

[AArch64, 4/6] Reimplement frsqrts intrinsics

2016-05-24 Thread Jiong Wang

Similar as [3/6], these intrinsics were implemented before the instruction
pattern "aarch64_rsqrts" added, that these intrinsics were implemented
through inline assembly.

This mirgrate the implementation to builtin.

gcc/
2016-05-23  Jiong Wang 

* config/aarch64/aarch64-builtins.def (rsqrts): New builtins 
for modes

VALLF.
* config/aarch64/aarch64-simd.md (aarch64_rsqrts_3): 
Rename to

"aarch64_rsqrts".
* config/aarch64/aarch64.c (get_rsqrts_type): Update gen* name.
* config/aarch64/arm_neon.h (vrsqrtss_f32): Remove inline 
assembly.  Use

builtin.
(vrsqrtsd_f64): Likewise.
(vrsqrts_f32): Likewise.
(vrsqrtsq_f32): Likewise.
(vrsqrtsq_f64): Likewise.
>From ea271deeb19e3a1e611cbc1ddf3abfec06388958 Mon Sep 17 00:00:00 2001
From: "Jiong.Wang" 
Date: Mon, 23 May 2016 12:12:33 +0100
Subject: [PATCH 4/6] 4

---
 gcc/config/aarch64/aarch64-builtins.def |  3 ++
 gcc/config/aarch64/aarch64-simd.md  |  2 +-
 gcc/config/aarch64/aarch64.c| 10 ++--
 gcc/config/aarch64/arm_neon.h   | 87 -
 4 files changed, 41 insertions(+), 61 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.def b/gcc/config/aarch64/aarch64-builtins.def
index 32bcd06..1955d17 100644
--- a/gcc/config/aarch64/aarch64-builtins.def
+++ b/gcc/config/aarch64/aarch64-builtins.def
@@ -462,3 +462,6 @@
 
   /* Implemented by aarch64_rsqrte.  */
   BUILTIN_VALLF (UNOP, rsqrte, 0)
+
+  /* Implemented by aarch64_rsqrts.  */
+  BUILTIN_VALLF (BINOP, rsqrts, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c34d21e..cca6c1b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -390,7 +390,7 @@
   "frsqrte\\t%0, %1"
   [(set_attr "type" "neon_fp_rsqrte_")])
 
-(define_insn "aarch64_rsqrts_3"
+(define_insn "aarch64_rsqrts"
   [(set (match_operand:VALLF 0 "register_operand" "=w")
 	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
 	   (match_operand:VALLF 2 "register_operand" "w")]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 18a8c1e..ba71d2a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7377,11 +7377,11 @@ get_rsqrts_type (machine_mode mode)
 {
   switch (mode)
   {
-case DFmode:   return gen_aarch64_rsqrts_df3;
-case SFmode:   return gen_aarch64_rsqrts_sf3;
-case V2DFmode: return gen_aarch64_rsqrts_v2df3;
-case V2SFmode: return gen_aarch64_rsqrts_v2sf3;
-case V4SFmode: return gen_aarch64_rsqrts_v4sf3;
+case DFmode:   return gen_aarch64_rsqrtsdf;
+case SFmode:   return gen_aarch64_rsqrtssf;
+case V2DFmode: return gen_aarch64_rsqrtsv2df;
+case V2SFmode: return gen_aarch64_rsqrtsv2sf;
+case V4SFmode: return gen_aarch64_rsqrtsv4sf;
 default: gcc_unreachable ();
   }
 }
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index be48a5e..1971373 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -9196,61 +9196,6 @@ vrsqrteq_u32 (uint32x4_t a)
   return result;
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vrsqrts_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ ("frsqrts %0.2s,%1.2s,%2.2s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
-vrsqrtsd_f64 (float64_t a, float64_t b)
-{
-  float64_t result;
-  __asm__ ("frsqrts %d0,%d1,%d2"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vrsqrtsq_f32 (float32x4_t a, float32x4_t b)
-{
-  float32x4_t result;
-  __asm__ ("frsqrts %0.4s,%1.4s,%2.4s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vrsqrtsq_f64 (float64x2_t a, float64x2_t b)
-{
-  float64x2_t result;
-  __asm__ ("frsqrts %0.2d,%1.2d,%2.2d"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-vrsqrtss_f32 (float32_t a, float32_t b)
-{
-  float32_t result;
-  __asm__ ("frsqrts %s0,%s1,%s2"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 #define vshrn_high_n_s16(a, b, c)   \
   __extension__ \
 ({  \
@@ -21481,6 +21426,38 @@ vrsqrteq_f64 (float64x2_t a)
   return __builtin_aarch64_rsqrtev2df (a);
 }
 
+/* vrsqrts.  */
+
+__extens

[AArch64, 2/6] Reimplement vector fixed-point intrinsics

2016-05-24 Thread Jiong Wang

Based on top of [1/6], this patch reimplement vector intrinsics for
conversion between floating-point and fixed-point.

gcc/
2016-05-23  Jiong Wang 

* config/aarch64/aarch64-builtins.def (scvtf): New builtins for 
vector types.

(ucvtf): Likewise.
(fcvtzs): Likewise.
(fcvtzu): Likewise.
* config/aarch64/aarch64-simd.md
(3): Extend to more modes.
Rename to 3.
(3): Likewise and 
rename to

3.
* config/aarch64/arm_neon.h (vcvt_n_f32_s32): Remove inline 
assembly.

Use builtin.
(vcvt_n_f32_u32): Likewise.
(vcvt_n_s32_f32): Likewise.
(vcvt_n_u32_f32): Likewise.
(vcvtq_n_f32_s32): Likewise.
(vcvtq_n_f32_u32): Likewise.
(vcvtq_n_f64_s64): Likewise.
(vcvtq_n_f64_u64): Likewise.
(vcvtq_n_s32_f32): Likewise.
(vcvtq_n_s64_f64): Likewise.
(vcvtq_n_u32_f32): Likewise.
(vcvtq_n_u64_f64): Likewise.
* config/aarch64/iterators.md (VALLI): New mode iterator.
(fcvt_target): Support V4DI, V4SI and V2SI.
(FCVT_TARGET): Likewise.
>From 63e8362e7d0afc2f4dd4288d38d3f64b62bfd657 Mon Sep 17 00:00:00 2001
From: "Jiong.Wang" 
Date: Mon, 23 May 2016 12:12:04 +0100
Subject: [PATCH 2/6] 2

---
 gcc/config/aarch64/aarch64-builtins.def |   4 +
 gcc/config/aarch64/aarch64-simd.md  |  22 ++--
 gcc/config/aarch64/arm_neon.h   | 216 +++-
 gcc/config/aarch64/iterators.md |   5 +
 4 files changed, 92 insertions(+), 155 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.def b/gcc/config/aarch64/aarch64-builtins.def
index 4528db3..5e6280c 100644
--- a/gcc/config/aarch64/aarch64-builtins.def
+++ b/gcc/config/aarch64/aarch64-builtins.def
@@ -455,3 +455,7 @@
   BUILTIN_GPI (BINOP, fcvtzsdf, 3)
   BUILTIN_GPI (BINOP_USS, fcvtzusf, 3)
   BUILTIN_GPI (BINOP_USS, fcvtzudf, 3)
+  BUILTIN_VALLI (BINOP, scvtf, 3)
+  BUILTIN_VALLI (BINOP_SUS, ucvtf, 3)
+  BUILTIN_VALLF (BINOP, fcvtzs, 3)
+  BUILTIN_VALLF (BINOP_USS, fcvtzu, 3)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 670c690..66ca2de 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1778,26 +1778,26 @@
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
-;; Convert between fixed-point and floating-point (scalar variant from SIMD)
+;; Convert between fixed-point and floating-point (SIMD)
 
-(define_insn "3"
-  [(set (match_operand: 0 "register_operand" "=w")
-	(unspec: [(match_operand:GPF 1 "register_operand" "w")
-   (match_operand:SI 2 "immediate_operand" "i")]
+(define_insn "3"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(match_operand:VALLF 1 "register_operand" "w")
+ (match_operand:SI 2 "immediate_operand" "i")]
 	 FCVT_F2FIXED))]
   "TARGET_SIMD"
   "\t%0, %1, #%2"
-  [(set_attr "type" "neon_fp_to_int_")]
+  [(set_attr "type" "neon_fp_to_int_")]
 )
 
-(define_insn "3"
-  [(set (match_operand: 0 "register_operand" "=w")
-	(unspec: [(match_operand:GPI 1 "register_operand" "w")
-   (match_operand:SI 2 "immediate_operand" "i")]
+(define_insn "3"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec: [(match_operand:VALLI 1 "register_operand" "w")
+ (match_operand:SI 2 "immediate_operand" "i")]
 	 FCVT_FIXED2F))]
   "TARGET_SIMD"
   "\t%0, %1, #%2"
-  [(set_attr "type" "neon_int_to_fp_")]
+  [(set_attr "type" "neon_int_to_fp_")]
 )
 
 ;; ??? Note that the vectorizer usage of the vec_unpacks_[lo/hi] patterns
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 012a11a..bd712fc 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -6025,150 +6025,6 @@ vaddlvq_u32 (uint32x4_t a)
result;  \
  })
 
-#define vcvt_n_f32_s32(a, b)\
-  __extension__ \
-({  \
-   int32x2_t a_ = (a);  \
-   float32x2_t result;  \
-   __asm__ ("scvtf %0.2s, %1.2s, #%2"   \
-: "=w"(result)  \
-: "w"(a_), "i"(b)   \
-: /* No clobbers */);   \
-   result;  \
- })
-
-#define vcvt_n_f32_u32(a, b)\
-  __extension__ \
-({  \
-   uint32x2_t a_ = (a); \
-   float32x2_t result;

[AArch64, 5/6] Reimplement fabd intrinsics & merge rtl patterns

2016-05-24 Thread Jiong Wang
These intrinsics were implemented before "fabd_3" introduces.  
Meanwhile

the patterns "fabd_3" and "*fabd_scalar3" can be merged into a
single "fabd3" using VALLF.

This patch migrate the implementation to builtins backed by this pattern.

gcc/
2016-05-23  Jiong Wang 

* config/aarch64/aarch64-builtins.def (fabd): New builtins for 
modes

VALLF.
* config/aarch64/aarch64-simd.md (fabd_3): Extend modes 
from VDQF

to VALLF.
"*fabd_scalar3): Delete.
* config/aarch64/arm_neon.h (vabds_f32): Remove inline assembly.
Use builtin.
(vabdd_f64): Likewise.
(vabd_f32): Likewise.
(vabdq_f32): Likewise.
(vabdq_f64): Likewise.

>From 9bafb58055d4e379df7b626acd6aa80bdb0d4b22 Mon Sep 17 00:00:00 2001
From: "Jiong.Wang" 
Date: Mon, 23 May 2016 12:12:53 +0100
Subject: [PATCH 5/6] 5

---
 gcc/config/aarch64/aarch64-builtins.def |  3 ++
 gcc/config/aarch64/aarch64-simd.md  | 23 +++--
 gcc/config/aarch64/arm_neon.h   | 87 -
 3 files changed, 42 insertions(+), 71 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.def b/gcc/config/aarch64/aarch64-builtins.def
index 1955d17..40baebe 100644
--- a/gcc/config/aarch64/aarch64-builtins.def
+++ b/gcc/config/aarch64/aarch64-builtins.def
@@ -465,3 +465,6 @@
 
   /* Implemented by aarch64_rsqrts.  */
   BUILTIN_VALLF (BINOP, rsqrts, 0)
+
+  /* Implemented by fabd_3.  */
+  BUILTIN_VALLF (BINOP, fabd, 3)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index cca6c1b..71dd74a 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -474,23 +474,14 @@
   [(set_attr "type" "neon_arith_acc")]
 )
 
-(define_insn "fabd_3"
-  [(set (match_operand:VDQF 0 "register_operand" "=w")
-	(abs:VDQF (minus:VDQF
-		   (match_operand:VDQF 1 "register_operand" "w")
-		   (match_operand:VDQF 2 "register_operand" "w"]
-  "TARGET_SIMD"
-  "fabd\t%0., %1., %2."
-  [(set_attr "type" "neon_fp_abd_")]
-)
-
-(define_insn "*fabd_scalar3"
-  [(set (match_operand:GPF 0 "register_operand" "=w")
-(abs:GPF (minus:GPF
- (match_operand:GPF 1 "register_operand" "w")
- (match_operand:GPF 2 "register_operand" "w"]
+(define_insn "fabd3"
+  [(set (match_operand:VALLF 0 "register_operand" "=w")
+	(abs:VALLF
+	  (minus:VALLF
+	(match_operand:VALLF 1 "register_operand" "w")
+	(match_operand:VALLF 2 "register_operand" "w"]
   "TARGET_SIMD"
-  "fabd\t%0, %1, %2"
+  "fabd\t%0, %1, %2"
   [(set_attr "type" "neon_fp_abd_")]
 )
 
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 9bbe815..ca29074 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -5440,17 +5440,6 @@ vabaq_u32 (uint32x4_t a, uint32x4_t b, uint32x4_t c)
   return result;
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vabd_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ ("fabd %0.2s, %1.2s, %2.2s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vabd_s8 (int8x8_t a, int8x8_t b)
 {
@@ -5517,17 +5506,6 @@ vabd_u32 (uint32x2_t a, uint32x2_t b)
   return result;
 }
 
-__extension__ static __inline float64_t __attribute__ ((__always_inline__))
-vabdd_f64 (float64_t a, float64_t b)
-{
-  float64_t result;
-  __asm__ ("fabd %d0, %d1, %d2"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int16x8_t __attribute__ ((__always_inline__))
 vabdl_high_s8 (int8x16_t a, int8x16_t b)
 {
@@ -5660,28 +5638,6 @@ vabdl_u32 (uint32x2_t a, uint32x2_t b)
   return result;
 }
 
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vabdq_f32 (float32x4_t a, float32x4_t b)
-{
-  float32x4_t result;
-  __asm__ ("fabd %0.4s, %1.4s, %2.4s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vabdq_f64 (float64x2_t a, float64x2_t b)
-{
-  float64x2_t result;
-  __asm__ ("fabd %0.2d, %1.2d, %2.2d"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vabdq_s8 (int8x16_t a, int8x16_t b)
 {
@@ -5748,17 +5704,6 @@ vabdq_u32 (uint32x4_t a, uint32x4_t b)
   return result;
 }
 
-__extension__ static __inline float32_t __attribute__ ((__always_inline__))
-vabds_f32 (float32_t a, float32_t b)
-{
-  float32_t result;
-  __asm__ ("fabd %s0, %s1, %s2"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline i

[AArch64, 6/6] Reimplement vpadd intrinsics & extend rtl patterns to all modes

2016-05-24 Thread Jiong Wang
These intrinsics was implemented by inline assembly using "faddp" 
instruction.
There was a pattern "aarch64_addpv4sf" which supportsV4SF mode only 
while we can

extend this pattern to support VDQF mode, then we can reimplement these
intrinsics through builtlins.

gcc/
2016-05-23  Jiong Wang 

* config/aarch64/aarch64-builtins.def (faddp): New builtins for 
modes in VDQF.

* config/aarch64/aarch64-simd.md (aarch64_faddp): New.
(arch64_addpv4sf): Delete.
(reduc_plus_scal_v4sf): Use "gen_aarch64_faddpv4sf" instead of
"gen_aarch64_addpv4sf".
* gcc/config/aarch64/iterators.md (UNSPEC_FADDP): New.
* config/aarch64/arm_neon.h (vpadd_f32): Remove inline 
assembly.  Use

builtin.
(vpaddq_f32): Likewise.
(vpaddq_f64): Likewise.

>From d97a40ac2e69403b64bcf53596581b49b86ef40c Mon Sep 17 00:00:00 2001
From: "Jiong.Wang" 
Date: Mon, 23 May 2016 12:13:13 +0100
Subject: [PATCH 6/6] 6

---
 gcc/config/aarch64/aarch64-builtins.def |  3 ++
 gcc/config/aarch64/aarch64-simd.md  | 23 ---
 gcc/config/aarch64/arm_neon.h   | 51 -
 gcc/config/aarch64/iterators.md |  1 +
 4 files changed, 34 insertions(+), 44 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.def b/gcc/config/aarch64/aarch64-builtins.def
index 40baebe..37d8183 100644
--- a/gcc/config/aarch64/aarch64-builtins.def
+++ b/gcc/config/aarch64/aarch64-builtins.def
@@ -468,3 +468,6 @@
 
   /* Implemented by fabd_3.  */
   BUILTIN_VALLF (BINOP, fabd, 3)
+
+  /* Implemented by aarch64_faddp.  */
+  BUILTIN_VDQF (BINOP, faddp, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 71dd74a..9b9f8df 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1992,6 +1992,16 @@
   }
 )
 
+(define_insn "aarch64_faddp"
+ [(set (match_operand:VDQF 0 "register_operand" "=w")
+   (unspec:VDQF [(match_operand:VDQF 1 "register_operand" "w")
+		 (match_operand:VDQF 2 "register_operand" "w")]
+		 UNSPEC_FADDP))]
+ "TARGET_SIMD"
+ "faddp\t%0., %1., %2."
+  [(set_attr "type" "neon_fp_reduc_add_")]
+)
+
 (define_insn "aarch64_reduc_plus_internal"
  [(set (match_operand:VDQV 0 "register_operand" "=w")
(unspec:VDQV [(match_operand:VDQV 1 "register_operand" "w")]
@@ -2019,15 +2029,6 @@
   [(set_attr "type" "neon_fp_reduc_add_")]
 )
 
-(define_insn "aarch64_addpv4sf"
- [(set (match_operand:V4SF 0 "register_operand" "=w")
-   (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "w")]
-		UNSPEC_FADDV))]
- "TARGET_SIMD"
- "faddp\\t%0.4s, %1.4s, %1.4s"
-  [(set_attr "type" "neon_fp_reduc_add_s_q")]
-)
-
 (define_expand "reduc_plus_scal_v4sf"
  [(set (match_operand:SF 0 "register_operand")
(unspec:V4SF [(match_operand:V4SF 1 "register_operand")]
@@ -2036,8 +2037,8 @@
 {
   rtx elt = GEN_INT (ENDIAN_LANE_N (V4SFmode, 0));
   rtx scratch = gen_reg_rtx (V4SFmode);
-  emit_insn (gen_aarch64_addpv4sf (scratch, operands[1]));
-  emit_insn (gen_aarch64_addpv4sf (scratch, scratch));
+  emit_insn (gen_aarch64_faddpv4sf (scratch, operands[1], operands[1]));
+  emit_insn (gen_aarch64_faddpv4sf (scratch, scratch, scratch));
   emit_insn (gen_aarch64_get_lanev4sf (operands[0], scratch, elt));
   DONE;
 })
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index ae4c429..a37ceeb 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -8225,17 +8225,6 @@ vpadalq_u32 (uint64x2_t a, uint32x4_t b)
   return result;
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vpadd_f32 (float32x2_t a, float32x2_t b)
-{
-  float32x2_t result;
-  __asm__ ("faddp %0.2s,%1.2s,%2.2s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int16x4_t __attribute__ ((__always_inline__))
 vpaddl_s8 (int8x8_t a)
 {
@@ -8368,28 +8357,6 @@ vpaddlq_u32 (uint32x4_t a)
   return result;
 }
 
-__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vpaddq_f32 (float32x4_t a, float32x4_t b)
-{
-  float32x4_t result;
-  __asm__ ("faddp %0.4s,%1.4s,%2.4s"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
-__extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
-vpaddq_f64 (float64x2_t a, float64x2_t b)
-{
-  float64x2_t result;
-  __asm__ ("faddp %0.2d,%1.2d,%2.2d"
-   : "=w"(result)
-   : "w"(a), "w"(b)
-   : /* No clobbers */);
-  return result;
-}
-
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
 vpaddq_s8 (int8x16_t a, int8x16_t b)
 {
@@ -18629,6 +18596,24 @@ vnegq_s64 (int64x2_t __a)
 
 /* vpadd  */
 
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vpadd_f32 (float32x2_t __a, float32x2_t __b)
+{
+  return __builtin_aarc

Re: [PATCH] Fix PR tree-optimization/71170

2016-05-24 Thread Christophe Lyon
On 24 May 2016 at 05:13, Kugan Vivekanandarajah
 wrote:
> On 23 May 2016 at 21:35, Richard Biener  wrote:
>> On Sat, May 21, 2016 at 8:08 AM, Kugan Vivekanandarajah
>>  wrote:
>>> On 20 May 2016 at 21:07, Richard Biener  wrote:
 On Fri, May 20, 2016 at 1:51 AM, Kugan Vivekanandarajah
  wrote:
> Hi Richard,
>
>> I think it should have the same rank as op or op + 1 which is the current
>> behavior.  Sth else doesn't work correctly here I think, like inserting 
>> the
>> multiplication not near the definition of op.
>>
>> Well, the whole "clever insertion" logic is simply flawed.
>
> What I meant to say was that the simple logic we have now wouldn’t
> work. "clever logic" is knowing where exactly where it is needed and
> inserting there.  I think thats what  you are suggesting below in a
> simple to implement way.
>
>> I'd say that ideally we would delay inserting the multiplication to
>> rewrite_expr_tree time.  For example by adding a ops->stmt_to_insert
>> member.
>>
>
> Here is an implementation based on above. Bootstrap on x86-linux-gnu
> is OK. regression testing is ongoing.

 I like it.  Please push the insertion code to a helper as I think you need
 to post-pone setting the stmts UID to that point.

 Ideally we'd make use of the same machinery in attempt_builtin_powi,
 removing the special-casing of powi_result.  (same as I said that ideally
 the plus->mult stuff would use the repeat-ops machinery...)

 I'm not 100% convinced the place you insert the stmt is correct but I
 haven't spent too much time to decipher reassoc in this area.
>>>
>>>
>>> Hi Richard,
>>>
>>> Thanks. Here is a tested version of the patch. I did miss one place
>>> which I fixed now (tranform_stmt_to_copy) I also created a function to
>>> do the insertion.
>>>
>>>
>>> Bootstrap and regression testing on x86_64-linux-gnu are fine. Is this
>>> OK for trunk.
>>
>> @@ -3798,6 +3805,7 @@ rewrite_expr_tree (gimple *stmt, unsigned int opindex,
>>oe1 = ops[opindex];
>>oe2 = ops[opindex + 1];
>>
>> +
>>if (rhs1 != oe1->op || rhs2 != oe2->op)
>> {
>>   gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
>>
>> please remove this stray change.
>>
>> Ok with that change.
>
> Hi Richard,
>
> Thanks for the review. I also found another issue with this patch.
> I.e. for the stmt_to_insert we will get gimple_bb of NULL which is not
> expected in sort_by_operand_rank. This only showed up only while
> building a version of glibc.
>
> Bootstrap and regression testing are ongoing.Is this OK for trunk if
> passes regression and bootstrap.
>

I'm seeing build failures in glibc after you committed r236619.
This new patch is fixing it, right?


> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2016-05-24  Kugan Vivekanandarajah  
>
> * tree-ssa-reassoc.c (sort_by_operand_rank): Check for gimple_bb of NULL
> for stmt_to_insert.
>
>
> gcc/testsuite/ChangeLog:
>
> 2016-05-24  Kugan Vivekanandarajah  
>
> * gcc.dg/tree-ssa/reassoc-44.c: New test.


RE: [PATCH][MIPS] Add -mgrow-frame-downwards option

2016-05-24 Thread Matthew Fortune
Sandra Loosemore  writes:
> On 05/20/2016 08:58 AM, Robert Suchanek wrote:
> > Hi,
> >
> > The patch changes the default behaviour of the direction in which the
> > local frame grows for MIPS16.
> >
> > The code size reduces by about 0.5% in average case for -Os, hence, it
> > is good to turn the option on by default.
> >
> > Ok to apply?
> >
> > Regards,
> > Robert
> >
> > gcc/
> >
> > 2016-05-20  Matthew Fortune  
> >
> > * config/mips/mips.h (FRAME_GROWS_DOWNWARD): Enable it
> > conditionally for MIPS16.
> > * config/mips/mips.opt: Add -mgrow-frame-downwards option.
> > Enable it by default for MIPS16.
> > * doc/invoke.texi: Document the option.
> 
> This may be a stupid question, but what point is there in exposing this
> as an option to users?  Users generally just want the compiler to emit

Hi Sandra,

Firstly, thanks for reviewing it is appreciated.  There is some method to
the madness in the sense that Robert and I have a reasonable number of
patches that have been pending submission but have been released as
part of toolchains from Imagination.  We figured it would be best to post
them as is and then have this kind of discussion in the open about what
to keep and what to change so I expect there to be a few more things like
this to review. I'm likely to propose changes myself too with my upstream
hat on.

> good code when they compile with -O, not more individual optimization
> switches to twiddle.

Agreed.

> Is FRAME_GROWS_DOWNWARD likely to be so buggy or
> poorly tested that it's necessary to provide a way to turn it off?

No. I have no objection to removing this option. We have had it for
a while in our sources and found no need to advise anyone to turn it off
so hardwiring FRAME_GROWS_DOWNWARD for mips16 looks good.

Thanks,
Matthew

> If we really must have this option
> 
> > diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt index
> > 3b92ef5..53feb23 100644
> > --- a/gcc/config/mips/mips.opt
> > +++ b/gcc/config/mips/mips.opt
> > @@ -447,3 +447,7 @@ Enum(mips_cb_setting) String(always)
> Value(MIPS_CB_ALWAYS)
> >   minline-intermix
> >   Target Report Var(TARGET_INLINE_INTERMIX)
> >   Allow inlining even if the compression flags differ between caller
> and callee.
> > +
> > +mgrow-frame-downwards
> > +Target Report Var(TARGET_FRAME_GROWS_DOWNWARDS) Init(1) Change the
> > +behaviour to grow the frame downwards for MIPS16.
> 
> British spelling of "behaviour" here.  How about just "Grow the frame
> downwards for MIPS16."
> 
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index
> > 2f6195e..6e5d620 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -17929,6 +17930,18 @@ vice-versa.  When using this option it is
> necessary to protect functions
> >   that cannot be compiled as MIPS16 with a @code{noinline} attribute
> to ensure
> >   they are not inlined into a MIPS16 function.
> >
> > +@item -mgrow-frame-downwards
> > +@itemx -mno-grow-frame-downwards
> > +@opindex mgrow-frame-downwards
> > +Grow the local frame down (up) for MIPS16.
> > +
> > +Growing the frame downwards allows us to get spill slots created at
> > +the lowest
> 
> s/allows us to get spill slots created/allows GCC to create spill slots/
> 
> > +address rather than the highest address in a local frame.  The
> > +benefit of this is smaller code size as accessing spill splots closer
> > +to the stack pointer can be done using using 16-bit instructions.
> 
> s/spill splots/spill slots/
> 
> But, this option description is so implementor-speaky that it just
> reinforces my thinking that it's likely to be uninteresting to users
> 
> > +
> > +The option is enabled by default (to grow frame downwards) for
> MIPS16.
> > +
> >   @item -mabi=32
> >   @itemx -mabi=o64
> >   @itemx -mabi=n32
> >
> 
> -Sandra



Re: [PATCH] Fix PR tree-optimization/71170

2016-05-24 Thread Kugan Vivekanandarajah
On 24 May 2016 at 18:36, Christophe Lyon  wrote:
> On 24 May 2016 at 05:13, Kugan Vivekanandarajah
>  wrote:
>> On 23 May 2016 at 21:35, Richard Biener  wrote:
>>> On Sat, May 21, 2016 at 8:08 AM, Kugan Vivekanandarajah
>>>  wrote:
 On 20 May 2016 at 21:07, Richard Biener  wrote:
> On Fri, May 20, 2016 at 1:51 AM, Kugan Vivekanandarajah
>  wrote:
>> Hi Richard,
>>
>>> I think it should have the same rank as op or op + 1 which is the 
>>> current
>>> behavior.  Sth else doesn't work correctly here I think, like inserting 
>>> the
>>> multiplication not near the definition of op.
>>>
>>> Well, the whole "clever insertion" logic is simply flawed.
>>
>> What I meant to say was that the simple logic we have now wouldn’t
>> work. "clever logic" is knowing where exactly where it is needed and
>> inserting there.  I think thats what  you are suggesting below in a
>> simple to implement way.
>>
>>> I'd say that ideally we would delay inserting the multiplication to
>>> rewrite_expr_tree time.  For example by adding a ops->stmt_to_insert
>>> member.
>>>
>>
>> Here is an implementation based on above. Bootstrap on x86-linux-gnu
>> is OK. regression testing is ongoing.
>
> I like it.  Please push the insertion code to a helper as I think you need
> to post-pone setting the stmts UID to that point.
>
> Ideally we'd make use of the same machinery in attempt_builtin_powi,
> removing the special-casing of powi_result.  (same as I said that ideally
> the plus->mult stuff would use the repeat-ops machinery...)
>
> I'm not 100% convinced the place you insert the stmt is correct but I
> haven't spent too much time to decipher reassoc in this area.


 Hi Richard,

 Thanks. Here is a tested version of the patch. I did miss one place
 which I fixed now (tranform_stmt_to_copy) I also created a function to
 do the insertion.


 Bootstrap and regression testing on x86_64-linux-gnu are fine. Is this
 OK for trunk.
>>>
>>> @@ -3798,6 +3805,7 @@ rewrite_expr_tree (gimple *stmt, unsigned int opindex,
>>>oe1 = ops[opindex];
>>>oe2 = ops[opindex + 1];
>>>
>>> +
>>>if (rhs1 != oe1->op || rhs2 != oe2->op)
>>> {
>>>   gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
>>>
>>> please remove this stray change.
>>>
>>> Ok with that change.
>>
>> Hi Richard,
>>
>> Thanks for the review. I also found another issue with this patch.
>> I.e. for the stmt_to_insert we will get gimple_bb of NULL which is not
>> expected in sort_by_operand_rank. This only showed up only while
>> building a version of glibc.
>>
>> Bootstrap and regression testing are ongoing.Is this OK for trunk if
>> passes regression and bootstrap.
>>
>
> I'm seeing build failures in glibc after you committed r236619.
> This new patch is fixing it, right?


Yes (same patch attached). Also Bootstrap and regression testing on
x86_64-linux-gnu didn’t have no new failures.

Is this OK for trunk?

Thanks,
Kugan

gcc/ChangeLog:

2016-05-24  Kugan Vivekanandarajah  

* tree-ssa-reassoc.c (sort_by_operand_rank): Check fgimple_bb for NULL.


gcc/testsuite/ChangeLog:

2016-05-24  Kugan Vivekanandarajah  

* gcc.dg/tree-ssa/reassoc-44.c: New test.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-44.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-44.c
index e69de29..9b12212 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-44.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-44.c
@@ -0,0 +1,10 @@
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+unsigned int a;
+int b, c;
+void fn1 ()
+{
+  b = a + c + c;
+}
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index fb683ad..06f4d1b 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -525,7 +525,7 @@ sort_by_operand_rank (const void *pa, const void *pb)
  gimple *stmtb = SSA_NAME_DEF_STMT (oeb->op);
  basic_block bba = gimple_bb (stmta);
  basic_block bbb = gimple_bb (stmtb);
- if (bbb != bba)
+ if (bba && bbb && bbb != bba)
{
  if (bb_rank[bbb->index] != bb_rank[bba->index])
return bb_rank[bbb->index] - bb_rank[bba->index];


Tighten syntax checking for OpenACC routine construct in C

2016-05-24 Thread Thomas Schwinge
Hi!

OK for trunk?

commit 155feb878deedd09fd60e2322b1515de41595b13
Author: Thomas Schwinge 
Date:   Tue May 24 10:42:08 2016 +0200

Tighten syntax checking for OpenACC routine construct in C

gcc/c/
* c-parser.c (c_parser_oacc_routine): Tighten syntax checks.
gcc/testsuite/
* c-c++-common/goacc/routine-5.c: Add tests.
* g++.dg/goacc/routine-2.C: Remove duplicate tests.
* gfortran.dg/goacc/routine-6.f90: Add tests.
---
 gcc/c/c-parser.c  | 19 +--
 gcc/testsuite/c-c++-common/goacc/routine-5.c  | 21 +
 gcc/testsuite/g++.dg/goacc/routine-2.C|  6 --
 gcc/testsuite/gfortran.dg/goacc/routine-6.f90 |  7 +++
 4 files changed, 33 insertions(+), 20 deletions(-)

diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 80ac4d5..cbd4e4c 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -13984,25 +13984,24 @@ c_parser_oacc_routine (c_parser *parser, enum 
pragma_context context)
   c_parser_consume_token (parser);
 
   c_token *token = c_parser_peek_token (parser);
-
   if (token->type == CPP_NAME && (token->id_kind == C_ID_ID
  || token->id_kind == C_ID_TYPENAME))
{
  decl = lookup_name (token->value);
  if (!decl)
-   {
- error_at (token->location, "%qE has not been declared",
-   token->value);
- decl = error_mark_node;
-   }
+   error_at (token->location, "%qE has not been declared",
+ token->value);
+ c_parser_consume_token (parser);
}
   else
c_parser_error (parser, "expected function name");
 
-  if (token->type != CPP_CLOSE_PAREN)
-   c_parser_consume_token (parser);
-
-  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0);
+  if (!decl
+ || !c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
+   {
+ c_parser_skip_to_pragma_eol (parser, false);
+ return;
+   }
 }
 
   /* Build a chain of clauses.  */
diff --git gcc/testsuite/c-c++-common/goacc/routine-5.c 
gcc/testsuite/c-c++-common/goacc/routine-5.c
index 2a9db90..1efd154 100644
--- gcc/testsuite/c-c++-common/goacc/routine-5.c
+++ gcc/testsuite/c-c++-common/goacc/routine-5.c
@@ -38,13 +38,26 @@ namespace g {}
 #pragma acc routine /* { dg-error "not followed by" "" { target c++ } } */
 using namespace g;
 
-#pragma acc routine (g) /* { dg-error "does not refer to" "" { target c++ } } 
*/
+#pragma acc routine (g) /* { dg-error "does not refer to a function" "" { 
target c++ } } */
 
-#endif
+#endif /* __cplusplus */
 
-#pragma acc routine (a) /* { dg-error "does not refer to" } */
+#pragma acc routine (a) /* { dg-error "does not refer to a function" } */
   
-#pragma acc routine (c) /* { dg-error "does not refer to" } */
+#pragma acc routine (c) /* { dg-error "does not refer to a function" } */
+
+
+#pragma acc routine () vector /* { dg-error "expected (function 
name|unqualified-id) before .\\). token" } */
+
+#pragma acc routine (+) /* { dg-error "expected (function name|unqualified-id) 
before .\\+. token" } */
+
+
+extern void R1(void);
+extern void R2(void);
+#pragma acc routine (R1, R2, R3) worker /* { dg-error "expected .\\). before 
.,. token" } */
+#pragma acc routine (R1 R2 R3) worker /* { dg-error "expected .\\). before 
.R2." } */
+#pragma acc routine (R1) worker
+#pragma acc routine (R2) worker
 
 
 void Bar ();
diff --git gcc/testsuite/g++.dg/goacc/routine-2.C 
gcc/testsuite/g++.dg/goacc/routine-2.C
index 2d16466..ea7c9bf 100644
--- gcc/testsuite/g++.dg/goacc/routine-2.C
+++ gcc/testsuite/g++.dg/goacc/routine-2.C
@@ -14,15 +14,9 @@ one()
 
 int incr (int);
 float incr (float);
-int inc;
 
 #pragma acc routine (incr) /* { dg-error "names a set of overloads" } */
 
-#pragma acc routine (increment) /* { dg-error "has not been declared" } */
-
-#pragma acc routine (inc) /* { dg-error "does not refer to a function" } */
-
-#pragma acc routine (+) /* { dg-error "expected unqualified-id before '.' 
token" } */
 
 int sum (int, int);
 
diff --git gcc/testsuite/gfortran.dg/goacc/routine-6.f90 
gcc/testsuite/gfortran.dg/goacc/routine-6.f90
index 10951ee..10943cf 100644
--- gcc/testsuite/gfortran.dg/goacc/routine-6.f90
+++ gcc/testsuite/gfortran.dg/goacc/routine-6.f90
@@ -29,6 +29,13 @@ program main
   !$acc routine (subr1) ! { dg-error "invalid function name" }
   external :: subr2
   !$acc routine (subr2)
+
+  external :: R1, R2
+  !$acc routine (R1 R2 R3) ! { dg-error "Syntax error in \\!\\\$ACC ROUTINE 
\\( NAME \\) at \\(1\\), expecting .\\). after NAME" }
+  !$acc routine (R1, R2, R3) ! { dg-error "Syntax error in \\!\\\$ACC ROUTINE 
\\( NAME \\) at \\(1\\), expecting .\\). after NAME" }
+  !$acc routine (R1)
+  !$acc routine (R2)
+
   !$acc parallel
   !$acc loop
   do i = 1, n


Grüße
 Thomas


[PATCH][ARM] PR target/69857 Remove bogus early return false; in gen_operands_ldrd_strd

2016-05-24 Thread Kyrill Tkachov

Hi all,

As the PR says, the gen_operands_ldrd_strd function has a spurious return false 
in it.
It seems to have been there from the beginning when that code was added.

The code is trying to transform:
mov r0, 0
str r0, [r2]
mov r0, 1
str r0, [r2, #4]
 into:
mov r0, 0
mov r1, 1
strd r0, r1, [r2]

but thanks to this return only works on Thumb2.
This means that the path it was not getting tested, and we weren't generating
as many STRD instructions as we could when compiling with -marm.
This patch removes the spurious return. It also re-indents the comment for this
transformation (replacing 8 spaces with tab) and adds a mention of the behaviour
for ARM state that was missing.

With it bootstrap and test in arm state on arm-none-linux-gnueabihf runs fine.

Across all of SPEC2006 this increased the number of STRD instructions used by 
0.12%
(from 121375 -> 121530)

Ok for trunk?

Thanks,
Kyrill

2016-05-24  Kyrylo Tkachov  

PR target/69857
* config/arm/arm.c (gen_operands_ldrd_strd): Remove bogus early
return.  Reindent transformation comment and mention the ARM state
behavior.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 089d1483ac534cfd7693fdd309149aa5e8bf8191..fe1c37cda62fa76ef9438208e39b3a91e7161972 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15985,14 +15985,17 @@ gen_operands_ldrd_strd (rtx *operands, bool load,
   /* If the same input register is used in both stores
  when storing different constants, try to find a free register.
  For example, the code
-mov r0, 0
-str r0, [r2]
-mov r0, 1
-str r0, [r2, #4]
+	mov r0, 0
+	str r0, [r2]
+	mov r0, 1
+	str r0, [r2, #4]
  can be transformed into
-mov r1, 0
-strd r1, r0, [r2]
- in Thumb mode assuming that r1 is free.  */
+	mov r1, 0
+	mov r0, 1
+	strd r1, r0, [r2]
+ in Thumb mode assuming that r1 is free.
+ For ARM mode do the same but only if the starting register
+ can be made to be even.  */
   if (const_store
   && REGNO (operands[0]) == REGNO (operands[1])
   && INTVAL (operands[4]) != INTVAL (operands[5]))
@@ -16011,7 +16014,6 @@ gen_operands_ldrd_strd (rtx *operands, bool load,
   }
 else if (TARGET_ARM)
   {
-return false;
 int regno = REGNO (operands[0]);
 if (!peep2_reg_dead_p (4, operands[0]))
   {


Re: Tighten syntax checking for OpenACC routine construct in C

2016-05-24 Thread Jakub Jelinek
On Tue, May 24, 2016 at 10:51:15AM +0200, Thomas Schwinge wrote:
> Hi!
> 
> OK for trunk?
> 
> commit 155feb878deedd09fd60e2322b1515de41595b13
> Author: Thomas Schwinge 
> Date:   Tue May 24 10:42:08 2016 +0200
> 
> Tighten syntax checking for OpenACC routine construct in C
> 
>   gcc/c/
>   * c-parser.c (c_parser_oacc_routine): Tighten syntax checks.
>   gcc/testsuite/
>   * c-c++-common/goacc/routine-5.c: Add tests.
>   * g++.dg/goacc/routine-2.C: Remove duplicate tests.
>   * gfortran.dg/goacc/routine-6.f90: Add tests.

Ok.

Jakub


Re: [Patch wwwdocs] Add aarch64-none-linux-gnu as a primary platform for GCC-7

2016-05-24 Thread Richard Biener
On Tue, May 24, 2016 at 12:20 AM, Gerald Pfeifer  wrote:
> On Mon, 23 May 2016, Richard Biener wrote:
>> So I propose to demote -freebsd to secondary and use
>> i686-unknown-freebsd (or x86_64-unknown-freebsd?).
>>
>> Gerald, Andreas, can you comment on both issues?  Esp. i386
>> is putting quite some burden on libstdc++ and atomics support
>> for example.
>
> As Jeff noted, i386 actually is the "marketing" name used for the
> platform, GCC has been defaulting to i486 for ages, and I upgraded
> to i586 last year:
>
> 2015-11-15  Gerald Pfeifer  
>
> * config/i386/freebsd.h (SUBTARGET32_DEFAULT_CPU): Change to i586.
> Remove support for FreeBSD 5 and earlier.
>
> And, yes, the system compiler on current versions of FreeBSD is
> LLVM (for most platforms including x86).  There is still a fair
> user base, though.
>
> Given the above, do you still see a desire to make this change?

Can we update to a non-marketing name then, like i586-unknown-freebsd please?
config.gcc accepts i[34567]86-*-freebsd*.  It at least confused me.

Richard.

> Gerald


Re: [PATCH] Fix PR tree-optimization/71170

2016-05-24 Thread Richard Biener
On Tue, May 24, 2016 at 5:13 AM, Kugan Vivekanandarajah
 wrote:
> On 23 May 2016 at 21:35, Richard Biener  wrote:
>> On Sat, May 21, 2016 at 8:08 AM, Kugan Vivekanandarajah
>>  wrote:
>>> On 20 May 2016 at 21:07, Richard Biener  wrote:
 On Fri, May 20, 2016 at 1:51 AM, Kugan Vivekanandarajah
  wrote:
> Hi Richard,
>
>> I think it should have the same rank as op or op + 1 which is the current
>> behavior.  Sth else doesn't work correctly here I think, like inserting 
>> the
>> multiplication not near the definition of op.
>>
>> Well, the whole "clever insertion" logic is simply flawed.
>
> What I meant to say was that the simple logic we have now wouldn’t
> work. "clever logic" is knowing where exactly where it is needed and
> inserting there.  I think thats what  you are suggesting below in a
> simple to implement way.
>
>> I'd say that ideally we would delay inserting the multiplication to
>> rewrite_expr_tree time.  For example by adding a ops->stmt_to_insert
>> member.
>>
>
> Here is an implementation based on above. Bootstrap on x86-linux-gnu
> is OK. regression testing is ongoing.

 I like it.  Please push the insertion code to a helper as I think you need
 to post-pone setting the stmts UID to that point.

 Ideally we'd make use of the same machinery in attempt_builtin_powi,
 removing the special-casing of powi_result.  (same as I said that ideally
 the plus->mult stuff would use the repeat-ops machinery...)

 I'm not 100% convinced the place you insert the stmt is correct but I
 haven't spent too much time to decipher reassoc in this area.
>>>
>>>
>>> Hi Richard,
>>>
>>> Thanks. Here is a tested version of the patch. I did miss one place
>>> which I fixed now (tranform_stmt_to_copy) I also created a function to
>>> do the insertion.
>>>
>>>
>>> Bootstrap and regression testing on x86_64-linux-gnu are fine. Is this
>>> OK for trunk.
>>
>> @@ -3798,6 +3805,7 @@ rewrite_expr_tree (gimple *stmt, unsigned int opindex,
>>oe1 = ops[opindex];
>>oe2 = ops[opindex + 1];
>>
>> +
>>if (rhs1 != oe1->op || rhs2 != oe2->op)
>> {
>>   gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
>>
>> please remove this stray change.
>>
>> Ok with that change.
>
> Hi Richard,
>
> Thanks for the review. I also found another issue with this patch.
> I.e. for the stmt_to_insert we will get gimple_bb of NULL which is not
> expected in sort_by_operand_rank. This only showed up only while
> building a version of glibc.
>
> Bootstrap and regression testing are ongoing.Is this OK for trunk if
> passes regression and bootstrap.

Hmm, I'd rather fall thru to the SSA_NAME_VERSION or id comparison here
than to stmt_dominates_stmt which is only well-defined for stmts in the same BB.

So sth like

Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c  (revision 236630)
+++ gcc/tree-ssa-reassoc.c  (working copy)
@@ -519,6 +519,8 @@ sort_by_operand_rank (const void *pa, co
 See PR60418.  */
   if (!SSA_NAME_IS_DEFAULT_DEF (oea->op)
  && !SSA_NAME_IS_DEFAULT_DEF (oeb->op)
+ && !oea->stmt_to_insert
+ && !oeb->stmt_to_insert
  && SSA_NAME_VERSION (oeb->op) != SSA_NAME_VERSION (oea->op))
{
  gimple *stmta = SSA_NAME_DEF_STMT (oea->op);

ok with that change.

Richard.

> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2016-05-24  Kugan Vivekanandarajah  
>
> * tree-ssa-reassoc.c (sort_by_operand_rank): Check for gimple_bb of NULL
> for stmt_to_insert.
>
>
> gcc/testsuite/ChangeLog:
>
> 2016-05-24  Kugan Vivekanandarajah  
>
> * gcc.dg/tree-ssa/reassoc-44.c: New test.


Re: [PATCH][ARM] PR target/69857 Remove bogus early return false; in gen_operands_ldrd_strd

2016-05-24 Thread Ramana Radhakrishnan


On 24/05/16 09:52, Kyrill Tkachov wrote:
> Hi all,
> 
> As the PR says, the gen_operands_ldrd_strd function has a spurious return 
> false in it.
> It seems to have been there from the beginning when that code was added.
> 
> The code is trying to transform:
> mov r0, 0
> str r0, [r2]
> mov r0, 1
> str r0, [r2, #4]
>  into:
> mov r0, 0
> mov r1, 1
> strd r0, r1, [r2]
> 
> but thanks to this return only works on Thumb2.
> This means that the path it was not getting tested, and we weren't generating
> as many STRD instructions as we could when compiling with -marm.
> This patch removes the spurious return. It also re-indents the comment for 
> this
> transformation (replacing 8 spaces with tab) and adds a mention of the 
> behaviour
> for ARM state that was missing.
> 
> With it bootstrap and test in arm state on arm-none-linux-gnueabihf runs fine.
> 
> Across all of SPEC2006 this increased the number of STRD instructions used by 
> 0.12%
> (from 121375 -> 121530)
> 
> Ok for trunk?

Ok - oops ! 

Ramana
> 
> Thanks,
> Kyrill
> 
> 2016-05-24  Kyrylo Tkachov  
> 
> PR target/69857
> * config/arm/arm.c (gen_operands_ldrd_strd): Remove bogus early
> return.  Reindent transformation comment and mention the ARM state
> behavior.


Re: [ARM] Add support for overflow add, sub, and neg operations

2016-05-24 Thread Kyrill Tkachov

Hi Michael,

Sorry for the delay in reviewing. A few comments at the bottom.

On 29/03/16 00:19, Michael Collison wrote:

An updated patch that resolves the issues with thumb2 support and adds test 
cases as requested. Looking to check this into GCC 7 stage1 when it opens.

2016-02-24  Michael Collison  

* config/arm/arm-modes.def: Add new condition code mode CC_V
to represent the overflow bit.
* config/arm/arm.c (maybe_get_arm_condition_code):
Add support for CC_Vmode.
* config/arm/arm.md (addv4, add3_compareV,
addsi3_compareV_upper): New patterns to support signed
builtin overflow add operations.
(uaddv4, add3_compareC, addsi3_compareV_upper):
New patterns to support unsigned builtin add overflow operations.
(subv4, sub3_compare1): New patterns to support signed
builtin overflow subtract operations,
(usubv4): New patterns to support unsigned builtin subtract
overflow operations.
(negvsi3, negvdi3, negdi2_compare, negsi2_carryin_compare): New patterns
to support builtin overflow negate operations.
* gcc.target/arm/builtin_saddl.c: New testcase.
* gcc.target/arm/builtin_saddll.c: New testcase.
* gcc.target/arm/builtin_uaddl.c: New testcase.
* gcc.target/arm/builtin_uaddll.c: New testcase.
* gcc.target/arm/builtin_ssubl.c: New testcase.
* gcc.target/arm/builtin_ssubll.c: New testcase.
* gcc.target/arm/builtin_usubl.c: New testcase.
* gcc.target/arm/builtin_usubll.c: New testcase.

On 02/29/2016 04:13 AM, Kyrill Tkachov wrote:


On 26/02/16 10:32, Michael Collison wrote:



On 02/25/2016 02:51 AM, Kyrill Tkachov wrote:

Hi Michael,

On 24/02/16 23:02, Michael Collison wrote:

This patch adds support for builtin overflow of add, subtract and negate. This 
patch is targeted for gcc 7 stage 1. It was tested with no regressions in arm 
and thumb modes on the following targets:

arm-non-linux-gnueabi
arm-non-linux-gnuabihf
armeb-none-linux-gnuabihf
arm-non-eabi



I'll have a deeper look once we're closer to GCC 7 development.
I've got a few comments in the meantime.


2016-02-24 Michael Collison 

* config/arm/arm-modes.def: Add new condition code mode CC_V
to represent the overflow bit.
* config/arm/arm.c (maybe_get_arm_condition_code):
Add support for CC_Vmode.
* config/arm/arm.md (addv4, add3_compareV,
addsi3_compareV_upper): New patterns to support signed
builtin overflow add operations.
(uaddv4, add3_compareC, addsi3_compareV_upper):
New patterns to support unsigned builtin add overflow operations.
(subv4, sub3_compare1): New patterns to support signed
builtin overflow subtract operations,
(usubv4): New patterns to support unsigned builtin subtract
overflow operations.
(negvsi3, negvdi3, negdi2_compre, negsi2_carryin_compare): New patterns
to support builtin overflow negate operations.




Can you please summarise what sequences are generated for these operations, and 
how
they are better than the default fallback sequences.


Sure for a simple test case such as:

int
fn3 (int x, int y, int *ovf)
{
  int res;
  *ovf = __builtin_sadd_overflow (x, y, &res);
  return res;
}

Current trunk at -O2 generates

fn3:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
cmp r1, #0
mov r3, #0
add r1, r0, r1
blt .L4
cmp r1, r0
blt .L3
.L2:
str r3, [r2]
mov r0, r1
bx  lr
.L4:
cmp r1, r0
ble .L2
.L3:
mov r3, #1
b   .L2

With the overflow patch this now generates:

   addsr0, r0, r1
   movvs   r3, #1
   movvc   r3, #0
   str r3, [r2]
   bx  lr



Thanks! That looks much better.


Also, we'd need tests for each of these overflow operations, since these are 
pretty complex
patterns that are being added.


The patterns are tested now most notably by tests in:

c-c++-common/torture/builtin-arith-overflow*.c

I had a few failures I resolved so the builtin overflow arithmetic functions 
are definitely being exercised.


Great, that gives me more confidence on the correctness aspects but...



Also, you may want to consider splitting this into a patch series, each adding 
a single
overflow operation, together with its tests. That way it will be easier to keep 
track of
which pattern applies to which use case and they can go in independently of 
each other.


Let me know if you still fell the same way given the existing test cases.



... I'd like us to still have scan-assembler tests. The torture tests exercise 
the correctness,
but we'd want tests to catch regressions where we stop generating the new 
patterns due to other
optimisation changes, which would lead to code quality regressions.
So I'd like us to have scan-assembler tests for these sequences to make sure we 
generate the right
instructions.

Thanks,

Re: [PATCH] Fix PR tree-optimization/71170

2016-05-24 Thread Jakub Jelinek
On Tue, May 24, 2016 at 06:46:49PM +1000, Kugan Vivekanandarajah wrote:
> 2016-05-24  Kugan Vivekanandarajah  
> 
> * tree-ssa-reassoc.c (sort_by_operand_rank): Check fgimple_bb for NULL.

s/fgimple/gimple/ ?

> --- a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-44.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-44.c
> @@ -0,0 +1,10 @@
> +
> +/* { dg-do compile } */

Why the empty line above?  Either stick there a PR number if one is filed,
or leave it out.

> +/* { dg-options "-O2" } */
> +
> +unsigned int a;
> +int b, c;
> +void fn1 ()
> +{
> +  b = a + c + c;
> +}
> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
> index fb683ad..06f4d1b 100644
> --- a/gcc/tree-ssa-reassoc.c
> +++ b/gcc/tree-ssa-reassoc.c
> @@ -525,7 +525,7 @@ sort_by_operand_rank (const void *pa, const void *pb)
> gimple *stmtb = SSA_NAME_DEF_STMT (oeb->op);
> basic_block bba = gimple_bb (stmta);
> basic_block bbb = gimple_bb (stmtb);
> -   if (bbb != bba)
> +   if (bba && bbb && bbb != bba)
>   {
> if (bb_rank[bbb->index] != bb_rank[bba->index])
>   return bb_rank[bbb->index] - bb_rank[bba->index];

Can bb_rank be ever the same for bbb != bba?  If yes, perhaps it would be
better to fallthrough into the reassoc_stmt_dominates_stmt_p testing
code, if not, perhaps just assert that it is different and just
return the difference unconditionally?

Jakub


Re: [PATCH] Improve TBAA with unions

2016-05-24 Thread Richard Biener
On Wed, 18 May 2016, Richard Biener wrote:

> 
> The following adjusts get_alias_set beahvior when applied to
> union accesses to use the union alias-set rather than alias-set
> zero.  This is in line with behavior from the alias oracle
> which (bogously) circumvents alias-set zero with looking at
> the alias-sets of the base object.  Thus for
> 
> union U { int i; float f; };
> 
> float
> foo (union U *u, double *p)
> {
>   u->f = 1.;
>   *p = 0;
>   return u->f;
> }
> 
> the langhooks ensured u->f has alias-set zero and thus disambiguation
> against *p was not allowed.  Still the alias-oracle did the disambiguation
> by using the alias set of the union here (I think optimizing the
> return to return 1. is valid).
> 
> We have a good place in the middle-end to apply such rules which
> is component_uses_parent_alias_set_from - this is where I move
> the logic that is duplicated in various frontends.
> 
> The Java and Ada frontends do not allow union type punning (LTO does),
> so this patch may eventually pessimize them.  I don't care anything
> about Java but Ada folks might want to chime in.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> Ok for trunk?

Ping.

Thanks,
Richard.

> Thanks,
> Richard.
> 
> 2016-05-18  Richard Biener  
> 
>   * alias.c (component_uses_parent_alias_set_from): Handle
>   type punning through union accesses by using the union alias set.
>   * gimple.c (gimple_get_alias_set): Remove union type punning case.
> 
>   c-family/
>   * c-common.c (c_common_get_alias_set): Remove union type punning case.
>   
>   fortran/
>   * f95-lang.c (LANG_HOOKS_GET_ALIAS_SET): Remove (un-)define.
>   (gfc_get_alias_set): Remove.
> 
>   
> Index: trunk/gcc/alias.c
> ===
> *** trunk.orig/gcc/alias.c2016-05-18 11:15:41.744792403 +0200
> --- trunk/gcc/alias.c 2016-05-18 11:31:40.139709782 +0200
> *** component_uses_parent_alias_set_from (co
> *** 619,624 
> --- 619,632 
>   case COMPONENT_REF:
> if (DECL_NONADDRESSABLE_P (TREE_OPERAND (t, 1)))
>   found = t;
> +   /* Permit type-punning when accessing a union, provided the access
> +  is directly through the union.  For example, this code does not
> +  permit taking the address of a union member and then storing
> +  through it.  Even the type-punning allowed here is a GCC
> +  extension, albeit a common and useful one; the C standard says
> +  that such accesses have implementation-defined behavior.  */
> +   else if (TREE_CODE (TREE_TYPE (TREE_OPERAND (t, 0))) == UNION_TYPE)
> + found = t;
> break;
>   
>   case ARRAY_REF:
> Index: trunk/gcc/c-family/c-common.c
> ===
> *** trunk.orig/gcc/c-family/c-common.c2016-05-18 11:15:41.744792403 
> +0200
> --- trunk/gcc/c-family/c-common.c 2016-05-18 11:31:40.143709828 +0200
> *** static GTY(()) hash_table
> *** 4734,4741 
>   alias_set_type
>   c_common_get_alias_set (tree t)
>   {
> -   tree u;
> - 
> /* For VLAs, use the alias set of the element type rather than the
>default of alias set 0 for types compared structurally.  */
> if (TYPE_P (t) && TYPE_STRUCTURAL_EQUALITY_P (t))
> --- 4734,4739 
> *** c_common_get_alias_set (tree t)
> *** 4745,4763 
> return -1;
>   }
>   
> -   /* Permit type-punning when accessing a union, provided the access
> -  is directly through the union.  For example, this code does not
> -  permit taking the address of a union member and then storing
> -  through it.  Even the type-punning allowed here is a GCC
> -  extension, albeit a common and useful one; the C standard says
> -  that such accesses have implementation-defined behavior.  */
> -   for (u = t;
> -TREE_CODE (u) == COMPONENT_REF || TREE_CODE (u) == ARRAY_REF;
> -u = TREE_OPERAND (u, 0))
> - if (TREE_CODE (u) == COMPONENT_REF
> - && TREE_CODE (TREE_TYPE (TREE_OPERAND (u, 0))) == UNION_TYPE)
> -   return 0;
> - 
> /* That's all the expressions we handle specially.  */
> if (!TYPE_P (t))
>   return -1;
> --- 4743,4748 
> Index: trunk/gcc/fortran/f95-lang.c
> ===
> *** trunk.orig/gcc/fortran/f95-lang.c 2016-05-18 11:15:41.744792403 +0200
> --- trunk/gcc/fortran/f95-lang.c  2016-05-18 11:31:48.623806334 +0200
> *** static bool global_bindings_p (void);
> *** 74,80 
>   static bool gfc_init (void);
>   static void gfc_finish (void);
>   static void gfc_be_parse_file (void);
> - static alias_set_type gfc_get_alias_set (tree);
>   static void gfc_init_ts (void);
>   static tree gfc_builtin_function (tree);
>   
> --- 74,79 
> *** static const struct attribute_spec gfc_a
> *** 110,116 
>

Re: [PATCH AArch64]Support missing vcond pattern by adding/using vec_cmp/vcond_mask patterns.

2016-05-24 Thread Bin.Cheng
Ping.

Thanks,
bin

On Tue, May 17, 2016 at 10:02 AM, Bin Cheng  wrote:
> Hi,
> Alan and Renlin noticed that some vcond patterns are not supported in 
> AArch64(or AArch32?) backend, and they both had some patches fixing this.  
> After investigation, I agree with them that vcond/vcondu in AArch64's backend 
> should be re-implemented using vec_cmp/vcond_mask patterns, so here comes 
> this patch which is based on Alan's.  This patch supports all vcond/vcondu 
> patterns by implementing/using vec_cmp and vcond_mask patterns.  Different to 
> the original patch, it doesn't change GCC's expanding process, and it keeps 
> vcond patterns.  The patch also introduces vec_cmp*_internal to support 
> special case optimization for vcond/vcondu which current implementation does.
> Apart from Alan's patch, I also learned ideas from Renlin's, and it is my 
> change that shall be blamed if any potential bug is introduced.
>
> With this patch, GCC's test condition "vect_cond_mixed" can be enabled on 
> AArch64 (in a following patch).
> Bootstrap and test on AArch64.  Is it OK?  BTW, this patch is necessary for 
> gcc.dg/vect/PR56541.c (on AArch64) which was added before in tree 
> if-conversion patch.
>
> Thanks,
> bin
>
> 2016-05-11  Alan Lawrence  
> Renlin Li  
> Bin Cheng  
>
> * config/aarch64/iterators.md (V_cmp_mixed, v_cmp_mixed): New.
> * config/aarch64/aarch64-simd.md (v2di3): Call
> gen_vcondv2div2di instead of gen_aarch64_vcond_internalv2div2di.
> (aarch64_vcond_internal): Delete pattern.
> (aarch64_vcond_internal): Ditto.
> (vcond_mask_): New pattern.
> (vec_cmp_internal, vec_cmp): New pattern.
> (vec_cmp_internal): New pattern.
> (vec_cmp, vec_cmpu): New pattern.
> (vcond): Re-implement using vec_cmp and vcond_mask.
> (vcondu): Ditto.
> (vcond): Delete.
> (vcond): New pattern.
> (vcondu): New pattern.
> (aarch64_cmtst): Revise comment using aarch64_vcond instead
> of aarch64_vcond_internal.
>
> gcc/testsuite/ChangeLog
> 2016-05-11  Bin Cheng  
>
> * gcc.target/aarch64/vect-vcond.c: New test.


Re: [PATCH 2/3] Add profiling support for IVOPTS

2016-05-24 Thread Bin.Cheng
On Thu, May 19, 2016 at 11:28 AM, Martin Liška  wrote:
> On 05/17/2016 12:27 AM, Bin.Cheng wrote:
>>> As profile-guided optimization can provide very useful information
>>> about basic block frequencies within a loop, following patch set leverages
>>> that information. It speeds up a single benchmark from upcoming SPECv6
>>> suite by 20% (-O2 -profile-generate/-fprofile use) and I think it can
>>> also improve others (currently measuring numbers for PGO).
>> Hi,
>> Is this 20% improvement from this patch, or does it include the
>> existing PGO's improvement?
>
> Hello.
>
> It shows that current trunk (compared to GCC 6 branch)
> has significantly improved the benchmark with PGO.
> Currently, my patch improves PGO by ~5% w/ -O2, but our plan is to
> improve static profile that would utilize the patch.
>
>>
>> For the patch:
>>> +
>>> +  /* Return true if the frequency has a valid value.  */
>>> +  bool has_frequency ();
>>> +
>>>/* Return infinite comp_cost.  */
>>>static comp_cost get_infinite ();
>>>
>>> @@ -249,6 +272,9 @@ private:
>>>   complexity field should be larger for more
>>>   complex expressions and addressing modes).  */
>>>int m_scratch;  /* Scratch used during cost computation.  */
>>> +  sreal m_frequency;  /* Frequency of the basic block this comp_cost
>>> + belongs to.  */
>>> +  sreal m_cost_scaled;  /* Scalled runtime cost.  */
>> IMHO we shouldn't embed frequency in comp_cost, neither record scaled
>> cost in it.  I would suggest we compute cost and amortize the cost
>> over frequency in get_computation_cost_at before storing it into
>> comp_cost.  That is, once cost is computed/stored in comp_cost, it is
>> already scaled with frequency.  One argument is frequency info is only
>> valid for use's statement/basic_block, it really doesn't have clear
>> meaning in comp_cost structure.  Outside of function
>> get_computation_cost_at, I found it's hard to understand/remember
>> what's the meaning of comp_cost.m_frequency and where it came from.
>> There are other reasons embedded in below comments.
>>>
>>>
>>>  comp_cost&
>>> @@ -257,6 +283,8 @@ comp_cost::operator= (const comp_cost& other)
>>>m_cost = other.m_cost;
>>>m_complexity = other.m_complexity;
>>>m_scratch = other.m_scratch;
>>> +  m_frequency = other.m_frequency;
>>> +  m_cost_scaled = other.m_cost_scaled;
>>>
>>>return *this;
>>>  }
>>> @@ -275,6 +303,7 @@ operator+ (comp_cost cost1, comp_cost cost2)
>>>
>>>cost1.m_cost += cost2.m_cost;
>>>cost1.m_complexity += cost2.m_complexity;
>>> +  cost1.m_cost_scaled += cost2.m_cost_scaled;
>>>
>>>return cost1;
>>>  }
>>> @@ -290,6 +319,8 @@ comp_cost
>>>  comp_cost::operator+= (HOST_WIDE_INT c)
>> This and below operators need check for infinite cost first and return
>> immediately.
>>>  {
>>>this->m_cost += c;
>>> +  if (has_frequency ())
>>> +this->m_cost_scaled += scale_cost (c);
>>>
>>>return *this;
>>>  }
>>> @@ -5047,18 +5128,21 @@ get_computation_cost_at (struct ivopts_data *data,
>>>   (symbol/var1/const parts may be omitted).  If we are looking for an
>>>   address, find the cost of addressing this.  */
>>>if (address_p)
>>> -return cost + get_address_cost (symbol_present, var_present,
>>> -offset, ratio, cstepi,
>>> -mem_mode,
>>> -TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>>> -speed, stmt_is_after_inc, can_autoinc);
>>> +{
>>> +  cost += get_address_cost (symbol_present, var_present,
>>> + offset, ratio, cstepi,
>>> + mem_mode,
>>> + TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>>> + speed, stmt_is_after_inc, can_autoinc);
>>> +  goto ret;
>>> +}
>>>
>>>/* Otherwise estimate the costs for computing the expression.  */
>>>if (!symbol_present && !var_present && !offset)
>>>  {
>>>if (ratio != 1)
>>>   cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
>>> -  return cost;
>>> +  goto ret;
>>>  }
>>>
>>>/* Symbol + offset should be compile-time computable so consider that 
>>> they
>>> @@ -5077,7 +5161,8 @@ get_computation_cost_at (struct ivopts_data *data,
>>>aratio = ratio > 0 ? ratio : -ratio;
>>>if (aratio != 1)
>>>  cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
>>> -  return cost;
>>> +
>>> +  goto ret;
>>>
>>>  fallback:
>>>if (can_autoinc)
>>> @@ -5093,8 +5178,13 @@ fallback:
>>>  if (address_p)
>>>comp = build_simple_mem_ref (comp);
>>>
>>> -return comp_cost (computation_cost (comp, speed), 0);
>>> +cost = comp_cost (computation_cost (comp, speed), 0);
>>>}
>>> +
>>> +ret:
>>> +  cost.calculate_scaled_cost (at->bb->frequency,
>>> +  data->current_loop->header->frequency);
>> Here cost consists of two parts.  One is for loop invariant
>> computation, we amortize is against avg_loop_niter and record register
>> pressure (either via invriant variables or invariant expressions) for
>> it;  the other is loop variant part.  For the first part, we should
>> not scaled it using

Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-24 Thread Jan Hubicka
> > -  if (stride)
> > +  if (stride && akind >= GFC_ARRAY_ALLOCATABLE)
> >  rtype = build_range_type (gfc_array_index_type, gfc_index_zero_node,
> >   int_const_binop (MINUS_EXPR, stride,
> >build_int_cst (TREE_TYPE 
> > (stride), 1)));
> > 
> > It does not seem to make sense to build range types for arrays where the
> > permitted value range is often above the upper bound.
> 
> Well, the ME explicitely allows domains with NULL TYPE_MAX_VALUE for this.
> In the above case TYPE_MIN_VALUE is zero so you can omit the domain but
> I believe that usually the FE communicates a lower bound of one to the ME.

I will give it a try to pass NULL here. Makes sense to me.  It seems the FE 
always 
compensates by hand and all arrays in array descriptor starts by 0.
> >if (DECL_P (ref)
> > +  /* Be sure the size of MEM_REF target match.  For example:
> > +
> > +  char buf[10];
> > +  struct foo *str = (struct foo *)&buf;
> > +
> > +  str->trailin_array[2] = 1;
> > +
> > +is valid because BUF allocate enough space.  */
> > +
> > +  && (!size || operand_equal_p (DECL_SIZE (ref), size, 0))
> 
> But it's still an array at struct end.  So I don't see how you
> can validly claim it is not.

It is because the predicate is a bit misnamed. It really check whether array is
possibly trailing array, i.e. we can not rely on the fact that all accesses are
within the specified domain.  The test I updated that looks for DECL simply 
assumes
that declarations can not be accessed past their end.
It would make more sense to use object size machinery here somehow.
(i.e. even in fortran we have accesses to mallocated buffers of constant size).
But this probably could be better handled at niter side where we can also deal 
with
case of real trailing arrays of known size.

Honza
> 
> Richard.
> 
> >&& !(flag_unconstrained_commons
> >&& TREE_CODE (ref) == VAR_DECL && DECL_COMMON (ref)))
> >  return false;
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


[PATCH] Fix PR71254

2016-05-24 Thread Richard Biener

The following testsuite regression on the gcc 5 branch exposes a latent
issue on aarch64.  Fixed by compiling the testcase only on the arch
the fix was backported for.

Tested w/ aarch64 cross and x86_64, committed to branch.

Richard.

2016-05-24  Richard Biener  

PR testsuite/71254
* gcc.dg/simd-7.c: Compile on x86_64 and i?86 only.

Index: gcc/testsuite/gcc.dg/simd-7.c
===
*** gcc/testsuite/gcc.dg/simd-7.c   (revision 236632)
--- gcc/testsuite/gcc.dg/simd-7.c   (working copy)
***
*** 1,4 
! /* { dg-do compile } */
  /* { dg-options "-w -Wno-psabi" } */
  
  #if __SIZEOF_LONG_DOUBLE__ == 16 || __SIZEOF_LONG_DOUBLE__ == 8
--- 1,4 
! /* { dg-do compile { target x86_64-*-* i?86-*-* } } */
  /* { dg-options "-w -Wno-psabi" } */
  
  #if __SIZEOF_LONG_DOUBLE__ == 16 || __SIZEOF_LONG_DOUBLE__ == 8


Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-24 Thread Richard Biener
On Tue, 24 May 2016, Jan Hubicka wrote:

> > > -  if (stride)
> > > +  if (stride && akind >= GFC_ARRAY_ALLOCATABLE)
> > >  rtype = build_range_type (gfc_array_index_type, gfc_index_zero_node,
> > >   int_const_binop (MINUS_EXPR, stride,
> > >build_int_cst (TREE_TYPE 
> > > (stride), 1)));
> > > 
> > > It does not seem to make sense to build range types for arrays where the
> > > permitted value range is often above the upper bound.
> > 
> > Well, the ME explicitely allows domains with NULL TYPE_MAX_VALUE for this.
> > In the above case TYPE_MIN_VALUE is zero so you can omit the domain but
> > I believe that usually the FE communicates a lower bound of one to the ME.
> 
> I will give it a try to pass NULL here. Makes sense to me.  It seems the FE 
> always 
> compensates by hand and all arrays in array descriptor starts by 0.
> > >if (DECL_P (ref)
> > > +  /* Be sure the size of MEM_REF target match.  For example:
> > > +
> > > +char buf[10];
> > > +struct foo *str = (struct foo *)&buf;
> > > +
> > > +str->trailin_array[2] = 1;
> > > +
> > > +  is valid because BUF allocate enough space.  */
> > > +
> > > +  && (!size || operand_equal_p (DECL_SIZE (ref), size, 0))
> > 
> > But it's still an array at struct end.  So I don't see how you
> > can validly claim it is not.
> 
> It is because the predicate is a bit misnamed. It really check whether 
> array is possibly trailing array, i.e. we can not rely on the fact that 
> all accesses are within the specified domain.

Ah, yes.  Now I see.

>  The test I updated that looks for DECL simply assumes
> that declarations can not be accessed past their end.
> It would make more sense to use object size machinery here somehow.
> (i.e. even in fortran we have accesses to mallocated buffers of constant 
> size).
> But this probably could be better handled at niter side where we can also 
> deal with
> case of real trailing arrays of known size.

But then I'm not sure that TYPE_SIZE (TREE_TYPE (ref)) == NULL is
handled correctly.  I suppose you can hope for the array to be the
one forcing it NULL and thus its TYPE_DOMAIN max val being NULL ...

Richard.


[PR71252][PATCH] ICE: verify_ssa failed

2016-05-24 Thread Kugan Vivekanandarajah
Hi,

In build_and_add_sum, new stmt is created and inserted (which is the
actual use stmt). Therefore stmt_to_insert has to be inserted after
this is created. This patch moves it after.

I don’t know how I can reduce the Fortran test-case so adding the
test-case from bug report. Any help in reducing the test-case is
appreciated.

Regression testing on x86_64-linux-gnu and bootstrap didn’t find any new issues.

Is this OK for trunk?

Thanks,
Kugan

gcc/testsuite/ChangeLog:

2016-05-24  Kugan Vivekanandarajah  

* gfortran.dg/pr71252.f90: New test.

gcc/ChangeLog:

2016-05-24  Kugan Vivekanandarajah  

* tree-ssa-reassoc.c (rewrite_expr_tree_parallel): Add stmt_to_insert after
build_and_add_sum creates new use stmt.
diff --git a/gcc/testsuite/gfortran.dg/pr71252.f90 
b/gcc/testsuite/gfortran.dg/pr71252.f90
index e69de29..dae210b 100644
--- a/gcc/testsuite/gfortran.dg/pr71252.f90
+++ b/gcc/testsuite/gfortran.dg/pr71252.f90
@@ -0,0 +1,88 @@
+
+! { dg-do compile }
+! { dg-options "-O1 -ffast-math" }
+
+MODULE xc_b97
+  INTEGER, PARAMETER :: dp=8
+  PRIVATE
+  PUBLIC :: b97_lda_info, b97_lsd_info, b97_lda_eval, b97_lsd_eval
+CONTAINS
+  SUBROUTINE b97_lsd_eval(rho_set,deriv_set,grad_deriv,b97_params)
+INTEGER, INTENT(in)  :: grad_deriv
+INTEGER  :: handle, npoints, param, stat
+LOGICAL  :: failure
+REAL(kind=dp):: epsilon_drho, epsilon_rho, &
+scale_c, scale_x
+REAL(kind=dp), DIMENSION(:, :, :), POINTER :: dummy, e_0, e_ndra, &
+  e_ndra_ndra, e_ndra_ndrb, e_ndra_ra, e_ndra_rb, e_ndrb, e_ndrb_ndrb, &
+  e_ndrb_ra, e_ndrb_rb, e_ra, e_ra_ra, e_ra_rb, e_rb, e_rb_rb, &
+  norm_drhoa, norm_drhob, rhoa, rhob
+IF (.NOT. failure) THEN
+   CALL b97_lsd_calc(&
+rhoa=rhoa, rhob=rhob, norm_drhoa=norm_drhoa,&
+norm_drhob=norm_drhob, e_0=e_0, &
+e_ra=e_ra, e_rb=e_rb, &
+e_ndra=e_ndra, e_ndrb=e_ndrb, &
+e_ra_ra=e_ra_ra, e_ra_rb=e_ra_rb, e_rb_rb=e_rb_rb,&
+e_ra_ndra=e_ndra_ra, e_ra_ndrb=e_ndrb_ra, &
+e_rb_ndrb=e_ndrb_rb, e_rb_ndra=e_ndra_rb,&
+e_ndra_ndra=e_ndra_ndra, e_ndrb_ndrb=e_ndrb_ndrb,&
+e_ndra_ndrb=e_ndra_ndrb,&
+grad_deriv=grad_deriv, npoints=npoints, &
+epsilon_rho=epsilon_rho,epsilon_drho=epsilon_drho,&
+param=param,scale_c_in=scale_c,scale_x_in=scale_x)
+END IF
+  END SUBROUTINE b97_lsd_eval
+  SUBROUTINE b97_lsd_calc(rhoa, rhob, norm_drhoa, norm_drhob,&
+   e_0, e_ra, e_rb, e_ndra, e_ndrb, &
+   e_ra_ndra,e_ra_ndrb, e_rb_ndra, e_rb_ndrb,&
+   e_ndra_ndra, e_ndrb_ndrb, e_ndra_ndrb, &
+   e_ra_ra, e_ra_rb, e_rb_rb,&
+   grad_deriv,npoints,epsilon_rho,epsilon_drho, &
+   param, scale_c_in, scale_x_in)
+REAL(kind=dp), DIMENSION(*), INTENT(in)  :: rhoa, rhob, norm_drhoa, &
+norm_drhob
+REAL(kind=dp), DIMENSION(*), INTENT(inout) :: e_0, e_ra, e_rb, e_ndra, &
+  e_ndrb, e_ra_ndra, e_ra_ndrb, e_rb_ndra, e_rb_ndrb, e_ndra_ndra, &
+  e_ndrb_ndrb, e_ndra_ndrb, e_ra_ra, e_ra_rb, e_rb_rb
+INTEGER, INTENT(in)  :: grad_deriv, npoints
+REAL(kind=dp), INTENT(in):: epsilon_rho, epsilon_drho
+INTEGER, INTENT(in)  :: param
+REAL(kind=dp), INTENT(in):: scale_c_in, scale_x_in
+REAL(kind=dp) :: A_1, A_2, A_3, alpha_1_1, alpha_1_2, alpha_1_3, alpha_c, &
+  t133, t134, t1341, t1348, t1351, t1360, t1368, t138, t1388, t139, &
+  u_x_bnorm_drhobnorm_drhob, u_x_brhob, u_x_brhobnorm_drhob, u_x_brhobrhob
+SELECT CASE(grad_deriv)
+CASE default
+   DO ii=1,npoints
+  IF (rho>epsilon_rho) THEN
+ IF (grad_deriv/=0) THEN
+IF (grad_deriv>1 .OR. grad_deriv<-1) THEN
+   alpha_c1rhob = alpha_crhob
+   f1rhob = frhob
+   t1360 = -0.4e1_dp * t105 * t290 * chirhobrhob + (-0.2e1_dp 
* t239 &
+* t257 + t709 * t1236 * t711 * t62 / 0.2e1_dp - 
e_c_u_0rhobrhob) * f&
+* t108 + t438 * f1rhob * t108 + 0.4e1_dp * t439 * t443 
+ t1341 * &
+0.4e1_dp * t1348 * t443 + 0.4e1_dp * t1351 * t443 + 
0.12e2_dp * t113&
+* t107 * t1299 + 0.4e1_dp * t113 * t289 * chirhobrhob
+   IF (grad_deriv>1 .OR. grad_deriv==-2) THEN
+   exc_rhob_rhob = scale_x * (-t4 * t6 / t1152 * gx_b / &
+0.6e1_dp + e_lsda_x_brhob * (u_x_b1rhob * t31 + 
u_x_b * u_x_b1rhob *&
+u_x_brhobrhob * c_x_2)) + scale_c * 
(((e_c_u_0rhobrhob + (0.2e1_dp *&
+t726 * t1270 * t278 - t266 * (-t731 * t1205 / 
0.4e1_dp + t267 * &
+t1205 * t647) * t278

Re: [PR71252][PATCH] ICE: verify_ssa failed

2016-05-24 Thread Richard Biener
On Tue, May 24, 2016 at 12:38 PM, Kugan Vivekanandarajah
 wrote:
> Hi,
>
> In build_and_add_sum, new stmt is created and inserted (which is the
> actual use stmt). Therefore stmt_to_insert has to be inserted after
> this is created. This patch moves it after.
>
> I don’t know how I can reduce the Fortran test-case so adding the
> test-case from bug report. Any help in reducing the test-case is
> appreciated.
>
> Regression testing on x86_64-linux-gnu and bootstrap didn’t find any new 
> issues.
>
> Is this OK for trunk?

Ok.

Richard.

> Thanks,
> Kugan
>
> gcc/testsuite/ChangeLog:
>
> 2016-05-24  Kugan Vivekanandarajah  
>
> * gfortran.dg/pr71252.f90: New test.
>
> gcc/ChangeLog:
>
> 2016-05-24  Kugan Vivekanandarajah  
>
> * tree-ssa-reassoc.c (rewrite_expr_tree_parallel): Add stmt_to_insert 
> after
> build_and_add_sum creates new use stmt.


[PATCH] Fix PR71253

2016-05-24 Thread Richard Biener

I am currently testing the following patch that makes the control 
dependences data structures survive edge redirection when the
pass knows it doesn't alter control dependences.

It basically replaces the edge list with a vector of BB indices
(so even removing BBs and still querying the control dependences
is possible if you handle NULL edge src/dests).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-05-24  Richard Biener  

PR tree-optimization/71253
* cfganal.h (control_dependences): Make robust against edge
and BB removal.
(control_dependences::control_dependences): Remove edge_list argument.
(control_dependences::get_edge): Remove.
(control_dependences::get_edge_src): Add.
(control_dependences::get_edge_dest): Likewise.
(control_dependences::m_el): Make a vector of edge src/dest index.
* cfganal.c (control_dependences::find_control_dependence): Adjust.
(control_dependences::control_dependences): Likewise.
(control_dependences::~control_dependence): Likewise.
(control_dependences::get_edge): Remove.
(control_dependences::get_edge_src): Add.
(control_dependences::get_edge_dest): Likewise.
* tree-ssa-dce.c (mark_control_dependent_edges_necessary): Use
get_edge_src.
(perform_tree_ssa_dce): Adjust.
* tree-loop-distribution.c (create_edge_for_control_dependence): Use
get_edge_src.
(pass_loop_distribution::execute): Adjust.  Do loop destroying
conditional on changed.

* gcc.dg/torture/pr71253.c: New testcase.

Index: gcc/cfganal.c
===
*** gcc/cfganal.c   (revision 236630)
--- gcc/cfganal.c   (working copy)
*** control_dependences::find_control_depend
*** 408,450 
basic_block current_block;
basic_block ending_block;
  
!   gcc_assert (INDEX_EDGE_PRED_BB (m_el, edge_index)
! != EXIT_BLOCK_PTR_FOR_FN (cfun));
  
!   if (INDEX_EDGE_PRED_BB (m_el, edge_index) == ENTRY_BLOCK_PTR_FOR_FN (cfun))
  ending_block = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
else
! ending_block = find_pdom (INDEX_EDGE_PRED_BB (m_el, edge_index));
  
!   for (current_block = INDEX_EDGE_SUCC_BB (m_el, edge_index);
 current_block != ending_block
 && current_block != EXIT_BLOCK_PTR_FOR_FN (cfun);
 current_block = find_pdom (current_block))
! {
!   edge e = INDEX_EDGE (m_el, edge_index);
! 
!   /* For abnormal edges, we don't make current_block control
!dependent because instructions that throw are always necessary
!anyway.  */
!   if (e->flags & EDGE_ABNORMAL)
!   continue;
! 
!   set_control_dependence_map_bit (current_block, edge_index);
! }
  }
  
  /* Record all blocks' control dependences on all edges in the edge
 list EL, ala Morgan, Section 3.6.  */
  
! control_dependences::control_dependences (struct edge_list *edges)
!   : m_el (edges)
  {
timevar_push (TV_CONTROL_DEPENDENCES);
control_dependence_map.create (last_basic_block_for_fn (cfun));
for (int i = 0; i < last_basic_block_for_fn (cfun); ++i)
  control_dependence_map.quick_push (BITMAP_ALLOC (NULL));
!   for (int i = 0; i < NUM_EDGES (m_el); ++i)
  find_control_dependence (i);
timevar_pop (TV_CONTROL_DEPENDENCES);
  }
  
--- 408,461 
basic_block current_block;
basic_block ending_block;
  
!   gcc_assert (get_edge_src (edge_index) != EXIT_BLOCK_PTR_FOR_FN (cfun));
  
!   /* For abnormal edges, we don't make current_block control
!  dependent because instructions that throw are always necessary
!  anyway.  */
!   edge e = find_edge (get_edge_src (edge_index), get_edge_dest (edge_index));
!   if (e->flags & EDGE_ABNORMAL)
! return;
! 
!   if (get_edge_src (edge_index) == ENTRY_BLOCK_PTR_FOR_FN (cfun))
  ending_block = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
else
! ending_block = find_pdom (get_edge_src (edge_index));
  
!   for (current_block = get_edge_dest (edge_index);
 current_block != ending_block
 && current_block != EXIT_BLOCK_PTR_FOR_FN (cfun);
 current_block = find_pdom (current_block))
! set_control_dependence_map_bit (current_block, edge_index);
  }
  
  /* Record all blocks' control dependences on all edges in the edge
 list EL, ala Morgan, Section 3.6.  */
  
! control_dependences::control_dependences ()
  {
timevar_push (TV_CONTROL_DEPENDENCES);
+ 
+   /* Initialize the edge list.  */
+   int num_edges = 0;
+   basic_block bb;
+   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun),
+ EXIT_BLOCK_PTR_FOR_FN (cfun), next_bb)
+ num_edges += EDGE_COUNT (bb->succs);
+   m_el.create (num_edges);
+   edge e;
+   edge_iterator ei;
+   FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun),
+ EXIT_BLOCK_PTR_FOR_FN (cfun), next_bb)
+ FOR_EACH_EDGE (e, ei

Re: [PR71252][PATCH] ICE: verify_ssa failed

2016-05-24 Thread Jakub Jelinek
On Tue, May 24, 2016 at 08:38:29PM +1000, Kugan Vivekanandarajah wrote:
> Hi,
> 
> In build_and_add_sum, new stmt is created and inserted (which is the
> actual use stmt). Therefore stmt_to_insert has to be inserted after
> this is created. This patch moves it after.
> 
> I don’t know how I can reduce the Fortran test-case so adding the
> test-case from bug report. Any help in reducing the test-case is
> appreciated.
> 
> Regression testing on x86_64-linux-gnu and bootstrap didn’t find any new 
> issues.
> 
> Is this OK for trunk?
> 
> Thanks,
> Kugan
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-05-24  Kugan Vivekanandarajah  
> 
> * gfortran.dg/pr71252.f90: New test.
> 
> gcc/ChangeLog:
> 
> 2016-05-24  Kugan Vivekanandarajah  
> 

Please add
PR tree-optimization/71252
to both ChangeLog entries and
! PR tree-optimization/71252
to the first line in the testcase.

> * tree-ssa-reassoc.c (rewrite_expr_tree_parallel): Add stmt_to_insert 
> after
> build_and_add_sum creates new use stmt.

> diff --git a/gcc/testsuite/gfortran.dg/pr71252.f90 
> b/gcc/testsuite/gfortran.dg/pr71252.f90
> index e69de29..dae210b 100644
> --- a/gcc/testsuite/gfortran.dg/pr71252.f90
> +++ b/gcc/testsuite/gfortran.dg/pr71252.f90
> @@ -0,0 +1,88 @@
> +
> +! { dg-do compile }
> +! { dg-options "-O1 -ffast-math" }
> +
> +MODULE xc_b97
> +  INTEGER, PARAMETER :: dp=8
> +  PRIVATE
> +  PUBLIC :: b97_lda_info, b97_lsd_info, b97_lda_eval, b97_lsd_eval
> +CONTAINS
> +  SUBROUTINE b97_lsd_eval(rho_set,deriv_set,grad_deriv,b97_params)
> +INTEGER, INTENT(in)  :: grad_deriv
> +INTEGER  :: handle, npoints, param, stat
> +LOGICAL  :: failure
> +REAL(kind=dp):: epsilon_drho, epsilon_rho, &
> +scale_c, scale_x
> +REAL(kind=dp), DIMENSION(:, :, :), POINTER :: dummy, e_0, e_ndra, &
> +  e_ndra_ndra, e_ndra_ndrb, e_ndra_ra, e_ndra_rb, e_ndrb, e_ndrb_ndrb, &
> +  e_ndrb_ra, e_ndrb_rb, e_ra, e_ra_ra, e_ra_rb, e_rb, e_rb_rb, &
> +  norm_drhoa, norm_drhob, rhoa, rhob
> +IF (.NOT. failure) THEN
> +   CALL b97_lsd_calc(&
> +rhoa=rhoa, rhob=rhob, norm_drhoa=norm_drhoa,&
> +norm_drhob=norm_drhob, e_0=e_0, &
> +e_ra=e_ra, e_rb=e_rb, &
> +e_ndra=e_ndra, e_ndrb=e_ndrb, &
> +e_ra_ra=e_ra_ra, e_ra_rb=e_ra_rb, e_rb_rb=e_rb_rb,&
> +e_ra_ndra=e_ndra_ra, e_ra_ndrb=e_ndrb_ra, &
> +e_rb_ndrb=e_ndrb_rb, e_rb_ndra=e_ndra_rb,&
> +e_ndra_ndra=e_ndra_ndra, e_ndrb_ndrb=e_ndrb_ndrb,&
> +e_ndra_ndrb=e_ndra_ndrb,&
> +grad_deriv=grad_deriv, npoints=npoints, &
> +epsilon_rho=epsilon_rho,epsilon_drho=epsilon_drho,&
> +param=param,scale_c_in=scale_c,scale_x_in=scale_x)
> +END IF
> +  END SUBROUTINE b97_lsd_eval
> +  SUBROUTINE b97_lsd_calc(rhoa, rhob, norm_drhoa, norm_drhob,&
> +   e_0, e_ra, e_rb, e_ndra, e_ndrb, &
> +   e_ra_ndra,e_ra_ndrb, e_rb_ndra, e_rb_ndrb,&
> +   e_ndra_ndra, e_ndrb_ndrb, e_ndra_ndrb, &
> +   e_ra_ra, e_ra_rb, e_rb_rb,&
> +   grad_deriv,npoints,epsilon_rho,epsilon_drho, &
> +   param, scale_c_in, scale_x_in)
> +REAL(kind=dp), DIMENSION(*), INTENT(in)  :: rhoa, rhob, norm_drhoa, &
> +norm_drhob
> +REAL(kind=dp), DIMENSION(*), INTENT(inout) :: e_0, e_ra, e_rb, e_ndra, &
> +  e_ndrb, e_ra_ndra, e_ra_ndrb, e_rb_ndra, e_rb_ndrb, e_ndra_ndra, &
> +  e_ndrb_ndrb, e_ndra_ndrb, e_ra_ra, e_ra_rb, e_rb_rb
> +INTEGER, INTENT(in)  :: grad_deriv, npoints
> +REAL(kind=dp), INTENT(in):: epsilon_rho, epsilon_drho
> +INTEGER, INTENT(in)  :: param
> +REAL(kind=dp), INTENT(in):: scale_c_in, scale_x_in
> +REAL(kind=dp) :: A_1, A_2, A_3, alpha_1_1, alpha_1_2, alpha_1_3, 
> alpha_c, &
> +  t133, t134, t1341, t1348, t1351, t1360, t1368, t138, t1388, t139, &
> +  u_x_bnorm_drhobnorm_drhob, u_x_brhob, u_x_brhobnorm_drhob, 
> u_x_brhobrhob
> +SELECT CASE(grad_deriv)
> +CASE default
> +   DO ii=1,npoints
> +  IF (rho>epsilon_rho) THEN
> + IF (grad_deriv/=0) THEN
> +IF (grad_deriv>1 .OR. grad_deriv<-1) THEN
> +   alpha_c1rhob = alpha_crhob
> +   f1rhob = frhob
> +   t1360 = -0.4e1_dp * t105 * t290 * chirhobrhob + 
> (-0.2e1_dp * t239 &
> +* t257 + t709 * t1236 * t711 * t62 / 0.2e1_dp - 
> e_c_u_0rhobrhob) * f&
> +* t108 + t438 * f1rhob * t108 + 0.4e1_dp * t439 * 
> t443 + t1341 * &
> +0.4e1_dp * t1348 * t443 + 0.4e1_dp * t1351 * t443 + 
> 0.12e2_dp * t113&
> +* t107 * t1299 + 0.4e1_dp * t113 * t289 * chirhobrhob
> +   IF (grad_deriv>1 .OR. grad

Re: [PATCH v2] Ensure source_date_epoch is always initialised

2016-05-24 Thread Dhole
Hey!

I'm the original author of the SOURCE_DATE_EPOCH patch.

I've just seen this.  I believe that this bug was fixed in the the
rework of the patch I sent some days ago [1], although the latest
version of that patch has not been reviewed yet.  In [1] the
initialization of source_date_epoch is done at init.c
(cpp_create_reader), so now it should be initialized properly even when
just calling the preprocessor.  I tested your example and it gives the
expected output.

Although thinking further, maybe it would be more wise to use "0" as a
default value, to mean "not yet set", so that errors like this are
avoided.  So source_date_epoch could be:
0: not yet set
negative: disabled
positive: use this value as SOURCE_DATE_EPOCH

In such case, SOURCE_DATE_EPOCH would need to be a positive number
instead of a non-negative number.

In my latest patch it's:
-2: no yet set
-1: disabled
non-negative: use use this value SOURCE_DATE_EPOCH


[1] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01026.html

Cheers,
-- 
Dhole


signature.asc
Description: PGP signature


Re: [PATCH v2] Ensure source_date_epoch is always initialised

2016-05-24 Thread James Clarke
Hi,
> On 24 May 2016, at 11:59, Dhole  wrote:
> 
> Hey!
> 
> I'm the original author of the SOURCE_DATE_EPOCH patch.
> 
> I've just seen this.  I believe that this bug was fixed in the the
> rework of the patch I sent some days ago [1], although the latest
> version of that patch has not been reviewed yet.  In [1] the
> initialization of source_date_epoch is done at init.c
> (cpp_create_reader), so now it should be initialized properly even when
> just calling the preprocessor.  I tested your example and it gives the
> expected output.
> 
> Although thinking further, maybe it would be more wise to use "0" as a
> default value, to mean "not yet set", so that errors like this are
> avoided.  So source_date_epoch could be:
> 0: not yet set
> negative: disabled
> positive: use this value as SOURCE_DATE_EPOCH
> 
> In such case, SOURCE_DATE_EPOCH would need to be a positive number
> instead of a non-negative number.

0 *is* a valid SOURCE_DATE_EPOCH, ie Jan  1 1970 00:00:00, and should
definitely be allowed.

I see your patch continues to put some of the code inside c-family? Is
there a reason for doing that instead of keeping it all inside libcpp
like mine, given it’s inherently preprocessor-only? You’ve also merged
all the error paths into one message which is not as helpful.

Regards,
James



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: RFC [1/2] divmod transform

2016-05-24 Thread Prathamesh Kulkarni
On 23 May 2016 at 17:35, Richard Biener  wrote:
> On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
>  wrote:
>> Hi,
>> I have updated my patch for divmod (attached), which was originally
>> based on Kugan's patch.
>> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>> having same operands to divmod representation, so we can cse computation of 
>> mod.
>>
>> t1 = a TRUNC_DIV_EXPR b;
>> t2 = a TRUNC_MOD_EXPR b
>> is transformed to:
>> complex_tmp = DIVMOD (a, b);
>> t1 = REALPART_EXPR (complex_tmp);
>> t2 = IMAGPART_EXPR (complex_tmp);
>>
>> * New hook divmod_expand_libfunc
>> The rationale for introducing the hook is that different targets have
>> incompatible calling conventions for divmod libfunc.
>> Currently three ports define divmod libfunc: c6x, spu and arm.
>> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
>> return quotient and store remainder in argument passed as pointer,
>> while the arm version takes two arguments and returns both
>> quotient and remainder having mode double the size of the operand mode.
>> The port should hence override the hook expand_divmod_libfunc
>> to generate call to target-specific divmod.
>> Ports should define this hook if:
>> a) The port does not have divmod or div insn for the given mode.
>> b) The port defines divmod libfunc for the given mode.
>> The default hook default_expand_divmod_libfunc() generates call
>> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
>> are of DImode.
>>
>> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
>> cross-tested on arm*-*-*.
>> Bootstrap+test in progress on arm-linux-gnueabihf.
>> Does this patch look OK ?
>
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index 6b4601b..e4a021a 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
> machine_mode, optimization_type)
>return true;
>  }
>
> +void
> +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
> +  rtx op0, rtx op1,
> +  rtx *quot_p, rtx *rem_p)
>
> functions need a comment.
>
> ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In that
> case we could avoid the target hook.
Well I would prefer adding the hook because that's more easier -;)
Would it be ok for now to go with the hook ?
>
> +  /* If target overrides expand_divmod_libfunc hook
> +then perform divmod by generating call to the target-specifc divmod
> libfunc.  */
> +  if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
> +   return true;
> +
> +  /* Fall back to using libgcc2.c:__udivmoddi4.  */
> +  return (mode == DImode && unsignedp);
>
> I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
> but still restrict this to DImode && unsigned?  Also if
> targetm.expand_divmod_libfunc
> is not the default we expect the target to handle all modes?
Ah indeed, the check for DImode is unnecessary.
However I suppose the check for unsignedp should be there,
since we want to generate call to __udivmoddi4 only if operand is unsigned ?
>
> That said - I expected the above piece to be simply a 'return true;' ;)
>
> Usually we use some can_expand_XXX helper in optabs.c to query if the target
> supports a specific operation (for example SImode divmod would use DImode
> divmod by means of widening operands - for the unsigned case of course).
Thanks for pointing out. So if a target does not support divmod
libfunc for a mode
but for a wider mode, then we could zero-extend operands to the wider-mode,
perform divmod on the wider-mode, and then cast result back to the
original mode.
I haven't done that in this patch, would it be OK to do that as a follow up ?
>
> +  /* Disable the transform if either is a constant, since
> division-by-constant
> + may have specialized expansion.  */
> +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
> +return false;
>
> please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
>
> +  if (TYPE_OVERFLOW_TRAPS (type))
> +return false;
>
> why's that?  Generally please first test cheap things (trapping, 
> constant-ness)
> before checking expensive stuff (target_supports_divmod_p).
I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:
https://www.mail-archive.com/gcc@gcc.gnu.org/msg78534.html
"When looking at TRUNC_DIV_EXPR you should also exclude
the case where TYPE_OVERFLOW_TRAPS (type) as that should
expand using the [su]divv optabs (no trapping overflow
divmod optab exists)."
>
> +static bool
> +convert_to_divmod (gassign *stmt)
> +{
> +  if (!divmod_candidate_p (stmt))
> +return false;
> +
> +  tree op1 = gimple_assign_rhs1 (stmt);
> +  tree op2 = gimple_assign_rhs2 (stmt);
> +
> +  vec stmts = vNULL;
>
> use an auto_vec  - you currently leak it in at least one place.
>
> +  if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
> +   cfg_changed = true;
>
> note

[C/C++ PATCH] Fix bogus warning with -Wswitch-unreachable (PR c/71249)

2016-05-24 Thread Marek Polacek
Martin S. noticed that cc1plus bogusly warns on the following test.  That's
because I didn't realize that GIMPLE_BINDs might be nested in C++ so we need to
look through them, and only then get the first statement in the seq.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-05-24  Marek Polacek  

PR c/71249
* gimplify.c (gimplify_switch_expr): Look into the innermost lexical
scope.

* c-c++-common/Wswitch-unreachable-2.c: New test.

diff --git gcc/gimplify.c gcc/gimplify.c
index 6473544..5c5e9d6 100644
--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -1605,8 +1605,9 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
  && switch_body_seq != NULL)
{
  gimple_seq seq = switch_body_seq;
- if (gimple_code (switch_body_seq) == GIMPLE_BIND)
-   seq = gimple_bind_body (as_a  (switch_body_seq));
+ /* Look into the innermost lexical scope.  */
+ while (gimple_code (seq) == GIMPLE_BIND)
+   seq = gimple_bind_body (as_a  (seq));
  gimple *stmt = gimple_seq_first_stmt (seq);
  enum gimple_code code = gimple_code (stmt);
  if (code != GIMPLE_LABEL && code != GIMPLE_TRY)
diff --git gcc/testsuite/c-c++-common/Wswitch-unreachable-2.c 
gcc/testsuite/c-c++-common/Wswitch-unreachable-2.c
index e69de29..8f57392 100644
--- gcc/testsuite/c-c++-common/Wswitch-unreachable-2.c
+++ gcc/testsuite/c-c++-common/Wswitch-unreachable-2.c
@@ -0,0 +1,18 @@
+/* PR c/71249 */
+/* { dg-do compile } */
+
+int
+f (int i)
+{
+  switch (i)
+{
+  {
+   int j;
+  foo:
+   return i; /* { dg-bogus "statement will never be executed" } */
+  };
+case 3:
+  goto foo;
+}
+  return i;
+}

Marek


Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-05-24 Thread Wilco Dijkstra
Jim Wilson wrote:
> It looks like a slight lose on qdf24xx on SPEC CPU2006 at -O3.  I see
> about a 0.37% loss on the integer benchmarks, and no significant
> change on the FP benchmarks.  The integer loss is mainly due to
> 458.sjeng which drops 2%.  We had tried various values for
> max_case_values earlier, and didn't see any performance improvement
> from setting it, so we are using the default value.

That's interesting as sjeng shows ~2% gain on Cortex-A72 due to the
hot switches being badly laid out... I wonder whether the loss you see is
due to code alignment or some other secondary effect.

> We've been tracking changes to the FSF tree, and adjust our tuning
> structure as necessary, so I'm not too concerned about this.  We will
> just set the max_case_values field in the tuning structure to get the
> result we want.  What I am slightly concerned about is that the
> max_case_values field is only used at -O3 and above which limits the
> usefulness.  If a port has specified a value, it probably should be
> used for all non-size optimization, which means we should check for
> optimize_size first, then check for a cpu specific value, then use the
> default.  If you do that, then you don't need to change the default to
> get better generic/a53 code, you can change it in the generic and/or
> a53 tuning tables.

Yes it would be better to ensure max_case_values is used at -O2 as well.
But even then you'd want a reasonable default so that you only need to
set it if you want something very different.

> Though I see that the original patch from Samsung that added the
> max_case_values field has the -O3 check, so there was apparently some
> reason why they wanted it to work that way.  The value that the
> exynos-m1 is using, 48, looks pretty large, so maybe they thought that
> the code size expansion from that is only OK at -O3 and above.  Worst
> case, we might need two max_case_value fields, one to use at -O1/-O2,
> and one to use at -O3.

I hope we can improve switch expansion in GCC7 to avoid the worst of the
issues, and then we can revisit these settings.

Wilco



Re: RFC [1/2] divmod transform

2016-05-24 Thread Richard Biener
On Tue, 24 May 2016, Prathamesh Kulkarni wrote:

> On 23 May 2016 at 17:35, Richard Biener  wrote:
> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
> >  wrote:
> >> Hi,
> >> I have updated my patch for divmod (attached), which was originally
> >> based on Kugan's patch.
> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
> >> having same operands to divmod representation, so we can cse computation 
> >> of mod.
> >>
> >> t1 = a TRUNC_DIV_EXPR b;
> >> t2 = a TRUNC_MOD_EXPR b
> >> is transformed to:
> >> complex_tmp = DIVMOD (a, b);
> >> t1 = REALPART_EXPR (complex_tmp);
> >> t2 = IMAGPART_EXPR (complex_tmp);
> >>
> >> * New hook divmod_expand_libfunc
> >> The rationale for introducing the hook is that different targets have
> >> incompatible calling conventions for divmod libfunc.
> >> Currently three ports define divmod libfunc: c6x, spu and arm.
> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
> >> return quotient and store remainder in argument passed as pointer,
> >> while the arm version takes two arguments and returns both
> >> quotient and remainder having mode double the size of the operand mode.
> >> The port should hence override the hook expand_divmod_libfunc
> >> to generate call to target-specific divmod.
> >> Ports should define this hook if:
> >> a) The port does not have divmod or div insn for the given mode.
> >> b) The port defines divmod libfunc for the given mode.
> >> The default hook default_expand_divmod_libfunc() generates call
> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
> >> are of DImode.
> >>
> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
> >> cross-tested on arm*-*-*.
> >> Bootstrap+test in progress on arm-linux-gnueabihf.
> >> Does this patch look OK ?
> >
> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> > index 6b4601b..e4a021a 100644
> > --- a/gcc/targhooks.c
> > +++ b/gcc/targhooks.c
> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
> > machine_mode, optimization_type)
> >return true;
> >  }
> >
> > +void
> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
> > +  rtx op0, rtx op1,
> > +  rtx *quot_p, rtx *rem_p)
> >
> > functions need a comment.
> >
> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In 
> > that
> > case we could avoid the target hook.
> Well I would prefer adding the hook because that's more easier -;)
> Would it be ok for now to go with the hook ?
> >
> > +  /* If target overrides expand_divmod_libfunc hook
> > +then perform divmod by generating call to the target-specifc divmod
> > libfunc.  */
> > +  if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
> > +   return true;
> > +
> > +  /* Fall back to using libgcc2.c:__udivmoddi4.  */
> > +  return (mode == DImode && unsignedp);
> >
> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
> > but still restrict this to DImode && unsigned?  Also if
> > targetm.expand_divmod_libfunc
> > is not the default we expect the target to handle all modes?
> Ah indeed, the check for DImode is unnecessary.
> However I suppose the check for unsignedp should be there,
> since we want to generate call to __udivmoddi4 only if operand is unsigned ?

The optab libfunc for sdivmod should be NULL in that case.

> >
> > That said - I expected the above piece to be simply a 'return true;' ;)
> >
> > Usually we use some can_expand_XXX helper in optabs.c to query if the target
> > supports a specific operation (for example SImode divmod would use DImode
> > divmod by means of widening operands - for the unsigned case of course).
> Thanks for pointing out. So if a target does not support divmod
> libfunc for a mode
> but for a wider mode, then we could zero-extend operands to the wider-mode,
> perform divmod on the wider-mode, and then cast result back to the
> original mode.
> I haven't done that in this patch, would it be OK to do that as a follow up ?

I think that you should conservatively handle the div_optab query, thus if
the target has a HW division in a wider mode don't use the divmod IFN.
You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
out if that is available.

> > +  /* Disable the transform if either is a constant, since
> > division-by-constant
> > + may have specialized expansion.  */
> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
> > +return false;
> >
> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
> >
> > +  if (TYPE_OVERFLOW_TRAPS (type))
> > +return false;
> >
> > why's that?  Generally please first test cheap things (trapping, 
> > constant-ness)
> > before checking expensive stuff (target_supports_divmod_p).
> I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:
> https://www.mail-archive.com/gc

Re: [PATCH] Introduce can_remove_lhs_p

2016-05-24 Thread Richard Biener
On Mon, 23 May 2016, Marek Polacek wrote:

> On Mon, May 23, 2016 at 04:36:30PM +0200, Jakub Jelinek wrote:
> > On Mon, May 23, 2016 at 04:28:33PM +0200, Marek Polacek wrote:
> > > As promised in ,
> > > this is a simple clean-up which makes use of a new predicate.  Richi 
> > > suggested
> > > adding maybe_drop_lhs_from_noreturn_call which would be nicer, but I 
> > > didn't
> > > know how to do that, given the handling if lhs is an SSA_NAME.
> > 
> > Shouldn't it be should_remove_lhs_p instead?
> > I mean, it is not just an optimization, but part of how we define the IL.
>  
> Aha, ok.  Renamed.
> 
> > Shouldn't it be also used in tree-cfg.c (verify_gimple_call)?
> 
> I left that spot on purpose but now I don't quite see why, fixed.  Thanks,
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

Can you move should_remove_lhs_p to tree-cfg.h please?

Ok with that change.

Richard.

> 2016-05-23  Marek Polacek  
> 
>   * tree.h (should_remove_lhs_p): New predicate.
>   * cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Use it.
>   * gimple-fold.c (gimple_fold_call): Likewise.
>   * gimplify.c (gimplify_modify_expr): Likewise.
>   * tree-cfg.c (verify_gimple_call): Likewise.
>   * tree-cfgcleanup.c (fixup_noreturn_call): Likewise.
> 
> diff --git gcc/cgraph.c gcc/cgraph.c
> index cf9192f..1a4f665 100644
> --- gcc/cgraph.c
> +++ gcc/cgraph.c
> @@ -1513,10 +1513,7 @@ cgraph_edge::redirect_call_stmt_to_callee (void)
>  }
>  
>/* If the call becomes noreturn, remove the LHS if possible.  */
> -  if (lhs
> -  && (gimple_call_flags (new_stmt) & ECF_NORETURN)
> -  && TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs))) == INTEGER_CST
> -  && !TREE_ADDRESSABLE (TREE_TYPE (lhs)))
> +  if (gimple_call_noreturn_p (new_stmt) && should_remove_lhs_p (lhs))
>  {
>if (TREE_CODE (lhs) == SSA_NAME)
>   {
> diff --git gcc/gimple-fold.c gcc/gimple-fold.c
> index 858f484..6b50d43 100644
> --- gcc/gimple-fold.c
> +++ gcc/gimple-fold.c
> @@ -3052,12 +3052,9 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool 
> inplace)
> == void_type_node))
>   gimple_call_set_fntype (stmt, TREE_TYPE (fndecl));
> /* If the call becomes noreturn, remove the lhs.  */
> -   if (lhs
> -   && (gimple_call_flags (stmt) & ECF_NORETURN)
> +   if (gimple_call_noreturn_p (stmt)
> && (VOID_TYPE_P (TREE_TYPE (gimple_call_fntype (stmt)))
> -   || ((TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs)))
> -== INTEGER_CST)
> -   && !TREE_ADDRESSABLE (TREE_TYPE (lhs)
> +   || should_remove_lhs_p (lhs)))
>   {
> if (TREE_CODE (lhs) == SSA_NAME)
>   {
> diff --git gcc/gimplify.c gcc/gimplify.c
> index 4a544e3..c77eb51 100644
> --- gcc/gimplify.c
> +++ gcc/gimplify.c
> @@ -4847,9 +4847,7 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p,
>   }
>   }
>notice_special_calls (call_stmt);
> -  if (!gimple_call_noreturn_p (call_stmt)
> -   || TREE_ADDRESSABLE (TREE_TYPE (*to_p))
> -   || TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (*to_p))) != INTEGER_CST)
> +  if (!gimple_call_noreturn_p (call_stmt) || !should_remove_lhs_p 
> (*to_p))
>   gimple_call_set_lhs (call_stmt, *to_p);
>else if (TREE_CODE (*to_p) == SSA_NAME)
>   /* The above is somewhat premature, avoid ICEing later for a
> diff --git gcc/tree-cfg.c gcc/tree-cfg.c
> index 7c2ee78..82f0da6c 100644
> --- gcc/tree-cfg.c
> +++ gcc/tree-cfg.c
> @@ -3385,11 +3385,9 @@ verify_gimple_call (gcall *stmt)
>return true;
>  }
>  
> -  if (lhs
> -  && gimple_call_ctrl_altering_p (stmt)
> +  if (gimple_call_ctrl_altering_p (stmt)
>&& gimple_call_noreturn_p (stmt)
> -  && TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs))) == INTEGER_CST
> -  && !TREE_ADDRESSABLE (TREE_TYPE (lhs)))
> +  && should_remove_lhs_p (lhs))
>  {
>error ("LHS in noreturn call");
>return true;
> diff --git gcc/tree-cfgcleanup.c gcc/tree-cfgcleanup.c
> index 46d0fa3..4134c38 100644
> --- gcc/tree-cfgcleanup.c
> +++ gcc/tree-cfgcleanup.c
> @@ -604,8 +604,7 @@ fixup_noreturn_call (gimple *stmt)
>   temporaries of variable-sized types is not supported.  Also don't
>   do this with TREE_ADDRESSABLE types, as assign_temp will abort.  */
>tree lhs = gimple_call_lhs (stmt);
> -  if (lhs && TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs))) == INTEGER_CST
> -  && !TREE_ADDRESSABLE (TREE_TYPE (lhs)))
> +  if (should_remove_lhs_p (lhs))
>  {
>gimple_call_set_lhs (stmt, NULL_TREE);
>  
> diff --git gcc/tree.h gcc/tree.h
> index 2510d16..1d72437 100644
> --- gcc/tree.h
> +++ gcc/tree.h
> @@ -5471,4 +5471,14 @@ desired_pro_or_demotion_p (const_tree 

Re: [PATCH v2] Ensure source_date_epoch is always initialised

2016-05-24 Thread Dhole
On 16-05-24 12:06:48, James Clarke wrote:
> Hi,
> > On 24 May 2016, at 11:59, Dhole  wrote:
> > 
> > Hey!
> > 
> > I'm the original author of the SOURCE_DATE_EPOCH patch.
> > 
> > I've just seen this.  I believe that this bug was fixed in the the
> > rework of the patch I sent some days ago [1], although the latest
> > version of that patch has not been reviewed yet.  In [1] the
> > initialization of source_date_epoch is done at init.c
> > (cpp_create_reader), so now it should be initialized properly even when
> > just calling the preprocessor.  I tested your example and it gives the
> > expected output.
> > 
> > Although thinking further, maybe it would be more wise to use "0" as a
> > default value, to mean "not yet set", so that errors like this are
> > avoided.  So source_date_epoch could be:
> > 0: not yet set
> > negative: disabled
> > positive: use this value as SOURCE_DATE_EPOCH
> > 
> > In such case, SOURCE_DATE_EPOCH would need to be a positive number
> > instead of a non-negative number.
> 
> 0 *is* a valid SOURCE_DATE_EPOCH, ie Jan  1 1970 00:00:00, and should
> definitely be allowed.

You're right in the sense that 0 is a valid unix epoch.  In my
suggestion I was considering that SOURCE_DATE_EPOCH is used to set the
date the source code was last modified, and I guess no build process
nowadays has code that was last modified in 1970.  But it may be easier
to understand if 0 is left as a valid value.

> I see your patch continues to put some of the code inside c-family? Is
> there a reason for doing that instead of keeping it all inside libcpp
> like mine, given it’s inherently preprocessor-only? You’ve also merged
> all the error paths into one message which is not as helpful.

I merged the error paths into one as suggested in [1].  I'm not that
knowledgable of GCC to give a call on this, so I just followed the
suggestion from Martin.  But it could be reverted if needed.

Regarding having the code inside c-family, I'm following the suggestion
from Joseph [2]:

Joseph Myers wrote:
> Since cpplib is a library and doesn't have any existing getenv calls, I 
> wonder if it would be better for the cpplib client (i.e. something in the 
> gcc/ directory) to be what calls getenv and then informs cpplib of the 
> timestamp it should treat as being the time of compilation.

Jakub also found it reasonable [3]:

Jakub Jelinek wrote:
> Doing this on the gcc/ side is of course reasonable, but can be done through
> callbacks, libcpp already has lots of other callbacks into the gcc/ code,
> look for e.g. cpp_get_callbacks in gcc/c-family/* and in libcpp/ for
> corresponding code.


[1] https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01889.html
[2] https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02270.html
[3] https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01930.html


Cheers,
-- 
Dhole


signature.asc
Description: PGP signature


Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-24 Thread Jan Hubicka
> 
> Ah, yes.  Now I see.
> 
> >  The test I updated that looks for DECL simply assumes
> > that declarations can not be accessed past their end.
> > It would make more sense to use object size machinery here somehow.
> > (i.e. even in fortran we have accesses to mallocated buffers of constant 
> > size).
> > But this probably could be better handled at niter side where we can also 
> > deal with
> > case of real trailing arrays of known size.
> 
> But then I'm not sure that TYPE_SIZE (TREE_TYPE (ref)) == NULL is
> handled correctly.  I suppose you can hope for the array to be the
> one forcing it NULL and thus its TYPE_DOMAIN max val being NULL ...

Hmm, you are probably right. If we can have array with TYPE_DOMAIN != NULL
and sane bounds, but with TYPE_SIZE == NULL, we probably need to punt on NULL
TYPE_SIZE.  I can add it just to be sure.

I am testing

Index: tree.c
===
--- tree.c  (revision 236557)
+++ tree.c  (working copy)
@@ -13079,7 +13079,8 @@ array_at_struct_end_p (tree ref)
   tree size = NULL;
 
   if (TREE_CODE (ref) == MEM_REF
-  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR)
+  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR
+  && TYPE_SIZE (TREE_TYPE (ref)))
 {
   size = TYPE_SIZE (TREE_TYPE (ref));
   ref = TREE_OPERAND (TREE_OPERAND (ref, 0), 0);


[C++ Patch] PR 69872 ("[6/7 Regression] -Wnarrowing note without warning/errror")

2016-05-24 Thread Paolo Carlini

Hi,

in this small diagnostic regression we emit an inform without a 
preceding warning/error: checking the return value of the pedwarn, as we 
normally want to do, fixes the problem. Tested x86_64-linux.


Thanks,
Paolo.

/
/cp
2016-05-24  Paolo Carlini  

PR c++/69872
* typeck2.c (check_narrowing): Check pedwarn return value.

/testsuite
2016-05-24  Paolo Carlini  

PR c++/69872
* g++.dg/warn/Wno-narrowing1.C: New.
Index: cp/typeck2.c
===
--- cp/typeck2.c(revision 236630)
+++ cp/typeck2.c(working copy)
@@ -950,10 +950,12 @@ check_narrowing (tree type, tree init, tsubst_flag
{
  if (complain & tf_warning_or_error)
{
- if (!almost_ok || pedantic)
-   pedwarn (loc, OPT_Wnarrowing, "narrowing conversion of %qE "
-"from %qT to %qT inside { }", init, ftype, type);
- if (pedantic && almost_ok)
+ if ((!almost_ok || pedantic)
+ && pedwarn (loc, OPT_Wnarrowing,
+ "narrowing conversion of %qE "
+ "from %qT to %qT inside { }",
+ init, ftype, type)
+ && almost_ok)
inform (loc, " the expression has a constant value but is not "
"a C++ constant-expression");
  ok = true;
Index: testsuite/g++.dg/warn/Wno-narrowing1.C
===
--- testsuite/g++.dg/warn/Wno-narrowing1.C  (revision 0)
+++ testsuite/g++.dg/warn/Wno-narrowing1.C  (working copy)
@@ -0,0 +1,7 @@
+// PR c++/69872
+// { dg-options "-Wall -Wextra -pedantic -Wno-narrowing" }
+
+struct s { int x, y; };
+short offsets[1] = {
+  ((char*) &(((struct s*)16)->y) - (char *)16),  // { dg-bogus "note" }
+};


Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-24 Thread Richard Biener
On Tue, 24 May 2016, Jan Hubicka wrote:

> > 
> > Ah, yes.  Now I see.
> > 
> > >  The test I updated that looks for DECL simply assumes
> > > that declarations can not be accessed past their end.
> > > It would make more sense to use object size machinery here somehow.
> > > (i.e. even in fortran we have accesses to mallocated buffers of constant 
> > > size).
> > > But this probably could be better handled at niter side where we can also 
> > > deal with
> > > case of real trailing arrays of known size.
> > 
> > But then I'm not sure that TYPE_SIZE (TREE_TYPE (ref)) == NULL is
> > handled correctly.  I suppose you can hope for the array to be the
> > one forcing it NULL and thus its TYPE_DOMAIN max val being NULL ...
> 
> Hmm, you are probably right. If we can have array with TYPE_DOMAIN != NULL
> and sane bounds, but with TYPE_SIZE == NULL, we probably need to punt on NULL
> TYPE_SIZE.  I can add it just to be sure.

As a MEM_REF embeds a VIEW_CONVERT you can placement-new

struct { int a[5]; char b[]; };

ontop of char X[24]; and access MEM_REF[&x].a[3] (not at struct end)
and MEM_REF[&x].b[4] but _both_ accesses would have TYPE_SIZE NULL.

So I'm not sure TYPE_SIZE tells you anything here...

Richard.

> I am testing
> 
> Index: tree.c
> ===
> --- tree.c(revision 236557)
> +++ tree.c(working copy)
> @@ -13079,7 +13079,8 @@ array_at_struct_end_p (tree ref)
>tree size = NULL;
>  
>if (TREE_CODE (ref) == MEM_REF
> -  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR)
> +  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR
> +  && TYPE_SIZE (TREE_TYPE (ref)))
>  {
>size = TYPE_SIZE (TREE_TYPE (ref));
>ref = TREE_OPERAND (TREE_OPERAND (ref, 0), 0);
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-24 Thread Jan Hubicka
> > Hmm, you are probably right. If we can have array with TYPE_DOMAIN != NULL
> > and sane bounds, but with TYPE_SIZE == NULL, we probably need to punt on 
> > NULL
> > TYPE_SIZE.  I can add it just to be sure.
> 
> As a MEM_REF embeds a VIEW_CONVERT you can placement-new
> 
> struct { int a[5]; char b[]; };

Yep. This is what I am trying to handle with the TYPE_SIZE condition.
> 
> ontop of char X[24]; and access MEM_REF[&x].a[3] (not at struct end)
> and MEM_REF[&x].b[4] but _both_ accesses would have TYPE_SIZE NULL.
> 
> So I'm not sure TYPE_SIZE tells you anything here...

Well, here when parsing  MEM_REF[&x].a[3] array_at_struct_end_p should return
true because it parses the handled components and will see FIELD_REF for .a
that is not at end:

  while (handled_component_p (ref)) 
{   
  /* If the reference chain contains a component reference to a 
 non-union type and there follows another field the reference   
 is not at the end of a structure.  */  
  if (TREE_CODE (ref) == COMPONENT_REF  
  && TREE_CODE (TREE_TYPE (TREE_OPERAND (ref, 0))) == RECORD_TYPE)  
{   
  tree nextf = DECL_CHAIN (TREE_OPERAND (ref, 1));  
  while (nextf && TREE_CODE (nextf) != FIELD_DECL)  
nextf = DECL_CHAIN (nextf); 
  if (nextf)
return false;   
}   

  ref = TREE_OPERAND (ref, 0);  
}   

The size compare is meant to make difference between
struct a { int a[5]; char b[5]; };
placed in char buf[sizeof(struct a)]
or in placed in char buf[sizeof(struct a)+5]

The REF seen at this pace is the REF of ourter type after unwinding handled 
compoennts,
so it should have TYPE_SIZE defined in this case I think.

Honza
> 
> Richard.
> 
> > I am testing
> > 
> > Index: tree.c
> > ===
> > --- tree.c  (revision 236557)
> > +++ tree.c  (working copy)
> > @@ -13079,7 +13079,8 @@ array_at_struct_end_p (tree ref)
> >tree size = NULL;
> >  
> >if (TREE_CODE (ref) == MEM_REF
> > -  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR)
> > +  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR
> > +  && TYPE_SIZE (TREE_TYPE (ref)))
> >  {
> >size = TYPE_SIZE (TREE_TYPE (ref));
> >ref = TREE_OPERAND (TREE_OPERAND (ref, 0), 0);
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-24 Thread Jan Hubicka
Hi,
I tried the attached patch that gets rid of gfc_array_range_type because it
seems pointless from middle-end POV. It however affects .original dumps in the
following way:
--- assumed_type_2.f90.003t.original2016-05-24 14:32:45.771503552 +0200
+++ ../assumed_type_2.f90.003t.original 2016-05-24 14:34:07.637311579 +0200
@@ -246,7 +246,7 @@
 parm.20.offset = NON_LVALUE_EXPR ;
 D.3504 = _gfortran_internal_pack (&parm.20);
 sub_array_assumed (D.3504);
-if ((void *[0:] *) parm.20.data != (void *[0:] *) D.3504)
+if ((void *[] *) parm.20.data != (void *[] *) D.3504)
   { 
 _gfortran_internal_unpack (&parm.20, D.3504);
 __builtin_free (D.3504);
@@ -576,12 +576,12 @@
 { 
   static logical(kind=4) C.3584 = 1;

-  sub_scalar (&(*(real(kind=4)[0:] * restrict) 
array_real_alloc.data)[(array_real_alloc.offset + 
array_real_alloc.dim[1].stride * 2) + 3], &C.3584);
+  sub_scalar (&(*(real(kind=4)[] * restrict) 
array_real_alloc.data)[(array_real_alloc.offset + 
array_real_alloc.dim[1].stride * 2) + 3], &C.3584);
 }
 { 
   static logical(kind=4) C.3585 = 1;

-  sub_scalar (&(*(character(kind=1)[0:][1:1] *) 
array_char_ptr.data)[array_char_ptr.offset + NON_LVALUE_EXPR 
], &C.3585, 1);
+  sub_scalar (&(*(character(kind=1)[][1:1] *) 
array_char_ptr.data)[array_char_ptr.offset + NON_LVALUE_EXPR 
], &C.3585, 1);
 }
 { 
   static logical(kind=4) C.3586 = 1;

Which breaks testsuite.  Perhaps just
 can be printed as 0: (because that is what NULL domain means).  This
is done by dump_array_domain in pretty-print.c and I am not quite sure who
else relies on the format.
Or we can just compoensate the testsuite given that the bounds are really
unknown...

Honza

Index: trans-types.c
===
--- trans-types.c   (revision 236556)
+++ trans-types.c   (working copy)
@@ -52,7 +52,6 @@ along with GCC; see the file COPYING3.
 CInteropKind_t c_interop_kinds_table[ISOCBINDING_NUMBER];
 
 tree gfc_array_index_type;
-tree gfc_array_range_type;
 tree gfc_character1_type_node;
 tree pvoid_type_node;
 tree prvoid_type_node;
@@ -945,12 +944,6 @@ gfc_init_types (void)
 = build_pointer_type (build_function_type_list (void_type_node, 
NULL_TREE));
 
   gfc_array_index_type = gfc_get_int_type (gfc_index_integer_kind);
-  /* We cannot use gfc_index_zero_node in definition of gfc_array_range_type,
- since this function is called before gfc_init_constants.  */
-  gfc_array_range_type
- = build_range_type (gfc_array_index_type,
- build_int_cst (gfc_array_index_type, 0),
- NULL_TREE);
 
   /* The maximum array element size that can be handled is determined
  by the number of bits available to store this field in the array
@@ -1920,12 +1913,12 @@ gfc_get_array_type_bounds (tree etype, i
 
   /* We define data as an array with the correct size if possible.
  Much better than doing pointer arithmetic.  */
-  if (stride)
+  if (stride && akind >= GFC_ARRAY_UNKNOWN)
 rtype = build_range_type (gfc_array_index_type, gfc_index_zero_node,
  int_const_binop (MINUS_EXPR, stride,
   build_int_cst (TREE_TYPE 
(stride), 1)));
   else
-rtype = gfc_array_range_type;
+rtype = NULL;
   arraytype = build_array_type (etype, rtype);
   arraytype = build_pointer_type (arraytype);
   if (restricted)


Re: [PATCH][ARM] PR target/70830: Avoid POP-{reglist}^ when returning from interrupt handlers

2016-05-24 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01211.html

Thanks,
Kyrill

On 17/05/16 11:40, Kyrill Tkachov wrote:


On 13/05/16 12:05, Kyrill Tkachov wrote:

Hi Christophe,

On 12/05/16 20:57, Christophe Lyon wrote:

On 12 May 2016 at 11:48, Ramana Radhakrishnan  wrote:

On Thu, May 5, 2016 at 12:50 PM, Kyrill Tkachov
 wrote:

Hi all,

In this PR we deal with some fallout from the conversion to unified
assembly.
We now end up emitting instructions like:
   pop {r0,r1,r2,r3,pc}^
which is not legal. We have to use an LDM form.

There are bugs in two arm.c functions: output_return_instruction and
arm_output_multireg_pop.

In output_return_instruction the buggy hunk from the conversion was:
   else
-   if (TARGET_UNIFIED_ASM)
   sprintf (instr, "pop%s\t{", conditional);
-   else
- sprintf (instr, "ldm%sfd\t%%|sp!, {", conditional);

The code was already very obscurely structured and arguably the bug was
latent.
It emitted POP only when TARGET_UNIFIED_ASM was on, and since
TARGET_UNIFIED_ASM was on
only for Thumb, we never went down this path interrupt handling code, since
the interrupt
attribute is only available for ARM code. After the removal of
TARGET_UNIFIED_ASM we ended up
using POP unconditionally. So this patch adds a check for IS_INTERRUPT and
outputs the
appropriate LDM form.

In arm_output_multireg_pop the buggy hunk was:
-  if ((regno_base == SP_REGNUM) && TARGET_THUMB)
+  if ((regno_base == SP_REGNUM) && update)
  {
-  /* Output pop (not stmfd) because it has a shorter encoding.  */
-  gcc_assert (update);
sprintf (pattern, "pop%s\t{", conditional);
  }

Again, the POP was guarded on TARGET_THUMB and so would never be taken on
interrupt handling
routines. This patch guards that with the appropriate check on interrupt
return.

Also, there are a couple of bugs in the 'else' branch of that 'if':
* The "ldmfd%s" was output without a '\t' at the end which meant that the
base register
name would be concatenated with the 'ldmfd', creating invalid assembly.

* The logic:

   if (regno_base == SP_REGNUM)
   /* update is never true here, hence there is no need to handle
  pop here.  */
 sprintf (pattern, "ldmfd%s", conditional);

   if (update)
 sprintf (pattern, "ldmia%s\t", conditional);
   else
 sprintf (pattern, "ldm%s\t", conditional);

Meant that for "regno == SP_REGNUM && !update" we'd end up printing
"ldmfd%sldm%s\t"
to pattern. I didn't manage to reproduce that condition though, so maybe it
can't ever occur.
This patch fixes both these issues nevertheless.

I've added the testcase from the PR to catch the fix in
output_return_instruction.
The testcase doesn't catch the bugs in arm_output_multireg_pop, but the
existing tests
gcc.target/arm/interrupt-1.c and gcc.target/arm/interrupt-2.c would have
caught them
if only they were assemble tests rather than just compile. So this patch
makes them
assembly tests (and reverts the scan-assembler checks for the correct LDM
pattern).

Bootstrapped and tested on arm-none-linux-gnueabihf.
Ok for trunk and GCC 6?


Hi Kyrill,

Did you test --with-mode=thumb?
When using arm mode, I see regressions:

   gcc.target/arm/neon-nested-apcs.c (test for excess errors)
   gcc.target/arm/nested-apcs.c (test for excess errors)


It's because I have a local patch in my binutils that makes gas warn on the
deprecated sequences that these two tests generate (they use the deprecated 
-mapcs option),
so these tests were already showing the (test for excess errors) FAIL for me,
so I they didn't appear in my tests diff for this patch. :(

I've reproduced the failure with a clean tree.
Where before we generated:
ldmsp, {fp, sp, pc}
now we generate:
pop{fp, sp, pc}

which are not equivalent (pop performs a write-back) and gas warns:
Warning: writeback of base register when in register list is UNPREDICTABLE

I'm testing a patch to fix this.
Sorry for the regression.


Here is the fix.
I had remove the update from the condition for the "pop" erroneously. Of 
course, if we're not
updating the SP we can't use POP that has an implicit writeback.

Bootstrapped on arm-none-linux-gnueabihf. Tested with -mthumb and -marm.

Ok for trunk and GCC 6?

Thanks,
Kyrill

2016-05-17  Kyrylo Tkachov  

PR target/70830
* config/arm/arm.c (arm_output_multireg_pop): Guard "pop" on update.





Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-24 Thread Richard Biener
On Tue, 24 May 2016, Jan Hubicka wrote:

> Hi,
> I tried the attached patch that gets rid of gfc_array_range_type because it
> seems pointless from middle-end POV. It however affects .original dumps in the
> following way:
> --- assumed_type_2.f90.003t.original2016-05-24 14:32:45.771503552 +0200
> +++ ../assumed_type_2.f90.003t.original 2016-05-24 14:34:07.637311579 +0200
> @@ -246,7 +246,7 @@
>  parm.20.offset = NON_LVALUE_EXPR ;
>  D.3504 = _gfortran_internal_pack (&parm.20);
>  sub_array_assumed (D.3504);
> -if ((void *[0:] *) parm.20.data != (void *[0:] *) D.3504)
> +if ((void *[] *) parm.20.data != (void *[] *) D.3504)
>{ 
>  _gfortran_internal_unpack (&parm.20, D.3504);
>  __builtin_free (D.3504);
> @@ -576,12 +576,12 @@
>  { 
>static logical(kind=4) C.3584 = 1;
> 
> -  sub_scalar (&(*(real(kind=4)[0:] * restrict) 
> array_real_alloc.data)[(array_real_alloc.offset + 
> array_real_alloc.dim[1].stride * 2) + 3], &C.3584);
> +  sub_scalar (&(*(real(kind=4)[] * restrict) 
> array_real_alloc.data)[(array_real_alloc.offset + 
> array_real_alloc.dim[1].stride * 2) + 3], &C.3584);
>  }
>  { 
>static logical(kind=4) C.3585 = 1;
> 
> -  sub_scalar (&(*(character(kind=1)[0:][1:1] *) 
> array_char_ptr.data)[array_char_ptr.offset + NON_LVALUE_EXPR 
> ], &C.3585, 1);
> +  sub_scalar (&(*(character(kind=1)[][1:1] *) 
> array_char_ptr.data)[array_char_ptr.offset + NON_LVALUE_EXPR 
> ], &C.3585, 1);
>  }
>  { 
>static logical(kind=4) C.3586 = 1;
> 
> Which breaks testsuite.  Perhaps just
>  can be printed as 0: (because that is what NULL domain means).  This
> is done by dump_array_domain in pretty-print.c and I am not quite sure who
> else relies on the format.
> Or we can just compoensate the testsuite given that the bounds are really
> unknown...

As said I'd simply use NULL TYPE_MAX_VALUE, not drop TYPE_DOMAIN 
completely (yes, NULL TYPE_DOMAIN is equal to [0:] so we can as well
print that - as you say, not sure what else breaks with that ;))

Richard.

> Honza
> 
> Index: trans-types.c
> ===
> --- trans-types.c (revision 236556)
> +++ trans-types.c (working copy)
> @@ -52,7 +52,6 @@ along with GCC; see the file COPYING3.
>  CInteropKind_t c_interop_kinds_table[ISOCBINDING_NUMBER];
>  
>  tree gfc_array_index_type;
> -tree gfc_array_range_type;
>  tree gfc_character1_type_node;
>  tree pvoid_type_node;
>  tree prvoid_type_node;
> @@ -945,12 +944,6 @@ gfc_init_types (void)
>  = build_pointer_type (build_function_type_list (void_type_node, 
> NULL_TREE));
>  
>gfc_array_index_type = gfc_get_int_type (gfc_index_integer_kind);
> -  /* We cannot use gfc_index_zero_node in definition of gfc_array_range_type,
> - since this function is called before gfc_init_constants.  */
> -  gfc_array_range_type
> -   = build_range_type (gfc_array_index_type,
> -   build_int_cst (gfc_array_index_type, 0),
> -   NULL_TREE);
>  
>/* The maximum array element size that can be handled is determined
>   by the number of bits available to store this field in the array
> @@ -1920,12 +1913,12 @@ gfc_get_array_type_bounds (tree etype, i
>  
>/* We define data as an array with the correct size if possible.
>   Much better than doing pointer arithmetic.  */
> -  if (stride)
> +  if (stride && akind >= GFC_ARRAY_UNKNOWN)
>  rtype = build_range_type (gfc_array_index_type, gfc_index_zero_node,
> int_const_binop (MINUS_EXPR, stride,
>  build_int_cst (TREE_TYPE 
> (stride), 1)));
>else
> -rtype = gfc_array_range_type;
> +rtype = NULL;
>arraytype = build_array_type (etype, rtype);
>arraytype = build_pointer_type (arraytype);
>if (restricted)
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH][ARM][4/4] Simplify checks for CONST_INT_P and comparison against 1/0

2016-05-24 Thread Kyrill Tkachov

Hi all,

Following up from patch 3/4 there are a few more instances where we check that 
an RTX is CONST_INT_P and then
compare its INTVAL against 1 or 0. These can be replaced by just comparing the 
RTX directly against CONST1_RTX
or CONST0_RTX.

This patch does that.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk as obvious.

Thanks,
Kyrill

2016-05-24  Kyrylo Tkachov  

* config/arm/neon.md (ashldi3_neon):  Replace comparison of INTVAL of
operands[2] against 1 with comparison against CONST1_RTX.
(di3_neon): Likewise.
* config/arm/predicates.md (const0_operand): Replace with comparison
against CONST0_RTX.
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 21eed7bb99c48d508a1c8be9c8f992ae07f3d550..e2fdfbb04621ee6f8603849be089e8bce624214d 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1082,7 +1082,7 @@ (define_insn_and_split "ashldi3_neon"
   }
 else
   {
-	if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 1
+	if (operands[2] == CONST1_RTX (SImode)
 	&& (!reg_overlap_mentioned_p (operands[0], operands[1])
 		|| REGNO (operands[0]) == REGNO (operands[1])))
 	  /* This clobbers CC.  */
@@ -1184,7 +1184,7 @@ (define_insn_and_split "di3_neon"
   }
 else
   {
-	if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 1
+	if (operands[2] == CONST1_RTX (SImode)
 	&& (!reg_overlap_mentioned_p (operands[0], operands[1])
 		|| REGNO (operands[0]) == REGNO (operands[1])))
 	  /* This clobbers CC.  */
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 86c1bb62ae9ba433afe3169e07055c1b818e26c8..762c828c98bdccebb773142f1202ec171e3438f7 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -149,8 +149,7 @@ (define_predicate "arm_not_immediate_operand"
(match_test "const_ok_for_arm (~INTVAL (op))")))
 
 (define_predicate "const0_operand"
-  (and (match_code "const_int")
-   (match_test "INTVAL (op) == 0")))
+  (match_test "op == CONST0_RTX (mode)"))
 
 ;; Something valid on the RHS of an ARM data-processing instruction
 (define_predicate "arm_rhs_operand"


Re: [fortran] Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-24 Thread Jan Hubicka
> As said I'd simply use NULL TYPE_MAX_VALUE, not drop TYPE_DOMAIN 
> completely (yes, NULL TYPE_DOMAIN is equal to [0:] so we can as well
> print that - as you say, not sure what else breaks with that ;))

NULL TYPE_MAX_VALUE was used by my previous patch, because it used 
gfc_array_range_type that was built as such.  I am testing the patch
bellow + tree-pretty-print update:
Index: tree-pretty-print.c
===
--- tree-pretty-print.c (revision 236556)
+++ tree-pretty-print.c (working copy)
@@ -362,7 +362,7 @@ dump_array_domain (pretty_printer *pp, t
}
 }
   else
-pp_string (pp, "");
+pp_string (pp, "0:");
   pp_right_bracket (pp);
 }
 
I suppose this is slightly better because it will make things more regular
across frontends and will make LTO to merge bit more.

Honza


[PATCH][ARM][1/4] Replace uses of int_log2 by exact_log2

2016-05-24 Thread Kyrill Tkachov

Hi all,

The int_log2 function in arm.c is not really useful since we already have a 
generic function for calculating
the log2 of HOST_WIDE_INTs. The only difference in functionality is that 
int_log2 also asserts that the result
is no greater than 31.

This patch removes int_log2 in favour of exact_log2 and adds an assert on the 
result to make sure the return
value was as expected.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Is this ok? Or is there something I'm missing about int_log2?

Thanks,
Kyrill

2016-05-24  Kyrylo Tkachov  

* config/arm/arm.c (int_log2): Delete definition and prototype.
(shift_op): Use exact_log2 instead of int_log2.
(vfp3_const_double_for_fract_bits): Likewise.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6cc0feb6f87157171c889e998e52b4e5d8683c66..3fe6eab46f3c18ace6899b5be45ad646992f43e4 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -104,7 +104,6 @@ static void arm_print_operand_address (FILE *, machine_mode, rtx);
 static bool arm_print_operand_punct_valid_p (unsigned char code);
 static const char *fp_const_from_val (REAL_VALUE_TYPE *);
 static arm_cc get_arm_condition_code (rtx);
-static HOST_WIDE_INT int_log2 (HOST_WIDE_INT);
 static const char *output_multi_immediate (rtx *, const char *, const char *,
 	   int, HOST_WIDE_INT);
 static const char *shift_op (rtx, HOST_WIDE_INT *);
@@ -18920,7 +18919,8 @@ shift_op (rtx op, HOST_WIDE_INT *amountp)
 	  return NULL;
 	}
 
-  *amountp = int_log2 (*amountp);
+  *amountp = exact_log2 (*amountp);
+  gcc_assert (IN_RANGE (*amountp, 0, 31));
   return ARM_LSL_NAME;
 
 default:
@@ -18952,22 +18952,6 @@ shift_op (rtx op, HOST_WIDE_INT *amountp)
   return mnem;
 }
 
-/* Obtain the shift from the POWER of two.  */
-
-static HOST_WIDE_INT
-int_log2 (HOST_WIDE_INT power)
-{
-  HOST_WIDE_INT shift = 0;
-
-  while HOST_WIDE_INT) 1 << shift) & power) == 0)
-{
-  gcc_assert (shift <= 31);
-  shift++;
-}
-
-  return shift;
-}
-
 /* Output a .ascii pseudo-op, keeping track of lengths.  This is
because /bin/as is horribly restrictive.  The judgement about
whether or not each character is 'printable' (and can be output as
@@ -27691,7 +27675,11 @@ vfp3_const_double_for_fract_bits (rtx operand)
 	  HOST_WIDE_INT value = real_to_integer (&r0);
 	  value = value & 0x;
 	  if ((value != 0) && ( (value & (value - 1)) == 0))
-	return int_log2 (value);
+	{
+	  int ret = exact_log2 (value);
+	  gcc_assert (IN_RANGE (ret, 0, 31));
+	  return ret;
+	}
 	}
 }
   return 0;


[PATCH] Fix PR71230

2016-05-24 Thread Richard Biener

The following fixes the ICEs in PR71230.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-05-24  Richard Biener  

PR tree-optimization/71240
* tree-ssa-math-opts.c (init_symbolic_number): Verify the source
has integral type.

* gcc.dg/optimize-bswapsi-5.c: New testcase.

Index: gcc/tree-ssa-math-opts.c
===
*** gcc/tree-ssa-math-opts.c(revision 236630)
--- gcc/tree-ssa-math-opts.c(working copy)
*** init_symbolic_number (struct symbolic_nu
*** 2051,2056 
--- 2051,2059 
  {
int size;
  
+   if (! INTEGRAL_TYPE_P (TREE_TYPE (src)))
+ return false;
+ 
n->base_addr = n->offset = n->alias_set = n->vuse = NULL_TREE;
  
/* Set up the symbolic number N by setting each byte to a value between 1 
and
Index: gcc/testsuite/gcc.dg/optimize-bswapsi-5.c
===
*** gcc/testsuite/gcc.dg/optimize-bswapsi-5.c   (revision 0)
--- gcc/testsuite/gcc.dg/optimize-bswapsi-5.c   (working copy)
***
*** 0 
--- 1,31 
+ /* { dg-do compile } */
+ /* { dg-require-effective-target bswap32 } */
+ /* { dg-options "-O2 -fdump-tree-bswap" } */
+ /* { dg-additional-options "-march=z900" { target s390-*-* } } */
+ 
+ struct L { unsigned int l[2]; };
+ union U { double a; struct L l; } u;
+ 
+ void
+ foo (double a, struct L *p)
+ {
+   u.a = a;
+   struct L l = u.l, m;
+   m.l[0] = (((l.l[1] & 0xff00) >> 24)
+   | ((l.l[1] & 0x00ff) >> 8)
+   | ((l.l[1] & 0xff00) << 8)
+   | ((l.l[1] & 0x00ff) << 24));
+   m.l[1] = (((l.l[0] & 0xff00) >> 24)
+   | ((l.l[0] & 0x00ff) >> 8)
+   | ((l.l[0] & 0xff00) << 8)
+   | ((l.l[0] & 0x00ff) << 24));
+   *p = m;
+ }
+ 
+ void
+ bar (double a, struct L *p)
+ {
+   foo (a, p);
+ }
+ 
+ /* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 2 
"bswap" } } */


[PATCH][ARM][2/4] Replace casts of 1 to HOST_WIDE_INT by HOST_WIDE_INT_1 and HOST_WIDE_INT_1U

2016-05-24 Thread Kyrill Tkachov

Hi all,

hwint.h defines a number of useful macros to access the constants -1,0,1 cast to
HOST_WIDE_INT or unsigned HOST_WIDE_INT. We can use these to save some 
horizontal
space and parentheses when we need such constants.

This patch replaces such uses with these macros to slightly improve the 
readability
of some of the expressions in the arm backend.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Will commit as obvious.

Thanks,
Kyrill

P.S. One such usage remains in thumb1_rtx_costs since Thomas will be removing 
it as part
of his ARMv8-M patches, so I didn't want to introduce a dependency.

2016-05-24  Kyrylo Tkachov  

* config/arm/arm.md (andsi3): Replace cast of 1 to HOST_WIDE_INT
with HOST_WIDE_INT_1.
(insv): Likewise.
* config/arm/arm.c (optimal_immediate_sequence): Replace cast of
1 to unsigned HOST_WIDE_INT with HOST_WIDE_INT_1U.
(arm_canonicalize_comparison): Likewise.
(thumb1_rtx_costs): Replace cast of 1 to HOST_WIDE_INT with
HOST_WIDE_INT_1.
(thumb1_size_rtx_costs): Likewise.
(vfp_const_double_index): Replace cast of 1 to unsigned
HOST_WIDE_INT with HOST_WIDE_INT_1U.
(get_jump_table_size): Replace cast of 1 to HOST_WIDE_INT with
HOST_WIDE_INT_1.
(arm_asan_shadow_offset): Replace cast of 1 to unsigned
HOST_WIDE_INT with HOST_WIDE_INT_1U.
* config/arm/neon.md (vec_set): Replace cast of 1 to
HOST_WIDE_INT with HOST_WIDE_INT_1.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3fe6eab46f3c18ace6899b5be45ad646992f43e4..78478303593522d186734c452c970fb013bf846e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -4053,7 +4053,7 @@ optimal_immediate_sequence (enum rtx_code code, unsigned HOST_WIDE_INT val,
  yield a shorter sequence, we may as well use zero.  */
   insns1 = optimal_immediate_sequence_1 (code, val, return_sequence, best_start);
   if (best_start != 0
-  && unsigned HOST_WIDE_INT) 1) << best_start) < val))
+  && ((HOST_WIDE_INT_1U << best_start) < val))
 {
   insns2 = optimal_immediate_sequence_1 (code, val, &tmp_sequence, 0);
   if (insns2 <= insns1)
@@ -4884,7 +4884,7 @@ arm_canonicalize_comparison (int *code, rtx *op0, rtx *op1,
   if (mode == VOIDmode)
 mode = GET_MODE (*op1);
 
-  maxval = (((unsigned HOST_WIDE_INT) 1) << (GET_MODE_BITSIZE(mode) - 1)) - 1;
+  maxval = (HOST_WIDE_INT_1U << (GET_MODE_BITSIZE (mode) - 1)) - 1;
 
   /* For DImode, we have GE/LT/GEU/LTU comparisons.  In ARM mode
  we can also use cmp/cmpeq for GTU/LEU.  GT/LE must be either
@@ -8254,8 +8254,8 @@ thumb1_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
 	  int i;
 	  /* This duplicates the tests in the andsi3 expander.  */
 	  for (i = 9; i <= 31; i++)
-	if HOST_WIDE_INT) 1) << i) - 1 == INTVAL (x)
-		|| (((HOST_WIDE_INT) 1) << i) - 1 == ~INTVAL (x))
+	if ((HOST_WIDE_INT_1 << i) - 1 == INTVAL (x)
+		|| (HOST_WIDE_INT_1 << i) - 1 == ~INTVAL (x))
 	  return COSTS_N_INSNS (2);
 	}
   else if (outer == ASHIFT || outer == ASHIFTRT
@@ -9007,8 +9007,8 @@ thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
   int i;
   /* This duplicates the tests in the andsi3 expander.  */
   for (i = 9; i <= 31; i++)
-if HOST_WIDE_INT) 1) << i) - 1 == INTVAL (x)
-|| (((HOST_WIDE_INT) 1) << i) - 1 == ~INTVAL (x))
+if ((HOST_WIDE_INT_1 << i) - 1 == INTVAL (x)
+|| (HOST_WIDE_INT_1 << i) - 1 == ~INTVAL (x))
   return COSTS_N_INSNS (2);
 }
   else if (outer == ASHIFT || outer == ASHIFTRT
@@ -12122,7 +12122,7 @@ vfp3_const_double_index (rtx x)
 
   /* We can permit four significant bits of mantissa only, plus a high bit
  which is always 1.  */
-  mask = ((unsigned HOST_WIDE_INT)1 << (point_pos - 5)) - 1;
+  mask = (HOST_WIDE_INT_1U << (point_pos - 5)) - 1;
   if ((mantissa & mask) != 0)
 return -1;
 
@@ -16216,7 +16216,7 @@ get_jump_table_size (rtx_jump_table_data *insn)
 	{
 	case 1:
 	  /* Round up size  of TBB table to a halfword boundary.  */
-	  size = (size + 1) & ~(HOST_WIDE_INT)1;
+	  size = (size + 1) & ~HOST_WIDE_INT_1;
 	  break;
 	case 2:
 	  /* No padding necessary for TBH.  */
@@ -29694,7 +29694,7 @@ arm_fusion_enabled_p (unsigned int op)
 static unsigned HOST_WIDE_INT
 arm_asan_shadow_offset (void)
 {
-  return (unsigned HOST_WIDE_INT) 1 << 29;
+  return HOST_WIDE_INT_1U << 29;
 }
 
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 2b190e23a11f23f6e076a84bd309260c8bc4b9da..8c63bf7b75c4e84283ffee471375389f5a5b1a34 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -2154,13 +2154,13 @@ (define_expand "andsi3"
 
   for (i = 9; i <= 31; i++)
 	{
-	  if HOST_WIDE_INT) 1) << i) - 1 == INTVAL (operands[2]))
+	  if ((HOST_WIDE_INT_1 << i) - 1 == INTVAL (operands[2]))
 	{
 	  emit_insn (gen_extzv (operands[0], operands[1], GEN_INT (i),
 			 	cons

[PATCH] Fix PR71230 some more

2016-05-24 Thread Richard Biener

There were more omissions in how zero_one_operation works with the new
way of processing negates in the context of multiplication chains.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.  CPU 2006
build in progress as well.

Richard.

2016-05-24  Richard Biener  

PR tree-optimization/71230
* tree-ssa-reassoc.c (zero_one_operation): Handle negate special ops.

* gcc.dg/torture/pr71230.c: New testcase.
* g++.dg/torture/pr71230.C: Likewise.

Index: gcc/tree-ssa-reassoc.c
===
*** gcc/tree-ssa-reassoc.c  (revision 236630)
--- gcc/tree-ssa-reassoc.c  (working copy)
*** zero_one_operation (tree *def, enum tree
*** 1189,1200 
  {
tree name;
  
!   if (opcode == MULT_EXPR
! && stmt_is_power_of_op (stmt, op))
{
! if (decrement_power (stmt) == 1)
!   propagate_op_to_single_use (op, stmt, def);
! return;
}
  
name = gimple_assign_rhs1 (stmt);
--- 1191,1210 
  {
tree name;
  
!   if (opcode == MULT_EXPR)
{
! if (stmt_is_power_of_op (stmt, op))
!   {
! if (decrement_power (stmt) == 1)
!   propagate_op_to_single_use (op, stmt, def);
! return;
!   }
! else if (gimple_assign_rhs_code (stmt) == NEGATE_EXPR
!  && gimple_assign_rhs1 (stmt) == op)
!   {
! propagate_op_to_single_use (op, stmt, def);
! return;
!   }
}
  
name = gimple_assign_rhs1 (stmt);
*** zero_one_operation (tree *def, enum tree
*** 1213,1219 
}
  
/* We might have a multiply of two __builtin_pow* calls, and
!the operand might be hiding in the rightmost one.  */
if (opcode == MULT_EXPR
  && gimple_assign_rhs_code (stmt) == opcode
  && TREE_CODE (gimple_assign_rhs2 (stmt)) == SSA_NAME
--- 1223,1230 
}
  
/* We might have a multiply of two __builtin_pow* calls, and
!the operand might be hiding in the rightmost one.  Likewise
!this can happen for a negate.  */
if (opcode == MULT_EXPR
  && gimple_assign_rhs_code (stmt) == opcode
  && TREE_CODE (gimple_assign_rhs2 (stmt)) == SSA_NAME
*** zero_one_operation (tree *def, enum tree
*** 1226,1231 
--- 1237,1249 
propagate_op_to_single_use (op, stmt2, def);
  return;
}
+ else if (is_gimple_assign (stmt2)
+  && gimple_assign_rhs_code (stmt2) == NEGATE_EXPR
+  && gimple_assign_rhs1 (stmt2) == op)
+   {
+ propagate_op_to_single_use (op, stmt2, def);
+ return;
+   }
}
  
/* Continue walking the chain.  */
Index: gcc/testsuite/gcc.dg/torture/pr71230.c
===
*** gcc/testsuite/gcc.dg/torture/pr71230.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr71230.c  (working copy)
***
*** 0 
--- 1,25 
+ /* { dg-do compile } */
+ /* { dg-additional-options "-ffast-math" } */
+ 
+ void metric_carttosphere(int *cctk_lsh, double txz, double tyz, double txx,
+double tzz, double sint, double cosp, double cost,
+double tyy, double sinp, double txy, double *grp,
+double *grq, double *r)
+ {
+   int i;
+   for(i=0; i class Tensor;
+template  class Point {
+public:
+Point (const double x, const double y, const double z);
+double operator () (const unsigned int index) const;
+};
+template  class TriaObjectAccessor  {
+Point & vertex (const unsigned int i) const;
+Point barycenter (double, double, double, double, double) const;
+};
+template <> Point<3> TriaObjectAccessor<3, 3>::barycenter (double s6, double 
s7, double s1, double s2, double s3) const
+{
+const double x[8] = {
+   vertex(0)(0),vertex(1)(0),vertex(2)(0),vertex(3)(0),
vertex(4)(0),vertex(5)(0),vertex(6)(0),vertex(7)(0) };
+const double y[8] = {
+   vertex(0)(1),vertex(1)(1),vertex(2)(1),vertex(3)(1),
vertex(4)(1),vertex(5)(1),vertex(6)(1),vertex(7)(1) };
+const double z[8] = {
+   vertex(0)(2),vertex(1)(2),vertex(2)(2),vertex(3)(2),
vertex(4)(2),vertex(5)(2),vertex(6)(2),vertex(7)(2) };
+double s4, s5, s8;
+const double unknown0 = s1*s2;
+const double unknown1 = s1*s2;
+s8 = -z[2]*x[1]*y[2]*z[5]+z[2]*y[1]*x[2]*z[5]-z[2]*z[1]*x[2]*y[5]+z[2]*z   
 
[1]*x[5]*y[2]+2.0*y[5]*x[7]*z[4]*z[4]-y[1]*x[2]*z[0]*z[0]+x[0]*y[3]*z[7]*z[7]   
 
-2.0*z[5]*z[5]*x[4]*y[1]+2.0*z[5]*z[5]*x[1]*y[4]+z[5]*z[5]*x[0]*y[4]-2.0*z[2]*z 
   
[2]*x[1]*y[3]+2.0*z[2]*z[2]*x[3]*y[1]-x[0]*y[4]*z[7]*z[7]-y[0]*x[3]*z[7]*z[7]+x 
   [1]*y[0]*z[5]*z[5];
+s5 = s8

[PATCH][ARM][3/4] Cleanup casts from INTVAL to [unsigned] HOST_WIDE_INT

2016-05-24 Thread Kyrill Tkachov

Hi all,

We have a few instances in the arm backend where we take the INTVAL of an RTX 
and immediately cast it to
an (unsigned HOST_WIDE_INT). This is exactly equivalent to taking the UINTVAL 
of the RTX.

This patch fixes such uses. A couple of uses in arm.md take the INTVAL and then 
compare it to the constant
1 which can be replaced by a comparison with CONST1_RTX without extracting the 
INTVAL.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing as obvious.

Thanks,
Kyrill

2016-05-24  Kyrylo Tkachov  

* config/arm/arm.md (ashldi3): Replace comparison of INTVAL of
operands[2] against 1 with comparison against CONST1_RTX.
(ashrdi3): Likewise.
(lshrdi3): Likewise.
(ashlsi3): Replace cast of INTVAL to unsigned HOST_WIDE_INT with
UINTVAL.
(ashrsi3): Likewise.
(lshrsi3): Likewise.
(rotrsi3): Likewise.
(define_split above *compareqi_eq0): Likewise.
(define_split above "prologue"): Likewise.
* config/arm/arm.c (thumb1_size_rtx_costs): Likewise.
* config/arm/predicates.md (shift_operator): Likewise.
(shift_nomul_operator): Likewise.
(sat_shift_operator): Likewise.
(thumb1_cmp_operand): Likewise.
(const_neon_scalar_shift_amount_operand): Replace manual range
check with IN_RANGE.
* config/arm/thumb1.md (define_peephole2 above *thumb_subdi3):
Replace cast of INTVAL to unsigned HOST_WIDE_INT with UINTVAL.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 78478303593522d186734c452c970fb013bf846e..55b3a82618ef4138573baad3f0654162a33e1032 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8986,7 +8986,7 @@ thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
 case CONST_INT:
   if (outer == SET)
 {
-  if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256)
+  if (UINTVAL (x) < 256)
 return COSTS_N_INSNS (1);
 	  /* See split "TARGET_THUMB1 && satisfies_constraint_J".  */
 	  if (INTVAL (x) >= -255 && INTVAL (x) <= -1)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 8c63bf7b75c4e84283ffee471375389f5a5b1a34..e78ede8945fb2d0c0ac5a5af7b96a64d061cf5c3 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3761,8 +3761,7 @@ (define_expand "ashldi3"
 {
   rtx scratch1, scratch2;
 
-  if (CONST_INT_P (operands[2])
-	  && (HOST_WIDE_INT) INTVAL (operands[2]) == 1)
+  if (operands[2] == CONST1_RTX (SImode))
 {
   emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
   DONE;
@@ -3807,7 +3806,7 @@ (define_expand "ashlsi3"
   "TARGET_EITHER"
   "
   if (CONST_INT_P (operands[2])
-  && ((unsigned HOST_WIDE_INT) INTVAL (operands[2])) > 31)
+  && (UINTVAL (operands[2])) > 31)
 {
   emit_insn (gen_movsi (operands[0], const0_rtx));
   DONE;
@@ -3835,8 +3834,7 @@ (define_expand "ashrdi3"
 {
   rtx scratch1, scratch2;
 
-  if (CONST_INT_P (operands[2])
-	  && (HOST_WIDE_INT) INTVAL (operands[2]) == 1)
+  if (operands[2] == CONST1_RTX (SImode))
 {
   emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
   DONE;
@@ -3881,7 +3879,7 @@ (define_expand "ashrsi3"
   "TARGET_EITHER"
   "
   if (CONST_INT_P (operands[2])
-  && ((unsigned HOST_WIDE_INT) INTVAL (operands[2])) > 31)
+  && UINTVAL (operands[2]) > 31)
 operands[2] = GEN_INT (31);
   "
 )
@@ -3906,8 +3904,7 @@ (define_expand "lshrdi3"
 {
   rtx scratch1, scratch2;
 
-  if (CONST_INT_P (operands[2])
-	  && (HOST_WIDE_INT) INTVAL (operands[2]) == 1)
+  if (operands[2] == CONST1_RTX (SImode))
 {
   emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
   DONE;
@@ -3952,7 +3949,7 @@ (define_expand "lshrsi3"
   "TARGET_EITHER"
   "
   if (CONST_INT_P (operands[2])
-  && ((unsigned HOST_WIDE_INT) INTVAL (operands[2])) > 31)
+  && (UINTVAL (operands[2])) > 31)
 {
   emit_insn (gen_movsi (operands[0], const0_rtx));
   DONE;
@@ -3986,7 +3983,7 @@ (define_expand "rotrsi3"
   if (TARGET_32BIT)
 {
   if (CONST_INT_P (operands[2])
-  && ((unsigned HOST_WIDE_INT) INTVAL (operands[2])) > 31)
+  && UINTVAL (operands[2]) > 31)
 operands[2] = GEN_INT (INTVAL (operands[2]) % 32);
 }
   else /* TARGET_THUMB1 */
@@ -5129,7 +5126,7 @@ (define_split
 		 (match_operator 5 "subreg_lowpart_operator"
 		  [(match_operand:SI 4 "s_register_operand" "")]]
   "TARGET_32BIT
-   && ((unsigned HOST_WIDE_INT) INTVAL (operands[3])
+   && (UINTVAL (operands[3])
== (GET_MODE_MASK (GET_MODE (operands[5]))
& (GET_MODE_MASK (GET_MODE (operands[5]))
 	  << (INTVAL (operands[2])"
@@ -10187,8 +10184,8 @@ (define_split
 	 (match_operand 1 "const_int_operand" "")))
(clobber (match_scratch:SI 2 ""))]
   "TARGET_ARM
-   && (((unsigned HOST_WIDE_INT) INTVAL (operands[1]))
-   == (((unsigned HOST_WIDE_INT) INTVAL (operands[1])) >> 24) << 24)"
+   && ((UINTVA

Re: [PATCH v3] gcov: Runtime configurable destination output

2016-05-24 Thread Nathan Sidwell

On 05/23/16 16:03, Aaron Conole wrote:

The previous gcov behavior was to always output errors on the stderr channel.
This is fine for most uses, but some programs will require stderr to be
untouched by libgcov for certain tests. This change allows configuring
the gcov output via an environment variable which will be used to open
the appropriate file.


this patch is nearly there, but a couple of nits and an error on my part.



+/* Configured via the GCOV_ERROR_FILE environment variable;
+   it will either be stderr, or a file of the user's choosing. */
+static FILE *gcov_error_file;


I was wrong about making this static.  Your original externally visible 
definition (with leading __) was right.  The reason is that multiple gcov-aware 
shared objects should use the same FILE for errors.  If you could restore that 
part of your previous patch, along  with a comment explaining why 
gcov_error_file is externally visible, but get_gcov_error is static, that'd be 
great.



+
+/* A utility function to populate the gcov_error_file pointer */
+
+static FILE *
+get_gcov_error_file(void)
+{
+#if IN_GCOV_TOOL
+  return stderr;
+#endif


Prefer #else ... #endif to encapsulate the  remaining bit of the function.



 /* A utility function for outputing errors.  */


May as well fix the spelling error
  outputing -> outputting



+#if !IN_GCOV_TOOL
+static void
+gcov_error_exit(void)
+{
+  if (gcov_error_file && gcov_error_file != stderr)
+{
+  fclose(gcov_error_file);


needs space -- the habit'll grow



+#if !IN_GCOV_TOOL
+static void gcov_error_exit(void);


space before '('

nathan


Re: [PATCH] Vectorize inductions that are live after the loop.

2016-05-24 Thread Richard Biener
On Mon, May 23, 2016 at 2:53 PM, Alan Hayward  wrote:
>
> Thanks for the review.
>
> On 23/05/2016 11:35, "Richard Biener"  wrote:
>
>>
>>@@ -6332,79 +6324,81 @@ vectorizable_live_operation (gimple *stmt,
>>   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>>   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>>   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>>-  tree op;
>>-  gimple *def_stmt;
>>-  ssa_op_iter iter;
>>+  imm_use_iterator imm_iter;
>>+  tree lhs, lhs_type, vec_lhs;
>>+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>>+  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
>>+  int ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
>>+  gimple *use_stmt;
>>
>>   gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
>>
>>+  if (STMT_VINFO_TYPE (stmt_info) == reduc_vec_info_type)
>>+return true;
>>+
>>
>>This is an odd check - it says the stmt is handled by
>>vectorizable_reduction.  And your
>>return claims it is handled by vectorizable_live_operation ...
>
> Previously this check was made to decide whether to call
> vectorizable_live_operation,
> So it made sense to put this check inside the function.
>
> But, yes, I agree that the return value of the function no longer makes
> sense.
> I can revert this.

Please.

>>
>>You removed the SIMD lane handling?
>
> The SIMD lane handling effectively checked for a special case, then added
> code which would extract the final value of the vector.
> The new code I’ve added does the exact same thing for more generic cases,
> so the SIMD check can be removed and it’ll still be vectorized correctly.

Ah, that's nice then.

>>
>>@@ -303,6 +335,16 @@ vect_stmt_relevant_p (gimple *stmt, loop_vec_info
>>loop_vinfo,
>>}
>> }
>>
>>+  if (*live_p && *relevant == vect_unused_in_scope
>>+  && !is_simple_and_all_uses_invariant (stmt, loop_vinfo))
>>+{
>>+  if (dump_enabled_p ())
>>+   dump_printf_loc (MSG_NOTE, vect_location,
>>+"vec_stmt_relevant_p: live and not all uses "
>>+"invariant.\n");
>>+  *relevant = vect_used_only_live;
>>+}
>>
>>But that's a missed invariant motion / code sinking opportunity then.
>>Did you have a
>>testcase for this?
>
> I don’t have a test case :(
> It made sense that this was the correct action to do on the failure
> (rather than assert).

I meant a testcase that has is_gimple_and_all_uses_invariant == true.

>>
>>@@ -618,57 +660,31 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info
>>loop_vinfo)
>>}
>>
>>   /* Examine the USEs of STMT. For each USE, mark the stmt that
>>defines it
>>-(DEF_STMT) as relevant/irrelevant and live/dead according to the
>>-liveness and relevance properties of STMT.  */
>>+(DEF_STMT) as relevant/irrelevant according to the relevance
>>property
>>+of STMT.  */
>>   stmt_vinfo = vinfo_for_stmt (stmt);
>>   relevant = STMT_VINFO_RELEVANT (stmt_vinfo);
>>-  live_p = STMT_VINFO_LIVE_P (stmt_vinfo);
>>-
>>-  /* Generally, the liveness and relevance properties of STMT are
>>-propagated as is to the DEF_STMTs of its USEs:
>>- live_p <-- STMT_VINFO_LIVE_P (STMT_VINFO)
>>- relevant <-- STMT_VINFO_RELEVANT (STMT_VINFO)
>>-
>>-One exception is when STMT has been identified as defining a
>>reduction
>>-variable; in this case we set the liveness/relevance as follows:
>>-  live_p = false
>>-  relevant = vect_used_by_reduction
>>-This is because we distinguish between two kinds of relevant
>>stmts -
>>-those that are used by a reduction computation, and those that
>>are
>>-(also) used by a regular computation.  This allows us later on to
>>-identify stmts that are used solely by a reduction, and
>>therefore the
>>-order of the results that they produce does not have to be kept.
>> */
>>-
>>-  def_type = STMT_VINFO_DEF_TYPE (stmt_vinfo);
>>-  tmp_relevant = relevant;
>>-  switch (def_type)
>>+
>>+  switch (STMT_VINFO_DEF_TYPE (stmt_vinfo))
>> {
>>
>>you removed this comment.  Is it no longer valid?  Can you please
>>instead update it?
>>This is a tricky area.
>
> I’ll replace with a new comment.
>
>>
>>
>>@@ -1310,17 +1325,14 @@ vect_init_vector (gimple *stmt, tree val, tree
>>type, gimple_stmt_iterator *gsi)
>>In case OP is an invariant or constant, a new stmt that creates a
>>vector def
>>needs to be introduced.  VECTYPE may be used to specify a required
>>type for
>>vector invariant.  */
>>-
>>-tree
>>-vect_get_vec_def_for_operand (tree op, gimple *stmt, tree vectype)
>>+static tree
>>+vect_get_vec_def_for_operand_internal (tree op, gimple *stmt,
>>+  loop_vec_info loop_vinfo, tree
>>vectype)
>> {
>>   tree vec_oprnd;
>>...
>>
>>+tree
>>+vect_get_vec_def_for_operand (tree op, gimple *stmt, tree vectype)
>>+{
>>+  stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
>>+  loop_vec_info loop_vinfo = STMT_

Re: RFC [1/2] divmod transform

2016-05-24 Thread Prathamesh Kulkarni
On 24 May 2016 at 17:42, Richard Biener  wrote:
> On Tue, 24 May 2016, Prathamesh Kulkarni wrote:
>
>> On 23 May 2016 at 17:35, Richard Biener  wrote:
>> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni
>> >  wrote:
>> >> Hi,
>> >> I have updated my patch for divmod (attached), which was originally
>> >> based on Kugan's patch.
>> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>> >> having same operands to divmod representation, so we can cse computation 
>> >> of mod.
>> >>
>> >> t1 = a TRUNC_DIV_EXPR b;
>> >> t2 = a TRUNC_MOD_EXPR b
>> >> is transformed to:
>> >> complex_tmp = DIVMOD (a, b);
>> >> t1 = REALPART_EXPR (complex_tmp);
>> >> t2 = IMAGPART_EXPR (complex_tmp);
>> >>
>> >> * New hook divmod_expand_libfunc
>> >> The rationale for introducing the hook is that different targets have
>> >> incompatible calling conventions for divmod libfunc.
>> >> Currently three ports define divmod libfunc: c6x, spu and arm.
>> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:
>> >> return quotient and store remainder in argument passed as pointer,
>> >> while the arm version takes two arguments and returns both
>> >> quotient and remainder having mode double the size of the operand mode.
>> >> The port should hence override the hook expand_divmod_libfunc
>> >> to generate call to target-specific divmod.
>> >> Ports should define this hook if:
>> >> a) The port does not have divmod or div insn for the given mode.
>> >> b) The port defines divmod libfunc for the given mode.
>> >> The default hook default_expand_divmod_libfunc() generates call
>> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and
>> >> are of DImode.
>> >>
>> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and
>> >> cross-tested on arm*-*-*.
>> >> Bootstrap+test in progress on arm-linux-gnueabihf.
>> >> Does this patch look OK ?
>> >
>> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c
>> > index 6b4601b..e4a021a 100644
>> > --- a/gcc/targhooks.c
>> > +++ b/gcc/targhooks.c
>> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,
>> > machine_mode, optimization_type)
>> >return true;
>> >  }
>> >
>> > +void
>> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,
>> > +  rtx op0, rtx op1,
>> > +  rtx *quot_p, rtx *rem_p)
>> >
>> > functions need a comment.
>> >
>> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In 
>> > that
>> > case we could avoid the target hook.
>> Well I would prefer adding the hook because that's more easier -;)
>> Would it be ok for now to go with the hook ?
>> >
>> > +  /* If target overrides expand_divmod_libfunc hook
>> > +then perform divmod by generating call to the target-specifc 
>> > divmod
>> > libfunc.  */
>> > +  if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)
>> > +   return true;
>> > +
>> > +  /* Fall back to using libgcc2.c:__udivmoddi4.  */
>> > +  return (mode == DImode && unsignedp);
>> >
>> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'
>> > but still restrict this to DImode && unsigned?  Also if
>> > targetm.expand_divmod_libfunc
>> > is not the default we expect the target to handle all modes?
>> Ah indeed, the check for DImode is unnecessary.
>> However I suppose the check for unsignedp should be there,
>> since we want to generate call to __udivmoddi4 only if operand is unsigned ?
>
> The optab libfunc for sdivmod should be NULL in that case.
Ah indeed, thanks.
>
>> >
>> > That said - I expected the above piece to be simply a 'return true;' ;)
>> >
>> > Usually we use some can_expand_XXX helper in optabs.c to query if the 
>> > target
>> > supports a specific operation (for example SImode divmod would use DImode
>> > divmod by means of widening operands - for the unsigned case of course).
>> Thanks for pointing out. So if a target does not support divmod
>> libfunc for a mode
>> but for a wider mode, then we could zero-extend operands to the wider-mode,
>> perform divmod on the wider-mode, and then cast result back to the
>> original mode.
>> I haven't done that in this patch, would it be OK to do that as a follow up ?
>
> I think that you should conservatively handle the div_optab query, thus if
> the target has a HW division in a wider mode don't use the divmod IFN.
> You'd simply iterate over GET_MODE_WIDER_MODE and repeat the
> if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing
> out if that is available.
Done.
>
>> > +  /* Disable the transform if either is a constant, since
>> > division-by-constant
>> > + may have specialized expansion.  */
>> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
>> > +return false;
>> >
>> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)
>> >
>> > +  if (TYPE_OVERFLOW_TRAPS (type))
>> > +return false;
>> >
>> > why's that?  Generally please first test ch

  1   2   >