date:20160112

[PATCH] Fix PR69157

2016-01-12 Thread Richard Biener


With the work-around-limited-IL patch I put in earlier we now need
to cope with dt_external stmts (in self-referencing SLPs).  Thus
keep the def-type check to the analysis phase (larger refactoring
to split analysis and transform phase more properly is not appropriate
at this stage).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-01-12  Richard Biener  

PR tree-optimization/69157
* tree-vect-stmts.c (vectorizable_mask_load_store): Check
stmts def type only during analyze phase.
(vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.

* gcc.dg/torture/pr69157.c: New testcase.

Index: gcc/tree-vect-stmts.c
===
*** gcc/tree-vect-stmts.c   (revision 232213)
--- gcc/tree-vect-stmts.c   (working copy)
*** vectorizable_mask_load_store (gimple *st
*** 1757,1763 
if (!STMT_VINFO_RELEVANT_P (stmt_info))
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
  return false;
  
if (!STMT_VINFO_DATA_REF (stmt_info))
--- 1760,1767 
if (!STMT_VINFO_RELEVANT_P (stmt_info))
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
!   && ! vec_stmt)
  return false;
  
if (!STMT_VINFO_DATA_REF (stmt_info))
*** vectorizable_call (gimple *gs, gimple_st
*** 2206,2212 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
  return false;
  
/* Is GS a vectorizable call?   */
--- 2210,2217 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
!   && ! vec_stmt)
  return false;
  
/* Is GS a vectorizable call?   */
*** vectorizable_simd_clone_call (gimple *st
*** 2811,2817 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
  return false;
  
if (gimple_call_lhs (stmt)
--- 2816,2823 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
!   && ! vec_stmt)
  return false;
  
if (gimple_call_lhs (stmt)
*** vectorizable_conversion (gimple *stmt, g
*** 3669,3675 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
  return false;
  
if (!is_gimple_assign (stmt))
--- 3675,3682 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
!   && ! vec_stmt)
  return false;
  
if (!is_gimple_assign (stmt))
*** vectorizable_assignment (gimple *stmt, g
*** 4246,4252 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
  return false;
  
/* Is vectorizable assignment?  */
--- 4253,4260 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
!   && ! vec_stmt)
  return false;
  
/* Is vectorizable assignment?  */
*** vectorizable_shift (gimple *stmt, gimple
*** 4462,4468 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
  return false;
  
/* Is STMT a vectorizable binary/unary operation?   */
--- 4470,4477 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
!   && ! vec_stmt)
  return false;
  
/* Is STMT a vectorizable binary/unary operation?   */
*** vectorizable_operation (gimple *stmt, gi
*** 4823,4829 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
  return false;
  
/* Is STMT a vectorizable binary/unary operation?   */
--- 4832,4839 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
!   && ! vec_stmt)
  return false;
  
/* Is STMT a vectorizable binary/unary operation?   */
*** vectorizable_store (gimple *stmt, gimple
*** 5248,5254 
if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
  return false;
  
!   i

[PATCH] Fix PR69174

2016-01-12 Thread Richard Biener


This fixes an oversight in strided permuted SLP loads which miscomputed
the number of required loads.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-01-12  Richard Biener  

PR tree-optimization/69174
* tree-vect-stmts.c (vect_mark_relevant): Remove excessive vertical
space.
(vectorizable_load): Properly compute the number of loads needed
for permuted strided SLP loads and do not spuriously assign
to SLP_TREE_VEC_STMTS.

* gcc.dg/torture/pr69174.c: New testcase.

Index: gcc/tree-vect-stmts.c
===
*** gcc/tree-vect-stmts.c   (revision 232213)
--- gcc/tree-vect-stmts.c   (working copy)
*** vect_mark_relevant (vec *workl
*** 190,197 
gimple *pattern_stmt;
  
if (dump_enabled_p ())
! dump_printf_loc (MSG_NOTE, vect_location,
!  "mark relevant %d, live %d.\n", relevant, live_p);
  
/* If this stmt is an original stmt in a pattern, we might need to mark its
   related pattern stmt instead of the original stmt.  However, such stmts
--- 190,200 
gimple *pattern_stmt;
  
if (dump_enabled_p ())
! {
!   dump_printf_loc (MSG_NOTE, vect_location,
!  "mark relevant %d, live %d: ", relevant, live_p);
!   dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
! }
  
/* If this stmt is an original stmt in a pattern, we might need to mark its
   related pattern stmt instead of the original stmt.  However, such stmts
*** vectorizable_load (gimple *stmt, gimple_
*** 6748,6756 
  else
ltype = vectype;
  ltype = build_aligned_type (ltype, TYPE_ALIGN (TREE_TYPE (vectype)));
! ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
  if (slp_perm)
!   dr_chain.create (ncopies);
}
for (j = 0; j < ncopies; j++)
{
--- 6751,6766 
  else
ltype = vectype;
  ltype = build_aligned_type (ltype, TYPE_ALIGN (TREE_TYPE (vectype)));
! /* For SLP permutation support we need to load the whole group,
!not only the number of vector stmts the permutation result
!fits in.  */
  if (slp_perm)
!   {
! ncopies = (group_size * vf + nunits - 1) / nunits;
! dr_chain.create (ncopies);
!   }
! else
!   ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
}
for (j = 0; j < ncopies; j++)
{
*** vectorizable_load (gimple *stmt, gimple_
*** 6798,6806 
  
  if (slp)
{
- SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
  if (slp_perm)
dr_chain.quick_push (gimple_assign_lhs (new_stmt));
}
  else
{
--- 6808,6817 
  
  if (slp)
{
  if (slp_perm)
dr_chain.quick_push (gimple_assign_lhs (new_stmt));
+ else
+   SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
}
  else
{
Index: gcc/testsuite/gcc.dg/torture/pr69174.c
===
*** gcc/testsuite/gcc.dg/torture/pr69174.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr69174.c  (working copy)
***
*** 0 
--- 1,19 
+ /* { dg-do compile } */
+ 
+ typedef int pixval;
+ typedef struct { pixval r, g, b; } xel;
+ int convertRow_sample, convertRaster_col;
+ short *convertRow_samplebuf;
+ xel *convertRow_xelrow;
+ short convertRow_spp;
+ void fn1() {
+ int *alpharow;
+ for (; convertRaster_col;
+++convertRaster_col, convertRow_sample += convertRow_spp) {
+   convertRow_xelrow[convertRaster_col].r =
+   convertRow_xelrow[convertRaster_col].g =
+   convertRow_xelrow[convertRaster_col].b =
+   convertRow_samplebuf[convertRow_sample];
+   alpharow[convertRaster_col] = convertRow_samplebuf[convertRow_sample + 
3];
+ }
+ }

[PATCH] Fix PR69168

2016-01-12 Thread Richard Biener


The following fixes the assumption that we consistently have patterns
used in SLP.  That's not true given we skip them if the original
def is life or relevant during SLP analysis.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-01-12  Richard Biener  

PR tree-optimization/69168
* tree-vect-loop.c (vect_analyze_loop_2): Reset both main and
pattern stmt SLP type.
* tree-vect-slp.c (vect_detect_hybrid_slp_stmts): Patterns may
end up unused so cope with that case.

* gcc.dg/torture/pr69168.c: New testcase.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c(revision 232231)
+++ gcc/tree-vect-loop.c(working copy)
@@ -2189,10 +2189,11 @@ again:
   !gsi_end_p (si); gsi_next (&si))
{
  stmt_vec_info stmt_info = vinfo_for_stmt (gsi_stmt (si));
+ STMT_SLP_TYPE (stmt_info) = loop_vect;
  if (STMT_VINFO_IN_PATTERN_P (stmt_info))
{
- gcc_assert (STMT_SLP_TYPE (stmt_info) == loop_vect);
  stmt_info = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (stmt_info));
+ STMT_SLP_TYPE (stmt_info) = loop_vect;
  for (gimple_stmt_iterator pi
 = gsi_start (STMT_VINFO_PATTERN_DEF_SEQ (stmt_info));
   !gsi_end_p (pi); gsi_next (&pi))
@@ -2201,7 +2202,6 @@ again:
  STMT_SLP_TYPE (vinfo_for_stmt (pstmt)) = loop_vect;
}
}
- STMT_SLP_TYPE (stmt_info) = loop_vect;
}
 }
   /* Free optimized alias test DDRS.  */
Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 232231)
+++ gcc/tree-vect-slp.c (working copy)
@@ -2016,10 +2016,10 @@ vect_detect_hybrid_slp_stmts (slp_tree n
 {
   /* Check if a pure SLP stmt has uses in non-SLP stmts.  */
   gcc_checking_assert (PURE_SLP_STMT (stmt_vinfo));
-  /* We always get the pattern stmt here, but for immediate
-uses we have to use the LHS of the original stmt.  */
-  gcc_checking_assert (!STMT_VINFO_IN_PATTERN_P (stmt_vinfo));
-  if (STMT_VINFO_RELATED_STMT (stmt_vinfo))
+  /* If we get a pattern stmt here we have to use the LHS of the
+ original stmt for immediate uses.  */
+  if (! STMT_VINFO_IN_PATTERN_P (stmt_vinfo)
+ && STMT_VINFO_RELATED_STMT (stmt_vinfo))
stmt = STMT_VINFO_RELATED_STMT (stmt_vinfo);
   if (TREE_CODE (gimple_op (stmt, 0)) == SSA_NAME)
FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, gimple_op (stmt, 0))
Index: gcc/testsuite/gcc.dg/torture/pr69168.c
===
--- gcc/testsuite/gcc.dg/torture/pr69168.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr69168.c  (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+long a, b, e;
+short *c;
+int *d;
+void fn1()
+{
+  int i;
+  for (; e; e--)
+{
+  i = 2;
+  for (; i; i--)
+   a = b = *d++ / (1 << 9);
+  b = b ? 8 : a;
+  *c++ = *c++ = b;
+}
+}

[PATCH, testsuite] Stabilize test result output of dump-noaddr

2016-01-12 Thread Thomas Preud'homme

Hi,

Everytime the static pass number of passes change, testsuite output for dump-
noaddr will change, leading to a series of noise lines like the following 
under dg-cmp-results:

PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -O1  comparison
PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -O2  comparison
PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -O2 -flto -fno-use-
linker-plugin -flto-partition=none  comparison
PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -O2 -flto -fuse-
linker-plugin -fno-fat-lto-objects  comparison
PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -O3 -fomit-frame-
pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  comparison
PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -O3 -g  comparison
PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -Og -g  comparison
PASS->NA: gcc.c-torture/unsorted/dump-noaddr.c.036t.fre1,  -Os  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -O1  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -O2  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -O2 -flto -fno-use-
linker-plugin -flto-partition=none  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -O2 -flto -fuse-
linker-plugin -fno-fat-lto-objects  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -O3 -fomit-frame-
pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -O3 -g  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -Og -g  comparison
NA->PASS: gcc.c-torture/unsorted/dump-noaddr.c.034t.fre1,  -Os  comparison

This patch solve this problem by replacing the static pass number in the 
output by a star, allowing for a stable output while retaining easy copy/
pasting in shell.

ChangeLog entry is as follows:


*** gcc/testsuite/ChangeLog ***

2015-12-30  Thomas Preud'homme  

* gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Replace static
pass number in output by a star.


diff --git a/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x b/gcc/
testsuite/gcc.c-torture/unsorted/dump-noaddr.x
index a8174e0..001dd6b 100644
--- a/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x
+++ b/gcc/testsuite/gcc.c-torture/unsorted/dump-noaddr.x
@@ -18,6 +18,7 @@ proc dump_compare { src options } {
foreach dump1 [lsort [glob -nocomplain dump1/*]] {
regsub dump1/ $dump1 dump2/ dump2
set dumptail "gcc.c-torture/unsorted/[file tail $dump1]"
+   regsub {\.\d+((t|r|i)\.[^.]+)$} $dumptail {.*\1} dumptail
#puts "$option $dump1"
set tmp [ diff "$dump1" "$dump2" ]
if { $tmp == 0 } {


Is this ok for stage3?

Best regards,

Thomas

RE: [PATCH, libgcc/ARM 1/6] Fix Thumb-1 only == ARMv6-M & Thumb-2 only == ARMv7-M assumptions

2016-01-12 Thread Thomas Preud'homme

> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Thursday, December 17, 2015 1:58 PM
> 
> Hi,
> 
> We decided to apply the following patch to the ARM embedded 5 branch.
> This is *not* intended for trunk for now. We will send a separate email
> for trunk.

And now a rebased patch on top of trunk.


> 
> This patch is part of a patch series to add support for ARMv8-M[1] to GCC.
> This specific patch fixes some assumptions related to M profile
> architectures. Currently GCC (mostly libgcc) contains several assumptions
> that the only ARM architecture with Thumb-1 only instructions is ARMv6-
> M and the only one with Thumb-2 only instructions is ARMv7-M. ARMv8-
> M [1] make this wrong since ARMv8-M baseline is also (mostly) Thumb-1
> only and ARMv8-M mainline is also Thumb-2 only. This patch replace
> checks for __ARM_ARCH_*__ for checks against
> __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM instead. For
> instance, Thumb-1 only can be checked with
> #if !defined(__ARM_ARCH_ISA_ARM) && (__ARM_ARCH_ISA_THUMB
> == 1). It also fixes the guard for DIV code to not apply to ARMv8-M
> Baseline since it uses Thumb-2 instructions.
> 
> [1] For a quick overview of ARMv8-M please refer to the initial cover
> letter.
> 
> ChangeLog entries are as follow:
> 
> 

*** gcc/ChangeLog ***

2015-11-13  Thomas Preud'homme  

* config/arm/elf.h: Use __ARM_ARCH_ISA_THUMB and __ARM_ARCH_ISA_ARM to
decide whether to prevent some libgcc routines being included for some
multilibs rather than __ARM_ARCH_6M__ and add comment to indicate the
link between this condition and the one in
libgcc/config/arm/lib1func.S.
* config/arm/arm.h (TARGET_ARM_V6M): Add check to TARGET_ARM_ARCH.
(TARGET_ARM_V7M): Likewise.


*** gcc/testsuite/ChangeLog ***

2015-11-10  Thomas Preud'homme  

* lib/target-supports.exp (check_effective_target_arm_cortex_m): Use
__ARM_ARCH_ISA_ARM to test for Cortex-M devices.


*** libgcc/ChangeLog ***

2015-12-17  Thomas Preud'homme  

* config/arm/bpabi-v6m.S: Fix header comment to mention Thumb-1 rather
than ARMv6-M.
* config/arm/lib1funcs.S (__prefer_thumb__): Define among other cases
for all Thumb-1 only targets.
(__only_thumb1__): Define for all Thumb-1 only targets.
(THUMB_LDIV0): Test for __only_thumb1__ rather than __ARM_ARCH_6M__.
(EQUIV): Likewise.
(ARM_FUNC_ALIAS): Likewise.
(umodsi3): Add check to __only_thumb1__ to guard the idiv version.
(modsi3): Likewise.
(HAVE_ARM_CLZ): Remove block defining it.
(clzsi2): Test for __only_thumb1__ rather than __ARM_ARCH_6M__ and
check __ARM_FEATURE_CLZ instead of HAVE_ARM_CLZ.
(clzdi2): Likewise.
(ctzsi2): Likewise.
(L_interwork_call_via_rX): Test for __ARM_ARCH_ISA_ARM rather than
__ARM_ARCH_6M__ in guard for checking whether it is defined.
(final includes): Test for __only_thumb1__ rather than
__ARM_ARCH_6M__ and add comment to indicate the connection between
this condition and the one in gcc/config/arm/elf.h.
* config/arm/libunwind.S: Test for __ARM_ARCH_ISA_THUMB and
__ARM_ARCH_ISA_ARM rather than __ARM_ARCH_6M__.
* config/arm/t-softfp: Likewise.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index fd999dd..0d23f39 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2182,8 +2182,10 @@ extern int making_const_table;
 #define TARGET_ARM_ARCH\
   (arm_base_arch)  \
 
-#define TARGET_ARM_V6M (!arm_arch_notm && !arm_arch_thumb2)
-#define TARGET_ARM_V7M (!arm_arch_notm && arm_arch_thumb2)
+#define TARGET_ARM_V6M (TARGET_ARM_ARCH == BASE_ARCH_6M && !arm_arch_notm \
+   && !arm_arch_thumb2)
+#define TARGET_ARM_V7M (TARGET_ARM_ARCH == BASE_ARCH_7M && !arm_arch_notm \
+   && arm_arch_thumb2)
 
 /* The highest Thumb instruction set version supported by the chip.  */
 #define TARGET_ARM_ARCH_ISA_THUMB  \
diff --git a/gcc/config/arm/elf.h b/gcc/config/arm/elf.h
index 3795728..579a580 100644
--- a/gcc/config/arm/elf.h
+++ b/gcc/config/arm/elf.h
@@ -148,8 +148,9 @@
   while (0)
 
 /* Horrible hack: We want to prevent some libgcc routines being included
-   for some multilibs.  */
-#ifndef __ARM_ARCH_6M__
+   for some multilibs.  The condition should match the one in
+   libgcc/config/arm/lib1funcs.S.  */
+#if __ARM_ARCH_ISA_ARM || __ARM_ARCH_ISA_THUMB != 1
 #undef L_fixdfsi
 #undef L_fixunsdfsi
 #undef L_truncdfsf2
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 4e349e9..3f96826 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3221,10 +3221,8 @@ proc check_effective_target_arm_cortex_m { } {
return 0
 }
 return [check_no_compiler_messages arm_cortex_m assembly {
-

Re: [PATCH, ARM] Fox target/69180] #pragma GCC target should not warn about redefined macros

2016-01-12 Thread Kyrill Tkachov



On 12/01/16 09:00, Christian Bruel wrote:



On 01/11/2016 03:37 PM, Kyrill Tkachov wrote:

On 11/01/16 12:54, Christian Bruel wrote:

Hi Kyrill,

On 01/11/2016 12:32 PM, Kyrill Tkachov wrote:

Hi Christian,

On 07/01/16 15:40, Christian Bruel wrote:
as discussed with Kyrill (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00307.html), this patch avoids confusing (for the testsuite) macro redefinition warning or pedantic errors when the user changes FP versions implicitly with a 
#pragma

GCC target. The warning is kept when the macro is redefined explicitly by the 
user.

tested on arm-linux-gnueabi for {,-mfpu=neon-fp-armv8,-mfpu=neon}


Index: config/arm/arm-c.c
===
--- config/arm/arm-c.c(revision 232101)
+++ config/arm/arm-c.c(working copy)
@@ -23,6 +23,7 @@
#include "c-family/c-common.h"
#include "tm_p.h"
#include "c-family/c-pragma.h"
+#include "stringpool.h"

/* Output C specific EABI object attributes.  These can not be done in
   arm.c because they require information from the C frontend.  */
@@ -245,8 +246,18 @@ arm_pragma_target_parse (tree args, tree

  /* Update macros.  */
  gcc_assert (cur_opt->x_target_flags == target_flags);
-  /* This one can be redefined by the pragma without warning.  */
-  cpp_undef (parse_in, "__ARM_FP");
+
+  /* Don't warn for macros that have context sensitive values depending on
+ other attributes.
+ See warn_of_redefinition, Reset after cpp_create_definition.  */
+  tree acond_macro = get_identifier ("__ARM_NEON_FP");
+  C_CPP_HASHNODE (acond_macro)->flags |= NODE_CONDITIONAL ;
+
+  acond_macro = get_identifier ("__ARM_FP");
+  C_CPP_HASHNODE (acond_macro)->flags |= NODE_CONDITIONAL;
+
+  acond_macro = get_identifier ("__ARM_FEATURE_LDREX");
+  C_CPP_HASHNODE (acond_macro)->flags |= NODE_CONDITIONAL;

I see this mechanism also being used by rs6000, s390 and spu but I'm not very 
familiar with it.
Could you please provide a short explanatino of what NODE_CONDITIONAL means?
I suspec this is ok, but I'd like to get a better understanding of what's going 
on here.

This is part of a larger support for context-sensitive keywords implemented for 
rs6000 (patch digging https://gcc.gnu.org/ml/gcc-patches/2007-12/msg00306.html).

On ARM those preprocessor macros are always defined so we don't need to define 
the macro_to_expand cpp hook.  However their value does legitimately change in 
the specific #pragma target path so we reuse this logic for this path.
The macro will always be correctly recognized on the other paths(#ifdef,...) 
because the NODE_CONDITIONAL bit is cleared when defined (see 
cpp_create_definition). The idea of the original rs6000 patch is that if a 
macro is user-defined it
is not context-sensitive.
So this is absolutely a reuse of a subpart of a larger support, but this logic fits and 
works well for our goal, given that the preprocessor value can change between target 
contexts, and that the bit is not set for "normal" builtin
definition.

In short:  Ask `warn_of_redefinition` to be permissive about those macro 
redefinitions when we come from a pragma target definition, as if we were 
redefining a context-sensitive macro,  the difference is that it is always 
defined.

does this sound clear :-) ?


Thanks, it's much clearer now.
A couple of comments on the patch then

+  tree acond_macro = get_identifier ("__ARM_NEON_FP");
+  C_CPP_HASHNODE (acond_macro)->flags |= NODE_CONDITIONAL ;

So what happens if __ARM_FP was never defined, does get_identifier return 
NULL_TREE?
If so, won't C_CPP_HASHNODE (acond_macro)->flags ICE?


get_identifier returns an allocated tree, even in not in the pool already. So 
won't ICE.



Index: testsuite/gcc.target/arm/pr69180.c
===
--- testsuite/gcc.target/arm/pr69180.c(revision 0)
+++ testsuite/gcc.target/arm/pr69180.c(working copy)
@@ -0,0 +1,16 @@
+/* PR target/69180
+   Check that __ARM_NEON_FP redefinition warns for user setting and not for
+   #pragma GCC target.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-mfloat-abi=softfp -mfpu=neon" } */
+

I believe we should use /* { dg-add-options arm_neon } */ here.


I also first did this, but the test would fail because -pedantic-error set by 
DEFAULT_CFLAGS turns the warning into errors. So I preferred to reset 
explicitly the options.



Thanks for the explanations.
This is ok for trunk.

Kyrill



Thanks,
Kyrill

Ping: check initializer to be zero in .bss-like sections

2016-01-12 Thread Jan Beulich

>>> On 10.12.15 at 08:21,  wrote:
> Just like gas, which has recently learned to reject such initializers,
> gcc shouldn't accept such either.
> ---
> The only question really is whether the new test case should be limited
> to certain targets - I haven't been able to figure out possible valid
> qualifiers to use here.
> 
> gcc/
> 2015-12-10  Jan Beulich  
> 
>   * varasm.c (get_variable_section): Validate initializer in
>   named .bss-like sections.
> 
> gcc/testsuite/
> 2015-12-10  Jan Beulich  
> 
>   * gcc.dg/bss.c: New.
> 
> --- 2015-12-09/gcc/testsuite/gcc.dg/bss.c
> +++ 2015-12-09/gcc/testsuite/gcc.dg/bss.c
> @@ -0,0 +1,8 @@
> +/* Test non-zero initializers in .bss-like sections get properly refused.  
> */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +
> +int __attribute__((section(".bss.local"))) x = 1; /* { dg-error "" "zero 
> init" } */
> +int *__attribute__((section(".bss.local"))) px = &x; /* { dg-error "" "zero 
> init" } */
> +int __attribute__((section(".bss.local"))) y = 0;
> +int *__attribute__((section(".bss.local"))) py = (void*)0;
> --- 2015-12-09/gcc/varasm.c
> +++ 2015-12-09/gcc/varasm.c
> @@ -1150,7 +1150,18 @@ get_variable_section (tree decl, bool pr
>  
>resolve_unique_section (decl, reloc, flag_data_sections);
>if (IN_NAMED_SECTION (decl))
> -return get_named_section (decl, NULL, reloc);
> +{
> +  section *sect = get_named_section (decl, NULL, reloc);
> +
> +  if ((sect->common.flags & SECTION_BSS) && !bss_initializer_p (decl))
> + {
> +   error_at (DECL_SOURCE_LOCATION (decl),
> + "only zero initializers are allowed in section %qs",
> + sect->named.name);
> +   DECL_INITIAL (decl) = error_mark_node;
> + }
> +  return sect;
> +}
>  
>if (ADDR_SPACE_GENERIC_P (as)
>&& !DECL_THREAD_LOCAL_P (decl)
> 
> 
>

[PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices

2016-01-12 Thread Tom de Vries


Hi,

This patch fixes PR69110, a wrong-code bug in autopar.


I.

consider testcase test.c:
...
#define N 1000

unsigned int i = 0;

static void __attribute__((noinline, noclone))
foo (void)
{
  unsigned int z;

  for (z = 0; z < N; ++z)
++i;
}

extern void abort (void);

int
main (void)
{
  foo ();
  if (i != N)
abort ();

  return 0;
}
...

When compiled with -O1 -ftree-parallelize-loops=2 -fno-tree-loop-im, the 
test fails:

...
$ gcc test.c -O1 -ftree-parallelize-loops=2 -Wl,-rpath=$(pwd 
-P)//install/lib64 -fno-tree-loop-im

$ ./a.out
Aborted (core dumped)
$
...


II.

Before parloops, at ivcanon we have the loop body:
...
  :
  # z_10 = PHI 
  # ivtmp_12 = PHI 
  i.1_4 = i;
  _5 = i.1_4 + 1;
  i = _5;
  z_7 = z_10 + 1;
  ivtmp_2 = ivtmp_12 - 1;
  if (ivtmp_2 != 0)
goto ;
  else
goto ;
...

There's a loop-carried dependency in i, that is, the read from i in 
iteration z == 1 depends on the write to i in iteration z == 0. So the 
loop cannot be parallelized. The test-case fails because parloops still 
parallelizes the loop.



III.

Since the loop carried dependency is in-memory, it is not handled by the 
code analyzing reductions, since that code ignores the virtual phi.


So, AFAIU, this loop carried dependency should be handled by the 
dependency testing in loop_parallel_p. And loop_parallel_p returns true 
for this loop.


A comment in loop_parallel_p reads: "Check for problems with 
dependences.  If the loop can be reversed, the iterations are independent."


AFAIU, the loop order can actually be reversed. But, it cannot be 
executed in parallel.


So from this perspective, it seems in this case the comment matches the 
check, but the check is not sufficient.



IV.

OTOH, if we replace the declaration of i with i[1], and replace the 
references of i with i[0], we see that loop_parallel_p fails.  So the 
loop_parallel_p check in this case seems sufficient, and there's 
something else that causes the check to fail in this case.


The difference is in the generated data ref:
- in the 'i[1]' case, we set DR_ACCESS_FNS in dr_analyze_indices to
  vector with a single element: access function 0.
- in the 'i' case, we set DR_ACCESS_FNS to NULL.

This difference causes different handling in the dependency generation, 
in particular in add_distance_for_zero_overlaps which has no effect for 
the 'i' case because  DDR_NUM_SUBSCRIPTS (ddr) == 0 (as a consequence of 
the NULL access_fns of both the source and sink data refs).


From this perspective, it seems that the loop_parallel_p check is 
sufficient, and that dr_analyze_indices shouldn't return a NULL 
access_fns for 'i'.



V.

When compiling with graphite using -floop-parallelize-all --param 
graphite-min-loops-per-function=1, we find:

...
[scop-detection-fail] Graphite cannot handle data-refs in stmt:
# VUSE <.MEM_11>
i.1_4 = i;
...

The function scop_detection::stmt_has_simple_data_refs_p returns false 
because of the code recently added for PR66980 at r228357:

...
  int nb_subscripts = DR_NUM_DIMENSIONS (dr);

  if (nb_subscripts < 1)
{
  free_data_refs (drs);
  return false;
}
...

[ DR_NUM_DIMENSIONS (dr) is 0 as a consequence of the NULL access_fns. ]

This code labels DR_NUM_DIMENSIONS (dr) == 0 as 'data reference analysis 
has failed'.


From this perspective, it seems that the dependence handling should 
bail out once it finds a data ref with DR_NUM_DIMENSIONS (dr) == 0 (or 
DR_ACCESS_FNS == 0).



VI.

This test-case used to pass in 4.6 because in 
find_data_references_in_stmt we had:

...
  /* FIXME -- data dependence analysis does not work correctly for
 objects with invariant addresses in loop nests.  Let us fail
 here until the problem is fixed.  */
  if (dr_address_invariant_p (dr) && nest)
{
  free_data_ref (dr);
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file,
 "\tFAILED as dr address is invariant\n");
  ret = false;
  break;
}
...

That FIXME was removed in the fix for PR46787, at r175704.

The test-case fails in 4.8, and I guess from there onwards.


VII.

The attached patch fixes the problem by returning a zero access function 
for 'i' in dr_analyze_indices.


[ But I can also imagine a solution similar to the graphite fix:
...
@@ -3997,6 +3999,12 @@ find_data_references_in_stmt
   dr = create_data_ref (nest, loop_containing_stmt (stmt),
ref->ref, stmt, ref->is_read);
   gcc_assert (dr != NULL);
+  if (DR_NUM_DIMENSIONS (dr) == 0)
+   {
+ datarefs->release ();
+ return false;
+   }
+
   datarefs->safe_push (dr);
 }
   references.release ();
...

I'm not familiar enough with the dependency analysis code to know where 
exactly this should be fixed. ]


Bootstrapped and reg-tested on x86_64.

OK for trunk?

OK for release branches?

Thanks,
- Tom
Don't return NULL access_fns in dr_analyze

Re: [Patch ifcvt] Add a new parameter to limit if-conversion

2016-01-12 Thread Andreas Schwab

gcc.dg/ifcvt-5.c fails on ia64:

>From ifcvt-5.c.223r.ce1:

== Pass 2 ==


== no more changes

1 possible IF blocks searched.
1 IF blocks converted.
2 true changes made.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

[PATCH] Handle inter-block notes before BARRIER in rtl merge_blocks (PR target/69175, take 2)

2016-01-12 Thread Jakub Jelinek

On Sat, Jan 09, 2016 at 12:27:02AM +0100, Bernd Schmidt wrote:
> Well, I checked a bit more. Most callers of merge_blocks seem to already
> look for barriers if they are a concern and remove them. This occurs
> multiple times in ifcvt.c and cfgcleanup.c. Oddly,
> merge_blocks_move_predecessor_nojumps uses next_nonnote_insn to find the
> barrier, while merge_blocks_move_successor_nojumps uses just NEXT_INSN. That
> should probably be fixed too.
> 
> So the situation is a bit odd in that most callers remove the barrier but
> merge_blocks tries to handle an isolated barrier as well. The area could
> probably cleaned up a little, but on the whole I still lean towards
> requiring the caller to remove an isolated barrier. That leaves the RTL in a
> more consistent state before the call to merge_blocks.

So is the following ok for trunk?
Bootstrapped/regtested on x86_64-linux and i686-linux, and Kyrill has kindly
bootstrapped/regtested it on arm too.

2016-01-12  Jakub Jelinek  

PR target/69175
* ifcvt.c (cond_exec_process_if_block): When removing the last
insn from then_bb, remove also any possible barriers that follow it.

* g++.dg/opt/pr69175.C: New test.

--- gcc/ifcvt.c.jj  2016-01-04 14:55:53.0 +0100
+++ gcc/ifcvt.c 2016-01-11 16:13:22.833174933 +0100
@@ -739,7 +739,7 @@ cond_exec_process_if_block (ce_if_block
   rtx_insn *from = then_first_tail;
   if (!INSN_P (from))
from = find_active_insn_after (then_bb, from);
-  delete_insn_chain (from, BB_END (then_bb), false);
+  delete_insn_chain (from, get_last_bb_insn (then_bb), false);
 }
   if (else_last_head)
 delete_insn_chain (first_active_insn (else_bb), else_last_head, false);
--- gcc/testsuite/g++.dg/opt/pr69175.C.jj   2016-01-08 13:04:04.084805432 
+0100
+++ gcc/testsuite/g++.dg/opt/pr69175.C  2016-01-08 13:03:47.0 +0100
@@ -0,0 +1,29 @@
+// PR target/69175
+// { dg-do compile }
+// { dg-options "-O2" }
+// { dg-additional-options "-march=armv7-a -mfloat-abi=hard -mfpu=vfpv3-d16 
-mthumb" { target { arm_hard_vfp_ok && arm_thumb2_ok } } }
+
+struct A { A *c, *d; } a;
+struct B { A *e; A *f; void foo (); };
+void *b;
+
+void
+B::foo ()
+{
+  if (b) 
+{
+  A *n = (A *) b;
+  if (b == e)
+   if (n == f)
+ e = __null;
+   else
+ e->c = __null;
+  else
+   n->d->c = &a;
+  n->d = e;
+  if (e == __null)
+   e = f = n;
+  else
+   e = n;
+}
+}


Jakub

Re: Backport: [Patch AArch64] Reinstate CANNOT_CHANGE_MODE_CLASS to fix pr67609

2016-01-12 Thread Marcus Shawcroft

On 18 December 2015 at 12:13, James Greenhalgh  wrote:

> Looking back at the patch just before I hit commit, the 4.9 backport was
> a little different (as we still have a CANNOT_CHANGE_MODE_CLASS there).
> We can drop the aarch64-protos.h and aarch64.h changes, and we need to
> change the sense of the new check, such that we can return true for the
> case added by this patch, and false for the limited number of other safe
> cases in 4.9.
>
> Bootstrapped on aarch64-none-linux-gnu.
>
> OK?
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2015-12-14  James Greenhalgh  
>
> Backport from mainline.
> 2015-12-09  James Greenhalgh  
>
> PR rtl-optimization/67609
> * config/aarch64/aarch64.c
> (aarch64_cannot_change_mode_class): Don't permit word_mode
> subregs of full vector modes.
> * config/aarch64/aarch64.md (aarch64_movdi_low): Use
> zero_extract rather than truncate.
> (aarch64_movdi_high): Likewise.
>
> gcc/testsuite/
>
> 2015-12-14  James Greenhalgh  
>
> Backport from mainline.
> 2015-12-09  James Greenhalgh  
>
> PR rtl-optimization/67609
> * gcc.dg/torture/pr67609.c: New.
>

OK /Marcus

[PATCH] [RTEMS] Add Cortex-M7 multilib for FPU support

2016-01-12 Thread Sebastian Huber

gcc/ChangeLog
2016-01-12  Sebastian Huber  

* config/arm/t-rtems: Add cortex-m7/fpv5-d16 multilib.
---
 gcc/config/arm/t-rtems | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/arm/t-rtems b/gcc/config/arm/t-rtems
index 3b62181..02dcd65 100644
--- a/gcc/config/arm/t-rtems
+++ b/gcc/config/arm/t-rtems
@@ -1,7 +1,7 @@
 # Custom RTEMS multilibs for ARM
 
-MULTILIB_OPTIONS  = mbig-endian mthumb 
march=armv6-m/march=armv7-a/march=armv7-r/march=armv7-m 
mfpu=neon/mfpu=vfpv3-d16/mfpu=fpv4-sp-d16 mfloat-abi=hard
-MULTILIB_DIRNAMES = eb thumb armv6-m armv7-a armv7-r armv7-m neon vfpv3-d16 
fpv4-sp-d16 hard
+MULTILIB_OPTIONS  = mbig-endian mthumb 
march=armv6-m/march=armv7-a/march=armv7-r/march=armv7-m/mcpu=cortex-m7 
mfpu=neon/mfpu=vfpv3-d16/mfpu=fpv4-sp-d16/mfpu=fpv5-d16 mfloat-abi=hard
+MULTILIB_DIRNAMES = eb thumb armv6-m armv7-a armv7-r armv7-m cortex-m7 neon 
vfpv3-d16 fpv4-sp-d16 fpv5-d16 hard
 
 # Enumeration of multilibs
 
@@ -16,5 +16,6 @@ MULTILIB_REQUIRED += mthumb/march=armv7-a
 MULTILIB_REQUIRED += mthumb/march=armv7-r/mfpu=vfpv3-d16/mfloat-abi=hard
 MULTILIB_REQUIRED += mthumb/march=armv7-r
 MULTILIB_REQUIRED += mthumb/march=armv7-m/mfpu=fpv4-sp-d16/mfloat-abi=hard
+MULTILIB_REQUIRED += mthumb/mcpu=cortex-m7/mfpu=fpv5-d16/mfloat-abi=hard
 MULTILIB_REQUIRED += mthumb/march=armv7-m
 MULTILIB_REQUIRED += mthumb
-- 
1.8.4.5

Re: [PATCH] PR target/69225: Set FLT_EVAL_METHOD to 2 only if 387 FPU is used

2016-01-12 Thread Uros Bizjak

On Tue, Jan 12, 2016 at 1:15 AM, Joseph Myers  wrote:
> On Mon, 11 Jan 2016, H.J. Lu wrote:
>
>> Here is the updated patch.  Joseph, is this OK?
>
> I have no objections to this patch.

Thinking some more, it looks to me that we also have to return 2 when
SSE2 (SSE doubles) is not enabled.

I'm testing following patch:

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 6c63871..b71cf4f 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -693,8 +693,9 @@ extern const char *host_detect_local_cpu (int
argc, const char **argv);
only SSE, rounding is correct; when using both SSE and the FPU,
the rounding precision is indeterminate, since either may be chosen
apparently at random.  */
-#define TARGET_FLT_EVAL_METHOD \
-  (TARGET_MIX_SSE_I387 ? -1 : (TARGET_80387 && !TARGET_SSE_MATH) ? 2 : 0)
+#define TARGET_FLT_EVAL_METHOD \
+  (TARGET_MIX_SSE_I387 ? -1\
+   : (TARGET_80387 && !TARGET_SSE2 && !TARGET_SSE_MATH) ? 2 : 0)

 /* Whether to allow x87 floating-point arithmetic on MODE (one of
SFmode, DFmode and XFmode) in the current excess precision

Uros.

Re: [PATCH] PR target/69225: Set FLT_EVAL_METHOD to 2 only if 387 FPU is used

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 12:10:20PM +0100, Uros Bizjak wrote:
> On Tue, Jan 12, 2016 at 1:15 AM, Joseph Myers  wrote:
> > On Mon, 11 Jan 2016, H.J. Lu wrote:
> >
> >> Here is the updated patch.  Joseph, is this OK?
> >
> > I have no objections to this patch.
> 
> Thinking some more, it looks to me that we also have to return 2 when
> SSE2 (SSE doubles) is not enabled.
> 
> I'm testing following patch:

That looks weird.  If TARGET_80387 and !TARGET_SSE_MATH, then no matter
whether sse2 is enabled or not, normal floating point operations will be
performed in 387 stack and thus FLT_EVAL_METHOD should be 2, not 0.
Do you want to do this because some instructions might be vectorized and
therefore end up in sse registers?  For -std=c99 that shouldn't happen,
already the C FE would promote all the arithmetics to be done in long
doubles, and for -std=gnu99 it is acceptable if non-vectorized computations
honor FLT_EVAL_METHOD and vectorized ones don't.
> 
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 6c63871..b71cf4f 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -693,8 +693,9 @@ extern const char *host_detect_local_cpu (int
> argc, const char **argv);
> only SSE, rounding is correct; when using both SSE and the FPU,
> the rounding precision is indeterminate, since either may be chosen
> apparently at random.  */
> -#define TARGET_FLT_EVAL_METHOD \
> -  (TARGET_MIX_SSE_I387 ? -1 : (TARGET_80387 && !TARGET_SSE_MATH) ? 2 : 0)
> +#define TARGET_FLT_EVAL_METHOD \
> +  (TARGET_MIX_SSE_I387 ? -1\
> +   : (TARGET_80387 && !TARGET_SSE2 && !TARGET_SSE_MATH) ? 2 : 0)
> 
>  /* Whether to allow x87 floating-point arithmetic on MODE (one of
> SFmode, DFmode and XFmode) in the current excess precision
> 
> Uros.

Jakub

Re: [PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices

2016-01-12 Thread Richard Biener

On Tue, 12 Jan 2016, Tom de Vries wrote:

> Hi,
> 
> This patch fixes PR69110, a wrong-code bug in autopar.
> 
> 
> I.
> 
> consider testcase test.c:
> ...
> #define N 1000
> 
> unsigned int i = 0;
> 
> static void __attribute__((noinline, noclone))
> foo (void)
> {
>   unsigned int z;
> 
>   for (z = 0; z < N; ++z)
> ++i;
> }
> 
> extern void abort (void);
> 
> int
> main (void)
> {
>   foo ();
>   if (i != N)
> abort ();
> 
>   return 0;
> }
> ...
> 
> When compiled with -O1 -ftree-parallelize-loops=2 -fno-tree-loop-im, the test
> fails:
> ...
> $ gcc test.c -O1 -ftree-parallelize-loops=2 -Wl,-rpath=$(pwd
> -P)//install/lib64 -fno-tree-loop-im
> $ ./a.out
> Aborted (core dumped)
> $
> ...
> 
> 
> II.
> 
> Before parloops, at ivcanon we have the loop body:
> ...
>   :
>   # z_10 = PHI 
>   # ivtmp_12 = PHI 
>   i.1_4 = i;
>   _5 = i.1_4 + 1;
>   i = _5;
>   z_7 = z_10 + 1;
>   ivtmp_2 = ivtmp_12 - 1;
>   if (ivtmp_2 != 0)
> goto ;
>   else
> goto ;
> ...
> 
> There's a loop-carried dependency in i, that is, the read from i in iteration
> z == 1 depends on the write to i in iteration z == 0. So the loop cannot be
> parallelized. The test-case fails because parloops still parallelizes the
> loop.
> 
> 
> III.
> 
> Since the loop carried dependency is in-memory, it is not handled by the code
> analyzing reductions, since that code ignores the virtual phi.
> 
> So, AFAIU, this loop carried dependency should be handled by the dependency
> testing in loop_parallel_p. And loop_parallel_p returns true for this loop.
> 
> A comment in loop_parallel_p reads: "Check for problems with dependences.  If
> the loop can be reversed, the iterations are independent."
> 
> AFAIU, the loop order can actually be reversed. But, it cannot be executed in
> parallel.
> 
> So from this perspective, it seems in this case the comment matches the check,
> but the check is not sufficient.
> 
> 
> IV.
> 
> OTOH, if we replace the declaration of i with i[1], and replace the references
> of i with i[0], we see that loop_parallel_p fails.  So the loop_parallel_p
> check in this case seems sufficient, and there's something else that causes
> the check to fail in this case.
> 
> The difference is in the generated data ref:
> - in the 'i[1]' case, we set DR_ACCESS_FNS in dr_analyze_indices to
>   vector with a single element: access function 0.
> - in the 'i' case, we set DR_ACCESS_FNS to NULL.
> 
> This difference causes different handling in the dependency generation, in
> particular in add_distance_for_zero_overlaps which has no effect for the 'i'
> case because  DDR_NUM_SUBSCRIPTS (ddr) == 0 (as a consequence of the NULL
> access_fns of both the source and sink data refs).
> 
> From this perspective, it seems that the loop_parallel_p check is sufficient,
> and that dr_analyze_indices shouldn't return a NULL access_fns for 'i'.
> 
> 
> V.
> 
> When compiling with graphite using -floop-parallelize-all --param
> graphite-min-loops-per-function=1, we find:
> ...
> [scop-detection-fail] Graphite cannot handle data-refs in stmt:
> # VUSE <.MEM_11>
> i.1_4 = i;
> ...
> 
> The function scop_detection::stmt_has_simple_data_refs_p returns false because
> of the code recently added for PR66980 at r228357:
> ...
>   int nb_subscripts = DR_NUM_DIMENSIONS (dr);
> 
>   if (nb_subscripts < 1)
>   {
>   free_data_refs (drs);
>   return false;
> }
> ...
> 
> [ DR_NUM_DIMENSIONS (dr) is 0 as a consequence of the NULL access_fns. ]
> 
> This code labels DR_NUM_DIMENSIONS (dr) == 0 as 'data reference analysis has
> failed'.
> 
> From this perspective, it seems that the dependence handling should bail out
> once it finds a data ref with DR_NUM_DIMENSIONS (dr) == 0 (or DR_ACCESS_FNS ==
> 0).
> 
> 
> VI.
> 
> This test-case used to pass in 4.6 because in find_data_references_in_stmt we
> had:
> ...
>   /* FIXME -- data dependence analysis does not work correctly for
>  objects with invariant addresses in loop nests.  Let us fail
>  here until the problem is fixed.  */
>   if (dr_address_invariant_p (dr) && nest)
>   {
>   free_data_ref (dr);
>   if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file,
>  "\tFAILED as dr address is invariant\n");
>   ret = false;
>   break;
>   }
> ...
> 
> That FIXME was removed in the fix for PR46787, at r175704.
> 
> The test-case fails in 4.8, and I guess from there onwards.
> 
> 
> VII.
> 
> The attached patch fixes the problem by returning a zero access function for
> 'i' in dr_analyze_indices.
> 
> [ But I can also imagine a solution similar to the graphite fix:
> ...
> @@ -3997,6 +3999,12 @@ find_data_references_in_stmt
>dr = create_data_ref (nest, loop_containing_stmt (stmt),
> ref->ref, stmt, ref->is_read);
>gcc_assert (dr != NULL);
> +  if (DR_NUM_DIMENSIONS (dr) == 0)
> +   {
> + datarefs->release ();
>

Re: [PATCH] Cleanup vect testsuite includes

2016-01-12 Thread Richard Biener

On Mon, Jan 11, 2016 at 3:01 PM, Alan Lawrence  wrote:
> This was an attempt to make more of the vect testsuite compilable with a 
> stage-1
> compiler, i.e. without standard header files like stdlib.h, to ease looking 
> for
> differences in assembly output. (It is still necessary to comment out most of
> tree-vect.h to do this, but at least such temporary/local changes can be
> restricted to one file.)
>
> Inclusion of stdlib.h and signal.h are quite inconsistent, with some files
> explicitly declaring common functions like abort, and others #including the
> header, sometimes totally unnecessarily.
>
> I left files using malloc, calloc and free as is, tho I guess the same 
> treatment
> could be applied there.
>
> Tested (natively) on x86_64-none-linux-gnu and aarch64-none-linux-gnu.
>
> Is this OK for trunk?

Ok.

Richard.

> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/fast-math-bb-slp-call-3.c: Declare functions as 'extern'
> rather than #including math.h & stdlib.h.
> * gcc.dg/vect/pr47001.c: Declare abort as 'extern', remove stdlib.h.
> * gcc.dg/vect/pr49771.c: Likewise.
> * gcc.dg/vect/vect-10-big-array.c: Likewise.
> * gcc.dg/vect/vect-neg-store-1.c: Likewise.
> * gcc.dg/vect/vect-neg-store-2.c: Likewise.
> * gcc.dg/vect/slp-37.c: Change NULL to 0, remove stdlib.h.
> * gcc.dg/vect/pr40254.c: Remove unnecessary include of stdlib.h.
> * gcc.dg/vect/pr44507.c: Likewise.
> * gcc.dg/vect/pr45902.c: Likewise.
> * gcc.dg/vect/slp-widen-mult-half.c: Likewise.
> * gcc.dg/vect/vect-117.c: Likewise.
> * gcc.dg/vect/vect-99.c: Likewise.
> * gcc.dg/vect/vect-aggressive-1.c: Likewise.
> * gcc.dg/vect/vect-cond-1.c: Likewise.
> * gcc.dg/vect/vect-cond-2.c: Likewise.
> * gcc.dg/vect/vect-cond-3.c: Likewise.
> * gcc.dg/vect/vect-cond-4.c: Likewise.
> * gcc.dg/vect/vect-mask-load-1.c: Likewise.
> * gcc.dg/vect/vect-mask-loadstore-1.c: Likewise.
> * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-1.c: Likewise.
> * gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-2.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4.c: Likewise.
> * gcc.dg/vect/vect-widen-mult-const-s16.c: Likewise.
> * gcc.dg/vect/vect-widen-mult-const-u16.c: Likewise.
> * gcc.dg/vect/vect-widen-mult-half-u8.c: Likewise.
> * gcc.dg/vect/vect-widen-mult-half.c: Likewise.
> * gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c: Remove unnecessary
> include of signal.h.
> * gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c: Likewise.
> * gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c: Likewise.
> * gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c: Likewise.
> * gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c: Likewise.
> * gcc.dg/vect/no-trapping-math-vect-ifcvt-16.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-16.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-17.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-2.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-3.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-4.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-5.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-6.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-7.c: Likewise.
> * gcc.dg/vect/vect-ifcvt-9.c: Likewise.
> * gcc.dg/vect/vect-outer-5.c: Likewise.
> * gcc.dg/vect/vect-outer-6.c: Likewise.
> * gcc.dg/vect/vect-strided-u8-i8-gap4-unknown.c: Remove unnecessary
> include of stdio.h.
> ---
>  gcc/testsuite/gcc.dg/vect/fast-math-bb-slp-call-3.c | 8 ++--
>  gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-11.c  | 1 -
>  gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-12.c  | 1 -
>  gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-13.c  | 1 -
>  gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-14.c  | 1 -
>  gcc/testsuite/gcc.dg/vect/no-trapping-math-vect-ifcvt-15.c  | 1 -
>  gcc/testsuite/gcc.dg/vect/no-vfa-vect-dv-2.c| 1 -
>  gcc/testsuite/gcc.dg/vect/pr40254.c | 1 -
>  gcc/testsuite/gcc.dg/vect/pr44507.c | 1 -
>  gcc/testsuite/gcc.dg/vect/pr45902.c | 1 -
>  gcc/testsuite/gcc.dg/vect/pr47001.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/pr49771.c | 3 ++-
>  gcc/testsuite/gcc.dg/vect/slp-37.c  | 5 ++---
>  gcc/testsuite/gcc.dg/vect/slp-widen-mult-half.c | 1 -
>  gcc/testsuite/gcc.dg/vect/vect-10-big-array.c   | 3 ++-
>  gcc/testsuite/gcc.dg/vect/vect-117.c| 1 -
>  gcc

Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt

2016-01-12 Thread James Greenhalgh

On Mon, Jan 11, 2016 at 04:57:56PM -0600, Evandro Menezes wrote:
> On 01/11/2016 05:53 AM, James Greenhalgh wrote:
> >I'd like to switch the logic around in aarch64.c such that
> >-mlow-precision-recip-sqrt causes us to always emit the low-precision
> >software expansion for reciprocal square root. I have two reasons to do
> >this; first is consistency across -mcpu targets, second is enabling more
> >-mcpu targets to use the flag for peak tuning.
> >
> >I don't much like that the precision we use for -mlow-precision-recip-sqrt
> >differs between cores (and possibly compiler revisions). Yes, we're
> >under -ffast-math but I take this flag to mean the user explicitly wants the
> >low-precision expansion, and we should not diverge from that based on an
> >internal decision as to what is optimal for performance in the
> >high-precision case. I'd prefer to keep things as predictable as possible,
> >and here that means always emitting the low-precision expansion when asked.
> >
> >Judging by the comments in the thread proposing the reciprocal square
> >root optimisation, this will benefit all cores currently supported by GCC.
> >To be clear, we would still not expand in the high-precision case for any
> >cores which do not explicitly ask for it. Currently that is Cortex-A57
> >and xgene, though I will be proposing a patch to remove Cortex-A57 from
> >that list shortly.
> >
> >Which gives my second motivation for this patch. -mlow-precision-recip-sqrt
> >is intended as a tuning flag for situations where performance is more
> >important than precision, but the current logic requires setting an
> >internal flag which also changes the performance characteristics where
> >high-precision is needed. This conflates two decisions the target might
> >want to make, and reduces the applicability of an option targets might
> >want to enable for performance. In particular, I'd still like to see
> >-mlow-precision-recip-sqrt continue to emit the cheaper, low-precision
> >sequence for floats under Cortex-A57.
> >
> >Based on that reasoning, this patch makes the appropriate change to the
> >logic. I've checked with the current -mcpu values to ensure that behaviour
> >without -mlow-precision-recip-sqrt does not change, and that behaviour
> >with -mlow-precision-recip-sqrt is to emit the low precision sequences.
> >
> >I've also put this through bootstrap and test on aarch64-none-linux-gnu
> >with no issues.
> >
> >OK?
> 
> Yes, it LGTM.

Thanks.

> I appreciate the idea of uniformity whne an option is specified,
> which led me to think if it wouldn't be a good ide to add an option
> that would have the effect of focring the emission of the reciprocal
> square root, effectively forcing the flag
> AARCH64_EXTRA_TUNE_RECIP_SQRT on, regardless of the tuning flags for
> the given core.  I think that this flag would be particularly useful
> when specifying flags for specific functions, irrespective of the
> core.
> 
> Thoughts?

Currently you can do this using the (mostly unsupported) -moverride
mechanism as -moverride=tune=recip_sqrt from the command line.
I'm not sure how reliable using this from
__attribute__((target("override=tune=recip_sqrt"))) would be, I wrote a small
testcase that didn't work as intended, but whether that is a bug or a
design decision I'm not yet sure. I think the logic for parsing the
target attribute is set up to reapply the command-line override string
over whichever tuning options you apply through the attribute, rather than
to allow you to apply a per-function override.

As to whether we'd want to expose this as a fully supported,
user-visible setting, I'd rather not. Our claim is that for the
higher-precision sequences the results are close enough that we can
consider this like reassociation width or other core-specific tuning
parameters that we don't expose. What I'm hoping to avoid is a
proliferation of supported options which are not in anybody's regular
testing matrix. This one would not be so bad as it is automatically
enabled by some cores. For now I'd rather not add the option.

Thanks,
James

Re: [PATCH PR68911]Check overflow when computing range information from loop niter bound

2016-01-12 Thread Richard Biener

On Mon, Jan 11, 2016 at 5:11 PM, Bin Cheng  wrote:
> Hi,
> A wrong code bug is reported in PR68911, which GCC generates infinite loop 
> for below example after loop niter analysis changes.  After that change, 
> scev_probably_wraps_p identifies that e_1 in below case never overflow/wrap:
> :
> e_15 = e_1 + 1;
>
> :
> # e_1 = PHI 
> if (e_1 <= 93)
>   goto ;
> else
>   goto ;
>
> The loop niter analysis gives us:
> Analyzing # of iterations of loop 2
>   exit condition [e_8, + , 1] <= 93
>   bounds on difference of bases: -4294967202 ... 93
>   result:
> zero if e_8 > 94
> # of iterations 94 - e_8, bounded by 94
>
> I think the analysis is good.  When scev_probably_wraps_p returns false, it 
> may imply two possible cases.
> CASE 1) If loop's latch gets executed, then we know the SCEV doesn't 
> overflow/wrap during loop execution.
> CASE 2) If loop's latch isn't executed, i.e., the loop exits immediately at 
> its first check on exit condition.  In this case the SCEV doesn't 
> overflow/wrap because it isn't increased at all.
>
> The real problem I think is VRP only checks scev_probably_wraps_p then 
> assumes SCEV won't overflow/wrap after niter.bound iterations.  This is not 
> true for CASE 2).  If we have a large enough starting value for e_1, for 
> example, 0xfff8 in this example, e_1 is guaranteed not overflow/wrap only 
> because the loop exits immediately, not after niter.bound interations.  Here 
> VRP assuming "e_1 + niter.bound" doesn't overflow/wrap is wrong.
>
> This patch fixes the issue by adding overflow check in range information 
> computed for "e_1 + niter.bound".  It catches overflow/wrap of the expression 
> when loop may exit immediately.
>
> With this change, actually I think it may be possible for us to remove the 
> call to scev_probably_wraps_p, though I didn't do that in this patch.
>
> Bootstrap and test on x86_64 and AArch64.  Is it OK?

Ok.

Thanks,
Richard.

> Thanks,
> bin
>
> 2016-01-10  Bin Cheng  
>
> PR tree-optimization/68911
> * tree-vrp.c (adjust_range_with_scev): Check overflow in range
> information computed for expression "init + nit * step".
>
> gcc/testsuite/ChangeLog
> 2016-01-10  Bin Cheng  
>
> PR tree-optimization/68911
> * gcc.c-torture/execute/pr68911.c: New test.
>
>

[PATCH, PING] DWARF: process all TYPE_DECL nodes when iterating on scopes

2016-01-12 Thread Pierre-Marie de Rodat


Hello,

Ping for the patch submitted in 
. Thanks!


--
Pierre-Marie de Rodat

[PATCH] Fix PR69007

2016-01-12 Thread Richard Biener


The following fixes fallout by no longer overwriting detected patterns.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-01-12  Richard Biener  

PR tree-optimization/69007
* tree-vect-patterns.c (vect_vect_recog_func_ptrs): Move
widen_sum after dot_prod and sad.

Index: gcc/tree-vect-patterns.c
===
*** gcc/tree-vect-patterns.c(revision 232261)
--- gcc/tree-vect-patterns.c(working copy)
*** struct vect_recog_func
*** 75,85 
vect_recog_func_ptr fn;
const char *name;
  };
  static vect_recog_func vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
{ vect_recog_widen_mult_pattern, "widen_mult" },
-   { vect_recog_widen_sum_pattern, "widen_sum" },
{ vect_recog_dot_prod_pattern, "dot_prod" },
{ vect_recog_sad_pattern, "sad" },
{ vect_recog_pow_pattern, "pow" },
{ vect_recog_widen_shift_pattern, "widen_shift" },
{ vect_recog_over_widening_pattern, "over_widening" },
--- 75,89 
vect_recog_func_ptr fn;
const char *name;
  };
+ 
+ /* Note that ordering matters - the first pattern matching on a stmt
+is taken which means usually the more complex one needs to preceed
+the less comples onex (widen_sum only after dot_prod or sad for example).  
*/
  static vect_recog_func vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
{ vect_recog_widen_mult_pattern, "widen_mult" },
{ vect_recog_dot_prod_pattern, "dot_prod" },
{ vect_recog_sad_pattern, "sad" },
+   { vect_recog_widen_sum_pattern, "widen_sum" },
{ vect_recog_pow_pattern, "pow" },
{ vect_recog_widen_shift_pattern, "widen_shift" },
{ vect_recog_over_widening_pattern, "over_widening" },

[PATCH] Fix PR69053

2016-01-12 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-01-12  Richard Biener  

PR tree-optimization/69053
* tree-vect-loop.c (get_initial_def_for_reduction): Properly
convert initial value for cond reductions.

* g++.dg/torture/pr69053.C: New testcase.

Index: gcc/tree-vect-loop.c
===
*** gcc/tree-vect-loop.c(revision 232261)
--- gcc/tree-vect-loop.c(working copy)
*** get_initial_def_for_reduction (gimple *s
*** 4075,4084 
tree *elts;
int i;
bool nested_in_vect_loop = false;
-   tree init_value;
REAL_VALUE_TYPE real_init_val = dconst0;
int int_init_val = 0;
gimple *def_stmt = NULL;
  
gcc_assert (vectype);
nunits = TYPE_VECTOR_SUBPARTS (vectype);
--- 4075,4084 
tree *elts;
int i;
bool nested_in_vect_loop = false;
REAL_VALUE_TYPE real_init_val = dconst0;
int int_init_val = 0;
gimple *def_stmt = NULL;
+   gimple_seq stmts = NULL;
  
gcc_assert (vectype);
nunits = TYPE_VECTOR_SUBPARTS (vectype);
*** get_initial_def_for_reduction (gimple *s
*** 4107,4122 
return vect_create_destination_var (init_val, vectype);
  }
  
-   if (TREE_CONSTANT (init_val))
- {
-   if (SCALAR_FLOAT_TYPE_P (scalar_type))
- init_value = build_real (scalar_type, TREE_REAL_CST (init_val));
-   else
- init_value = build_int_cst (scalar_type, TREE_INT_CST_LOW (init_val));
- }
-   else
- init_value = init_val;
- 
switch (code)
  {
case WIDEN_SUM_EXPR:
--- 4107,4112 
*** get_initial_def_for_reduction (gimple *s
*** 4193,4199 
break;
  }
  }
!   init_def = build_vector_from_val (vectype, init_value);
break;
  
default:
--- 4183,4192 
break;
  }
  }
!   init_val = gimple_convert (&stmts, TREE_TYPE (vectype), init_val);
!   if (! gimple_seq_empty_p (stmts))
! gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
!   init_def = build_vector_from_val (vectype, init_val);
break;
  
default:
Index: gcc/testsuite/g++.dg/torture/pr69053.C
===
*** gcc/testsuite/g++.dg/torture/pr69053.C  (revision 0)
--- gcc/testsuite/g++.dg/torture/pr69053.C  (working copy)
***
*** 0 
--- 1,17 
+ // { dg-do compile }
+ // { dg-additional-options "-march=core-avx2" { target x86_64-*-* i?86-*-* } }
+ struct A {
+ int *elem[1];
+ };
+ int a, d, e;
+ A *b;
+ int *c;
+ int main()
+ {
+   int *f = 0;
+   for (; e; e++)
+ if (b->elem[e])
+   f = c;
+   if (f)
+ a = d;
+ }

Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt

2016-01-12 Thread Kyrill Tkachov


Hi all,

On 12/01/16 11:32, James Greenhalgh wrote:

On Mon, Jan 11, 2016 at 04:57:56PM -0600, Evandro Menezes wrote:

On 01/11/2016 05:53 AM, James Greenhalgh wrote:

I'd like to switch the logic around in aarch64.c such that
-mlow-precision-recip-sqrt causes us to always emit the low-precision
software expansion for reciprocal square root. I have two reasons to do
this; first is consistency across -mcpu targets, second is enabling more
-mcpu targets to use the flag for peak tuning.

I don't much like that the precision we use for -mlow-precision-recip-sqrt
differs between cores (and possibly compiler revisions). Yes, we're
under -ffast-math but I take this flag to mean the user explicitly wants the
low-precision expansion, and we should not diverge from that based on an
internal decision as to what is optimal for performance in the
high-precision case. I'd prefer to keep things as predictable as possible,
and here that means always emitting the low-precision expansion when asked.

Judging by the comments in the thread proposing the reciprocal square
root optimisation, this will benefit all cores currently supported by GCC.
To be clear, we would still not expand in the high-precision case for any
cores which do not explicitly ask for it. Currently that is Cortex-A57
and xgene, though I will be proposing a patch to remove Cortex-A57 from
that list shortly.

Which gives my second motivation for this patch. -mlow-precision-recip-sqrt
is intended as a tuning flag for situations where performance is more
important than precision, but the current logic requires setting an
internal flag which also changes the performance characteristics where
high-precision is needed. This conflates two decisions the target might
want to make, and reduces the applicability of an option targets might
want to enable for performance. In particular, I'd still like to see
-mlow-precision-recip-sqrt continue to emit the cheaper, low-precision
sequence for floats under Cortex-A57.

Based on that reasoning, this patch makes the appropriate change to the
logic. I've checked with the current -mcpu values to ensure that behaviour
without -mlow-precision-recip-sqrt does not change, and that behaviour
with -mlow-precision-recip-sqrt is to emit the low precision sequences.

I've also put this through bootstrap and test on aarch64-none-linux-gnu
with no issues.

OK?

Yes, it LGTM.

Thanks.


I appreciate the idea of uniformity whne an option is specified,
which led me to think if it wouldn't be a good ide to add an option
that would have the effect of focring the emission of the reciprocal
square root, effectively forcing the flag
AARCH64_EXTRA_TUNE_RECIP_SQRT on, regardless of the tuning flags for
the given core.  I think that this flag would be particularly useful
when specifying flags for specific functions, irrespective of the
core.

Thoughts?

Currently you can do this using the (mostly unsupported) -moverride
mechanism as -moverride=tune=recip_sqrt from the command line.
I'm not sure how reliable using this from
__attribute__((target("override=tune=recip_sqrt"))) would be, I wrote a small
testcase that didn't work as intended, but whether that is a bug or a
design decision I'm not yet sure. I think the logic for parsing the
target attribute is set up to reapply the command-line override string
over whichever tuning options you apply through the attribute, rather than
to allow you to apply a per-function override.


As a clarification: we don't support an "override" target attribute on aarch64.
I had a patch earlier in the year to hook up the override string parsing 
machinery
into the attributes parsing code, but didn't end up proposing it.
IIRC the syntax of the override string (using '=' multiple times) would 
needlessly
complicate the parsing code for something that's not intended to be used by 
regular
users but rather by power users that are exploring gcc internals.

Thanks,
Kyrill


As to whether we'd want to expose this as a fully supported,
user-visible setting, I'd rather not. Our claim is that for the
higher-precision sequences the results are close enough that we can
consider this like reassociation width or other core-specific tuning
parameters that we don't expose. What I'm hoping to avoid is a
proliferation of supported options which are not in anybody's regular
testing matrix. This one would not be so bad as it is automatically
enabled by some cores. For now I'd rather not add the option.

Thanks,
James

Re: [PATCH, testsuite] Stabilize test result output of dump-noaddr

2016-01-12 Thread Mike Stump

On Jan 12, 2016, at 12:48 AM, Thomas Preud'homme 
 wrote:
> This patch solve this problem by replacing the static pass number in the 
> output by a star, allowing for a stable output while retaining easy copy/
> pasting in shell.

> Is this ok for stage3?

Ok.

Re: [PR tree-optimization/64946] Push integer type conversion to ABS_EXPR argument when possible.

2016-01-12 Thread Matthew Wahab


On 11/01/16 17:46, Richard Biener wrote:

On January 11, 2016 5:36:33 PM GMT+01:00, Bernd Schmidt  
wrote:

On 01/11/2016 05:33 PM, Matthew Wahab wrote:


The case I'm trying to fix has (short)abs((int)short_var). I'd

thought

that if
abs(short_var) was undefined because the result couldn't be

represented

then the type
conversion from int to short would also be undefined. In fact, it's
implementation
defined and S4.5 of the GCC manual says that the value is reduced

until

it can be
represented. So (short)abs((int)short_var) will produce a value when
abs(short_var) is undefined meaning this transformation isn't

correct.

I'll drop this patch.


Maybe we could have an optab and corresponding internal function for an

abs that's always defined.


I'd like to have ABSU_EXPR (or allow unsigned result on ABS_EXPR).



I'll see if I can do anything along those lines. This looks like something for stage 
1 though.


Matthew

Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt

2016-01-12 Thread James Greenhalgh

On Tue, Jan 12, 2016 at 05:53:21AM +, Kumar, Venkataramanan wrote:
> Hi James,
> 
> > -Original Message-
> > From: James Greenhalgh [mailto:james.greenha...@arm.com]
> > Sent: Monday, January 11, 2016 5:24 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: n...@arm.com; marcus.shawcr...@arm.com;
> > richard.earns...@arm.com; Kumar, Venkataramanan;
> > philipp.toms...@theobroma-systems.com; pins...@gmail.com;
> > kyrylo.tkac...@arm.com; e.mene...@samsung.com
> > Subject: [Patch AArch64] Use software sqrt expansion always for -mlow-
> > precision-recip-sqrt
> > 
> > 
> > Hi,
> > 
> > I'd like to switch the logic around in aarch64.c such that -mlow-precision-
> > recip-sqrt causes us to always emit the low-precision software expansion for
> > reciprocal square root. I have two reasons to do this; first is consistency
> > across -mcpu targets, second is enabling more -mcpu targets to use the flag
> > for peak tuning.
> > 
> > I don't much like that the precision we use for -mlow-precision-recip-sqrt
> > differs between cores (and possibly compiler revisions). Yes, we're under -
> > ffast-math but I take this flag to mean the user explicitly wants the low-
> > precision expansion, and we should not diverge from that based on an
> > internal decision as to what is optimal for performance in the 
> > high-precision
> > case. I'd prefer to keep things as predictable as possible, and here that
> > means always emitting the low-precision expansion when asked.
> > 
> > Judging by the comments in the thread proposing the reciprocal square root
> > optimisation, this will benefit all cores currently supported by GCC.
> > To be clear, we would still not expand in the high-precision case for any 
> > cores
> > which do not explicitly ask for it. Currently that is Cortex-A57 and xgene,
> > though I will be proposing a patch to remove Cortex-A57 from that list
> > shortly.
> > 
> > Which gives my second motivation for this patch. -mlow-precision-recip-sqrt
> > is intended as a tuning flag for situations where performance is more
> > important than precision, but the current logic requires setting an internal
> > flag which also changes the performance characteristics where high-precision
> > is needed. This conflates two decisions the target might want to make, and
> > reduces the applicability of an option targets might want to enable for
> > performance. In particular, I'd still like to see -mlow-precision-recip-sqrt
> > continue to emit the cheaper, low-precision sequence for floats under
> > Cortex-A57.
> > 
> > Based on that reasoning, this patch makes the appropriate change to the
> > logic. I've checked with the current -mcpu values to ensure that behaviour
> > without -mlow-precision-recip-sqrt does not change, and that behaviour
> > with -mlow-precision-recip-sqrt is to emit the low precision sequences.
> > 
> > I've also put this through bootstrap and test on aarch64-none-linux-gnu with
> > no issues.
> > 
> > OK?
> > 
> > Thanks,
> > James
> > 
> 
> Yes I like enabling this optimization for all cpus target via
> -mlow-precision-recip-sqrt .
>  
> If my understanding is correct for cortex-a57 we now need to use only
> -mlow-precision-recip-sqrt to emit software sqrt expansion?
> 
> In the below code 
> ---snip---
> void
> aarch64_emit_swrsqrt (rtx dst, rtx src)
> {
> 
> 
>   int iterations = double_mode ? 3 : 2;
> 
>   if (flag_mrecip_low_precision_sqrt)
> iterations--;
>  ---snip---
> 
> Now cortex-a57 case we will always do  2 and 1 steps  for double and float
> and  3 and 2 will never be used. Should we make it 2 and 1 as default? Or
> any target still needs to use 3 and 2. 

The code here should handle two cases:

  1) Normal -Ofast case -> Some targets use the estimate expansion with
 3 iterations for double, 2 for float. Other targets use the hardware
 fsqrt/fdiv instructions.
  2) -mlow-precision-recip-sqrt -> All targets use the estimate expansion
 with 2 iterations for double, 1 for float.

-mlow-precision-recip-sqrt is a specialisation to be used only when the
programmer knows the lower precision is acceptable. It should not be on
by default...

> Ps: I remember reducing iterations benefited gromacs but caused some VE in
> other FP benchmarks.  

... For exactly this reason :-)

Thanks,
James

[PATCH, i386, AVX512] PR target/69228: Restrict default masks for prefetch gathers/scatters instructions.

2016-01-12 Thread Alexander Fomin

This patch addresses PR target/69228. Expanding non-mask builtins
for prefetch gather/scatter insns results in using default mask.
Although Intel ISA Extensions Programming Reference statement about
EVEX.aaa field in prefetch gather/scatter insns encoding is a bit
opaque, no default mask is allowed for that family.

Bootstrapped and regtested on x86_64-linux-gnu. OK for trunk?

Thanks,
Alexander
---
gcc/

PR target/69228
* config/i386/sse.md (define_expand "avx512pf_gatherpfsf"):
Change first operand predicate from register_or_constm1_operand
to register_operand.
(define_expand "avx512pf_gatherpfdf"): Likewise.
(define_expand "avx512pf_scatterpfsf"): Likewise.
(define_expand "avx512pf_scatterpfdf"): Likewise.
(define_insn "*avx512pf_gatherpfsf"): Remove.
(define_insn "*avx512pf_gatherpfdf"): Likewise.
(define_insn "*avx512pf_scatterpfsf"): Likewise.
(define_insn "*avx512pf_scatterpfdf"): Likewise.
* config/i386/i386.c (ix86_expand_builtin): Remove first operand
comparison with constm1_rtx from vec_prefetch_gen part.

gcc/testsuite

PR target/69228
* gcc.target/i386/avx512pf-vscatterpf0dpd-1.c: Adjust.
* gcc.target/i386/avx512pf-vscatterpf0dps-1.c: Likewise.
* gcc.target/i386/avx512pf-vscatterpf0qpd-1.c: Likewise.
* gcc.target/i386/avx512pf-vscatterpf0qps-1.c: Likewise.
* gcc.target/i386/avx512pf-vscatterpf1dpd-1.c: Likewise.
* gcc.target/i386/avx512pf-vscatterpf1dps-1.c: Likewise.
* gcc.target/i386/avx512pf-vscatterpf1qpd-1.c: Likewise.
* gcc.target/i386/avx512pf-vscatterpf1qps-1.c: Likewise.
---
 gcc/config/i386/i386.c |   5 +-
 gcc/config/i386/sse.md | 120 +
 .../gcc.target/i386/avx512pf-vscatterpf0dpd-1.c|   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf0dps-1.c|   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf0qpd-1.c|   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf0qps-1.c|   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf1dpd-1.c|   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf1dps-1.c|   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf1qpd-1.c|   3 +-
 .../gcc.target/i386/avx512pf-vscatterpf1qps-1.c|   3 +-
 10 files changed, 14 insertions(+), 135 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index aac0847..c37eb74 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -41821,13 +41821,12 @@ rdseed_step:
 
   op0 = fixup_modeless_constant (op0, mode0);
 
-  if (GET_MODE (op0) == mode0
- || (GET_MODE (op0) == VOIDmode && op0 != constm1_rtx))
+  if (GET_MODE (op0) == mode0 || GET_MODE (op0) == VOIDmode)
{
  if (!insn_data[icode].operand[0].predicate (op0, mode0))
op0 = copy_to_mode_reg (mode0, op0);
}
-  else if (op0 != constm1_rtx)
+  else
{
  op0 = copy_to_reg (op0);
  op0 = simplify_gen_subreg (mode0, op0, GET_MODE (op0), 0);
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 278dd38..b96be36 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15674,7 +15674,7 @@
 
 (define_expand "avx512pf_gatherpfsf"
   [(unspec
- [(match_operand: 0 "register_or_constm1_operand")
+ [(match_operand: 0 "register_operand")
   (mem:
(match_par_dup 5
  [(match_operand 2 "vsib_address_operand")
@@ -15716,37 +15716,10 @@
(set_attr "prefix" "evex")
(set_attr "mode" "XI")])
 
-(define_insn "*avx512pf_gatherpfsf"
-  [(unspec
- [(const_int -1)
-  (match_operator: 4 "vsib_mem_operator"
-   [(unspec:P
-  [(match_operand:P 1 "vsib_address_operand" "Tv")
-   (match_operand:VI48_512 0 "register_operand" "v")
-   (match_operand:SI 2 "const1248_operand" "n")]
-  UNSPEC_VSIBADDR)])
-  (match_operand:SI 3 "const_2_to_3_operand" "n")]
- UNSPEC_GATHER_PREFETCH)]
-  "TARGET_AVX512PF"
-{
-  switch (INTVAL (operands[3]))
-{
-case 3:
-  return "vgatherpf0ps\t{%4|%4}";
-case 2:
-  return "vgatherpf1ps\t{%4|%4}";
-default:
-  gcc_unreachable ();
-}
-}
-  [(set_attr "type" "sse")
-   (set_attr "prefix" "evex")
-   (set_attr "mode" "XI")])
-
 ;; Packed double variants
 (define_expand "avx512pf_gatherpfdf"
   [(unspec
- [(match_operand: 0 "register_or_constm1_operand")
+ [(match_operand: 0 "register_operand")
   (mem:V8DF
(match_par_dup 5
  [(match_operand 2 "vsib_address_operand")
@@ -15788,37 +15761,10 @@
(set_attr "prefix" "evex")
(set_attr "mode" "XI")])
 
-(define_insn "*avx512pf_gatherpfdf"
-  [(unspec
- [(const_int -1)
-  (match_operator:V8DF 4 "vsib_mem_operator"
-   [(unspec:P
-  [(match_operand:P 1 "vsib_address_operand" "Tv")
-   (match_operand:VI4_256_8_512 0 "register_operand" "v")
-   (match_operand:SI 2 "cons

Re: [RFC][ARM][PR67714] signed char is zero-extended instead of sign-extended

2016-01-12 Thread Kyrill Tkachov


Hi Kugan,

On 12/01/16 06:22, kugan wrote:


When promote_function_mode and promote_ssa_mode changes the sign differently, 
following  is the cause for the problem in PR67714.

 _8 = fn1D.5055 ();
  f_13 = _8;

function returns -15 and in _8 it is sign extended. In the second statement, we say that the value is SUBREG_PROMOTED and promoted sign in unsigned which is wrong. When the value in _8 had come other ways than function call it would be 
correct (as it would be zero extended). Attached patch checks that and uses the correct promoted sign in this case.


The problem with the approach is, when you the following piece of code, we can 
still fail. But, I dont think I will ever happen. Any thoughts?


 _8 = fn1D.5055 ();
  _9 = _8
  f_13 = _9;

This is similar to PR65932 where sign change in PROMOTE_MODE causes problem for 
parameter. But need a different fix there.
Regression tested on arm-none-linux-gnu with no new regression. I also 
bootstrapped regression tested (on an earlier version of trunk) for 
x86_64-none-linux-gnu with no new regressions. If this OK, I will do a full 
testing again. Comments?

Thanks,
Kugan


gcc/ChangeLog:

2016-01-12  Kugan Vivekanandarajah  

* expr.c (expand_expr_real_1): Fix promoted sign in SUBREG_PRMOTED
for SSA_NAME when rhs has a value returned from function call.



Thanks for working on this.
I'll leave to other to comment on this part as I'm not overly familiar with 
that area but...


gcc/testsuite/ChangeLog:

2016-01-12  Kugan Vivekanandarajah  

* gcc.target/arm/pr67714.c: New test.


This test doesn't contain any arm-specific code so can you please put it in 
gcc.c-torture/execute/

Thanks,
Kyrill

Re: [RFC][ARM][PR67714] signed char is zero-extended instead of sign-extended

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 12:04:22PM +, Kyrill Tkachov wrote:
> >2016-01-12  Kugan Vivekanandarajah  
> >
> >* expr.c (expand_expr_real_1): Fix promoted sign in SUBREG_PRMOTED

I'd like to just point at the ChangeLog typo - PRMOTED instead of PROMOTED.

Jakub

Re: [RFC][ARM][PR67714] signed char is zero-extended instead of sign-extended

2016-01-12 Thread Kyrill Tkachov



On 12/01/16 12:08, Jakub Jelinek wrote:

On Tue, Jan 12, 2016 at 12:04:22PM +, Kyrill Tkachov wrote:

2016-01-12  Kugan Vivekanandarajah  

* expr.c (expand_expr_real_1): Fix promoted sign in SUBREG_PRMOTED

I'd like to just point at the ChangeLog typo - PRMOTED instead of PROMOTED.


Since we're on the subject of the ChangeLog...
It should also refer to the PR: PR middle-end/67714

Kyrill

Re: [PATCH] PR target/69225: Set FLT_EVAL_METHOD to 2 only if 387 FPU is used

2016-01-12 Thread Uros Bizjak

On Tue, Jan 12, 2016 at 12:18 PM, Jakub Jelinek  wrote:
> On Tue, Jan 12, 2016 at 12:10:20PM +0100, Uros Bizjak wrote:
>> On Tue, Jan 12, 2016 at 1:15 AM, Joseph Myers  
>> wrote:
>> > On Mon, 11 Jan 2016, H.J. Lu wrote:
>> >
>> >> Here is the updated patch.  Joseph, is this OK?
>> >
>> > I have no objections to this patch.
>>
>> Thinking some more, it looks to me that we also have to return 2 when
>> SSE2 (SSE doubles) is not enabled.
>>
>> I'm testing following patch:
>
> That looks weird.  If TARGET_80387 and !TARGET_SSE_MATH, then no matter
> whether sse2 is enabled or not, normal floating point operations will be
> performed in 387 stack and thus FLT_EVAL_METHOD should be 2, not 0.
> Do you want to do this because some instructions might be vectorized and
> therefore end up in sse registers?  For -std=c99 that shouldn't happen,
> already the C FE would promote all the arithmetics to be done in long
> doubles, and for -std=gnu99 it is acceptable if non-vectorized computations
> honor FLT_EVAL_METHOD and vectorized ones don't.

Eh, today is just not the day for science.

Hopefully, the logic in the patch below is correct:

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 6c63871..5b42e89 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -693,8 +693,9 @@ extern const char *host_detect_local_cpu (int
argc, const char **argv);
only SSE, rounding is correct; when using both SSE and the FPU,
the rounding precision is indeterminate, since either may be chosen
apparently at random.  */
-#define TARGET_FLT_EVAL_METHOD \
-  (TARGET_MIX_SSE_I387 ? -1 : (TARGET_80387 && !TARGET_SSE_MATH) ? 2 : 0)
+#define TARGET_FLT_EVAL_METHOD \
+  (TARGET_MIX_SSE_I387 ? -1\
+   : TARGET_80387 && !(TARGET_SSE2 && TARGET_SSE_MATH) ? 2 : 0)

 /* Whether to allow x87 floating-point arithmetic on MODE (one of
SFmode, DFmode and XFmode) in the current excess precision

Uros.

Re: [PATCH : RL78] Disable interrupts during hardware multiplication routines

2016-01-12 Thread Nick Clifton


Hi Kaushik,


+/* Structure for G13 MDUC registers.  */
+struct mduc_reg_type
+{
+  unsigned int address;
+  enum machine_mode mode;
+  bool is_volatile;
+};
+
+struct mduc_reg_type  mduc_regs[NUM_OF_MDUC_REGS] =
+  {{0xf00e8, QImode, true},
+   {0x0, HImode, true},
+   {0x2, HImode, true},
+   {0xf2224, HImode, true},
+   {0xf00e0, HImode, true},
+   {0xf00e2, HImode, true}};


If the is_volatile field is true for all members of this array, why 
bother having it at all ?  (If I remember correctly in your previous 
patch only some of these addresses were being treated as volatile 
registers, not all of them).




+/* Check if the block uses mul/div insns for G13 target.  */
+static bool
+check_mduc_usage ()


Add a void type to the declaration.  Ie:

  check mduc_usage (void)



+{
+  rtx insn;
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+  {


You should have a blank line between the end of the variable 
declarations and the start of the code.




+FOR_BB_INSNS (bb, insn)
+{
+  if (get_attr_is_g13_muldiv_insn (insn) == IS_G13_MULDIV_INSN_YES)
+return true;


I am not sure - but it might be safer to check INSN_P(insn) first 
before checking for the muldiv attribute.




+  for (int i = 0; i 

Indentation.



+  mem_mduc = gen_rtx_MEM (QImode, GEN_INT (mduc_regs[i].address));
+  MEM_VOLATILE_P (mem_mduc) = 1;
+  emit_insn (gen_movqi (gen_rtx_REG (QImode, A_REG), mem_mduc));
+}
+else
+{
+  mem_mduc = gen_rtx_MEM (HImode, GEN_INT (mduc_regs[i].address));
+  MEM_VOLATILE_P (mem_mduc) = 1;
+  emit_insn (gen_movqi (gen_rtx_REG (HImode, AX_REG), mem_mduc));
+}


In the else case you are using gen_movqi to move an HImode value...

Also you could simplify the above code like this:

  for (int i = 0; i < NUM_OF_MDUC_REGS; i++)
{
   mduc_reg_type *reg = mduc_regs + i;
   rtx mem_mduc = gen_rtx_MEM (reg->mode, GEN_INT (reg->address));

   MEM_VOLATILE_P (mem_mduc) = reg->is_volatile;
   if (reg->mode == QImode)
 emit_insn (gen_movqi (gen_rtx_REG (QImode, A_REG), mem_mduc));
   else
 emit_insn (gen_movhi (gen_rtx_REG (HImode, AX_REG), mem_mduc));
   emit_insn (gen_push (gen_rtx_REG (HImode, AX_REG)));
}


fs = cfun->machine->framesize_locals + cfun->machine->framesize_outgoing;
+  if (MUST_SAVE_MDUC_REGISTERS && (!crtl->is_leaf || check_mduc_usage ()))
+fs = fs + NUM_OF_MDUC_REGS * 2;
if (fs > 0)
  {
/* If we need to subtract more than 254*3 then it is faster and
@@ -1426,6 +1490,8 @@
else
  {
fs = cfun->machine->framesize_locals + 
cfun->machine->framesize_outgoing;
+  if (MUST_SAVE_MDUC_REGISTERS && (!crtl->is_leaf || check_mduc_usage ()))
+fs = fs + NUM_OF_MDUC_REGS * 2;
if (fs > 254 * 3)


No - this is wrong.  "fs" is the amount of extra space needed in the 
stack frame to hold local variables and outgoing variables.  It should 
not include the stack space used for already pushed registers.




Index: gcc/config/rl78/rl78.h
===
--- gcc/config/rl78/rl78.h(revision 2871)
+++ gcc/config/rl78/rl78.h(working copy)
@@ -28,6 +28,7 @@
  #define TARGET_G14(rl78_cpu_type == CPU_G14)


+#define NUM_OF_MDUC_REGS 6


Why define this here ?  It is only ever used in rl78,c and it can be 
computed automatically by applying the ARRAY_SIZE macro to the mduc_regs 
array.





Index: gcc/config/rl78/rl78.opt
===
--- gcc/config/rl78/rl78.opt(revision 2871)
+++ gcc/config/rl78/rl78.opt(working copy)
@@ -103,4 +103,10 @@
  Target Mask(ES0)
  Assume ES is zero throughout program execution, use ES: for read-only data.

+msave-mduc-in-interrupts
+Target Mask(SAVE_MDUC_REGISTERS)
+Stores the MDUC registers in interrupt handlers for G13 target.

+mno-save-mduc-in-interrupts
+Target RejectNegative Mask(NO_SAVE_MDUC_REGISTERS)
+Does not save the MDUC registers in interrupt handlers for G13 target.


This looks wrong.  Surely you only need the msave-mduc-in-interrupts 
definition.  That will automatically allow -mno-save-mduc-in-interrupts, 
since it does not have the RejectNegative attribute.  Also these is no 
need to have two separate target mask bits.  Just SAVE_MDUC_REGISTERS 
will do.






Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi(revision 2871)
+++ gcc/doc/invoke.texi(working copy)


You should also add the name of the new option to the Machine Dependent 
Options section of the manual.  (Approx line 896 in invoke.texi)




+@item -msave-mduc-in-interrupts
+@item -mno-save-mduc-in-interrupts
+@opindex msave-mduc-in-interrupts
+@opindex mno-save-mduc-in-interrupts
+Specifies that interrupt handler functions should preserve the
+MDUC registers.  This is only necessary if normal code might use
+the MDUC registers, for example because it performs multiplication
+and division operations. The default is to ig

Re: [PATCH] Handle inter-block notes before BARRIER in rtl merge_blocks (PR target/69175, take 2)

2016-01-12 Thread Bernd Schmidt


On 01/12/2016 11:17 AM, Jakub Jelinek wrote:


PR target/69175
* ifcvt.c (cond_exec_process_if_block): When removing the last
insn from then_bb, remove also any possible barriers that follow it.

* g++.dg/opt/pr69175.C: New test.


Ok.


Bernd

Re: [PATCH] Be less conservative in process_{output,input}_constraints (PR target/65689)

2016-01-12 Thread James Greenhalgh

On Wed, Apr 08, 2015 at 11:00:59PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> Right now, stmt.c on constraints not hardcoded in it, and not
> define_{register,address,memory}_constraint just assumes the
> constraint might allow both reg and mem.  Unfortunately, on some
> constraints which clearly can't allow either of those leads to errors
> at -O0, because the expander doesn't try so hard to expand it as
> EXPAND_INITIALIZER.
> 
> The following patch is an attempt to handle at least the easy cases
> - define_constraint like:
> (define_constraint "S"
>   "A constraint that matches an absolute symbolic address."
>   (and (match_code "const,symbol_ref,label_ref")
>(match_test "aarch64_symbolic_address_p (op)")))
> where the match_code clearly proves that it never can match any REG/SUBREG,
> nor MEM, by teaching genpreds.c to emit an extra inline function that
> stmt.c can in process_{output,input}_constraint use for the unknown
> constraints.
> 
> On x86_64/i686 this only detects constraint G as not allowing reg nor mem
> (it is match_code const_double), and V (plus < and >, but those are
> hardcoded in stmt.c already) that it allows mem but not reg.
> On aarch64, in the first category it detects several constraints.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2015-04-08  Jakub Jelinek  
> 
>   PR target/65689
>   * genpreds.c (struct constraint_data): Add maybe_allows_reg and
>   maybe_allows_mem bitfields.
>   (maybe_allows_none_start, maybe_allows_none_end,
>   maybe_allows_reg_start, maybe_allows_reg_end, maybe_allows_mem_start,
>   maybe_allows_mem_end): New variables.
>   (compute_maybe_allows): New function.
>   (add_constraint): Use it to initialize maybe_allows_reg and
>   maybe_allows_mem fields.
>   (choose_enum_order): Sort the non-is_register/is_const_int/is_memory/
>   is_address constraints such that those that allow neither mem nor
>   reg come first, then those that only allow reg but not mem, then
>   those that only allow mem but not reg, then the rest.
>   (write_allows_reg_mem_function): New function.
>   (write_tm_preds_h): Call it.
>   * stmt.c (parse_output_constraint, parse_input_constraint): Use
>   the generated insn_extra_constraint_allows_reg_mem function
>   instead of always setting *allows_reg = true; *allows_mem = true;
>   for unknown extra constraints.

Hi Jakub,

This applies clean to gcc-5-branch. I've bootstrapped and tested it on
x86_64-none-linux-gnu, aarch64-none-linux-gnu and arm-none-linux-gnueabihf
with no problems.

Is this OK to commit to gcc-5-branch so I can close out PR 65689?

Thanks,
James

Re: [PATCH] PR target/69225: Set FLT_EVAL_METHOD to 2 only if 387 FPU is used

2016-01-12 Thread Uros Bizjak

On Tue, Jan 12, 2016 at 1:12 PM, Uros Bizjak  wrote:
> On Tue, Jan 12, 2016 at 12:18 PM, Jakub Jelinek  wrote:
>> On Tue, Jan 12, 2016 at 12:10:20PM +0100, Uros Bizjak wrote:
>>> On Tue, Jan 12, 2016 at 1:15 AM, Joseph Myers  
>>> wrote:
>>> > On Mon, 11 Jan 2016, H.J. Lu wrote:
>>> >
>>> >> Here is the updated patch.  Joseph, is this OK?
>>> >
>>> > I have no objections to this patch.
>>>
>>> Thinking some more, it looks to me that we also have to return 2 when
>>> SSE2 (SSE doubles) is not enabled.
>>>
>>> I'm testing following patch:
>>
>> That looks weird.  If TARGET_80387 and !TARGET_SSE_MATH, then no matter
>> whether sse2 is enabled or not, normal floating point operations will be
>> performed in 387 stack and thus FLT_EVAL_METHOD should be 2, not 0.
>> Do you want to do this because some instructions might be vectorized and
>> therefore end up in sse registers?  For -std=c99 that shouldn't happen,
>> already the C FE would promote all the arithmetics to be done in long
>> doubles, and for -std=gnu99 it is acceptable if non-vectorized computations
>> honor FLT_EVAL_METHOD and vectorized ones don't.
>
> Eh, today is just not the day for science.
>
> Hopefully, the logic in the patch below is correct:
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 6c63871..5b42e89 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -693,8 +693,9 @@ extern const char *host_detect_local_cpu (int
> argc, const char **argv);
> only SSE, rounding is correct; when using both SSE and the FPU,
> the rounding precision is indeterminate, since either may be chosen
> apparently at random.  */
> -#define TARGET_FLT_EVAL_METHOD \
> -  (TARGET_MIX_SSE_I387 ? -1 : (TARGET_80387 && !TARGET_SSE_MATH) ? 2 : 0)
> +#define TARGET_FLT_EVAL_METHOD \
> +  (TARGET_MIX_SSE_I387 ? -1\
> +   : TARGET_80387 && !(TARGET_SSE2 && TARGET_SSE_MATH) ? 2 : 0)
>
>  /* Whether to allow x87 floating-point arithmetic on MODE (one of
> SFmode, DFmode and XFmode) in the current excess precision

Using this patch, SSE math won't be emitted for a simple testcase
using " -O2 -msse -m32 -std=c99 -mfpmath=sse" compile flags:

float test (float a, float b)
{
  return a + b;
}

since we start with:

test (float a, float b)
{
  long double _2;
  long double _4;
  long double _5;
  float _6;

  :
  _2 = (long double) a_1(D);
  _4 = (long double) b_3(D);
  _5 = _2 + _4;
  _6 = (float) _5;
  return _6;
}

This is counter-intuitive, so I'd say we leave things as they are. The
situation where only floats are evaluated as floats and doubles are
evaluated as long doubles is not covered in the FLT_EVAL_METHOD spec.

Uros.

Re: [PATCH] Be less conservative in process_{output,input}_constraints (PR target/65689)

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 12:30:20PM +, James Greenhalgh wrote:
> > 2015-04-08  Jakub Jelinek  
> > 
> > PR target/65689
> > * genpreds.c (struct constraint_data): Add maybe_allows_reg and
> > maybe_allows_mem bitfields.
> > (maybe_allows_none_start, maybe_allows_none_end,
> > maybe_allows_reg_start, maybe_allows_reg_end, maybe_allows_mem_start,
> > maybe_allows_mem_end): New variables.
> > (compute_maybe_allows): New function.
> > (add_constraint): Use it to initialize maybe_allows_reg and
> > maybe_allows_mem fields.
> > (choose_enum_order): Sort the non-is_register/is_const_int/is_memory/
> > is_address constraints such that those that allow neither mem nor
> > reg come first, then those that only allow reg but not mem, then
> > those that only allow mem but not reg, then the rest.
> > (write_allows_reg_mem_function): New function.
> > (write_tm_preds_h): Call it.
> > * stmt.c (parse_output_constraint, parse_input_constraint): Use
> > the generated insn_extra_constraint_allows_reg_mem function
> > instead of always setting *allows_reg = true; *allows_mem = true;
> > for unknown extra constraints.
> 
> Hi Jakub,
> 
> This applies clean to gcc-5-branch. I've bootstrapped and tested it on
> x86_64-none-linux-gnu, aarch64-none-linux-gnu and arm-none-linux-gnueabihf
> with no problems.
> 
> Is this OK to commit to gcc-5-branch so I can close out PR 65689?

Ok, thanks.

Jakub

Re: [PATCH] PR target/69225: Set FLT_EVAL_METHOD to 2 only if 387 FPU is used

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 01:32:05PM +0100, Uros Bizjak wrote:
> Using this patch, SSE math won't be emitted for a simple testcase
> using " -O2 -msse -m32 -std=c99 -mfpmath=sse" compile flags:
> 
> float test (float a, float b)
> {
>   return a + b;
> }
> 
> since we start with:
> 
> test (float a, float b)
> {
>   long double _2;
>   long double _4;
>   long double _5;
>   float _6;
> 
>   :
>   _2 = (long double) a_1(D);
>   _4 = (long double) b_3(D);
>   _5 = _2 + _4;
>   _6 = (float) _5;
>   return _6;
> }
> 
> This is counter-intuitive, so I'd say we leave things as they are. The
> situation where only floats are evaluated as floats and doubles are
> evaluated as long doubles is not covered in the FLT_EVAL_METHOD spec.

Well, for the -fexcess-precision=standard case (== -std=c99) FLT_EVAL_METHOD
2 doesn't hurt, that forces in the FE long double computation.  While if it
is 0 with -msse -mfpmath=sse, it means that the FE leaves computations as is
and they are computed in float precision for floats and in long double
precision for doubles.  For -fexcess-precision=fast it is different, because
the FE doesn't do anything, so in the end it is mixed in that case.
So, for -msse -mfpmath=sse, I think either we need FLT_EVAL_METHOD 2 or -1
or 2 for -fexcess-precision=standard and -1 for -fexcess-precision=fast.

Jakub

Re: [PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices

2016-01-12 Thread Tom de Vries


On 12/01/16 12:22, Richard Biener wrote:

Doesnt' the same issue apply to


>unsigned int *p;
>
>static void __attribute__((noinline, noclone))
>foo (void)
>{
>   unsigned int z;
>
>   for (z = 0; z < N; ++z)
> ++(*p);
>}

thus when we have a MEM_REF[p_1]?  SCEV will not analyze
its evolution to a POLYNOMIAL_CHREC and thus access_fns will
be NULL again.



I didn't manage to trigger this scenario, though I could probably make 
it happen by modifying ftree-loop-im to work in one case (the load of 
the value of p) but not the other (the *p load and store).



I think avoiding a NULL access_fns is ok but it should be done
unconditionally, not only for the DECL_P case.


Ok, I'll retest and commit this patch.

Thanks,
- Tom
Don't return NULL access_fns in dr_analyze_indices

2016-01-12  Tom de Vries  

	* tree-data-ref.c (dr_analyze_indices): Don't return NULL access_fns.

	* gcc.dg/autopar/pr69110.c: New test.

	* testsuite/libgomp.c/pr69110.c: New test.

---
 gcc/testsuite/gcc.dg/autopar/pr69110.c | 19 +++
 gcc/tree-data-ref.c|  3 +++
 libgomp/testsuite/libgomp.c/pr69110.c  | 26 ++
 3 files changed, 48 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/autopar/pr69110.c b/gcc/testsuite/gcc.dg/autopar/pr69110.c
new file mode 100644
index 000..e236015
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/autopar/pr69110.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -ftree-parallelize-loops=2 -fno-tree-loop-im -fdump-tree-parloops-details" } */
+
+#define N 1000
+
+unsigned int i = 0;
+
+void
+foo (void)
+{
+  unsigned int z;
+  for (z = 0; z < N; ++z)
+++i;
+}
+
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 0 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "FAILED: data dependencies exist across iterations" 1 "parloops" } } */
+
+
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index a40f40d..6503012 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1023,6 +1023,9 @@ dr_analyze_indices (struct data_reference *dr, loop_p nest, loop_p loop)
 		build_int_cst (reference_alias_ptr_type (ref), 0));
 }
 
+  if (access_fns == vNULL)
+access_fns.safe_push (integer_zero_node);
+
   DR_BASE_OBJECT (dr) = ref;
   DR_ACCESS_FNS (dr) = access_fns;
 }
diff --git a/libgomp/testsuite/libgomp.c/pr69110.c b/libgomp/testsuite/libgomp.c/pr69110.c
new file mode 100644
index 000..0d9e5ca
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr69110.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=2 -O1 -fno-tree-loop-im" } */
+
+#define N 1000
+
+unsigned int i = 0;
+
+static void __attribute__((noinline, noclone))
+foo (void)
+{
+  unsigned int z;
+  for (z = 0; z < N; ++z)
+++i;
+}
+
+extern void abort (void);
+
+int
+main (void)
+{
+  foo ();
+  if (i != N)
+abort ();
+
+  return 0;
+}

C++ PATCH to abate shift warnings (PR c++/68979)

2016-01-12 Thread Marek Polacek

Seems that people find compile-time error on the following testcase overly
pedantic.  I.e. that "enum A { X = -1 << 1 };" should compile, at least with
-fpermissive.  So I've changed the error_at into permerror and the return value
of cxx_eval_check_shift_p now depends on flag_permissive.  Luckily, I didn't
have to modify any of the existing tests.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-01-12  Marek Polacek  

PR c++/68979
* constexpr.c (cxx_eval_check_shift_p): Use permerror rather than
error_at and return negated flag_permissive.

* g++.dg/warn/permissive-1.C: New test.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index e60180e..dbcc242 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -1512,17 +1512,17 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_sgn (rhs) == -1)
 {
   if (!ctx->quiet)
-   error_at (loc, "right operand of shift expression %q+E is negative",
- build2_loc (loc, code, type, lhs, rhs));
-  return true;
+   permerror (loc, "right operand of shift expression %q+E is negative",
+  build2_loc (loc, code, type, lhs, rhs));
+  return !flag_permissive;
 }
   if (compare_tree_int (rhs, uprec) >= 0)
 {
   if (!ctx->quiet)
-   error_at (loc, "right operand of shift expression %q+E is >= than "
- "the precision of the left operand",
- build2_loc (loc, code, type, lhs, rhs));
-  return true;
+   permerror (loc, "right operand of shift expression %q+E is >= than "
+  "the precision of the left operand",
+  build2_loc (loc, code, type, lhs, rhs));
+  return !flag_permissive;
 }
 
   /* The value of E1 << E2 is E1 left-shifted E2 bit positions; [...]
@@ -1536,9 +1536,10 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_sgn (lhs) == -1)
{
  if (!ctx->quiet)
-   error_at (loc, "left operand of shift expression %q+E is negative",
- build2_loc (loc, code, type, lhs, rhs));
- return true;
+   permerror (loc,
+  "left operand of shift expression %q+E is negative",
+  build2_loc (loc, code, type, lhs, rhs));
+ return !flag_permissive;
}
   /* For signed x << y the following:
 (unsigned) x >> ((prec (lhs) - 1) - y)
@@ -1555,9 +1556,9 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_lt (integer_one_node, t))
{
  if (!ctx->quiet)
-   error_at (loc, "shift expression %q+E overflows",
- build2_loc (loc, code, type, lhs, rhs));
- return true;
+   permerror (loc, "shift expression %q+E overflows",
+  build2_loc (loc, code, type, lhs, rhs));
+ return !flag_permissive;
}
 }
   return false;
diff --git gcc/testsuite/g++.dg/warn/permissive-1.C 
gcc/testsuite/g++.dg/warn/permissive-1.C
index e69de29..7223e68 100644
--- gcc/testsuite/g++.dg/warn/permissive-1.C
+++ gcc/testsuite/g++.dg/warn/permissive-1.C
@@ -0,0 +1,8 @@
+// PR c++/68979
+// { dg-do compile }
+// { dg-options "-fpermissive -Wno-shift-overflow -Wno-shift-count-overflow 
-Wno-shift-count-negative" }
+
+enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" { 
target c++11 } }
+enum B { BB = 1 << -4 }; // { dg-warning "operand of shift expression" }
+enum C { CC = 1 << 100 }; // { dg-warning "operand of shift expression" }
+enum D { DD = 31 << 30 }; // { dg-warning "shift expression" "" { target c++11 
} }

Marek

Re: [hsa 2/10] Modifications to libgomp proper

2016-01-12 Thread Alexander Monakov

Hello, Martin, Jakub, community,

This part of the patch:

On Mon, 7 Dec 2015, Martin Jambor wrote:
> include/
>   * gomp-constants.h (GOMP_DEVICE_HSA): New macro.
[snip]
>   (GOMP_kernel_launch_attributes): New type.
>   (GOMP_hsa_kernel_dispatch): New type.

is going to break build of NVPTX cross-compiler, because it uses uint32_t,
uint64_t types like below, but those types will not be available when building
nvptx libgcc.  gomp-constants.h is #include'd in libgcc via tm.h and
offload.h.

Note how other files in include/ need to do a special dance with #ifdef
HAVE_STDINT_H to include  and obtain uint64_t.

Shall I move the problematic structs into a separate file, gomp-types.h?

Thanks.
Alexander

> diff --git a/include/gomp-constants.h b/include/gomp-constants.h
> index dffd631..1dae474 100644
> --- a/include/gomp-constants.h
> +++ b/include/gomp-constants.h
[snip]
> +/* Structure describing the run-time and grid properties of an HSA kernel
> +   lauch.  */
> +
> +struct GOMP_kernel_launch_attributes
> +{
> +  /* Number of dimensions the workload has.  Maximum number is 3.  */
> +  uint32_t ndim;
> +  /* Size of the grid in the three respective dimensions.  */
> +  uint32_t gdims[3];
> +  /* Size of work-groups in the respective dimensions.  */
> +  uint32_t wdims[3];
> +};

Re: C++ PATCH to abate shift warnings (PR c++/68979)

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 01:52:01PM +0100, Marek Polacek wrote:
> --- gcc/testsuite/g++.dg/warn/permissive-1.C
> +++ gcc/testsuite/g++.dg/warn/permissive-1.C
> @@ -0,0 +1,8 @@
> +// PR c++/68979
> +// { dg-do compile }
> +// { dg-options "-fpermissive -Wno-shift-overflow -Wno-shift-count-overflow 
> -Wno-shift-count-negative" }
> +
> +enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" { 
> target c++11 } }
> +enum B { BB = 1 << -4 }; // { dg-warning "operand of shift expression" }
> +enum C { CC = 1 << 100 }; // { dg-warning "operand of shift expression" }
> +enum D { DD = 31 << 30 }; // { dg-warning "shift expression" "" { target 
> c++11 } }

Shouldn't this test be limited to
// { dg-do compile { target int32 } }
or better yet replace the 100 and 30 above with
say __SIZEOF_INT__ * 4 * __CHAR_BIT__ - 4 and __SIZEOF_INT__ * __CHAR_BIT__ - 2
?
I'd guess that on say int16 targets, or int64 targets (if we have any at
some point) or int128 targets this wouldn't do what you are expecting.
{ target int32 } is not exactly right, because it still assumes __CHAR_BIT__ == 
8
and for other char sizes it could fail.

Jakub

Re: [PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices

2016-01-12 Thread Richard Biener

On Tue, 12 Jan 2016, Tom de Vries wrote:

> On 12/01/16 12:22, Richard Biener wrote:
> > Doesnt' the same issue apply to
> > 
> > > >unsigned int *p;
> > > >
> > > >static void __attribute__((noinline, noclone))
> > > >foo (void)
> > > >{
> > > >   unsigned int z;
> > > >
> > > >   for (z = 0; z < N; ++z)
> > > > ++(*p);
> > > >}
> > thus when we have a MEM_REF[p_1]?  SCEV will not analyze
> > its evolution to a POLYNOMIAL_CHREC and thus access_fns will
> > be NULL again.
> > 
> 
> I didn't manage to trigger this scenario, though I could probably make it
> happen by modifying ftree-loop-im to work in one case (the load of the value
> of p) but not the other (the *p load and store).
> 
> > I think avoiding a NULL access_fns is ok but it should be done
> > unconditionally, not only for the DECL_P case.
> 
> Ok, I'll retest and commit this patch.

Please add a comment as well.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [hsa 2/10] Modifications to libgomp proper

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 04:00:11PM +0300, Alexander Monakov wrote:
> Hello, Martin, Jakub, community,
> 
> This part of the patch:
> 
> On Mon, 7 Dec 2015, Martin Jambor wrote:
> > include/
> > * gomp-constants.h (GOMP_DEVICE_HSA): New macro.
> [snip]
> > (GOMP_kernel_launch_attributes): New type.
> > (GOMP_hsa_kernel_dispatch): New type.
> 
> is going to break build of NVPTX cross-compiler, because it uses uint32_t,
> uint64_t types like below, but those types will not be available when building
> nvptx libgcc.  gomp-constants.h is #include'd in libgcc via tm.h and
> offload.h.
> 
> Note how other files in include/ need to do a special dance with #ifdef
> HAVE_STDINT_H to include  and obtain uint64_t.
> 
> Shall I move the problematic structs into a separate file, gomp-types.h?

Or just move those into libgomp-plugin.h, those type definitions don't have
to be shared between the compiler and libgomp, the compiler has to duplicate
those definitions anyway, as it needs to create the IL of those types and
can't use the host structure type for that purpose.

> > diff --git a/include/gomp-constants.h b/include/gomp-constants.h
> > index dffd631..1dae474 100644
> > --- a/include/gomp-constants.h
> > +++ b/include/gomp-constants.h
> [snip]
> > +/* Structure describing the run-time and grid properties of an HSA kernel
> > +   lauch.  */
> > +
> > +struct GOMP_kernel_launch_attributes
> > +{
> > +  /* Number of dimensions the workload has.  Maximum number is 3.  */
> > +  uint32_t ndim;
> > +  /* Size of the grid in the three respective dimensions.  */
> > +  uint32_t gdims[3];
> > +  /* Size of work-groups in the respective dimensions.  */
> > +  uint32_t wdims[3];
> > +};

Jakub

Re: [PATCH] PR target/69225: Set FLT_EVAL_METHOD to 2 only if 387 FPU is used

2016-01-12 Thread Uros Bizjak

On Tue, Jan 12, 2016 at 1:43 PM, Jakub Jelinek  wrote:
> On Tue, Jan 12, 2016 at 01:32:05PM +0100, Uros Bizjak wrote:
>> Using this patch, SSE math won't be emitted for a simple testcase
>> using " -O2 -msse -m32 -std=c99 -mfpmath=sse" compile flags:
>>
>> float test (float a, float b)
>> {
>>   return a + b;
>> }
>>
>> since we start with:
>>
>> test (float a, float b)
>> {
>>   long double _2;
>>   long double _4;
>>   long double _5;
>>   float _6;
>>
>>   :
>>   _2 = (long double) a_1(D);
>>   _4 = (long double) b_3(D);
>>   _5 = _2 + _4;
>>   _6 = (float) _5;
>>   return _6;
>> }
>>
>> This is counter-intuitive, so I'd say we leave things as they are. The
>> situation where only floats are evaluated as floats and doubles are
>> evaluated as long doubles is not covered in the FLT_EVAL_METHOD spec.
>
> Well, for the -fexcess-precision=standard case (== -std=c99) FLT_EVAL_METHOD
> 2 doesn't hurt, that forces in the FE long double computation.  While if it
> is 0 with -msse -mfpmath=sse, it means that the FE leaves computations as is
> and they are computed in float precision for floats and in long double
> precision for doubles.  For -fexcess-precision=fast it is different, because
> the FE doesn't do anything, so in the end it is mixed in that case.
> So, for -msse -mfpmath=sse, I think either we need FLT_EVAL_METHOD 2 or -1
> or 2 for -fexcess-precision=standard and -1 for -fexcess-precision=fast.

I think that following definition describes -msse -mfpmath=sse
situation in the most elegant way. We can just declare that the
precision is not known in this case:

#define TARGET_FLT_EVAL_METHOD\
  (TARGET_MIX_SSE_I387 ? -1\
   : (TARGET_80387 && !TARGET_SSE_MATH) ? 2 : TARGET_SSE2 ? 0 : -1)

Using this patch, the compiler will still generate SSE instructions
for the above test.

Joseph, what is your opinion on this approach?

Uros.

Re: C++ PATCH to abate shift warnings (PR c++/68979)

2016-01-12 Thread Marek Polacek

On Tue, Jan 12, 2016 at 02:02:16PM +0100, Jakub Jelinek wrote:
> On Tue, Jan 12, 2016 at 01:52:01PM +0100, Marek Polacek wrote:
> > --- gcc/testsuite/g++.dg/warn/permissive-1.C
> > +++ gcc/testsuite/g++.dg/warn/permissive-1.C
> > @@ -0,0 +1,8 @@
> > +// PR c++/68979
> > +// { dg-do compile }
> > +// { dg-options "-fpermissive -Wno-shift-overflow 
> > -Wno-shift-count-overflow -Wno-shift-count-negative" }
> > +
> > +enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" 
> > { target c++11 } }
> > +enum B { BB = 1 << -4 }; // { dg-warning "operand of shift expression" }
> > +enum C { CC = 1 << 100 }; // { dg-warning "operand of shift expression" }
> > +enum D { DD = 31 << 30 }; // { dg-warning "shift expression" "" { target 
> > c++11 } }
> 
> Shouldn't this test be limited to
> // { dg-do compile { target int32 } }
> or better yet replace the 100 and 30 above with
> say __SIZEOF_INT__ * 4 * __CHAR_BIT__ - 4 and __SIZEOF_INT__ * __CHAR_BIT__ - 
> 2
> ?
> I'd guess that on say int16 targets, or int64 targets (if we have any at
> some point) or int128 targets this wouldn't do what you are expecting.
> { target int32 } is not exactly right, because it still assumes __CHAR_BIT__ 
> == 8
> and for other char sizes it could fail.

Oh yeah, forgot about those...  The following should be better.
Thanks,

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-01-12  Marek Polacek  

PR c++/68979
* constexpr.c (cxx_eval_check_shift_p): Use permerror rather than
error_at and return negated flag_permissive.

* g++.dg/warn/permissive-1.C: New test.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index e60180e..dbcc242 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -1512,17 +1512,17 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_sgn (rhs) == -1)
 {
   if (!ctx->quiet)
-   error_at (loc, "right operand of shift expression %q+E is negative",
- build2_loc (loc, code, type, lhs, rhs));
-  return true;
+   permerror (loc, "right operand of shift expression %q+E is negative",
+  build2_loc (loc, code, type, lhs, rhs));
+  return !flag_permissive;
 }
   if (compare_tree_int (rhs, uprec) >= 0)
 {
   if (!ctx->quiet)
-   error_at (loc, "right operand of shift expression %q+E is >= than "
- "the precision of the left operand",
- build2_loc (loc, code, type, lhs, rhs));
-  return true;
+   permerror (loc, "right operand of shift expression %q+E is >= than "
+  "the precision of the left operand",
+  build2_loc (loc, code, type, lhs, rhs));
+  return !flag_permissive;
 }
 
   /* The value of E1 << E2 is E1 left-shifted E2 bit positions; [...]
@@ -1536,9 +1536,10 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_sgn (lhs) == -1)
{
  if (!ctx->quiet)
-   error_at (loc, "left operand of shift expression %q+E is negative",
- build2_loc (loc, code, type, lhs, rhs));
- return true;
+   permerror (loc,
+  "left operand of shift expression %q+E is negative",
+  build2_loc (loc, code, type, lhs, rhs));
+ return !flag_permissive;
}
   /* For signed x << y the following:
 (unsigned) x >> ((prec (lhs) - 1) - y)
@@ -1555,9 +1556,9 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_lt (integer_one_node, t))
{
  if (!ctx->quiet)
-   error_at (loc, "shift expression %q+E overflows",
- build2_loc (loc, code, type, lhs, rhs));
- return true;
+   permerror (loc, "shift expression %q+E overflows",
+  build2_loc (loc, code, type, lhs, rhs));
+ return !flag_permissive;
}
 }
   return false;
diff --git gcc/testsuite/g++.dg/warn/permissive-1.C 
gcc/testsuite/g++.dg/warn/permissive-1.C
index e69de29..bfaca76 100644
--- gcc/testsuite/g++.dg/warn/permissive-1.C
+++ gcc/testsuite/g++.dg/warn/permissive-1.C
@@ -0,0 +1,8 @@
+// PR c++/68979
+// { dg-do compile { target int32 } }
+// { dg-options "-fpermissive -Wno-shift-overflow -Wno-shift-count-overflow 
-Wno-shift-count-negative" }
+
+enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" { 
target c++11 } }
+enum B { BB = 1 << -4 }; // { dg-warning "operand of shift expression" }
+enum C { CC = 1 << __SIZEOF_INT__ * 4 * __CHAR_BIT__ - 4 }; // { dg-warning 
"operand of shift expression" }
+enum D { DD = 10 << __SIZEOF_INT__ * __CHAR_BIT__ - 2 }; // { dg-warning 
"shift expression" "" { target c++11 } }

Marek

Re: [Patch ifcvt] Add a new parameter to limit if-conversion

2016-01-12 Thread Yuri Rumyantsev

Andreas,

Is it OK for you if we exclude dg/ifcvt-5.c from ia64 testing since
predication must be used instead of conditional move's.

2016-01-12 13:07 GMT+03:00 Andreas Schwab :
> gcc.dg/ifcvt-5.c fails on ia64:
>
> From ifcvt-5.c.223r.ce1:
>
> == Pass 2 ==
>
>
> == no more changes
>
> 1 possible IF blocks searched.
> 1 IF blocks converted.
> 2 true changes made.
>
> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."

Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)

2016-01-12 Thread Kirill Yukhin

Hello Jakub
On 08 Jan 21:20, Jakub Jelinek wrote:
> Hi!
> 
> This patch fixes
> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
> t]+[^{\\n]*%xmm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
> t]+[^{\\n]*%ymm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
> t]+[^{\\n]*%xmm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
> t]+[^{\\n]*%ymm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
> regressions that were introduced recently by fixing up the masked store check 
> for misalignment.
> The problem is that for v2df/v4df/v4sf/v8sf masked stores 
> ix86_expand_special_args_builtin
> failed to set aligned_mem and thus didn't set correct memory alignment.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Followed you discussion w/ HJ.
I think that metioned intrinsics should assume proper alignement and this
agrees with SDM.

So, your patch is ok for main trunk.

--
Thanks, K


> 
> 2016-01-08  Jakub Jelinek  
> 
>   PR target/69198
>   * config/i386/i386.c (ix86_expand_special_args_builtin): Ensure
>   aligned_mem is properly set for AVX512-VL floating point masked
>   stores.
> 
> --- gcc/config/i386/i386.c.jj 2016-01-08 07:31:11.0 +0100
> +++ gcc/config/i386/i386.c2016-01-08 18:16:21.030354042 +0100
> @@ -39776,7 +39776,11 @@ ix86_expand_special_args_builtin (const
>memory = 0;
>break;
>  case VOID_FTYPE_PV8DF_V8DF_UQI:
> +case VOID_FTYPE_PV4DF_V4DF_UQI:
> +case VOID_FTYPE_PV2DF_V2DF_UQI:
>  case VOID_FTYPE_PV16SF_V16SF_UHI:
> +case VOID_FTYPE_PV8SF_V8SF_UQI:
> +case VOID_FTYPE_PV4SF_V4SF_UQI:
>  case VOID_FTYPE_PV8DI_V8DI_UQI:
>  case VOID_FTYPE_PV4DI_V4DI_UQI:
>  case VOID_FTYPE_PV2DI_V2DI_UQI:
> @@ -39834,10 +39838,6 @@ ix86_expand_special_args_builtin (const
>  case VOID_FTYPE_PV16QI_V16QI_UHI:
>  case VOID_FTYPE_PV32QI_V32QI_USI:
>  case VOID_FTYPE_PV64QI_V64QI_UDI:
> -case VOID_FTYPE_PV4DF_V4DF_UQI:
> -case VOID_FTYPE_PV2DF_V2DF_UQI:
> -case VOID_FTYPE_PV8SF_V8SF_UQI:
> -case VOID_FTYPE_PV4SF_V4SF_UQI:
>nargs = 2;
>klass = store;
>/* Reserve memory operand for target.  */
> 
>   Jakub

[Committed, PATCH] Define STDINT_LONG32 and add predefined integer types for IAMCU

2016-01-12 Thread H.J. Lu

Define STDINT_LONG32 to 0, add SIZE_TYPE, PTRDIFF_TYPE and WCHAR_TYPE
for IAMCU to make integer types compatible with i386 Linux.

Checked into trunk.

H.J.

PR target/68456
PR target/69226
* config/i386/iamcu.h (SIZE_TYPE): New macro.
(PTRDIFF_TYPE): Likewise.
(WCHAR_TYPE): Likewise.
(WCHAR_TYPE_SIZE): Likewise.
(STDINT_LONG32): Likewise.
---
 gcc/config/i386/iamcu.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/config/i386/iamcu.h b/gcc/config/i386/iamcu.h
index 53afbc0..e16c9d63 100644
--- a/gcc/config/i386/iamcu.h
+++ b/gcc/config/i386/iamcu.h
@@ -94,3 +94,19 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 goto DONE; \
   }
\
   } while (0)
+
+#undef SIZE_TYPE
+#define SIZE_TYPE "unsigned int"
+
+#undef PTRDIFF_TYPE
+#define PTRDIFF_TYPE "int"
+
+#undef WCHAR_TYPE
+#define WCHAR_TYPE "long int"
+
+#undef WCHAR_TYPE_SIZE
+#define WCHAR_TYPE_SIZE BITS_PER_WORD
+
+/* Use int, instead of long int, for int32_t and uint32_t.  */
+#undef STDINT_LONG32
+#define STDINT_LONG32 0
-- 
2.5.0

Re: C++ PATCH to abate shift warnings (PR c++/68979)

2016-01-12 Thread Jason Merrill

Changing the diagnostic is OK, but cxx_eval_check_shift_p should return 
true regardless of flag_permissive, so that SFINAE results follow the 
standard.


Jason

Re: [C++ PATCH] Fix ICE due to Cilk+ related cp_gimplify_expr bug (PR objc++/68511, PR c++/69213)

2016-01-12 Thread Jason Merrill


OK.

Jason

Re: [hsa 2/10] Modifications to libgomp proper

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 02:29:06PM +0100, Martin Jambor wrote:
> GOMP_kernel_launch_attributes should not be there (it is a
> reminiscence from before the device-specific target arguments) and
> should be moved just to the HSA plugin.  I'll prepare a patch today.
> 
> While we do not have to share GOMP_hsa_kernel_dispatch, we actually do
> use them in both the plugin and the compiler, where we only use it in
> an offsetof, so that we only have the structure defined once.

But, even using it in offsetof might be wrong, the compiler could be a
cross-compiler, and you'd use offsetof on the host, while you want it for
the target, and that would be different.
So, IMHO you need (unless you already have) built the structure as a tree
type, lay it out, and then you can use at TYPE_SIZE_UNIT or
DECL_FIELD_OFFSET and the like.

Jakub

Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)

2016-01-12 Thread H.J. Lu

On Tue, Jan 12, 2016 at 5:12 AM, Kirill Yukhin  wrote:
> Hello Jakub
> On 08 Jan 21:20, Jakub Jelinek wrote:
>> Hi!
>>
>> This patch fixes
>> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
>> t]+[^{\\n]*%xmm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
>> FAIL: gcc.target/i386/avx512vl-vmovapd-1.c scan-assembler-times vmovapd[ 
>> t]+[^{\\n]*%ymm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
>> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
>> t]+[^{\\n]*%xmm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
>> FAIL: gcc.target/i386/avx512vl-vmovaps-1.c scan-assembler-times vmovaps[ 
>> t]+[^{\\n]*%ymm[0-9]+[^\\n]*){%k[1-7]}(?:\\n|[ t]+#) 1
>> regressions that were introduced recently by fixing up the masked store 
>> check for misalignment.
>> The problem is that for v2df/v4df/v4sf/v8sf masked stores 
>> ix86_expand_special_args_builtin
>> failed to set aligned_mem and thus didn't set correct memory alignment.
>>
>> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> Followed you discussion w/ HJ.
> I think that metioned intrinsics should assume proper alignement and this
> agrees with SDM.
>
> So, your patch is ok for main trunk.
>
> --
> Thanks, K
>
>
>>
>> 2016-01-08  Jakub Jelinek  
>>
>>   PR target/69198
>>   * config/i386/i386.c (ix86_expand_special_args_builtin): Ensure
>>   aligned_mem is properly set for AVX512-VL floating point masked
>>   stores.
>>
>> --- gcc/config/i386/i386.c.jj 2016-01-08 07:31:11.0 +0100
>> +++ gcc/config/i386/i386.c2016-01-08 18:16:21.030354042 +0100
>> @@ -39776,7 +39776,11 @@ ix86_expand_special_args_builtin (const
>>memory = 0;
>>break;
>>  case VOID_FTYPE_PV8DF_V8DF_UQI:
>> +case VOID_FTYPE_PV4DF_V4DF_UQI:
>> +case VOID_FTYPE_PV2DF_V2DF_UQI:
>>  case VOID_FTYPE_PV16SF_V16SF_UHI:
>> +case VOID_FTYPE_PV8SF_V8SF_UQI:
>> +case VOID_FTYPE_PV4SF_V4SF_UQI:
>>  case VOID_FTYPE_PV8DI_V8DI_UQI:
>>  case VOID_FTYPE_PV4DI_V4DI_UQI:
>>  case VOID_FTYPE_PV2DI_V2DI_UQI:
>> @@ -39834,10 +39838,6 @@ ix86_expand_special_args_builtin (const
>>  case VOID_FTYPE_PV16QI_V16QI_UHI:
>>  case VOID_FTYPE_PV32QI_V32QI_USI:
>>  case VOID_FTYPE_PV64QI_V64QI_UDI:
>> -case VOID_FTYPE_PV4DF_V4DF_UQI:
>> -case VOID_FTYPE_PV2DF_V2DF_UQI:
>> -case VOID_FTYPE_PV8SF_V8SF_UQI:
>> -case VOID_FTYPE_PV4SF_V4SF_UQI:
>>nargs = 2;
>>klass = store;
>>/* Reserve memory operand for target.  */
>>
>>   Jakub

GCC 5 has the same issue.  This patch should be backported to GCC 5
with

https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00528.html

which supersedes:

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=231269

OK to backport Jakub's and my patch for GCC 5?

-- 
H.J.

Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 05:39:29AM -0800, H.J. Lu wrote:
> GCC 5 has the same issue.  This patch should be backported to GCC 5
> with
> 
> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00528.html
> 
> which supersedes:
> 
> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=231269
> 
> OK to backport Jakub's and my patch for GCC 5?

I think I'd prefer just r231269 and my patch for the branch, to make the
changes as small as possible, leave the cleanup on the trunk only.
But, I'm not x86_64 maintainer, so I'll leave that decision to Uros/Kirill.

Jakub

Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)

2016-01-12 Thread Uros Bizjak

On Tue, Jan 12, 2016 at 2:42 PM, Jakub Jelinek  wrote:
> On Tue, Jan 12, 2016 at 05:39:29AM -0800, H.J. Lu wrote:
>> GCC 5 has the same issue.  This patch should be backported to GCC 5
>> with
>>
>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00528.html
>>
>> which supersedes:
>>
>> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=231269
>>
>> OK to backport Jakub's and my patch for GCC 5?
>
> I think I'd prefer just r231269 and my patch for the branch, to make the
> changes as small as possible, leave the cleanup on the trunk only.
> But, I'm not x86_64 maintainer, so I'll leave that decision to Uros/Kirill.

I agree with Jakub.

Those two patches are OK for backport.

Thanks,
Uros.

Re: [hsa 2/10] Modifications to libgomp proper

2016-01-12 Thread Martin Jambor

Hi,

On Fri, Dec 11, 2015 at 07:05:29PM +0100, Jakub Jelinek wrote:
> On Thu, Dec 10, 2015 at 06:52:23PM +0100, Martin Jambor wrote:
> > > > --- a/libgomp/task.c
> > > > +++ b/libgomp/task.c
> > > > @@ -581,6 +581,7 @@ GOMP_PLUGIN_target_task_completion (void *data)
> > > >gomp_mutex_unlock (&team->task_lock);
> > > >  }
> > > >ttask->state = GOMP_TARGET_TASK_FINISHED;
> > > > +  free (ttask->firstprivate_copies);
> > > >gomp_target_task_completion (team, task);
> > > >gomp_mutex_unlock (&team->task_lock);
> > > >  }
> > > 
> > > So, this function should have a special case for the SHARED_MEM case, 
> > > handle
> > > it closely to say how GOMP_taskgroup_end handles the finish_cancelled:
> > > case.  Just note that the target task is missing from certain queues at 
> > > that
> > > point.
> > 
> > I'm afraid I need some help here.  I do not quite understand how is
> > finish_cancelled in GOMP_taskgroup_end similar, it seems to be doing
> > much more than freeing one pointer.  What is exactly the issue with
> > the above?
> > 
> > Nevertheless, after reading through bits of task.c again, I wonder
> > whether any copying (for both shared memory target and the host) in
> > gomp_target_task_fn is actually necessary because it seems to be also
> > done in gomp_create_target_task.  Does that not apply somehow?
> 
> The target task is scheduled for the first action as normal task, and the
> scheduling of it already removes it from some of the queues (each task is
> put into 1-3 queues), i.e. actions performed mostly by
> gomp_task_run_pre.  Then the team task lock is unlocked and the task is run.
> Finally, for normal tasks, gomp_task_run_post_handle_depend,
> gomp_task_run_post_remove_parent, etc. is run.  Now, for async target tasks
> that have something running on some other device at that point, we don't do
> that, but instead make it GOMP_TASK_ASYNC_RUNNING.  And continue with other
> stuff, until gomp_target_task_completion is run.
> For non-shared mem that needs to readd the task again into the queues, so
> that it will be scheduled again.  But you don't need that for shared mem
> target tasks, they can just free the firstprivate_copies and finalize the
> task.
> At the time gomp_target_task_completion is called, the task is pretty much
> in the same state as it is around the finish_cancelled:; label.
> So instead of what the gomp_target_task_completion function does,
> you would for SHARED_MEM do something like:
>   size_t new_tasks
> = gomp_task_run_post_handle_depend (task, team);
>   gomp_task_run_post_remove_parent (task);
>   gomp_clear_parent (&task->children_queue);
>   gomp_task_run_post_remove_taskgroup (task);
>   team->task_count--;
> do_wake = 0;
>   if (new_tasks > 1)
> {
>   do_wake = team->nthreads - team->task_running_count
> - !task->in_tied_task;
>   if (do_wake > new_tasks)
> do_wake = new_tasks;
> }
> // Unlike other places, the following will be also run with the
> // task_lock held, but I'm afraid there is nothing to do about it.
> // See the comment in gomp_target_task_completion.
> gomp_finish_task (task);
> free (task);
> if (do_wake)
>   gomp_team_barrier_wake (&team->barrier, do_wake);
> 

I tried the above but libgomp testcase target-33.c always got stuck
within GOMP_taskgroup_end call, more specifically in
gomp_team_barrier_wait_end in config/linux/bar.c where the the first
call to gomp_barrier_handle_tasks left the barrier->generation as
BAR_WAITING_FOR_TASK and then nothing ever happened, even as the
callbacks fired.

After looking into the tasking mechanism for basically the whole day
yesterday, I *think* I fixed it by calling
gomp_team_barrier_set_task_pending from the callback and another hunk
in gomp_barrier_handle_tasks so that it clears that barrier flag even
if it has not picked up any tasks.  Please let me know if you think it
makes sense.

If so, I'll include it in an HSA patch set I hope to generate today.
Otherwise I guess I'd prefer to remove the shared-memory path and
revert to old behavior as a temporary measure until we find out what
was wrong.

Thanks and sorry that it took me so long to resolve this,

Martin


diff --git a/libgomp/task.c b/libgomp/task.c
index ab5df51..828c1fb 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -566,6 +566,14 @@ gomp_target_task_completion (struct gomp_team *team, 
struct gomp_task *task)
 gomp_team_barrier_wake (&team->barrier, 1);
 }
 
+static inline size_t
+gomp_task_run_post_handle_depend (struct gomp_task *child_task,
+ struct gomp_team *team);
+static inline void
+gomp_task_run_post_remove_parent (struct gomp_task *child_task);
+static inline void
+gomp_task_run_post_remove_taskgroup (struct gomp_task *child_task);
+
 /* Signal that a target task TTASK has completed the async

Re: [Patch ifcvt] Add a new parameter to limit if-conversion

2016-01-12 Thread Andreas Schwab

Yuri Rumyantsev  writes:

> Is it OK for you if we exclude dg/ifcvt-5.c from ia64 testing

Sure, go ahead.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: C++ PATCH to abate shift warnings (PR c++/68979)

2016-01-12 Thread Marek Polacek

On Tue, Jan 12, 2016 at 08:27:47AM -0500, Jason Merrill wrote:
> Changing the diagnostic is OK, but cxx_eval_check_shift_p should return true
> regardless of flag_permissive, so that SFINAE results follow the standard.

There's a complication, because if I keep returning true, we'll give a
compile-time error like this:

permissive-1.C:5:18: warning: left operand of shift expression ‘(-1 << 4)’ is
negative [-fpermissive]
 enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" {
target c++11 } }

permissive-1.C:5:21: error: enumerator value for ‘AA’ is not an integer
constant
 enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" {
target c++11 } }

So I suppose that wouldn't really help.  :(

Marek

Re: C++ PATCH to abate shift warnings (PR c++/68979)

2016-01-12 Thread Jason Merrill


On 01/12/2016 09:05 AM, Marek Polacek wrote:

On Tue, Jan 12, 2016 at 08:27:47AM -0500, Jason Merrill wrote:

Changing the diagnostic is OK, but cxx_eval_check_shift_p should return true
regardless of flag_permissive, so that SFINAE results follow the standard.


There's a complication, because if I keep returning true, we'll give a
compile-time error like this:

permissive-1.C:5:18: warning: left operand of shift expression ‘(-1 << 4)’ is
negative [-fpermissive]
  enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" {
target c++11 } }

permissive-1.C:5:21: error: enumerator value for ‘AA’ is not an integer
constant
  enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" {
target c++11 } }

So I suppose that wouldn't really help.  :(


In that case, we need to return (!flag_permissive || ctx->quiet).

Jason

[gomp4] fix kernel reductions

2016-01-12 Thread Nathan Sidwell

This patch fixes an ICE encountered with the Houston's testsuite when kernel 
optimizations are enabled.


The reduction is implemented via a cmp&swap loop, but later than the omp code 
usually does that lowering.   At the point it happens for kernels, loops must 
have simple latches, which this patch implements by splitting the non-simple 
latch's back  edge (which is what force_single_succ_latches does when run over 
the loop structure).


applied to gomp4

nathan
2016-01-08  Nathan Sidwell  

	gcc/
	* omp-low.c (expand_omp_atomic_pipeline): Pay attention to
	LOOPS_HAVE_SIMPLE_LATCHES state.

2016-01-12  Nathan Sidwell  

	gcc/testsuite/
	* gcc.dg/goacc/kern-1.c: New.

Index: omp-low.c
===
--- omp-low.c	(revision 232179)
+++ omp-low.c	(revision 232180)
@@ -12370,6 +12370,9 @@ expand_omp_atomic_pipeline (basic_block
   loop->header = loop_header;
   loop->latch = store_bb;
   add_loop (loop, loop_header->loop_father);
+  if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES))
+/* Split the edge from store_bb to loop_header */
+split_edge (e);
 
   if (gimple_in_ssa_p (cfun))
 update_ssa (TODO_update_ssa_no_phi);
Index: gcc.dg/goacc/kern-1.c
===
--- gcc.dg/goacc/kern-1.c	(revision 0)
+++ gcc.dg/goacc/kern-1.c	(working copy)
@@ -0,0 +1,23 @@
+/* { dg-additional-options "-fopenacc -O2 -ftree-parallelize-loops=32" } */
+
+/* The reduction on sum could cause an ICE with a non-simple latch loop.   */
+
+int printf (char const *, ...);
+
+int
+main ()
+{
+  int i;
+  double a[1000], sum = 0;
+
+  
+#pragma acc kernels pcopyin(a[0:1000])
+#pragma acc loop reduction(+:sum)
+  for(int i=0; i<1000; i++) {
+sum += a[i];
+  }
+
+  printf ("%lf\n", sum);
+
+  return 0;
+}

Re: C++ PATCH to abate shift warnings (PR c++/68979)

2016-01-12 Thread Marek Polacek

On Tue, Jan 12, 2016 at 09:09:38AM -0500, Jason Merrill wrote:
> In that case, we need to return (!flag_permissive || ctx->quiet).

Thanks.  So is this one ok once it passes testing?

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-01-12  Marek Polacek  

PR c++/68979
* constexpr.c (cxx_eval_check_shift_p): Use permerror rather than
error_at and adjust the return value.

* g++.dg/warn/permissive-1.C: New test.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index e60180e..36a1e42 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -1512,17 +1512,17 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_sgn (rhs) == -1)
 {
   if (!ctx->quiet)
-   error_at (loc, "right operand of shift expression %q+E is negative",
- build2_loc (loc, code, type, lhs, rhs));
-  return true;
+   permerror (loc, "right operand of shift expression %q+E is negative",
+  build2_loc (loc, code, type, lhs, rhs));
+  return (!flag_permissive || ctx->quiet);
 }
   if (compare_tree_int (rhs, uprec) >= 0)
 {
   if (!ctx->quiet)
-   error_at (loc, "right operand of shift expression %q+E is >= than "
- "the precision of the left operand",
- build2_loc (loc, code, type, lhs, rhs));
-  return true;
+   permerror (loc, "right operand of shift expression %q+E is >= than "
+  "the precision of the left operand",
+  build2_loc (loc, code, type, lhs, rhs));
+  return (!flag_permissive || ctx->quiet);
 }
 
   /* The value of E1 << E2 is E1 left-shifted E2 bit positions; [...]
@@ -1536,9 +1536,10 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_sgn (lhs) == -1)
{
  if (!ctx->quiet)
-   error_at (loc, "left operand of shift expression %q+E is negative",
- build2_loc (loc, code, type, lhs, rhs));
- return true;
+   permerror (loc,
+  "left operand of shift expression %q+E is negative",
+  build2_loc (loc, code, type, lhs, rhs));
+ return (!flag_permissive || ctx->quiet);
}
   /* For signed x << y the following:
 (unsigned) x >> ((prec (lhs) - 1) - y)
@@ -1555,9 +1556,9 @@ cxx_eval_check_shift_p (location_t loc, const 
constexpr_ctx *ctx,
   if (tree_int_cst_lt (integer_one_node, t))
{
  if (!ctx->quiet)
-   error_at (loc, "shift expression %q+E overflows",
- build2_loc (loc, code, type, lhs, rhs));
- return true;
+   permerror (loc, "shift expression %q+E overflows",
+  build2_loc (loc, code, type, lhs, rhs));
+ return (!flag_permissive || ctx->quiet);
}
 }
   return false;
diff --git gcc/testsuite/g++.dg/warn/permissive-1.C 
gcc/testsuite/g++.dg/warn/permissive-1.C
index e69de29..bfaca76 100644
--- gcc/testsuite/g++.dg/warn/permissive-1.C
+++ gcc/testsuite/g++.dg/warn/permissive-1.C
@@ -0,0 +1,8 @@
+// PR c++/68979
+// { dg-do compile { target int32 } }
+// { dg-options "-fpermissive -Wno-shift-overflow -Wno-shift-count-overflow 
-Wno-shift-count-negative" }
+
+enum A { AA = -1 << 4 }; // { dg-warning "operand of shift expression" "" { 
target c++11 } }
+enum B { BB = 1 << -4 }; // { dg-warning "operand of shift expression" }
+enum C { CC = 1 << __SIZEOF_INT__ * 4 * __CHAR_BIT__ - 4 }; // { dg-warning 
"operand of shift expression" }
+enum D { DD = 10 << __SIZEOF_INT__ * __CHAR_BIT__ - 2 }; // { dg-warning 
"shift expression" "" { target c++11 } }

Marek

[PATCH] Fix PR69077

2016-01-12 Thread Richard Biener


The following fixes PR69077.

LTO bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-01-12  Richard Biener  

lto/
PR lto/69077
* lto-symtab.c (lto_symtab_prevailing_virtual_decl): Properly
merge TREE_ADDRESSABLE and DECL_POSSIBLY_INLINED flags.

* g++.dg/lto/pr69077_0.C: New testcase.
* g++.dg/lto/pr69077_1.C: Likewise.

Index: gcc/lto/lto-symtab.c
===
*** gcc/lto/lto-symtab.c(revision 232261)
--- gcc/lto/lto-symtab.c(working copy)
*** lto_symtab_prevailing_virtual_decl (tree
*** 997,1002 
--- 997,1014 
  n = n->next_sharing_asm_name;
if (n)
  {
+   /* Merge decl state in both directions, we may still end up using
+the other decl.  */
+   TREE_ADDRESSABLE (n->decl) |= TREE_ADDRESSABLE (decl);
+   TREE_ADDRESSABLE (decl) |= TREE_ADDRESSABLE (n->decl);
+ 
+   if (TREE_CODE (decl) == FUNCTION_DECL)
+   {
+ /* Merge decl state in both directions, we may still end up using
+the other decl.  */
+ DECL_POSSIBLY_INLINED (n->decl) |= DECL_POSSIBLY_INLINED (decl);
+ DECL_POSSIBLY_INLINED (decl) |= DECL_POSSIBLY_INLINED (n->decl);
+   }
lto_symtab_prevail_decl (n->decl, decl);
decl = n->decl;
  }
Index: gcc/testsuite/g++.dg/lto/pr69077_0.C
===
*** gcc/testsuite/g++.dg/lto/pr69077_0.C(revision 0)
--- gcc/testsuite/g++.dg/lto/pr69077_0.C(working copy)
***
*** 0 
--- 1,14 
+ // { dg-lto-do link }
+ // { dg-lto-options { { -O3 -g -flto } } }
+ // { dg-extra-ld-options "-r -nostdlib" }
+ 
+ struct cStdDev
+ {
+   long ns;
+   virtual double mean() const {  return ns;  }
+ };
+ 
+ struct cWeightedStdDev : public cStdDev {
+ virtual int netPack();
+ };
+ int cWeightedStdDev::netPack() { }
Index: gcc/testsuite/g++.dg/lto/pr69077_1.C
===
*** gcc/testsuite/g++.dg/lto/pr69077_1.C(revision 0)
--- gcc/testsuite/g++.dg/lto/pr69077_1.C(working copy)
***
*** 0 
--- 1,15 
+ struct cStdDev
+ {
+   long ns;
+   virtual double mean() const {  return ns;  }
+ };
+ 
+ struct sf
+ {
+   void recordScalar(double value);
+   cStdDev eedStats;
+   virtual void finish();
+ };
+ void sf::finish() {
+ recordScalar(eedStats.mean());
+ }

Re: [PATCH] Fix up my recent change to vect_get_constant_vectors (PR tree-optimization/69207)

2016-01-12 Thread Ilya Enkovich

2016-01-11 20:13 GMT+03:00 Jakub Jelinek :
> Hi!
>
> Based on discussions on IRC, I'm submitting following fix for a regression
> on aarch64 - partial reversion (the case where VCE works too, just I thought
> using NOP_EXPR would be nicer) and change in the assert - op better be
> some integral value if converting it to VECTOR_BOOLEAN_TYPE_P's element
> type.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-01-11  Jakub Jelinek  
>
> PR tree-optimization/69207
> * tree-vect-slp.c (vect_get_constant_vectors): For
> VECTOR_BOOLEAN_TYPE_P, assert op has integral type instead of
> fold_convertible_p to vector_type's element type, and always
> use VCE for non-VECTOR_BOOLEAN_TYPE_P.
>
> --- gcc/tree-vect-slp.c.jj  2016-01-08 21:45:57.0 +0100
> +++ gcc/tree-vect-slp.c 2016-01-11 12:07:19.633366712 +0100
> @@ -2999,12 +2999,9 @@ vect_get_constant_vectors (tree op, slp_
>   gimple *init_stmt;
>   if (VECTOR_BOOLEAN_TYPE_P (vector_type))
> {
> - gcc_assert (fold_convertible_p (TREE_TYPE (vector_type),
> - op));
> + gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (op)));
>   init_stmt = gimple_build_assign (new_temp, NOP_EXPR, 
> op);

In vect_init_vector we had to introduce COND_EXPR to choose between 0 and -1 for
boolean vectors.  Shouldn't we do similar in SLP?

Thanks,
Ilya

> }
> - else if (fold_convertible_p (TREE_TYPE (vector_type), op))
> -   init_stmt = gimple_build_assign (new_temp, NOP_EXPR, op);
>   else
> {
>   op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vector_type),
>
> Jakub

Re: [hsa 2/10] Modifications to libgomp proper

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 02:46:52PM +0100, Martin Jambor wrote:
> diff --git a/libgomp/task.c b/libgomp/task.c
> index ab5df51..828c1fb 100644
> --- a/libgomp/task.c
> +++ b/libgomp/task.c
> @@ -584,8 +592,34 @@ GOMP_PLUGIN_target_task_completion (void *data)
>gomp_mutex_unlock (&team->task_lock);
>  }
>ttask->state = GOMP_TARGET_TASK_FINISHED;
> -  free (ttask->firstprivate_copies);
> -  gomp_target_task_completion (team, task);
> +
> +  if (ttask->devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)

First of all, I'm surprised you've changed
GOMP_PLUGIN_target_task_completion rather than gomp_target_task_completion.
The difference between those two is that the latter is run nost just from
the async event, but also if GOMP_PLUGIN_target_task_completion happens to
run before the gomp_mutex_lock (&team->task_lock); is acquired in the
various spots before
child_task->kind = GOMP_TASK_ASYNC_RUNNING;
The point is if the async completion happens too early for the thread
spawning it to notice, we want to complete it only when the spawning thread
is ready for that.

But looking at GOMP_PLUGIN_target_task_completion, I see we have a bug in
there,
  gomp_mutex_lock (&team->task_lock);
  if (ttask->state == GOMP_TARGET_TASK_READY_TO_RUN)
{
  ttask->state = GOMP_TARGET_TASK_FINISHED;
  gomp_mutex_unlock (&team->task_lock);
}
  ttask->state = GOMP_TARGET_TASK_FINISHED;
  gomp_target_task_completion (team, task);
  gomp_mutex_unlock (&team->task_lock);
there was meant to be I think return; after the first unlock, otherwise
it doubly unlocks the same lock, and performs gomp_target_task_completion
without the lock held, which may cause great havoc.

I'll test the obvious change here.

> +{
> +  free (ttask->firstprivate_copies);
> +  size_t new_tasks
> + = gomp_task_run_post_handle_depend (task, team);
> +  gomp_task_run_post_remove_parent (task);
> +  gomp_clear_parent (&task->children_queue);
> +  gomp_task_run_post_remove_taskgroup (task);
> +  team->task_count--;
> +  int do_wake = 0;
> +  if (new_tasks)
> + {
> +   do_wake = team->nthreads - team->task_running_count
> + - !task->in_tied_task;
> +   if (do_wake > new_tasks)
> + do_wake = new_tasks;
> + }
> +  /* Unlike other places, the following will be also run with the 
> task_lock
> + held, but there is nothing to do about it.  See the comment in
> + gomp_target_task_completion.  */
> +  gomp_finish_task (task);
> +  free (task);
> +  gomp_team_barrier_set_task_pending (&team->barrier);

This one really looks weird.  I mean, this should be done if we increase the
number of team's tasks, and gomp_task_run_post_handle_depend should do that
if it adds new tasks (IMHO it does), but if new_tasks is 0, then
there is no new task to schedule and therefore it should not be set.

> +  gomp_team_barrier_wake (&team->barrier, do_wake ? do_wake : 1);
> +}
> +  else
> +gomp_target_task_completion (team, task);
>gomp_mutex_unlock (&team->task_lock);
>  }
>  
> @@ -1275,7 +1309,12 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
> thr->task = task;
>   }
>else
> - return;
> + {
> +   if (team->task_count == 0
> +   && gomp_team_barrier_waiting_for_tasks (&team->barrier))
> + gomp_team_barrier_done (&team->barrier, state);
> +   return;
> + }
>gomp_mutex_lock (&team->task_lock);
>if (child_task)
>   {

And this hunk looks wrong too.  gomp_team_barrier_done shouldn't be done
outside of the lock held, there is no waking and I don't understand the
rationale for why you think current gomp_barrier_handle_tasks is wrong.

Anyway, if you make the HSA branch work the less efficient way of creating a
task that just frees the firstprivate copies, and post after the merge into
trunk a WIP patch that includes this, plus if there are clear instructions
how to build the HSA stuff on the wiki, my son has a box with AMD Kaveri,
so I'll try to debug it there.

Jakub

[RFC] non-unit stride loads for size power of 2.

2016-01-12 Thread Kumar, Venkataramanan

Hi 

The code below it looks like we always call  “vect_permute_load_chain” to load 
non-unit strides of size powers of 2.

(---snip---)
/* If reassociation width for vector type is 2 or greater target machine can
 execute 2 or more vector instructions in parallel.  Otherwise try to
 get chain for loads group using vect_shift_permute_load_chain.  */
  mode = TYPE_MODE (STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt)));
  
  if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1
  || exact_log2 (size) != -1
  || !vect_shift_permute_load_chain (dr_chain, size, stmt,
 gsi, &result_chain))
vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain);

static bool
vect_shift_permute_load_chain (vec dr_chain,
   unsigned int length,
   gimple *stmt,
   gimple_stmt_iterator *gsi,
   vec *result_chain)
{
…...
…...
  if (exact_log2 (length) != -1 && LOOP_VINFO_VECT_FACTOR (loop_vinfo) > 4) ⇐ 
This is not used.
{
  unsigned int j, log_length = exact_log2 (length);
  for (i = 0; i < nelt / 2; ++i)
sel[i] = i * 2;
  for (i = 0; i < nelt / 2; ++i)
sel[nelt / 2 + i] = i * 2 + 1; 
(---snip--)


Is there any reason to do so? 

I have not done any benchmarking,  but tried simple test cases for -mavx 
targets with sizes 2, 4 and VF > 4 (short/char types).
Looks like using vect_shift_permute_load_chain seems better. 

Should we change it to something like this ?

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index d0e20da..b0f0a02 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -5733,9 +5733,9 @@ vect_transform_grouped_load (gimple *stmt, vec 
dr_chain, int size,
  get chain for loads group using vect_shift_permute_load_chain.  */
   mode = TYPE_MODE (STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt)));
   if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) > 1
-  || exact_log2 (size) != -1
-  || !vect_shift_permute_load_chain (dr_chain, size, stmt,
-gsi, &result_chain))
+  || (!vect_shift_permute_load_chain (dr_chain, size, stmt,
+gsi, &result_chain)
+ && exact_log2 (size) != -1))
 vect_permute_load_chain (dr_chain, size, stmt, gsi, &result_chain);
   vect_record_grouped_load_vectors (stmt, result_chain);
   result_chain.release ();
 
regards,
Venkat.

[patch] libstdc++/69222 Prevent recursive instantiation in std::function

2016-01-12 Thread Jonathan Wakely


This fixes PR 69222 and PR 69005 for gcc-5-branch, by ensuring we
don't try to determine the result of invoking the function(Functor)
constructor argument when the type is incomplete (because that might
require instantiating the constructor again, which recurses).

Jason fixed 69005 on trunk by making the front end skip that
constructor when performing overload resolution for copy construction,
because it cannot be instantiated to make a copy, but there is still a
problem on the branch, so I'm fixing it in the library. I'm making the
same change on trunk, because it's an improvement anyway.

Tested x86_64-linux, committed to trunk and gcc-5-branch.

commit 540303f8e8f24a89ecd3698c70efe9ab753ef9d9
Author: redi 
Date:   Tue Jan 12 14:55:00 2016 +

Prevent recursive instantiation in std::function

	PR libstdc++/69005
	PR libstdc++/69222
	* include/std/functional (function::_Invoke): Remove, use result_of.
	(function::_Callable): Replace alias template with class template
	and use partial specialization instead of _NotSelf alias template.
	(function(_Functor)): Add "not self" constraint so that _Callable is
	not used while type is incomplete.
	* testsuite/20_util/function/69222.cc: New.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-5-branch@232274 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/include/std/functional b/libstdc++-v3/include/std/functional
index 139be61..717d1bf 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -1977,19 +1977,14 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
 {
   typedef _Res _Signature_type(_ArgTypes...);
 
-  template
-	using _Invoke = decltype(__callable_functor(std::declval<_Functor&>())
- (std::declval<_ArgTypes>()...) );
+  template::type>
+	struct _Callable : __check_func_return_type<_Res2, _Res> { };
 
   // Used so the return type convertibility checks aren't done when
   // performing overload resolution for copy construction/assignment.
   template
-	using _NotSelf = __not_>;
-
-  template
-	using _Callable
-	  = __and_<_NotSelf<_Functor>,
-		   __check_func_return_type<_Invoke<_Functor>, _Res>>;
+	struct _Callable : false_type { };
 
   template
 	using _Requires = typename enable_if<_Cond::value, _Tp>::type;
@@ -2054,6 +2049,7 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
*  reference_wrapper, this function will not throw.
*/
   template>, void>,
 	   typename = _Requires<_Callable<_Functor>, void>>
 	function(_Functor);
 
@@ -2246,7 +2242,7 @@ _GLIBCXX_MEM_FN_TRAITS(&&, false_type, true_type)
 }
 
   template
-template
+template
   function<_Res(_ArgTypes...)>::
   function(_Functor __f)
   : _Function_base()
diff --git a/libstdc++-v3/testsuite/20_util/function/69222.cc b/libstdc++-v3/testsuite/20_util/function/69222.cc
new file mode 100644
index 000..7c9dfec
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/function/69222.cc
@@ -0,0 +1,30 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+// { dg-do compile }
+
+#include 
+
+// Reduced from c++/69005
+struct Foo {
+  std::function f;
+};
+
+extern Foo exfoo;
+Foo f(exfoo);
+Foo& r = f = exfoo;

[PATCH][committed] libitm: Remove dead code and data.

2016-01-12 Thread Torvald Riegel

This removes code and data members that have not been used for quite a
while now.  The user-visible benefit is 8MB less space overhead if
libitm is used.

Tested on x86_64-linux and committed as r232275.


2016-01-12  Torvald Riegel  

* libitm_i.h (gtm_mask_stack): Remove.
* beginend.cc (gtm_stmlock_array, gtm_clock): Likewise.
* stmlock.h: Remove file.
* config/alpha/cacheline.h: Likewise.
* config/generic/cacheline.h: Likewise.
* config/powerpc/cacheline.h: Likewise.
* config/sparc/cacheline.h: Likewise.
* config/x86/cacheline.h: Likewise.

commit fe0abed5782347922d4f9dba13b9a917fe9d5296
Author: Torvald Riegel 
Date:   Mon Jan 11 19:30:14 2016 +0100

libitm: Remove dead code and data.

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index 367edc8..c801dab 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -36,9 +36,6 @@ gtm_rwlock GTM::gtm_thread::serial_lock;
 gtm_thread *GTM::gtm_thread::list_of_threads = 0;
 unsigned GTM::gtm_thread::number_of_threads = 0;
 
-gtm_stmlock GTM::gtm_stmlock_array[LOCK_ARRAY_SIZE];
-atomic GTM::gtm_clock;
-
 /* ??? Move elsewhere when we figure out library initialization.  */
 uint64_t GTM::gtm_spin_count_var = 1000;
 
diff --git a/libitm/config/alpha/cacheline.h b/libitm/config/alpha/cacheline.h
deleted file mode 100644
index c8da46d..000
--- a/libitm/config/alpha/cacheline.h
+++ /dev/null
@@ -1,38 +0,0 @@
-/* Copyright (C) 2009-2016 Free Software Foundation, Inc.
-   Contributed by Richard Henderson .
-
-   This file is part of the GNU Transactional Memory Library (libitm).
-
-   Libitm is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3 of the License, or
-   (at your option) any later version.
-
-   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
-   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
-   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-   more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   .  */
-
-#ifndef LIBITM_ALPHA_CACHELINE_H
-#define LIBITM_ALPHA_CACHELINE_H 1
-
-// A cacheline is the smallest unit with which locks are associated.
-// The current implementation of the _ITM_[RW] barriers assumes that
-// all data types can fit (aligned) within a cachline, which means
-// in practice sizeof(complex long double) is the smallest cacheline size.
-// It ought to be small enough for efficient manipulation of the
-// modification mask, below.
-#define CACHELINE_SIZE 64
-
-#include "config/generic/cacheline.h"
-
-#endif // LIBITM_ALPHA_CACHELINE_H
diff --git a/libitm/config/generic/cacheline.h b/libitm/config/generic/cacheline.h
deleted file mode 100644
index 8b9f927..000
--- a/libitm/config/generic/cacheline.h
+++ /dev/null
@@ -1,58 +0,0 @@
-/* Copyright (C) 2009-2016 Free Software Foundation, Inc.
-   Contributed by Richard Henderson .
-
-   This file is part of the GNU Transactional Memory Library (libitm).
-
-   Libitm is free software; you can redistribute it and/or modify it
-   under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 3 of the License, or
-   (at your option) any later version.
-
-   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
-   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
-   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
-   more details.
-
-   Under Section 7 of GPL version 3, you are granted additional
-   permissions described in the GCC Runtime Library Exception, version
-   3.1, as published by the Free Software Foundation.
-
-   You should have received a copy of the GNU General Public License and
-   a copy of the GCC Runtime Library Exception along with this program;
-   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
-   .  */
-
-#ifndef LIBITM_CACHELINE_H
-#define LIBITM_CACHELINE_H 1
-
-namespace GTM HIDDEN {
-
-// A cacheline is the smallest unit with which locks are associated.
-// The current implementation of the _ITM_[RW] barriers assumes that
-// all data types can fit (aligned) within a cachline, which means
-// in practice sizeof(complex long double) is the smallest cacheline size.
-// It ought to be small enough for efficient manipulation of the
-// modification mask, below.
-#ifndef CACHELINE_SIZE
-# define CACHELINE_SIZE 32
-#endif
-
-/

Re: [PATCH] remove mark_hook gty attribute

2016-01-12 Thread Richard Biener

On Mon, Jan 11, 2016 at 11:54 PM,   wrote:
> From: Trevor Saunders 
>
> Hi,
>
> this hardly counts as a bug fix, but going through open bugs I saw PR54809, 
> and
> realized we don't actually need this attribute any more, so we might as well
> just remove it.
>
> bootstrapped + regtested on x86_64-linux-gnu, ok for now or gcc 7?  I don't
> mind waiting, but it would be nice to have one less thing to remember to do.

Ok.

Richard.

> Trev
>
> gcc/ChangeLog:
>
> 2016-01-11  Trevor Saunders  
>
> PR middle-end/54809
> * doc/gty.texi: Remove documentation of mark_hook.
> * gengtype.c (struct write_types_data): Remove code to support
> mark_hook attribute.
> (walk_type): Likewise.
> (write_func_for_structure): Likewise.
> ---
>  gcc/doc/gty.texi | 10 --
>  gcc/gengtype.c   | 33 +++--
>  2 files changed, 3 insertions(+), 40 deletions(-)
>
> diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi
> index d3ca4e0..1a22e4b 100644
> --- a/gcc/doc/gty.texi
> +++ b/gcc/doc/gty.texi
> @@ -261,16 +261,6 @@ garbage collection runs, there's no need to mark 
> anything pointed to
>  by this variable, it can just be set to @code{NULL} instead.  This is used
>  to keep a list of free structures around for re-use.
>
> -@findex mark_hook
> -@item mark_hook ("@var{hook-routine-name}")
> -
> -If provided for a structure or union type, the given
> -@var{hook-routine-name} (between double-quotes) is the name of a
> -routine called when the garbage collector has just marked the data as
> -reachable. This routine should not change the data, or call any ggc
> -routine. Its only argument is a pointer to the just marked (const)
> -structure or union.
> -
>  @findex maybe_undef
>  @item maybe_undef
>
> diff --git a/gcc/gengtype.c b/gcc/gengtype.c
> index 966e597..be49660 100644
> --- a/gcc/gengtype.c
> +++ b/gcc/gengtype.c
> @@ -2407,7 +2407,6 @@ struct write_types_data
>const char *marker_routine;
>const char *reorder_note_routine;
>const char *comment;
> -  int skip_hooks;  /* skip hook generation if non zero */
>enum write_types_kinds kind;
>  };
>
> @@ -2677,8 +2676,6 @@ walk_type (type_p t, struct walk_type_data *d)
>maybe_undef_p = 1;
>  else if (strcmp (oo->name, "desc") == 0 && oo->kind == OPTION_STRING)
>desc = oo->info.string;
> -else if (strcmp (oo->name, "mark_hook") == 0)
> -  ;
>  else if (strcmp (oo->name, "nested_ptr") == 0
>  && oo->kind == OPTION_NESTED)
>nested_ptr_d = (const struct nested_ptr_data *) oo->info.nested;
> @@ -2918,7 +2915,6 @@ walk_type (type_p t, struct walk_type_data *d)
> const char *oldval = d->val;
> const char *oldprevval1 = d->prev_val[1];
> const char *oldprevval2 = d->prev_val[2];
> -   const char *struct_mark_hook = NULL;
> const int union_p = t->kind == TYPE_UNION;
> int seen_default_p = 0;
> options_p o;
> @@ -2942,13 +2938,6 @@ walk_type (type_p t, struct walk_type_data *d)
>   if (!desc && strcmp (o->name, "desc") == 0
>   && o->kind == OPTION_STRING)
> desc = o->info.string;
> - else if (!struct_mark_hook && strcmp (o->name, "mark_hook") == 0
> -  && o->kind == OPTION_STRING)
> -   struct_mark_hook = o->info.string;
> -
> -   if (struct_mark_hook)
> - oprintf (d->of, "%*s%s (&%s);\n",
> -  d->indent, "", struct_mark_hook, oldval);
>
> d->prev_val[2] = oldval;
> d->prev_val[1] = oldprevval2;
> @@ -3473,7 +3462,6 @@ write_func_for_structure (type_p orig_s, type_p s,
>const char *chain_next = NULL;
>const char *chain_prev = NULL;
>const char *chain_circular = NULL;
> -  const char *mark_hook_name = NULL;
>options_p opt;
>struct walk_type_data d;
>
> @@ -3509,9 +3497,6 @@ write_func_for_structure (type_p orig_s, type_p s,
>  else if (strcmp (opt->name, "chain_circular") == 0
>  && opt->kind == OPTION_STRING)
>chain_circular = opt->info.string;
> -else if (strcmp (opt->name, "mark_hook") == 0
> -&& opt->kind == OPTION_STRING)
> -  mark_hook_name = opt->info.string;
>  else if (strcmp (opt->name, "for_user") == 0)
>for_user = true;
>if (chain_prev != NULL && chain_next == NULL)
> @@ -3576,17 +3561,11 @@ write_func_for_structure (type_p orig_s, type_p s,
>oprintf (d.of, "))\n");
>if (chain_circular != NULL)
> oprintf (d.of, "return;\n  do\n");
> -  if (mark_hook_name && !wtd->skip_hooks)
> -   {
> - oprintf (d.of, "{\n");
> - oprintf (d.of, "  %s (xlimit);\n   ", mark_hook_name);
> -   }
> +
>oprintf (d.of, "   xlimit = (");
>d.prev_val[2] = "*xlimit";
>output_escaped_param (&d, chain_next, "chain_next");
>oprintf (d.of, ");\n");
> -  if (mark_hook_name && !wtd->skip_hooks)
> -   oprintf (d.of, "}\n"

Re: [RFA] [PATCH][PR tree-optimization/64910] Fix reassociation of binary bitwise operations with 3 operands

2016-01-12 Thread Richard Biener

On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law  wrote:
> On 01/11/2016 03:32 AM, Richard Biener wrote:
>
>>
>> Yeah, reassoc is largely about canonicalization.
>>
>>> Plus doing it in TER is almost certainly more complex than getting it
>>> right
>>> in reassoc to begin with.
>>
>>
>> I guess canonicalizing differently is ok but you'll still create
>> ((a & b) & 1) & c then if you only change the above place.
>
> What's best for that expression would depend on factors like whether or not
> the target can exploit ILP.  ie (a & b) & (1 & c) exposes more parallelism
> while (((a & b) & c) & 1) is not good for parallelism, but does expose the
> bit test.
>
> reassoc currently generates ((a & 1) & b) & c which is dreadful as there's
> no ILP or chance of creating a bit test.  My patch shuffles things around,
> but still doesn't expose the ILP or bit test in the 4 operand case.  Based
> on the comments in reassoc, it didn't seem like the author thought anything
> beyond the 3-operand case was worth handling. So my patch just handles the
> 3-operand case.
>
>
>
>>
>> So I'm not sure what pattern the backend is looking for?
>
> It just wants the constant last in the sequence.  That exposes bit clear,
> set, flip, test, etc idioms.

But those don't feed another bit operation, right?  Thus we'd like to see
((a & b) & c) & 1, not ((a & b) & 1) & c?  It sounds like the instructions
are designed to feed conditionals (aka CC consuming ops)?

Richard.

>
>
> Jeff

Re: [PATCH] OpenACC documentation for libgomp

2016-01-12 Thread James Norris


Bernd,

On 01/11/2016 11:23 AM, Bernd Schmidt wrote:

On 01/05/2016 04:47 PM, James Norris wrote:

I've updated the original patch after some very helpful
comments from Sandra (thank you, thank you).

OK to commit to trunk?


I'm probably not fully qualified to review the contents either, but few people
are and it looks reasonable enough that I guess I'll just ack it. Before that,
some questions though:


+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{function acc_async_test(arg);}
+@item   @tab @code{integer(kind=acc_handle_kind) arg}
+@item   @tab @code{logical acc_async_test}
+@end multitable


I guess this is how Fortran functions and their args/return values are
documented? Do we have other examples of this somewhere?


Yes, in the earlier section that describes OpenMP. One thing
that needs changing is 'Prototype' should be changed to 'Interface'
for Fortran.


+about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
+sections 4.1 and 4.2 of the â€œThe OpenACC
+Application Programming Interfaceâ€, Version 2.0, June, 2013.}.


Non-ascii characters. I'm guessing this should probably be some kind of texinfo
@something{} block; OTOH references to C standards in standards.texi just name
them in plain text.


As Jakub pointed out in followup, those instances should
be using a @uref and not double quoted.



I wonder if things like OpenMP and OpenACC should be mentioned in
standards.texi, but that is tangential to this patch.



That's a good idea. Thanks!

Thanks for taking the time for the review.

Jim

[4.9][PR69082]Backport "[PATCH][ARM]Tighten the conditions for arm_movw, arm_movt"

2016-01-12 Thread Renlin Li


Hi all,

Here I backport r227129 to branch 4.9 to fix exactly the same issue reported in 
PR69082.
It's been already committed on trunk and backportted to branch 5.


I have quoted the original message for the explanation.
The patch applies to branch 4.9 without any modifications.
Test case is not added as the one provided in the bugzilla ticket is too big 
and complex.

arm-none-linux-gnueabihf regression tested without any issues.

Is Okay to backport to branch 4.9?

Renlin Li


gcc/ChangeLog

2016-01-08  Renlin Li  

PR target/69082
Backport from mainline:
2015-08-24  Renlin Li  

* config/arm/arm-protos.h (arm_valid_symbolic_address_p): Declare.
* config/arm/arm.c (arm_valid_symbolic_address_p): Define.
* config/arm/arm.md (arm_movt): Use arm_valid_symbolic_address_p.
* config/arm/constraints.md ("j"): Add check for high code


On 19/08/15 15:37, Renlin Li wrote:



On 19/08/15 12:49, Renlin Li wrote:

Hi all,

This simple patch will tighten the conditions when matching movw and
arm_movt rtx pattern.
Those two patterns will generate the following assembly:

movw w1, #:lower16: dummy + addend
movt w1, #:upper16: dummy + addend

The addend here is optional. However, it should be an 16-bit signed
value with in the range -32768 <= A <= 32768.

By impose this restriction explicitly, it will prevent LRA/reload code
from generation invalid high/lo_sum code for arm target.
In process_address_1(), if the address is not legitimate, it will 
try to

generate high/lo_sum pair to put the address into register. It will
check if the target support those newly generated reload instructions.
By define those two patterns, arm will reject them if conditions is not
meet.

Otherwise, it might generate movw/movt instructions with addend larger
than 32768, this will cause a GAS error. GAS will produce '''offset out
of range'' error message when the addend for MOVW/MOVT REL 
relocation is

too large.


arm-none-eabi regression tests Okay, Okay to commit to the trunk and
backport to 5.0?

Regards,
Renlin

gcc/ChangeLog:

2015-08-19  Renlin Li  

   * config/arm/arm-protos.h (arm_valid_symbolic_address_p): 
Declare.

   * config/arm/arm.c (arm_valid_symbolic_address_p): Define.
   * config/arm/arm.md (arm_movt): Use 
arm_valid_symbolic_address_p.

   * config/arm/constraints.md ("j"): Add check for high code.


Thank you,
Renlin



diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index cef9eec..ff168bf 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -319,6 +319,7 @@ extern int vfp3_const_double_for_bits (rtx);
 
 extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
 	   rtx);
+extern bool arm_valid_symbolic_address_p (rtx);
 extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
 #endif /* RTX_CODE */
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c2095a3..7cc4d93 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28664,6 +28664,38 @@ arm_emit_coreregs_64bit_shift (enum rtx_code code, rtx out, rtx in,
   #undef BRANCH
 }
 
+/* Returns true if the pattern is a valid symbolic address, which is either a
+   symbol_ref or (symbol_ref + addend).
+
+   According to the ARM ELF ABI, the initial addend of REL-type relocations
+   processing MOVW and MOVT instructions is formed by interpreting the 16-bit
+   literal field of the instruction as a 16-bit signed value in the range
+   -32768 <= A < 32768.  */
+
+bool
+arm_valid_symbolic_address_p (rtx addr)
+{
+  rtx xop0, xop1 = NULL_RTX;
+  rtx tmp = addr;
+
+  if (GET_CODE (tmp) == SYMBOL_REF || GET_CODE (tmp) == LABEL_REF)
+return true;
+
+  /* (const (plus: symbol_ref const_int))  */
+  if (GET_CODE (addr) == CONST)
+tmp = XEXP (addr, 0);
+
+  if (GET_CODE (tmp) == PLUS)
+{
+  xop0 = XEXP (tmp, 0);
+  xop1 = XEXP (tmp, 1);
+
+  if (GET_CODE (xop0) == SYMBOL_REF && CONST_INT_P (xop1))
+	  return IN_RANGE (INTVAL (xop1), -0x8000, 0x7fff);
+}
+
+  return false;
+}
 
 /* Returns true if a valid comparison operation and makes
the operands in a form that is valid.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 288bbb9..eefb7fa 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5774,7 +5774,7 @@
   [(set (match_operand:SI 0 "nonimmediate_operand" "=r")
 	(lo_sum:SI (match_operand:SI 1 "nonimmediate_operand" "0")
 		   (match_operand:SI 2 "general_operand"  "i")))]
-  "arm_arch_thumb2"
+  "arm_arch_thumb2 && arm_valid_symbolic_address_p (operands[2])"
   "movt%?\t%0, #:upper16:%c2"
   [(set_attr "predicable" "yes")
(set_attr "predicable_short_it" "no")
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 42935a4..f9e11e0 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -67,7 +67,8 @@
 (define_constraint "j"
  "A constant suitable for a MOVW instruction. (ARM/Thumb-2)"

Re: [Patch ifcvt] Add a new parameter to limit if-conversion

2016-01-12 Thread Yuri Rumyantsev

Hi All,

Here is a simple fix to exclude dg/ifcvt-5.c test from ia64 testing.

Is it OK for trunk?
testsuite/ChangeLog:
2016-01-12  Yuri Rumyantsev  

PR rtl-optimization/68920
gcc/testsuite/ChangeLog
* gcc.dg/ifcvt-5.c: Exclude it from ia64 testing.

2016-01-12 17:01 GMT+03:00 Andreas Schwab :
> Yuri Rumyantsev  writes:
>
>> Is it OK for you if we exclude dg/ifcvt-5.c from ia64 testing
>
> Sure, go ahead.
>
> Andreas.
>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."

patch.1
Description: Binary data

Re: [PATCH : RL78] Disable interrupts during hardware multiplication routines

2016-01-12 Thread Mike Stump

On Jan 11, 2016, at 11:20 PM, Kaushik M Phatak  wrote:
> Kindly review the updated patch and let me know if it is OK.

My review comment is still outstanding.

Re: prevent "undef var" errors on gcc --help or --version

2016-01-12 Thread Olivier Hainque

Hello Bernd,

Thanks for your feedback on this :-)

> On 11 Jan 2016, at 17:09, Bernd Schmidt  wrote:
> 
> On 01/08/2016 02:23 PM, Olivier Hainque wrote:
>> +  /* Undefined variable references in specs are harmless if
>> + we're running for --help or --version alone, or together.  */
>> +  spec_undefvar_allowed =
>> +(((print_version || print_help_list)
>> +  && decoded_options_count == 2)
>> + ||
>> + ((print_version && print_help_list)
>> +  && decoded_options_count == 3));
>> +
> 
> This doesn't follow the formatting rules.

Arg, indeed. Revised version attached.

> Also, there are a couple of other options that cause gcc to just print 
> something and exit. Are these affected by missing env vars?

Some of these, for sure. For example, a common use case here is to
define a default --sysroot. We need this to be set properly for at
least --print-search-dirs and --print-prog-name, probably --print-file-name.

The print-multi family might be ok. It's heavily based on the presence
of other options on the command line, but maybe never depending on argument
values. I wasn't ready to bet though and opted for a conservative approach
first.

The attached patch is doing the same as the previous one, except more
explicitly and making it easier to adapt if deemed useful.

I could extract the decision code in a separate function if you prefer.

Olivier

spec-undef.diff
Description: Binary data

Re: prevent "undef var" errors on gcc --help or --version

2016-01-12 Thread Bernd Schmidt




On 01/12/2016 05:11 PM, Olivier Hainque wrote:

+  /* Decide if undefined variable references are allowed in specs.  */
+  {
+/* --version and --help alone or together are safe.  Note that -v would
+   make them unsafe, as they'd then be run for subprocesses as well, the
+   location of which might depend on variables possibly coming from
+   self-specs.  */
+
+/* Count the number of options we have for which undefined variables
+   are harmless for sure, and check that nothing else is set.  */
+
+unsigned n_varsafe_options = 0;
+


I think you can do without the outer braces. Ok with those removed.


Bernd

[Committed, PATCH] Sync top-level configure.ac with binutils-gdb

2016-01-12 Thread H.J. Lu

diff --git a/ChangeLog b/ChangeLog
index 1c5330a..4821c1f 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,13 @@
+2016-01-12  H.J. Lu  
+
+   Sync with binutils-gdb:
+   2015-10-21  Nick Clifton  
+
+   PR gas/19109
+   * configure.ac: Note the 'none' is an acceptable argument to
+   --enable-compressed-debug-sections.
+   * configure: Regenerate.
+
 2016-01-12  Bernd Edlinger  
 
PR bootstrap/69134
diff --git a/configure b/configure
index f5786ed..19eb2a4 100755
--- a/configure
+++ b/configure
@@ -1477,7 +1477,7 @@ Optional Features:
   offload target compiler during the build
   --enable-gold[=ARG] build gold [ARG={default,yes,no}]
   --enable-ld[=ARG]   build ld [ARG={default,yes,no}]
-  --enable-compressed-debug-sections={all,gas,gold,ld}
+  --enable-compressed-debug-sections={all,gas,gold,ld,none}
   Enable compressed debug sections for gas, gold or ld
   by default
   --disable-libquadmath   do not build libquadmath directory
diff --git a/configure.ac b/configure.ac
index a719e03..0ae53ac 100644
--- a/configure.ac
+++ b/configure.ac
@@ -397,7 +397,7 @@ esac
 # Decide the default method for compressing debug sections.
 # Provide a configure time option to override our default.
 AC_ARG_ENABLE(compressed_debug_sections,
-[AS_HELP_STRING([--enable-compressed-debug-sections={all,gas,gold,ld}],
+[AS_HELP_STRING([--enable-compressed-debug-sections={all,gas,gold,ld,none}],
[Enable compressed debug sections for gas, gold or ld by
 default])],
 [

Re: [4.9][PR69082]Backport "[PATCH][ARM]Tighten the conditions for arm_movw, arm_movt"

2016-01-12 Thread Richard Earnshaw (lists)

On 12/01/16 15:31, Renlin Li wrote:
> Hi all,
> 
> Here I backport r227129 to branch 4.9 to fix exactly the same issue
> reported in PR69082.
> It's been already committed on trunk and backportted to branch 5.
> 
> 
> I have quoted the original message for the explanation.
> The patch applies to branch 4.9 without any modifications.
> Test case is not added as the one provided in the bugzilla ticket is too
> big and complex.
> 
> arm-none-linux-gnueabihf regression tested without any issues.
> 
> Is Okay to backport to branch 4.9?
> 
> Renlin Li
> 
> 
> gcc/ChangeLog
> 
> 2016-01-08  Renlin Li  
> 
> PR target/69082
> Backport from mainline:
> 2015-08-24  Renlin Li  
> 
> * config/arm/arm-protos.h (arm_valid_symbolic_address_p): Declare.
> * config/arm/arm.c (arm_valid_symbolic_address_p): Define.
> * config/arm/arm.md (arm_movt): Use arm_valid_symbolic_address_p.
> * config/arm/constraints.md ("j"): Add check for high code
> 
> 

OK.

R.

> On 19/08/15 15:37, Renlin Li wrote:
>>
>>> On 19/08/15 12:49, Renlin Li wrote:
 Hi all,

 This simple patch will tighten the conditions when matching movw and
 arm_movt rtx pattern.
 Those two patterns will generate the following assembly:

 movw w1, #:lower16: dummy + addend
 movt w1, #:upper16: dummy + addend

 The addend here is optional. However, it should be an 16-bit signed
 value with in the range -32768 <= A <= 32768.

 By impose this restriction explicitly, it will prevent LRA/reload code
 from generation invalid high/lo_sum code for arm target.
 In process_address_1(), if the address is not legitimate, it will
 try to
 generate high/lo_sum pair to put the address into register. It will
 check if the target support those newly generated reload instructions.
 By define those two patterns, arm will reject them if conditions is not
 meet.

 Otherwise, it might generate movw/movt instructions with addend larger
 than 32768, this will cause a GAS error. GAS will produce '''offset out
 of range'' error message when the addend for MOVW/MOVT REL
 relocation is
 too large.


 arm-none-eabi regression tests Okay, Okay to commit to the trunk and
 backport to 5.0?

 Regards,
 Renlin

 gcc/ChangeLog:

 2015-08-19  Renlin Li  

* config/arm/arm-protos.h (arm_valid_symbolic_address_p):
 Declare.
* config/arm/arm.c (arm_valid_symbolic_address_p): Define.
* config/arm/arm.md (arm_movt): Use
 arm_valid_symbolic_address_p.
* config/arm/constraints.md ("j"): Add check for high code.
>>
>> Thank you,
>> Renlin
>>
> 
> 
> backport.diff
> 
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index cef9eec..ff168bf 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -319,6 +319,7 @@ extern int vfp3_const_double_for_bits (rtx);
>  
>  extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx,
>  rtx);
> +extern bool arm_valid_symbolic_address_p (rtx);
>  extern bool arm_validize_comparison (rtx *, rtx *, rtx *);
>  #endif /* RTX_CODE */
>  
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index c2095a3..7cc4d93 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -28664,6 +28664,38 @@ arm_emit_coreregs_64bit_shift (enum rtx_code code, 
> rtx out, rtx in,
>#undef BRANCH
>  }
>  
> +/* Returns true if the pattern is a valid symbolic address, which is either a
> +   symbol_ref or (symbol_ref + addend).
> +
> +   According to the ARM ELF ABI, the initial addend of REL-type relocations
> +   processing MOVW and MOVT instructions is formed by interpreting the 16-bit
> +   literal field of the instruction as a 16-bit signed value in the range
> +   -32768 <= A < 32768.  */
> +
> +bool
> +arm_valid_symbolic_address_p (rtx addr)
> +{
> +  rtx xop0, xop1 = NULL_RTX;
> +  rtx tmp = addr;
> +
> +  if (GET_CODE (tmp) == SYMBOL_REF || GET_CODE (tmp) == LABEL_REF)
> +return true;
> +
> +  /* (const (plus: symbol_ref const_int))  */
> +  if (GET_CODE (addr) == CONST)
> +tmp = XEXP (addr, 0);
> +
> +  if (GET_CODE (tmp) == PLUS)
> +{
> +  xop0 = XEXP (tmp, 0);
> +  xop1 = XEXP (tmp, 1);
> +
> +  if (GET_CODE (xop0) == SYMBOL_REF && CONST_INT_P (xop1))
> +   return IN_RANGE (INTVAL (xop1), -0x8000, 0x7fff);
> +}
> +
> +  return false;
> +}
>  
>  /* Returns true if a valid comparison operation and makes
> the operands in a form that is valid.  */
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 288bbb9..eefb7fa 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -5774,7 +5774,7 @@
>[(set (match_operand:SI 0 "nonimmediate_operand" "=r")
>   (lo_sum:SI (match_operand:SI 1 "nonimmediate_operand" "0")
>  (match_operand

Re: C++ PATCH to abate shift warnings (PR c++/68979)

2016-01-12 Thread Jason Merrill


OK.

Jason

Re: prevent "undef var" errors on gcc --help or --version

2016-01-12 Thread Olivier Hainque


> On 12 Jan 2016, at 17:14, Bernd Schmidt  wrote:
> 
> I think you can do without the outer braces. Ok with those removed.

Great! Thanks for the review and comments.

With Kind Regards,

Olivier

[trans-mem, aa64, arm, ppc, s390] Fixing PR68964

2016-01-12 Thread Richard Henderson

The problem in this PR is that we never got around to flushing out the vector
support for transactions for anything but x86.  My goal here is to make this as
generic as possible, so that it should Just Work with existing vector support
in the backend.

In addition, if I encounter other unexpected register types, I will now copy
them to memory and use memcpy, rather than crash.

The one piece of this that requires a tiny bit of extra work is enabling the
vector entry points in libitm.

For x86, we make sure to build the files with SSE or AVX support enabled.  For
s390x, I do the same thing, enabling z13 support.  I suppose we might need to
check for binutils support, but I'd rather do this only if necessary.

For arm I'm less sure what to do, since I seem to recall that use of Neon sets
a bit in the ELF header.  Which presumably means that the binary could no
longer be run without neon, even though the entry points wouldn't be used.

For powerpc, I don't know how to select Altivec if VSX isn't already enabled,
or indeed if that's the best thing to do.


Thanks for the review,


r~
PR tree-opt/68964
* target.def (builtin_tm_load, builtin_tm_store): Remove.
* config/i386/i386.c (ix86_builtin_tm_load): Remove.
(ix86_builtin_tm_store): Remove.
(TARGET_VECTORIZE_BUILTIN_TM_LOAD): Remove.
(TARGET_VECTORIZE_BUILTIN_TM_STORE): Remove.
* doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_TM_LOAD): Remove.
(TARGET_VECTORIZE_BUILTIN_TM_STORE): Remove.
* doc/tm.texi: Rebuild.

* gtm-builtins.def (BUILT_IN_TM_MEMCPY_RNWT): New.
(BUILT_IN_TM_MEMCPY_RTWN): New.
* trans-mem.c (tm_log_emit_stmt): Rearrange code for better
fallback from vector to integer helpers.
(build_tm_load): Handle vector types directly, instead of
via target hook.
(build_tm_store): Likewise.
(expand_assign_tm): Prepare for register types not handled by
the above.  Copy them to memory and use memcpy.
* tree.c (tm_define_builtin): New.
(find_tm_vector_type): New.
(build_tm_vector_builtins): New.
(build_common_builtin_nodes): Call it.

gcc/testsuite/
* gcc.dg/tm/memopt-13.c: Update expected function.
* gcc.dg/tm/memopt-6.c: Likewise.

libitm/
* Makefile.am (libitm_la_SOURCES) [ARCH_AARCH64]: Add neon.cc
(libitm_la_SOURCES) [ARCH_ARM]: Add neon.cc
(libitm_la_SOURCES) [ARCH_PPC]: Add vect.cc
(libitm_la_SOURCES) [ARCH_S390]: Add vx.cc
* configure.ac (ARCH_AARCH64): New conditional.
(ARCH_PPC, ARCH_S390): Likewise.
* Makefile.in, configure: Rebuild.

* libitm.h (_ITM_TYPE_M128): Always define.
* config/generic/dispatch-m64.cc: Split ...
* config/generic/dispatch-m128.cc: ... out of...
* config/x86/x86_sse.cc: ... here.
* config/aarch64/neon.cc: New file.
* config/arm/neon.cc: New file.
* config/powerpc/vect.cc: New file.


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ed91e5d..0b31ccd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -35214,48 +35214,6 @@ static const struct builtin_description bdesc_tm[] =
   { OPTION_MASK_ISA_AVX, CODE_FOR_nothing, "__builtin__ITM_LM256", (enum 
ix86_builtins) BUILT_IN_TM_LOG_M256, UNKNOWN, VOID_FTYPE_PCVOID },
 };
 
-/* TM callbacks.  */
-
-/* Return the builtin decl needed to load a vector of TYPE.  */
-
-static tree
-ix86_builtin_tm_load (tree type)
-{
-  if (TREE_CODE (type) == VECTOR_TYPE)
-{
-  switch (tree_to_uhwi (TYPE_SIZE (type)))
-   {
-   case 64:
- return builtin_decl_explicit (BUILT_IN_TM_LOAD_M64);
-   case 128:
- return builtin_decl_explicit (BUILT_IN_TM_LOAD_M128);
-   case 256:
- return builtin_decl_explicit (BUILT_IN_TM_LOAD_M256);
-   }
-}
-  return NULL_TREE;
-}
-
-/* Return the builtin decl needed to store a vector of TYPE.  */
-
-static tree
-ix86_builtin_tm_store (tree type)
-{
-  if (TREE_CODE (type) == VECTOR_TYPE)
-{
-  switch (tree_to_uhwi (TYPE_SIZE (type)))
-   {
-   case 64:
- return builtin_decl_explicit (BUILT_IN_TM_STORE_M64);
-   case 128:
- return builtin_decl_explicit (BUILT_IN_TM_STORE_M128);
-   case 256:
- return builtin_decl_explicit (BUILT_IN_TM_STORE_M256);
-   }
-}
-  return NULL_TREE;
-}
-
 /* Initialize the transactional memory vector load/store builtins.  */
 
 static void
@@ -54341,12 +54299,6 @@ ix86_addr_space_zero_address_valid (addr_space_t as)
 #define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
   ix86_builtin_vectorized_function
 
-#undef TARGET_VECTORIZE_BUILTIN_TM_LOAD
-#define TARGET_VECTORIZE_BUILTIN_TM_LOAD ix86_builtin_tm_load
-
-#undef TARGET_VECTORIZE_BUILTIN_TM_STORE
-#define TARGET_VECTORIZE_BUILTIN_TM_STORE ix86_builtin_tm_store
-
 #undef TARGET_VECTORIZE_BUILTIN_GATHER
 #define TARGET_VECTORIZE_BUILTIN_GATHER ix86_vec

[PATCH] DWARF: add abstract origin links on lexical blocks DIEs

2016-01-12 Thread Pierre-Marie de Rodat


Hello,

Although the following patch does not fix a regression, I believe it 
fixes a bug visible from a debugger, so I think it’s a valid candidate 
at this stage.


This change tracks from which abstract lexical block concrete ones come 
from in DWARF so that debuggers can inherit the former from the latter. 
This enables debuggers to properly handle the following case:


  * function Child2 is nested in a lexical block, itself nested in
function Child1;
  * function Child1 is inlined into some call site;
  * function Child2 is never inlined.

Here, Child2 is described in DWARF only in the abstract instance of 
Child1. So when debuggers decode Child1's concrete instances, they need 
to fetch the definition for Child2 in the corresponding abstract 
instance: the DW_AT_abstract_origin link on the lexical block that 
embeds Child1 enables them to do that.


Bootstrapped and regtested on x86_64-linux.
Ok to commit? Thank you in advance!

gcc/ChangeLog:

* dwarf2out.c (add_abstract_origin_attribute): Adjust
documentation comment.  For BLOCK nodes, add a
DW_AT_abstract_origin attribute that points to the DIE generated
for the origin BLOCK.
(gen_lexical_block_die): Call add_abstract_origin_attribute for
blocks from inlined functions.
---
 gcc/dwarf2out.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index da5524e..a889dbb 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -18463,15 +18463,16 @@ add_prototyped_attribute (dw_die_ref die, tree 
func_type)

 }
  /* Add an 'abstract_origin' attribute below a given DIE.  The DIE is 
found

-   by looking in either the type declaration or object declaration
-   equate table.  */
+   by looking in the type declaration, the object declaration equate 
table or

+   the block mapping.  */
  static inline dw_die_ref
 add_abstract_origin_attribute (dw_die_ref die, tree origin)
 {
   dw_die_ref origin_die = NULL;
 -  if (TREE_CODE (origin) != FUNCTION_DECL)
+  if (TREE_CODE (origin) != FUNCTION_DECL
+  && TREE_CODE (origin) != BLOCK)
 {
   /* We may have gotten separated from the block for the inlined
 function, if we're in an exception handler or some such; make
@@ -18493,6 +18494,8 @@ add_abstract_origin_attribute (dw_die_ref die, 
tree origin)

 origin_die = lookup_decl_die (origin);
   else if (TYPE_P (origin))
 origin_die = lookup_type_die (origin);
+  else if (TREE_CODE (origin) == BLOCK)
+origin_die = BLOCK_DIE (origin);
/* XXX: Functions that are never lowered don't always have correct 
block
  trees (in the case of java, they simply have no block tree, in 
some other
@@ -21294,6 +21297,10 @@ gen_lexical_block_die (tree stmt, dw_die_ref 
context_die)

  BLOCK_DIE (stmt) = stmt_die;
  old_die = NULL;
}
+
+  tree origin = block_ultimate_origin (stmt);
+  if (origin != NULL_TREE && origin != stmt)
+   add_abstract_origin_attribute (stmt_die, origin);
 }
if (old_die)
--
2.3.3.199.g52cae64

Re: [PATCH] OpenACC documentation for libgomp

2016-01-12 Thread James Norris


Hi!

On 01/11/2016 11:35 AM, Jakub Jelinek wrote:

On Tue, Jan 05, 2016 at 09:47:59AM -0600, James Norris wrote:

I've updated the original patch after some very helpful
comments from Sandra (thank you, thank you).


I'd prefer if OpenMP
* Enabling OpenMP::How to enable OpenMP for your applications.
* Runtime Library Routines::   The OpenMP runtime application programming
interface.
* Environment Variables::  Influencing runtime behavior with environment
variables.
chapters precede the OpenACC chapters, most libgomp users are not really
using any offloading, which is new, but using OpenMP for host
parallelization, and only far fewer users are actually trying some
acceleration, whether OpenACC or OpenMP offloading parts.


OpenACC content has been moved after the OpenMP content.



As Bernd found, there are some UTF-8 quotes or what in the patch, those
need to be replaced by some texinfo markup, say


+sections 4.1 and 4.2 of the ???The OpenACC
+Application Programming Interface???, Version 2.0, June, 2013.}.


@uref{http://www.openacc.org/, OpenACC Application Programming Interface, 
Version 2.0, June, 2013}
or something similar.


Those were double quotes and have been changed to @uref's.

Patch commited to trunk

Thanks for taking time for the review.

Jim


Index: ChangeLog
===
--- ChangeLog	(revision 232278)
+++ ChangeLog	(working copy)
@@ -1,3 +1,7 @@
+2016-01-12  James Norris  
+
+	* libgomp.texi: Updates for OpenACC.
+
 2016-01-11  Alexander Monakov  
 
 	* plugin/plugin-nvptx.c (link_ptx): Do not set CU_JIT_TARGET.
Index: libgomp.texi
===
--- libgomp.texi	(revision 232278)
+++ libgomp.texi	(working copy)
@@ -99,6 +99,16 @@
interface.
 * Environment Variables::  Influencing runtime behavior with environment 
variables.
+* Enabling OpenACC::   How to enable OpenACC for your
+   applications.
+* OpenACC Runtime Library Routines:: The OpenACC runtime application
+   programming interface.
+* OpenACC Environment Variables:: Influencing OpenACC runtime behavior with
+   environment variables.
+* CUDA Streams Usage:: Notes on the implementation of
+   asynchronous operations.
+* OpenACC Library Interoperability:: OpenACC library interoperability with the
+   NVIDIA CUBLAS library.
 * The libgomp ABI::Notes on the external ABI presented by libgomp.
 * Reporting Bugs:: How to report bugs in the GNU Offloading and
Multi Processing Runtime Library.
@@ -1790,6 +1800,1272 @@
 
 
 @c -
+@c Enabling OpenACC
+@c -
+
+@node Enabling OpenACC
+@chapter Enabling OpenACC
+
+To activate the OpenACC extensions for C/C++ and Fortran, the compile-time 
+flag @option{-fopenacc} must be specified.  This enables the OpenACC directive
+@code{#pragma acc} in C/C++ and @code{!$accp} directives in free form,
+@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
+@code{!$} conditional compilation sentinels in free form and @code{c$},
+@code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
+arranges for automatic linking of the OpenACC runtime library 
+(@ref{OpenACC Runtime Library Routines}).
+
+A complete description of all OpenACC directives accepted may be found in 
+the @uref{http://www.openacc.org/, OpenACC} Application Programming
+Interface manual, version 2.0.
+
+Note that this is an experimental feature and subject to
+change in future versions of GCC.  See
+@uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
+
+
+
+@c -
+@c OpenACC Runtime Library Routines
+@c -
+
+@node OpenACC Runtime Library Routines
+@chapter OpenACC Runtime Library Routines
+
+The runtime routines described here are defined by section 3 of the OpenACC
+specifications in version 2.0.
+They have C linkage, and do not throw exceptions.
+Generally, they are available only for the host, with the exception of
+@code{acc_on_device}, which is available for both the host and the
+acceleration device.
+
+@menu
+* acc_get_num_devices:: Get number of devices for the given device
+type.
+* acc_set_device_type:: Set type of device accelerator to use.
+* acc_get_device_type:: Get type of device accelerator to be used.
+* acc_set_device_num::  Set device number to use.
+* acc_get_device_num::  Get d

Re: [trans-mem, aa64, arm, ppc, s390] Fixing PR68964

2016-01-12 Thread Richard Earnshaw (lists)

On 12/01/16 16:53, Richard Henderson wrote:
> The problem in this PR is that we never got around to flushing out the vector
> support for transactions for anything but x86.  My goal here is to make this 
> as
> generic as possible, so that it should Just Work with existing vector support
> in the backend.
> 
> In addition, if I encounter other unexpected register types, I will now copy
> them to memory and use memcpy, rather than crash.
> 
> The one piece of this that requires a tiny bit of extra work is enabling the
> vector entry points in libitm.
> 
> For x86, we make sure to build the files with SSE or AVX support enabled.  For
> s390x, I do the same thing, enabling z13 support.  I suppose we might need to
> check for binutils support, but I'd rather do this only if necessary.
> 
> For arm I'm less sure what to do, since I seem to recall that use of Neon sets
> a bit in the ELF header.  Which presumably means that the binary could no
> longer be run without neon, even though the entry points wouldn't be used.

No, we don't use bits in the elf headers: there wouldn't be enough of
them!  Instead we use build attributes to record user intentions.  These
are (normally) derived from .arch and .fpu directives.

For normal core attributes you can use .object_arch to force the .arch
entry recorded in the attributes to a specific value, but I'm not sure
if you can override the .fpu directive in this way.  You might have to
experiment a bit.  Alternatively you might be able to force out the
relevant build attributes using .eabi_attribute to record some explicit
values (which then override the values that would be normally detected).

R.


> 
> For powerpc, I don't know how to select Altivec if VSX isn't already enabled,
> or indeed if that's the best thing to do.
> 
> 
> Thanks for the review,
> 
> 
> r~
> 
> 
> d-68964
> 
> 
>   PR tree-opt/68964
>   * target.def (builtin_tm_load, builtin_tm_store): Remove.
>   * config/i386/i386.c (ix86_builtin_tm_load): Remove.
>   (ix86_builtin_tm_store): Remove.
>   (TARGET_VECTORIZE_BUILTIN_TM_LOAD): Remove.
>   (TARGET_VECTORIZE_BUILTIN_TM_STORE): Remove.
>   * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_TM_LOAD): Remove.
>   (TARGET_VECTORIZE_BUILTIN_TM_STORE): Remove.
>   * doc/tm.texi: Rebuild.
> 
>   * gtm-builtins.def (BUILT_IN_TM_MEMCPY_RNWT): New.
>   (BUILT_IN_TM_MEMCPY_RTWN): New.
>   * trans-mem.c (tm_log_emit_stmt): Rearrange code for better
>   fallback from vector to integer helpers.
>   (build_tm_load): Handle vector types directly, instead of
>   via target hook.
>   (build_tm_store): Likewise.
>   (expand_assign_tm): Prepare for register types not handled by
>   the above.  Copy them to memory and use memcpy.
>   * tree.c (tm_define_builtin): New.
>   (find_tm_vector_type): New.
>   (build_tm_vector_builtins): New.
>   (build_common_builtin_nodes): Call it.
> 
> gcc/testsuite/
>   * gcc.dg/tm/memopt-13.c: Update expected function.
>   * gcc.dg/tm/memopt-6.c: Likewise.
> 
> libitm/
>   * Makefile.am (libitm_la_SOURCES) [ARCH_AARCH64]: Add neon.cc
>   (libitm_la_SOURCES) [ARCH_ARM]: Add neon.cc
>   (libitm_la_SOURCES) [ARCH_PPC]: Add vect.cc
>   (libitm_la_SOURCES) [ARCH_S390]: Add vx.cc
>   * configure.ac (ARCH_AARCH64): New conditional.
>   (ARCH_PPC, ARCH_S390): Likewise.
>   * Makefile.in, configure: Rebuild.
> 
>   * libitm.h (_ITM_TYPE_M128): Always define.
>   * config/generic/dispatch-m64.cc: Split ...
>   * config/generic/dispatch-m128.cc: ... out of...
>   * config/x86/x86_sse.cc: ... here.
>   * config/aarch64/neon.cc: New file.
>   * config/arm/neon.cc: New file.
>   * config/powerpc/vect.cc: New file.
> 
> 
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index ed91e5d..0b31ccd 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -35214,48 +35214,6 @@ static const struct builtin_description bdesc_tm[] =
>{ OPTION_MASK_ISA_AVX, CODE_FOR_nothing, "__builtin__ITM_LM256", (enum 
> ix86_builtins) BUILT_IN_TM_LOG_M256, UNKNOWN, VOID_FTYPE_PCVOID },
>  };
>  
> -/* TM callbacks.  */
> -
> -/* Return the builtin decl needed to load a vector of TYPE.  */
> -
> -static tree
> -ix86_builtin_tm_load (tree type)
> -{
> -  if (TREE_CODE (type) == VECTOR_TYPE)
> -{
> -  switch (tree_to_uhwi (TYPE_SIZE (type)))
> - {
> - case 64:
> -   return builtin_decl_explicit (BUILT_IN_TM_LOAD_M64);
> - case 128:
> -   return builtin_decl_explicit (BUILT_IN_TM_LOAD_M128);
> - case 256:
> -   return builtin_decl_explicit (BUILT_IN_TM_LOAD_M256);
> - }
> -}
> -  return NULL_TREE;
> -}
> -
> -/* Return the builtin decl needed to store a vector of TYPE.  */
> -
> -static tree
> -ix86_builtin_tm_store (tree type)
> -{
> -  if (TREE_CODE (type) == VECTOR_TYPE)
> -{
> -  switch (tree_to_uhwi (TYPE_SIZE (type)))
> - {
> - case 64:
> -

Re: [trans-mem, aa64, arm, ppc, s390] Fixing PR68964

2016-01-12 Thread Richard Earnshaw (lists)

On 12/01/16 17:16, Richard Earnshaw (lists) wrote:
> On 12/01/16 16:53, Richard Henderson wrote:
>> The problem in this PR is that we never got around to flushing out the vector
>> support for transactions for anything but x86.  My goal here is to make this 
>> as
>> generic as possible, so that it should Just Work with existing vector support
>> in the backend.
>>
>> In addition, if I encounter other unexpected register types, I will now copy
>> them to memory and use memcpy, rather than crash.
>>
>> The one piece of this that requires a tiny bit of extra work is enabling the
>> vector entry points in libitm.
>>
>> For x86, we make sure to build the files with SSE or AVX support enabled.  
>> For
>> s390x, I do the same thing, enabling z13 support.  I suppose we might need to
>> check for binutils support, but I'd rather do this only if necessary.
>>
>> For arm I'm less sure what to do, since I seem to recall that use of Neon 
>> sets
>> a bit in the ELF header.  Which presumably means that the binary could no
>> longer be run without neon, even though the entry points wouldn't be used.
> 
> No, we don't use bits in the elf headers: there wouldn't be enough of
> them!  Instead we use build attributes to record user intentions.  These
> are (normally) derived from .arch and .fpu directives.
> 
> For normal core attributes you can use .object_arch to force the .arch
> entry recorded in the attributes to a specific value, but I'm not sure
> if you can override the .fpu directive in this way.  You might have to
> experiment a bit.  Alternatively you might be able to force out the
> relevant build attributes using .eabi_attribute to record some explicit
> values (which then override the values that would be normally detected).
> 

BTW, the above only applies to AArch32 (traditional ARM), AArch64
doesn't put any marking out -- we assume that Neon is available.

R.

> R.
> 
> 
>>
>> For powerpc, I don't know how to select Altivec if VSX isn't already enabled,
>> or indeed if that's the best thing to do.
>>
>>
>> Thanks for the review,
>>
>>
>> r~
>>
>>
>> d-68964
>>
>>
>>  PR tree-opt/68964
>>  * target.def (builtin_tm_load, builtin_tm_store): Remove.
>>  * config/i386/i386.c (ix86_builtin_tm_load): Remove.
>>  (ix86_builtin_tm_store): Remove.
>>  (TARGET_VECTORIZE_BUILTIN_TM_LOAD): Remove.
>>  (TARGET_VECTORIZE_BUILTIN_TM_STORE): Remove.
>>  * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_TM_LOAD): Remove.
>>  (TARGET_VECTORIZE_BUILTIN_TM_STORE): Remove.
>>  * doc/tm.texi: Rebuild.
>>
>>  * gtm-builtins.def (BUILT_IN_TM_MEMCPY_RNWT): New.
>>  (BUILT_IN_TM_MEMCPY_RTWN): New.
>>  * trans-mem.c (tm_log_emit_stmt): Rearrange code for better
>>  fallback from vector to integer helpers.
>>  (build_tm_load): Handle vector types directly, instead of
>>  via target hook.
>>  (build_tm_store): Likewise.
>>  (expand_assign_tm): Prepare for register types not handled by
>>  the above.  Copy them to memory and use memcpy.
>>  * tree.c (tm_define_builtin): New.
>>  (find_tm_vector_type): New.
>>  (build_tm_vector_builtins): New.
>>  (build_common_builtin_nodes): Call it.
>>
>> gcc/testsuite/
>>  * gcc.dg/tm/memopt-13.c: Update expected function.
>>  * gcc.dg/tm/memopt-6.c: Likewise.
>>
>> libitm/
>>  * Makefile.am (libitm_la_SOURCES) [ARCH_AARCH64]: Add neon.cc
>>  (libitm_la_SOURCES) [ARCH_ARM]: Add neon.cc
>>  (libitm_la_SOURCES) [ARCH_PPC]: Add vect.cc
>>  (libitm_la_SOURCES) [ARCH_S390]: Add vx.cc
>>  * configure.ac (ARCH_AARCH64): New conditional.
>>  (ARCH_PPC, ARCH_S390): Likewise.
>>  * Makefile.in, configure: Rebuild.
>>
>>  * libitm.h (_ITM_TYPE_M128): Always define.
>>  * config/generic/dispatch-m64.cc: Split ...
>>  * config/generic/dispatch-m128.cc: ... out of...
>>  * config/x86/x86_sse.cc: ... here.
>>  * config/aarch64/neon.cc: New file.
>>  * config/arm/neon.cc: New file.
>>  * config/powerpc/vect.cc: New file.
>>
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index ed91e5d..0b31ccd 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -35214,48 +35214,6 @@ static const struct builtin_description bdesc_tm[] =
>>{ OPTION_MASK_ISA_AVX, CODE_FOR_nothing, "__builtin__ITM_LM256", (enum 
>> ix86_builtins) BUILT_IN_TM_LOG_M256, UNKNOWN, VOID_FTYPE_PCVOID },
>>  };
>>  
>> -/* TM callbacks.  */
>> -
>> -/* Return the builtin decl needed to load a vector of TYPE.  */
>> -
>> -static tree
>> -ix86_builtin_tm_load (tree type)
>> -{
>> -  if (TREE_CODE (type) == VECTOR_TYPE)
>> -{
>> -  switch (tree_to_uhwi (TYPE_SIZE (type)))
>> -{
>> -case 64:
>> -  return builtin_decl_explicit (BUILT_IN_TM_LOAD_M64);
>> -case 128:
>> -  return builtin_decl_explicit (BUILT_IN_TM_LOAD_M128);
>> -case 256:
>> -  return builtin_decl_explicit (BUILT_IN_TM_LOAD_M256);
>> -}
>> -}
>> -  return NULL_TREE;

Re: [patch] Avoid an unwanted decl re-map in copy_gimple_seq_and_replace_locals

2016-01-12 Thread Martin Jambor

Hi,

On Mon, Jan 11, 2016 at 05:38:47PM +0100, Jakub Jelinek wrote:
> On Mon, Jan 11, 2016 at 09:41:31AM +0100, Richard Biener wrote:
> > Hum.  Can't you check id->remapping_type_depth?

For some reason, last week I reached the conclusion that no.  But I
must have done something wrong because I have tested it again today
and just never creating a new decl in remap_decl if
id->remapping_type_depth is non zero is good enough for my testcase
and it survives bootstrap and testing too (previously I thought it did
not).

id->remapping_type_depth seems to be incremented for DECL_VALUE_EXPR
id->as well, so it actually might help in that situation too.

> That said, how do
> > we end up recursing into remap_decl when copying the variable length
> > decl/type?  Can't we avoid the recursion (basically avoid remapping
> > variable-size types at all?)

Here I agree with Jakub that there are situations where we have to.
There is a comment towards the end of remap_type_1 saying that when
remapping types, all required decls should have already been mapped.
If that is correct, and I belive it is, the remapping_type_depth test
should be fine.

> 
> I guess it depends, VLA types that refer in their various gimplified
> expressions only to decls defined outside of bind stmts we are duplicating
> are fine as is, they don't need remapping, or could be remapped to VLA types
> that use all the same temporary decls.
> VLAs that have some or all references to decls inside of the bind stmts
> we are duplicating IMHO need to be remapped.
> So, perhaps we need to remap_decls in replace_locals_stmt in two phases
> in presence of VLAs (or also vars with DECL_VALUE_EXPR)

I'm a bit worried what would happen do local DECLs that are pointers
to VLAs, because...

> - phase 1 would just walk the
>   for (old_var = decls; old_var; old_var = DECL_CHAIN (old_var))
> {
>   if (!can_be_nonlocal (old_var, id)
> && ! variably_modified_type_p (TREE_TYPE (old_var), id->src_fn))

...variably_modified_type_p seems to return true for them and...

>   remap_decl (old_var, id);
> }
> - phase 2 - do the full remap_decls, but during that arrange that
>   remap_decl for non-zero id->remapping_type_depth if (!n) just returns
>   decl

...they would not be copied here because remap_decl would not be
duplicating stuff.  So I'd end up with an original local decl when I
actually need a duplicate.

But let me go with just checking the remapping_type_depth for now.

Thanks for looking into this,

Martin

> That way, I think if the types refer to some temporaries that are defined
> in the bind stmts being copied, they will be properly duplicated, otherwise
> they will be shared.
> So, we'd need some flag in *id (just bool bitfield would be enough) that would
> allow replace_locals_stmt to set it before the remap_decls call in phase 2
> and clear it afterwards, and use that flag together with
> id->remapping_type_depth in remap_decls.
> 
>   Jakub

Re: [trans-mem, aa64, arm, ppc, s390] Fixing PR68964

2016-01-12 Thread Richard Henderson

On 01/12/2016 09:16 AM, Richard Earnshaw (lists) wrote:
> For normal core attributes you can use .object_arch to force the .arch
> entry recorded in the attributes to a specific value, but I'm not sure
> if you can override the .fpu directive in this way.  You might have to
> experiment a bit.  Alternatively you might be able to force out the
> relevant build attributes using .eabi_attribute to record some explicit
> values (which then override the values that would be normally detected).

Ouch.  Any chance I can get some help from arm folk about this?

I don't know how much this would affect anything in practice.

In particular, it's much more common for arm32 to configure
for the exact cpu+vfp variant that will be in use.  E.g. the
Fedora armv7+hardfloat settings.  For cases like that, __ARM_NEON
will be set, and the file will build just fine.

r~

[Patch,microblaze]: Optimized register reorganization for Microblaze.

2016-01-12 Thread Ajit Kumar Agarwal

The patch contains the changes in the macros fixed_registers and 
call_used_registers.
Earlier the register r21 is marked as fixed and also marked as 1 for call_used 
registers.
On top of that r21 is not assigned to any of the temporaries in rtl insns.

This makes the usage of registers r21 in the callee functions not possible and 
wasting
one registers to allocate in the callee function. The changes makes the 
register r21 as
allocatable to the callee function and optimized usage of the registers r21 in 
the callee
function reduces spill and fetch. The reduction in the spill and fetch is due 
to availability
of register r21 in the callee function. The availability of register r21 is 
made by
marking the register r21 as not fixed and the call_used registers is marked as 
0.
Also r20 is marked as fixed. The changes are done not to mark as fixed thus 
allowing
the registers to be used reducing the spill and fetch.

Regtested for Microblaze.

Performance runs made on Mibench/EEMBC benchmarks for microblaze. Following 
benchmarks 
shows the gains

 Benchmarks Gains
automotive_qsort1 =3.96%
automotive_susan_c = 7.68%
consumer_mad =9.6%
security_rijndael_d =19.57%
telecom_CRC32 =   7.66%
bitmnp01_lite =  10.61%
a2time01_lite =6.97%

ChangeLog:
2016-01-12  Ajit Agarwal  

* config/microblaze/microblaze.h
(FIXED_REGISTERS): Update in macro.
(CALL_USED_REGISTERS): Update in macro.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com.
---
 gcc/config/microblaze/microblaze.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/microblaze/microblaze.h 
b/gcc/config/microblaze/microblaze.h
index e115c42..dbfb652 100644
--- a/gcc/config/microblaze/microblaze.h
+++ b/gcc/config/microblaze/microblaze.h
@@ -253,14 +253,14 @@ extern enum pipeline_type microblaze_pipe;
 #define FIXED_REGISTERS
\
 {  \
   1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,  \
-  1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  \
+  1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  \
   1, 1, 1, 1   \
 }
 
 #define CALL_USED_REGISTERS\
 {  \
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  \
-  1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  \
+  1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  \
   1, 1, 1, 1   \
 }
 #define GP_REG_FIRST0
-- 
1.7.1

Thanks & Regards
Ajit



reg-reorg.patch
Description: reg-reorg.patch

[doc, 1/n] invoke.texi: name of gcc executable

2016-01-12 Thread Sandra Loosemore

I've checked in the first installment of my planned reorganization of 
the invoke.texi chapter.  Here I've deleted the section placed randomly 
in the middle of option descriptions that contained only a paragraph 
about the name of the gcc executable, and incorporated that information 
into the chapter introduction instead.  I did a little bit of editing of 
the text in the introduction as well, and rewrote the one reference to 
the deleted node so it makes sense without it.


-Sandra

2016-01-12  Sandra Loosemore 

	gcc/
	* doc/invoke.texi (Invoking GCC): Copy-edit.  Incorporate information
	about name of GCC executable.  Remove deleted node from menu.
	(Directory Options) <-B>: Remove cross-reference to deleted node.
	(Target Options): Delete section.
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 232279)
+++ gcc/doc/invoke.texi	(working copy)
@@ -72,8 +72,9 @@ assembly and linking.  The ``overall opt
 process at an intermediate stage.  For example, the @option{-c} option
 says not to run the linker.  Then the output consists of object files
 output by the assembler.
+@xref{Overall Options,,Options Controlling the Kind of Output}.
 
-Other options are passed on to one stage of processing.  Some options
+Other options are passed on to one or more stages of processing.  Some options
 control the preprocessor and others the compiler itself.  Yet other
 options control the assembler and linker; most of these are not
 documented here, since you rarely need to use any of them.
@@ -85,9 +86,18 @@ for C programs; when an option is only u
 for a particular option does not mention a source language, you can use
 that option with all supported languages.
 
-@cindex C++ compilation options
-@xref{Invoking G++,,Compiling C++ Programs}, for a summary of special
-options for compiling C++ programs.
+@cindex cross compiling
+@cindex specifying machine version
+@cindex specifying compiler version and target machine
+@cindex compiler version, specifying
+@cindex target machine, specifying
+The usual way to run GCC is to run the executable called @command{gcc}, or
+@command{@var{machine}-gcc} when cross-compiling, or
+@command{@var{machine}-gcc-@var{version}} to run a specific version of GCC.
+When you compile C++ programs, you should invoke GCC as @command{g++} 
+instead.  @xref{Invoking G++,,Compiling C++ Programs}, 
+for information about the differences in behavior between @command{gcc} 
+and @code{g++} when compiling C++ programs.
 
 @cindex grouping options
 @cindex options, grouping
@@ -137,7 +147,6 @@ only one of these two forms, whichever o
 * Directory Options::   Where to find header files and libraries.
 Where to find the compiler executable files.
 * Spec Files::  How to pass switches to sub-processes.
-* Target Options::  Running a cross-compiler, or an old version of GCC.
 * Submodel Options::Specifying minor hardware or convention variations,
 such as 68010 vs 68020.
 * Code Gen Options::Specifying conventions for function calls, data layout
@@ -11733,7 +11742,8 @@ include files, and data files of the com
 The compiler driver program runs one or more of the subprograms
 @command{cpp}, @command{cc1}, @command{as} and @command{ld}.  It tries
 @var{prefix} as a prefix for each program it tries to run, both with and
-without @samp{@var{machine}/@var{version}/} (@pxref{Target Options}).
+without @samp{@var{machine}/@var{version}/} for the corresponding target
+machine and compiler version.
 
 For each subprogram to be run, the compiler driver first tries the
 @option{-B} prefix, if any.  If that name is not found, or if @option{-B}
@@ -12409,20 +12419,6 @@ proper position among the other output f
 
 @c man begin OPTIONS
 
-@node Target Options
-@section Specifying Target Machine and Compiler Version
-@cindex target options
-@cindex cross compiling
-@cindex specifying machine version
-@cindex specifying compiler version and target machine
-@cindex compiler version, specifying
-@cindex target machine, specifying
-
-The usual way to run GCC is to run the executable called @command{gcc}, or
-@command{@var{machine}-gcc} when cross-compiling, or
-@command{@var{machine}-gcc-@var{version}} to run a version other than the
-one that was installed last.
-
 @node Submodel Options
 @section Hardware Models and Configurations
 @cindex submodel options

Re: [patch] Avoid an unwanted decl re-map in copy_gimple_seq_and_replace_locals

2016-01-12 Thread Martin Jambor

On Tue, Jan 12, 2016 at 06:36:21PM +0100, Martin Jambor wrote:
> > remap_decl (old_var, id);
> > }
> > - phase 2 - do the full remap_decls, but during that arrange that
> >   remap_decl for non-zero id->remapping_type_depth if (!n) just returns
> >   decl
> 
> ...they would not be copied here because remap_decl would not be
> duplicating stuff.  So I'd end up with an original local decl when I
> actually need a duplicate.
> 

ugh, I'm trying to be too fast and obviously forgot about the
id->remapping_type_depth part of the proposed condition.

Still, when could relying solely on id->remapping_type_depth fail?

Sorry for the noise,

Martin

Re: [PATCH] c++/58109 - alignas() fails to compile with constant expression

2016-01-12 Thread Martin Sebor


On 01/11/2016 10:20 PM, Jason Merrill wrote:

On 12/22/2015 09:32 PM, Martin Sebor wrote:

+  if (is_attribute_p ("aligned", name)
+  || is_attribute_p ("vector_size", name))
+{
+  /* Attribute argument may be a dependent indentifier.  */
+  if (tree t = args ? TREE_VALUE (args) : NULL_TREE)
+if (value_dependent_expression_p (t)
+|| type_dependent_expression_p (t))
+  return true;
+}


Instead of this, is_late_template_attribute should be fixed to check
attribute_takes_identifier_p.


attribute_takes_identifier_p() returns false for the aligned
attribute and for vector_size (it returns true only for
attributes cleanup, format, and mode, and none others).

Are you suggesting to also change attribute_takes_identifier_p
to return true for these attributes?  That would likely mean
changes to the C front end as well.)

Thanks
Martin

Re: [patch] Avoid an unwanted decl re-map in copy_gimple_seq_and_replace_locals

2016-01-12 Thread Jakub Jelinek

On Tue, Jan 12, 2016 at 06:51:31PM +0100, Martin Jambor wrote:
> On Tue, Jan 12, 2016 at 06:36:21PM +0100, Martin Jambor wrote:
> > >   remap_decl (old_var, id);
> > > }
> > > - phase 2 - do the full remap_decls, but during that arrange that
> > >   remap_decl for non-zero id->remapping_type_depth if (!n) just returns
> > >   decl
> > 
> > ...they would not be copied here because remap_decl would not be
> > duplicating stuff.  So I'd end up with an original local decl when I
> > actually need a duplicate.
> > 
> 
> ugh, I'm trying to be too fast and obviously forgot about the
> id->remapping_type_depth part of the proposed condition.
> 
> Still, when could relying solely on id->remapping_type_depth fail?

Well, those functions are used for numerous purposes, and you'd only
want to not remap decls if not already remapped if id->remapping_type_depth
when inside of the copy_gimple_seq_and_replace_locals
path (and only for the remap_decls in there), so you need IMHO some
flag to distinguish that.

And the reason for the above suggested 2 phases, where the first phase just
calls remap_decl and nothing else on the non-VLAs is to make sure that
if a VLA type or DECL_VALUE_EXPR uses (usually scalar) vars declared in the
same bind block, then those are processed first.

Jakub

Re: [PATCH] PR target/69225: Set FLT_EVAL_METHOD to 2 only if 387 FPU is used

2016-01-12 Thread Joseph Myers

On Tue, 12 Jan 2016, Uros Bizjak wrote:

> I think that following definition describes -msse -mfpmath=sse
> situation in the most elegant way. We can just declare that the
> precision is not known in this case:
> 
> #define TARGET_FLT_EVAL_METHOD\
>   (TARGET_MIX_SSE_I387 ? -1\
>: (TARGET_80387 && !TARGET_SSE_MATH) ? 2 : TARGET_SSE2 ? 0 : -1)
> 
> Using this patch, the compiler will still generate SSE instructions
> for the above test.
> 
> Joseph, what is your opinion on this approach?

I think this is reasonable.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices

2016-01-12 Thread Tom de Vries


On 12/01/16 14:04, Richard Biener wrote:

On Tue, 12 Jan 2016, Tom de Vries wrote:


On 12/01/16 12:22, Richard Biener wrote:

Doesnt' the same issue apply to


unsigned int *p;

static void __attribute__((noinline, noclone))
foo (void)
{
   unsigned int z;

   for (z = 0; z < N; ++z)
 ++(*p);
}

thus when we have a MEM_REF[p_1]?  SCEV will not analyze
its evolution to a POLYNOMIAL_CHREC and thus access_fns will
be NULL again.



I didn't manage to trigger this scenario, though I could probably make it
happen by modifying ftree-loop-im to work in one case (the load of the value
of p) but not the other (the *p load and store).


I think avoiding a NULL access_fns is ok but it should be done
unconditionally, not only for the DECL_P case.


Ok, I'll retest and commit this patch.


Please add a comment as well.


Patch updated with comment.

During testing however, I ran into two testsuite regressions:

1.

-PASS: gfortran.dg/graphite/pr39516.f   -O  (test for excess errors)
+FAIL: gfortran.dg/graphite/pr39516.f   -O  (internal compiler error)
+FAIL: gfortran.dg/graphite/pr39516.f   -O  (test for excess errors)

AFAIU, this is a duplicate of PR68976.

Should I wait with committing the patch until PR68976 is fixed?

2.

-XFAIL: gcc.dg/graphite/scop-pr66980.c scan-tree-dump-times graphite 
"number of SCoPs: 1" 1
+XPASS: gcc.dg/graphite/scop-pr66980.c scan-tree-dump-times graphite 
"number of SCoPs: 1" 1


AFAIU, this is not a real regression, but the testcase needs to be 
updated. I'm not sure how. Sebastian, perhaps you have an idea there?


Thanks,
- Tom

>From 24dfdb5a8a536203ad159bcbeaee6931be032f32 Mon Sep 17 00:00:00 2001
From: Tom de Vries 
Date: Tue, 12 Jan 2016 01:45:11 +0100
Subject: [PATCH] Don't return NULL access_fns in dr_analyze_indices

2016-01-12  Tom de Vries  

	* tree-data-ref.c (dr_analyze_indices): Don't return NULL access_fns.

	* gcc.dg/autopar/pr69110.c: New test.

	* testsuite/libgomp.c/pr69110.c: New test.
---
 gcc/testsuite/gcc.dg/autopar/pr69110.c | 19 +++
 gcc/tree-data-ref.c|  4 
 libgomp/testsuite/libgomp.c/pr69110.c  | 26 ++
 3 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/autopar/pr69110.c
 create mode 100644 libgomp/testsuite/libgomp.c/pr69110.c

diff --git a/gcc/testsuite/gcc.dg/autopar/pr69110.c b/gcc/testsuite/gcc.dg/autopar/pr69110.c
new file mode 100644
index 000..e236015
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/autopar/pr69110.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -ftree-parallelize-loops=2 -fno-tree-loop-im -fdump-tree-parloops-details" } */
+
+#define N 1000
+
+unsigned int i = 0;
+
+void
+foo (void)
+{
+  unsigned int z;
+  for (z = 0; z < N; ++z)
+++i;
+}
+
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 0 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "FAILED: data dependencies exist across iterations" 1 "parloops" } } */
+
+
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index a40f40d..7ff5db7 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1023,6 +1023,10 @@ dr_analyze_indices (struct data_reference *dr, loop_p nest, loop_p loop)
 		build_int_cst (reference_alias_ptr_type (ref), 0));
 }
 
+  /* Ensure that DR_NUM_DIMENSIONS (dr) != 0.  */
+  if (access_fns == vNULL)
+access_fns.safe_push (integer_zero_node);
+
   DR_BASE_OBJECT (dr) = ref;
   DR_ACCESS_FNS (dr) = access_fns;
 }
diff --git a/libgomp/testsuite/libgomp.c/pr69110.c b/libgomp/testsuite/libgomp.c/pr69110.c
new file mode 100644
index 000..0d9e5ca
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr69110.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=2 -O1 -fno-tree-loop-im" } */
+
+#define N 1000
+
+unsigned int i = 0;
+
+static void __attribute__((noinline, noclone))
+foo (void)
+{
+  unsigned int z;
+  for (z = 0; z < N; ++z)
+++i;
+}
+
+extern void abort (void);
+
+int
+main (void)
+{
+  foo ();
+  if (i != N)
+abort ();
+
+  return 0;
+}
-- 
1.9.1

Re: [PATCH], PowerPC IEEE 128-bit fp, #11-rev3 (enable libgcc conversions)

2016-01-12 Thread Michael Meissner

On Tue, Jan 12, 2016 at 12:18:55AM +, Joseph Myers wrote:
> On Mon, 11 Jan 2016, Michael Meissner wrote:
> 
> > I fixed the #ifdef to use __NO_FPRS__ (thanks for the heads up on that).  I
> > also believe I fixed the various formatting issues.  These two patches 
> > build on
> > a big endian power7 host and little endian power8 host with no regressions 
> > in
> > the testsuite (the gcc patch is included here, but it hasn't changed since 
> > the
> > previous version of this patch).  Are they ok to be checked in?
> 
> Are you sure you sent the right patch version?  I don't see those fixes in 
> this one.

You are right.  I did not update the patches from the changes I had made in the
branch.

[gcc]
2016-01-12  Michael Meissner  

* config/rs6000/rs6000-builtin.def (BU_FLOAT128_2): Add support
for pack/unpack functions for __ibm128.
(PACK_IF): Likewise.
(UNPACK_IF): Likewise.

* config/rs6000/rs6000.c (rs6000_builtin_mask_calculate): Add
support for __ibm128 pack/unpack functions.
(rs6000_invalid_builtin): Likewise.
(rs6000_init_builtins): Likewise.
(rs6000_opt_masks): Likewise.

* config/rs6000/rs6000.h (MASK_FLOAT128): Add short name.
(RS6000_BTM_FLOAT128): Add support for __ibm128 pack/unpack
functions
(RS6000_BTM_COMMON): Likewise.

* config/rs6000/rs6000.md (f128_vsx): New mode attribute.
(unpack): Use FMOVE128_FPR iterator instead of FMOVE128, to
disallow __builtin_{pack,unpack}_longdouble if long double is IEEE
128-bit floating point.  Add support for the double values to be
in Altivec registers for TF/IF packing and unpacking, but restrict
TD packing sub-fields to be FPR registers.  Don't allow overlapped
register support for packing.  Allow pack inputs to be memory
locations.  Don't build generator functions for unpack_dm
and unpack_nodm.
(unpack_dm): Likewise.
(unpack_nodm): Likewise.
(pack): Likewise.

* config/rs6000/rs6000-builtin.def (__builtin_pack_ibm128): Add
built-in functions to pack/unpack explicit __ibm128 values.
(__builtin_unpack_ibm128): Likewise.

* doc/extend.texi (PowerPC Built-in Functions): Document
__builtin_pack_ibm128 and __builtin_unpack_ibm128.

[libgcc]
2016-01-12  Michael Meissner  
Steven Munroe 
Tulio Magno Quites Machado Filho 

* config/rs6000/sfp-exceptions.c: New file to provide exception
support for IEEE 128-bit floating point.

* config/rs6000/float128-hw.c: New file for ISA 3.0 IEEE 128-bit
floating point hardware support.

* config/rs6000/floattikf.c: New files for IEEE 128-bit floating
point conversions.
* config/rs6000/fixunskfti.c: Likewise.
* config/rs6000/fixkfti.c: Likewise.
* config/rs6000/floatuntikf.c: Likewise.
* config/rs6000/extendkftf2-sw.c: Likewise.
* config/rs6000/trunctfkf2-sw.c: Likewise.

* config/rs6000/float128-ifunc.c: New file to pick either IEEE
128-bit floating point software emulation or use ISA 3.0 hardware
support if it is available.

* config/rs6000/quad-float128.h: New file to support IEEE 128-bit
floating point.

* config/rs6000/t-float128: New Makefile fragments to enable
building __float128 emulation support.
* config/rs6000/t-float128-hw: Likewise.

* config/rs6000/float128-sed: New file to convert TF names to KF
names for PowerPC IEEE 128-bit floating point support.

* config/rs6000/sfp-machine.h (_FP_W_TYPE_SIZE): Use 64-bit types
when building on 64-bit systems, or when VSX is enabled.
(_FP_W_TYPE): Likewise.
(_FP_WS_TYPE): Likewise.
(_FP_I_TYPE): Likewise.
(TItype): Define on 64-bit systems.
(UTItype): Likewise.
(TI_BITS): Likewise.
(_FP_MUL_MEAT_D): Add support for using 64-bit types.
(_FP_MUL_MEAT_Q): Likewise.
(_FP_DIV_MEAT_D): Likewise.
(_FP_DIV_MEAT_Q): Likewise.
(_FP_NANFRAC_D): Likewise.
(_FP_NANFRAC_Q): Likewise.
(ISA_BIT): Add exception support if we are being compiled on a
machine with hardware floating point support to build the IEEE
128-bit emulation functions.
(FP_EX_INVALID): Likewise.
(FP_EX_OVERFLOW): Likewise.
(FP_EX_UNDERFLOW): Likewise.
(FP_EX_DIVZERO): Likewise.
(FP_EX_INEXACT): Likewise.
(FP_EX_ALL): Likewise.
(__sfp_handle_exceptions): Likewise.
(FP_HANDLE_EXCEPTIONS): Likewise.
(FP_RND_NEAREST): Likewise.
(FP_RND_ZERO): Likewise.
(FP_RND_PINF): Likewise.
(FP_RND_MINF): Likewise.
(FP_RND_MASK): Likewise.
(_FP_DECL_EX): Likewise.
(FP_INIT_ROUNDMODE): Likewise.
(FP_ROUNDMODE): Likewise.

* configure.ac (powerp

Re: [hsa 2/10] Modifications to libgomp proper

2016-01-12 Thread Martin Jambor

Hi,

On Tue, Jan 12, 2016 at 02:38:15PM +0100, Jakub Jelinek wrote:
> On Tue, Jan 12, 2016 at 02:29:06PM +0100, Martin Jambor wrote:
> > GOMP_kernel_launch_attributes should not be there (it is a
> > reminiscence from before the device-specific target arguments) and
> > should be moved just to the HSA plugin.  I'll prepare a patch today.
> > 
> > While we do not have to share GOMP_hsa_kernel_dispatch, we actually do
> > use them in both the plugin and the compiler, where we only use it in
> > an offsetof, so that we only have the structure defined once.
> 
> But, even using it in offsetof might be wrong, the compiler could be a
> cross-compiler, and you'd use offsetof on the host, while you want it for
> the target, and that would be different.
> So, IMHO you need (unless you already have) built the structure as a tree
> type, lay it out, and then you can use at TYPE_SIZE_UNIT or
> DECL_FIELD_OFFSET and the like.
> 

I see. For now I have just put a FIXME there but have talked to Martin
about laying out the type properly.  This is what I have committed to
the branch.

Thanks,

Martin

2016-01-12  Martin Jambor  

include/
* gomp-constants.h (GOMP_kernel_launch_attributes): Removed.
(GOMP_hsa_kernel_dispatch): Likewise.

libgomp/
* plugin/plugin-hsa.c (GOMP_kernel_launch_attributes): Moved here.
(GOMP_hsa_kernel_dispatch): Likewise.

gcc/
* hsa-gen.c (GOMP_hsa_kernel_dispatch): Moved here.
---
 gcc/hsa-gen.c   | 35 +
 include/gomp-constants.h| 44 --
 libgomp/plugin/plugin-hsa.c | 47 +
 3 files changed, 82 insertions(+), 44 deletions(-)

diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index 1715b57..f633dfd 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -3747,6 +3747,41 @@ gen_set_num_threads (tree value, hsa_bb *hbb)
   hbb->append_insn (basic);
 }
 
+/* Collection of information needed for a dispatch of a kernel from a
+   kernel.  Keep in sync with libgomp's plugin-hsa.c.
+
+   FIXME: In order to support cross-compilations, we need to lay ot the type as
+   a tree and then use field_decl positions.
+ */
+
+struct GOMP_hsa_kernel_dispatch
+{
+  /* Pointer to a command queue associated with a kernel dispatch agent.  */
+  void *queue;
+  /* Pointer to reserved memory for OMP data struct copying.  */
+  void *omp_data_memory;
+  /* Pointer to a memory space used for kernel arguments passing.  */
+  void *kernarg_address;
+  /* Kernel object.  */
+  uint64_t object;
+  /* Synchronization signal used for dispatch synchronization.  */
+  uint64_t signal;
+  /* Private segment size.  */
+  uint32_t private_segment_size;
+  /* Group segment size.  */
+  uint32_t group_segment_size;
+  /* Number of children kernel dispatches.  */
+  uint64_t kernel_dispatch_count;
+  /* Number of threads.  */
+  uint32_t omp_num_threads;
+  /* Debug purpose argument.  */
+  uint64_t debug;
+  /* Levels-var ICV.  */
+  uint64_t omp_level;
+  /* Kernel dispatch structures created for children kernel dispatches.  */
+  struct GOMP_hsa_kernel_dispatch **children_dispatches;
+};
+
 /* Return an HSA register that will contain number of threads for
a future dispatched kernel.  Instructions are added to HBB.  */
 
diff --git a/include/gomp-constants.h b/include/gomp-constants.h
index 1dae474..a8e7723 100644
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -256,48 +256,4 @@ enum gomp_map_kind
 /* Identifiers of device-specific target arguments.  */
 #define GOMP_TARGET_ARG_HSA_KERNEL_ATTRIBUTES  (1 << 8)
 
-/* Structure describing the run-time and grid properties of an HSA kernel
-   lauch.  */
-
-struct GOMP_kernel_launch_attributes
-{
-  /* Number of dimensions the workload has.  Maximum number is 3.  */
-  uint32_t ndim;
-  /* Size of the grid in the three respective dimensions.  */
-  uint32_t gdims[3];
-  /* Size of work-groups in the respective dimensions.  */
-  uint32_t wdims[3];
-};
-
-/* Collection of information needed for a dispatch of a kernel from a
-   kernel.  */
-
-struct GOMP_hsa_kernel_dispatch
-{
-  /* Pointer to a command queue associated with a kernel dispatch agent.  */
-  void *queue;
-  /* Pointer to reserved memory for OMP data struct copying.  */
-  void *omp_data_memory;
-  /* Pointer to a memory space used for kernel arguments passing.  */
-  void *kernarg_address;
-  /* Kernel object.  */
-  uint64_t object;
-  /* Synchronization signal used for dispatch synchronization.  */
-  uint64_t signal;
-  /* Private segment size.  */
-  uint32_t private_segment_size;
-  /* Group segment size.  */
-  uint32_t group_segment_size;
-  /* Number of children kernel dispatches.  */
-  uint64_t kernel_dispatch_count;
-  /* Number of threads.  */
-  uint32_t omp_num_threads;
-  /* Debug purpose argument.  */
-  uint64_t debug;
-  /* Levels-var ICV.  */
-  uint64_t omp_level;
-  /* Kernel dispatch structures created for children kernel dispa

[gomp4] OpenACC documentation for libgomp.

2016-01-12 Thread James Norris


Hi,

Backported:

2016-01-12  James Norris  

* libgomp.texi: Updates for OpenACC.

from trunk.

Thanks,
Jim
Index: ChangeLog.gomp
===
--- ChangeLog.gomp	(revision 232292)
+++ ChangeLog.gomp	(working copy)
@@ -1,3 +1,9 @@
+2016-01-12  James Norris  
+
+	Backport from trunk:
+	2016-01-12  James Norris  
+	* libgomp.texi: Updates for OpenACC.
+
 2016-01-11  Thomas Schwinge  
 
 	* testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c: Remove
Index: libgomp.texi
===
--- libgomp.texi	(revision 232292)
+++ libgomp.texi	(working copy)
@@ -94,6 +94,14 @@
 @comment  better formatting.
 @comment
 @menu
+* Enabling OpenMP::  How to enable OpenMP for your
+ applications.
+* OpenMP Runtime Library Routines: Runtime Library Routines.
+ The OpenMP runtime application programming
+ interface.
+* OpenMP Environment Variables: Environment Variables.
+ Influencing OpenMP runtime behavior with
+ environment variables.
 * Enabling OpenACC:: How to enable OpenACC for your
  applications.
 * OpenACC Runtime Library Routines:: The OpenACC runtime application
@@ -104,14 +112,6 @@
  asynchronous operations.
 * OpenACC Library Interoperability:: OpenACC library interoperability with the
  NVIDIA CUBLAS library.
-* Enabling OpenMP::  How to enable OpenMP for your
- applications.
-* OpenMP Runtime Library Routines: Runtime Library Routines.
- The OpenMP runtime application programming
- interface.
-* OpenMP Environment Variables: Environment Variables.
- Influencing OpenMP runtime behavior with
- environment variables.
 * The libgomp ABI::  Notes on the external libgomp ABI.
 * Reporting Bugs::   How to report bugs in the GNU Offloading
  and Multi Processing Runtime Library.
@@ -126,643 +126,6 @@
 
 
 @c -
-@c Enabling OpenACC
-@c -
-
-@node Enabling OpenACC
-@chapter Enabling OpenACC
-
-To activate the OpenACC extensions for C/C++ and Fortran, the compile-time 
-flag @command{-fopenacc} must be specified.  This enables the OpenACC directive
-@code{#pragma acc} in C/C++ and @code{!$accp} directives in free form,
-@code{c$acc}, @code{*$acc} and @code{!$acc} directives in fixed form,
-@code{!$} conditional compilation sentinels in free form and @code{c$},
-@code{*$} and @code{!$} sentinels in fixed form, for Fortran.  The flag also
-arranges for automatic linking of the OpenACC runtime library 
-(@ref{OpenACC Runtime Library Routines}).
-
-A complete description of all OpenACC directives accepted may be found in 
-the @uref{http://www.openacc.org/, OpenMP Application Programming
-Interface} manual, version 2.0.
-
-Note that this is an experimental feature, incomplete, and subject to
-change in future versions of GCC.  See
-@uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
-
-
-
-@c -
-@c OpenACC Runtime Library Routines
-@c -
-
-@node OpenACC Runtime Library Routines
-@chapter OpenACC Runtime Library Routines
-
-The runtime routines described here are defined by section 3 of the OpenACC
-specifications in version 2.0.
-They have C linkage, and do not throw exceptions.
-Generally, they are available only for the host, with the exception of
-@code{acc_on_device}, which is available for both the host and the
-acceleration device.
-
-@menu
-* acc_get_num_devices:: Get number of devices for the given device type
-* acc_set_device_type::
-* acc_get_device_type::
-* acc_set_device_num::
-* acc_get_device_num::
-* acc_init::
-* acc_shutdown::
-* acc_on_device::   Whether executing on a particular device
-* acc_malloc::
-* acc_free::
-* acc_copyin::
-* acc_present_or_copyin::
-* acc_create::
-* acc_present_or_create::
-* acc_copyout::
-* acc_delete::
-* acc_update_device::
-* acc_update_self::
-* acc_map_data::
-* acc_unmap_data::
-* acc_deviceptr::
-* acc_hostptr::
-* acc_is_present::
-* acc_memcpy_to_device::
-* acc_memcpy_from_device::
-
-API routines for target platforms.
-
-* acc_get_current_cuda_device::
-* acc_get_current_cuda_context::
-* acc_get_cuda_stream::
-* acc_set_cuda_stream::
-@end menu
-
-
-
-@node acc_get_num_devices
-@s

Re: [PATCH] PR testsuite/69181: ensure expected multiline outputs is cleared per-test

2016-01-12 Thread David Malcolm

On Sat, 2016-01-09 at 03:07 +0100, Bernd Schmidt wrote:
> On 01/09/2016 01:51 AM, David Malcolm wrote:
> > The root cause here is that the logic to reset the list of expected
> > multiline outputs was being run from:
> >handle-multiline-outputs, called by
> >  prune.exp's prune_gcc_output
> > and none of that happens if the test is skipped by a target exclusion
> > in dg-do.
> 
> Thanks for tackling this.
> 
> > diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
> > index f9ec206..f778bca 100644
> > --- a/gcc/testsuite/lib/gcc-dg.exp
> > +++ b/gcc/testsuite/lib/gcc-dg.exp
> > @@ -836,6 +836,7 @@ if { [info procs saved-dg-test] == [list] } {
> > global testname_with_flags
> > global set_target_env_var
> > global keep_saved_temps_suffixes
> > +   global multiline_expected_outputs
> >
> > if { [ catch { eval saved-dg-test $args } errmsg ] } {
> > set saved_info $errorInfo
> > @@ -871,6 +872,7 @@ if { [info procs saved-dg-test] == [list] } {
> > if [info exists testname_with_flags] {
> > unset testname_with_flags
> > }
> > +   set multiline_expected_outputs []
> >   }
> >   }
> 
> I looked at this code, and there are two near-identical blocks which 
> reset all these variables. You are modifying only one of them, leaving 
> the one inside the if { catch } thing unchanged - is this intentional?

I'm not particularly strong at Tcl, but am I right in thinking that
given that we have this:

if { [ catch { eval saved-dg-test $args } errmsg ] } {
(A) set and unset various things
error $errmsg $saved_info
}
   (B) set and unset the same various things as (A)

that (B) will always be reached, and that the duplicates in (A) are
redundant? (unless they affect "error")

I see that this pattern was introduced back in r67696 aka
91a385a522a94154f9e0cd940c5937177737af02:

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 39ccaf6..c660eca 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2003-06-09  Mark Mitchell  
+
+   * lib/gcc-dg.exp (dg-test): Clear additional_files and
+   additional_sources.
+
 2003-05-21  David Taylor  
 
* gcc.dg/Wpadded.c: New file.
diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index aade663..1feadc4 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -294,3 +294,31 @@ proc dg-require-gc-sections { args } {
return
 }
 }
+
+# We need to make sure that additional_files and additional_sources
+# are both cleared out after every test.  It is not enough to clear
+# them out *before* the next test run because gcc-target-compile gets
+# run directly from some .exp files (outside of any test).  (Those
+# uses should eventually be eliminated.) 
+
+# Because the DG framework doesn't provide a hook that is run at the
+# end of a test, we must replace dg-test with a wrapper.
+
+if { [info procs saved-dg-test] == [list] } {
+rename dg-test saved-dg-test
+
+proc dg-test { args } {
+   global additional_files
+   global additional_sources
+   global errorInfo
+
+   if { [ catch { eval saved-dg-test $args } errmsg ] } {
+   set saved_info $errorInfo
+   set additional_files ""
+   set additional_sources ""
+   error $errmsg $saved_info
+   }
+   set additional_files ""
+   set additional_sources ""
+}
+}

and this pattern has been extended over the years.

I *could* add the
  set multiline_expected_outputs []
to the block guarded by the if {}, but it feels like cargo-culting to
me.  Am I missing something?


> Otherwise this looks reasonable IMO.
> 
> 
> Bernd

Re: [patch] libstdc++/68276 and libstdc++68995 qualification in

2016-01-12 Thread Jonathan Wakely


On 21/12/15 13:02 +, Jonathan Wakely wrote:

Two patches to add missing std:: qualification to prevent ADL
problems. Both are regressions, 68276 only on trunk, but 68995 has
been broken since 4.8.0 (but only affects people mixing TR1 with
C++11, and I was already rude about them in Bugzilla so won't do it
again here ;-)


For the branches I added a better test for 68995, this extends the
test on trunk to match what's on the branches now.

Tested x86_64-linux, committed to trunk.
commit 574125855cb79becc19ed564040a0ca1b23ebabc
Author: Jonathan Wakely 
Date:   Tue Jan 12 19:19:02 2016 +

Extend std::function test for PR 68995

	* testsuite/20_util/function/68995.cc: Test reference_wrapper cases.

diff --git a/libstdc++-v3/testsuite/20_util/function/68995.cc b/libstdc++-v3/testsuite/20_util/function/68995.cc
index 78712d6..5690657 100644
--- a/libstdc++-v3/testsuite/20_util/function/68995.cc
+++ b/libstdc++-v3/testsuite/20_util/function/68995.cc
@@ -25,3 +25,8 @@
 std::tr1::shared_ptr test() { return {}; }
 
 std::function()> func = test;
+std::function()> funcr = std::ref(test);
+
+void test2(std::tr1::shared_ptr) { }
+
+std::function)> func2 = std::ref(test2);

Re: [PATCH] Fix memory alignment on AVX512VL masked floating point stores (PR target/69198)

2016-01-12 Thread H.J. Lu

On Tue, Jan 12, 2016 at 5:45 AM, Uros Bizjak  wrote:
> On Tue, Jan 12, 2016 at 2:42 PM, Jakub Jelinek  wrote:
>> On Tue, Jan 12, 2016 at 05:39:29AM -0800, H.J. Lu wrote:
>>> GCC 5 has the same issue.  This patch should be backported to GCC 5
>>> with
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00528.html
>>>
>>> which supersedes:
>>>
>>> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=231269
>>>
>>> OK to backport Jakub's and my patch for GCC 5?
>>
>> I think I'd prefer just r231269 and my patch for the branch, to make the
>> changes as small as possible, leave the cleanup on the trunk only.
>> But, I'm not x86_64 maintainer, so I'll leave that decision to Uros/Kirill.
>
> I agree with Jakub.
>
> Those two patches are OK for backport.
>

This is what I checked in.

Thanks.


-- 
H.J.
From e6a6fd4b2fb4bb239fed4de6f9374f9b102e9c0f Mon Sep 17 00:00:00 2001
From: ienkovich 
Date: Fri, 4 Dec 2015 14:18:58 +
Subject: [PATCH] Fix alignment check in AVX-512 masked store

	Backport from mainline
	2016-01-12  Jakub Jelinek  

	PR target/69198
	* config/i386/i386.c (ix86_expand_special_args_builtin): Ensure
	aligned_mem is properly set for AVX512-VL floating point masked
	stores.

	2015-12-04  Ilya Enkovich  

	* config/i386/sse.md (_store_mask): Fix
	operand checked for alignment.
---
 gcc/ChangeLog  | 15 +++
 gcc/config/i386/i386.c |  8 
 gcc/config/i386/sse.md |  2 +-
 3 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d7bc6a2..be24722 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,18 @@
+2016-01-12  H.J. Lu  
+
+	Backport from mainline
+	2016-01-12  Jakub Jelinek  
+
+	PR target/69198
+	* config/i386/i386.c (ix86_expand_special_args_builtin): Ensure
+	aligned_mem is properly set for AVX512-VL floating point masked
+	stores.
+
+	2015-12-04  Ilya Enkovich  
+
+	* config/i386/sse.md (_store_mask): Fix
+	operand checked for alignment.
+
 2016-01-12  James Greenhalgh  
 
 	Backport from mainline r222186.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3547ba6..b0c301b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -38259,7 +38259,11 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
   memory = 0;
   break;
 case VOID_FTYPE_PV8DF_V8DF_QI:
+case VOID_FTYPE_PV4DF_V4DF_QI:
+case VOID_FTYPE_PV2DF_V2DF_QI:
 case VOID_FTYPE_PV16SF_V16SF_HI:
+case VOID_FTYPE_PV8SF_V8SF_QI:
+case VOID_FTYPE_PV4SF_V4SF_QI:
 case VOID_FTYPE_PV8DI_V8DI_QI:
 case VOID_FTYPE_PV4DI_V4DI_QI:
 case VOID_FTYPE_PV2DI_V2DI_QI:
@@ -38319,10 +38323,6 @@ ix86_expand_special_args_builtin (const struct builtin_description *d,
 case VOID_FTYPE_PV16QI_V16QI_HI:
 case VOID_FTYPE_PV32QI_V32QI_SI:
 case VOID_FTYPE_PV64QI_V64QI_DI:
-case VOID_FTYPE_PV4DF_V4DF_QI:
-case VOID_FTYPE_PV2DF_V2DF_QI:
-case VOID_FTYPE_PV8SF_V8SF_QI:
-case VOID_FTYPE_PV4SF_V4SF_QI:
   nargs = 2;
   klass = store;
   /* Reserve memory operand for target.  */
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9235753..15d7188 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1022,7 +1022,7 @@
   sse_suffix = "";
 }
 
-  if (misaligned_operand (operands[1], mode))
+  if (misaligned_operand (operands[0], mode))
 align = "u";
   else
 align = "a";
-- 
2.5.0

Re: [Patch, libstdc++/68877] Reimplement __is_[nothrow_]swappable

2016-01-12 Thread Daniel Krügler

Ping - this is a tentative reminder for this patch proposal.

2015-12-23 22:15 GMT+01:00 Daniel Krügler :
> This is a second try for a patch for libstdc++ bug 68877. See below
> for responses.
>
> 2015-12-22 22:42 GMT+01:00 Jonathan Wakely :
>> On 21/12/15 12:45 +0100, Daniel Krügler wrote:
>>>
>>> 2015-12-14 21:48 GMT+01:00 Daniel Krügler :

 This is a reimplementation of __is_swappable and
 __is_nothrow_swappable according to

 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4511.html

 and also adds a missing usage of __is_nothrow_swappable in the swap
 overload for arrays. Strictly speaking the latter change differs from
 the Standard specification which requires the expression
 noexcept(swap(*a, *b)) to be used. On the other hand the Standard is
 broken in this regard, as pointed out by

 http://cplusplus.github.io/LWG/lwg-active.html#2554
>>
>> The patch doesn't apply cleanly because it repeats some of the new
>> files either twice or three times (and also has some trailing
>> whitespace that shouldn't be there).
>
> I can confirm this, albeit I don't understand why this happens. I'm
> using TortoiseSVN and when trying to create a patch file it creates
> double entries for new directories. I have now explicitly removed the
> added directories from the patch, I hope that your patch experience is
> now better.
>
>> After fixing the patch to only
>> create new files once it applies, but then I get some FAILs:
>>
>> FAIL: 20_util/is_nothrow_swappable/value.cc (test for excess errors)
>> FAIL: 20_util/is_swappable/value.cc (test for excess errors)
>>
>> I don't have time to analyse these today, so I'll wait until you're
>> able to do so.
>
> I'm sorry for these errors. I could now find a way to reproduce the
> tests and found that they were partially due to an incomplete commit
> and partially because of sleepiness on my side. I hopefully fixed
> these blatant errors and took the chance to increase the test cases
> even further.
>
> Thanks again,
>
> - Daniel



-- 


SavedURI :Show URLShow URLSavedURI :
SavedURI :Hide URLHide URLSavedURI :
https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154https://mail.google.com/_/scs/mail-static/_/js/k=gmail.main.de.LEt2fN4ilLE.O/m=m_i,t,it/am=OCMOBiHj9kJxhnelj6j997_NLil29vVAOBGeBBRgJwD-m_0_8B_AD-qOEw/rt=h/d=1/rs=AItRSTODy9wv1JKZMABIG3Ak8ViC4kuOWA?random=1395770800154

Re: [trans-mem, aa64, arm, ppc, s390] Fixing PR68964

2016-01-12 Thread David Edelsohn

On Tue, Jan 12, 2016 at 11:53 AM, Richard Henderson  wrote:
> The problem in this PR is that we never got around to flushing out the vector
> support for transactions for anything but x86.  My goal here is to make this 
> as
> generic as possible, so that it should Just Work with existing vector support
> in the backend.
>
> In addition, if I encounter other unexpected register types, I will now copy
> them to memory and use memcpy, rather than crash.
>
> The one piece of this that requires a tiny bit of extra work is enabling the
> vector entry points in libitm.
>
> For x86, we make sure to build the files with SSE or AVX support enabled.  For
> s390x, I do the same thing, enabling z13 support.  I suppose we might need to
> check for binutils support, but I'd rather do this only if necessary.
>
> For arm I'm less sure what to do, since I seem to recall that use of Neon sets
> a bit in the ELF header.  Which presumably means that the binary could no
> longer be run without neon, even though the entry points wouldn't be used.
>
> For powerpc, I don't know how to select Altivec if VSX isn't already enabled,
> or indeed if that's the best thing to do.

VSX is an extension of Altivec (VMX) -- VSX always includes Altivec.
If VSX is enable, Altivec will be enabled and available.

Thanks, David

Re: [Patch, libstdc++/68877] Reimplement __is_[nothrow_]swappable

2016-01-12 Thread Jonathan Wakely


On 23/12/15 22:15 +0100, Daniel Krügler wrote:


   PR libstdc++/68877
   * include/std/type_traits: Following N4511, reimplement __is_swappable and
   __is_nothrow_swappable. Move __is_swappable to namespace std, adjust
   callers. Use __is_nothrow_swappable in swap.
   * include/bits/move.h: Use __is_nothrow_swappable in swap.
   * testsuite/20_util/is_nothrow_swappable/value.cc: Extend; remove
   __is_swappable related tests.
   * testsuite/20_util/is_swappable/value.cc: New.
   * testsuite/20_util/is_swappable/requirements/explicit_instantiation.cc:
   New.
   * testsuite/20_util/is_swappable/requirements/typedefs.cc: New.
   * testsuite/25_algorithms/swap/68877.cc: New.


Committed to trunk now, thanks!

1 2 >

1 - 100 of 122 matches

Mail list logo