date:20130906

[PATCH, PR 57748] Check for out of bounds access

2013-09-06 Thread Bernd Edlinger

Hi,

this patch fixes the ICE and wrong code issues from PR57748. It contains two 
test cases.
One for the ICE and one for the wrong code. Both test cases fail for all gcc 
versions,
and are fixed with this patch.

Bootstrapped and tested on x86_64-linux without any problems.

OK for trunk and the 4.8 / 4.7 branch?

Regards
Bernd Edlinger2013-09-06  Bernd Edlinger  

PR middle-end/57748
* expr.c (expand_assignment): Check for out of bounds
access to structures.

testsuite:

* gcc.dg/torture/pr57748-1.c: New test.
* gcc.dg/torture/pr57748-2.c: New test.



patch-pr57748.diff
Description: Binary data

Re: Fix PR middle-end/57370

2013-09-06 Thread Richard Biener

On Thu, Sep 5, 2013 at 9:23 PM, Easwaran Raman  wrote:
> On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener
>  wrote:
>> On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman  wrote:
>>> I have a new patch that supersedes this. The new patch also fixes PR
>>> tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
>>> no test regression on x86_64/linux. Ok for trunk?
>>>
>>> 2013-07-31  Easwaran Raman  
>>>
>>> PR middle-end/57370
>>> * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
>>> of debug statements that cause inconsistent IR.
>>
>> Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
>> at all.  The other hunks are ok, but we need to work harder to preserve
>> debug stmts - simply removing all is not going to fly.
>>
>> Richard.
>
> I looked into the problem related to the debug stmts in this failing
> test case in detail. Initially, the code sequence looks
>
>
> s$n_13 = MEM[(struct S *)&s];
> # DEBUG s$n => s$n_13
> _2 = (double) s$n_13;
> _4 = _2 * a_3(D);
> _6 = (double) i_5(D);
> _7 = _4 * _6;
> _9 = (double) j_8(D);
> _10 = _7 * _9;
> bar (_10);
> # DEBUG D#2 => (double) k_12(D)
> # DEBUG D#1 => _7 * D#2
> # DEBUG t$n => (int) D#1
>
> After reassociation
>
> s$n_13 = MEM[(struct S *)&s];
> # DEBUG s$n => s$n_13
> _2 = (double) s$n_13;
> _6 = (double) i_5(D);
> # DEBUG D#3 => _4 * _6
> _9 = (double) j_8(D);
> _4 = _9 * _2;
> _7 = _4 * a_3(D);
> _10 = _7 * _6;
> bar (_10);
> # DEBUG D#2 => (double) k_12(D)
> # DEBUG D#1 => D#3 * D#2
> # DEBUG t$n => (int) D#1
>
> In the above, # DEBUG D#3 => _4 * _6  appears above the statement that
> defines _4. But even if I move the def of D#3 below the statement
> defining _4, it is not sufficient. Before reassociation, t$n refers to
> the expression (s$n_13 * a_3(D) * i_5(D) * k_12(D)), but after
> reassociation it would refer to ( j_8(D) * s$n_13 *  i_5(D) *
> k_12(D)). It seems the correct fix is to discard the debug temps whose
> RHS contains a variable that is involved in reassociation and then
> reconstruct it. Is that the expected fix here?

The value of the DEBUG expression changes because the value
that _4 computes changes - that is a no-no that may not happen.
We cannot re-use _4 (without previously releasing it and allocating
it newly) with a new value.  releasing _4 should have fixed up the
debug stmts.

So - can you verify whether we are indeed just replacing the
RHS of _4 = _2 * a_3(D) to _9 * _2 without changing the SSA name of
that expression?

Another reason why re-using the same LHS for another value is
wrong is that annotated information like points-to sets, alignment
and soon value-range information will be wrong.

Thanks,
Richard.

> - Easwaran
>
>>>
>>> testsuite/ChangeLog:
>>> 2013-07-31  Easwaran Raman  
>>>
>>> PR middle-end/57370
>>> PR tree-optimization/57393
>>> PR tree-optimization/58011
>>> * gfortran.dg/reassoc_12.f90: New testcase.
>>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
>>>
>>>
>>> On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman  wrote:
 Ping.

 On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman  wrote:
> A newly generated statement in build_and_add_sum  function of
> tree-ssa-reassoc.c has to be assigned the UID of its adjacent
> statement. In one instance, it was assigned the wrong uid (of an
> earlier phi statement) which messed up the IR and caused the test
> program to hang. Bootstraps and no test regressions on x86_64/linux.
> Ok for trunk?
>
> Thanks,
> Easwaran
>
>
> 2013-06-27  Easwaran Raman  
>
> PR middle-end/57370
> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a 
> phi
> node for a non-phi gimple statement.
>
> testsuite/ChangeLog:
> 2013-06-27  Easwaran Raman  
>
> PR middle-end/57370
> * gfortran.dg/reassoc_12.f90: New testcase.
>
>
> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
> ===
> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
> @@ -0,0 +1,74 @@
> +! { dg-do compile }
> +! { dg-options "-O2 -ffast-math" }
> +! PR middle-end/57370
> +
> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, &
> +   grad_deriv,npoints, sx)
> +IMPLICIT REAL*8 (t)
> +INTEGER, PARAMETER :: dp=8
> +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, &
> +   e_ndrho_ndrho_rho
> +  DO ii=1,npoints
> +  IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN
> +t1425 = t233 * t557
> +t1429 = beta * t225
> +t1622 = t327 * t1621
> +t1626 = t327 * t1625
>

[RS6000] powerpc64 -mcmodel=medium large symbol offsets

2013-09-06 Thread Alan Modra

The following testcase taken from the linux kernel is miscompiled on
powerpc64-linux.

/* -m64 -mcmodel=medium -O -S -fno-section-anchors */
static int x;

unsigned long
foo (void)
{
  return ((unsigned long) &x) - 0xc000;
}

generates
addis 3,2,x+4611686018427387904@toc@ha
addi 3,3,x+4611686018427387904@toc@l
blr

losing the top 32 bits of the offset.  Sadly, the assembler and linker
do not complain, which is a hole in the ABI.  (@ha and _HA relocs as
per the ABI won't complain about overflow since they might be used in
a @highesta, @highera sequence loading a 64-bit value.)

This patch stops combine merging large offsets into a symbol addend
by copying code from reg_or_add_cint_operand to a new predicate,
add_cint_operand, and using that to restrict the range of offsets.
Bootstrapped and regression tested powerpc64-linux.  OK to apply?

* config/rs6000/predicates.md (add_cint_operand): New.
* config/rs6000/rs6000.md (largetoc_high_plus): Restrict offset
using add_cint_operand.
(largetoc_high_plus_aix): Likewise.

Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 202264)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -376,6 +376,12 @@
   (ior (match_code "const_int")
(match_operand 0 "gpc_reg_operand")))
 
+;; Return 1 if op is a constant integer valid for addition with addis, addi.
+(define_predicate "add_cint_operand"
+  (and (match_code "const_int")
+   (match_test "(unsigned HOST_WIDE_INT) (INTVAL (op) + 0x80008000)
+   < (unsigned HOST_WIDE_INT) 0x1ll")))
+
 ;; Return 1 if op is a constant integer valid for addition
 ;; or non-special register.
 (define_predicate "reg_or_add_cint_operand"
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 202264)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -12207,7 +12209,7 @@
(unspec [(match_operand:DI 1 "" "")
 (match_operand:DI 2 "gpc_reg_operand" "b")]
UNSPEC_TOCREL)
-   (match_operand 3 "const_int_operand" "n"]
+   (match_operand 3 "add_cint_operand" "n"]
"TARGET_ELF && TARGET_CMODEL != CMODEL_SMALL"
"addis %0,%2,%1+%3@toc@ha")
 
@@ -12218,7 +12220,7 @@
(unspec [(match_operand:P 1 "" "")
 (match_operand:P 2 "gpc_reg_operand" "b")]
UNSPEC_TOCREL)
-   (match_operand 3 "const_int_operand" "n"]
+   (match_operand 3 "add_cint_operand" "n"]
"TARGET_XCOFF && TARGET_CMODEL != CMODEL_SMALL"
"addis %0,%1+%3@u(%2)")
 

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH, PR 57748] Check for out of bounds access

2013-09-06 Thread Richard Biener

On Fri, Sep 6, 2013 at 9:03 AM, Bernd Edlinger
 wrote:
> Hi,
>
> this patch fixes the ICE and wrong code issues from PR57748. It contains two 
> test cases.
> One for the ICE and one for the wrong code. Both test cases fail for all gcc 
> versions,
> and are fixed with this patch.
>
> Bootstrapped and tested on x86_64-linux without any problems.
>
> OK for trunk and the 4.8 / 4.7 branch?

This doesn't look like the correct test.  The movmisalign optab query
uses the wrong mode.

Richard.

> Regards
> Bernd Edlinger

Re: Silence gcc.dg/autopar/pr49960.c

2013-09-06 Thread Jan Hubicka

> > +2013-09-05  Jan Hubicka  
> > +
> > +   * gcc.dg/autopar/pr49960.c: Disable partial inlining
> > 2013-09-05  Richard Biener  
> 
> Please add a blank line between entries?

The actual commmit has the extra line.  Not sure why the patch doesn't.
Sorry for that.

Honza

Re: RFC - Next refactoring steps

2013-09-06 Thread Mike Stump

On Sep 5, 2013, at 11:54 PM, Richard Biener  wrote:
> Most of the GCC headerfiles do not include all their required headers but
> rely on .c files doing that (in the appropriate order).  I somehow like
> that though I cannot explain why ;)

Very old school.  I can explain why I don't like it, but I'll resist.  The 
universal standard is for each header to include all it needs and for the 
ordering of includes not to matter any.  Deviations from this are the exception 
and should virtually never should be the case.

> Also grepping for #includes I see that this doesn't seem to be true anymore.

Yeah, everyone else hates the old school style with a passion, it was never a 
good idea.  :-)  When they come across it, people have a tendency to fix the 
headers as broken and over time, all the dependencies eventually get added.

[Patch AArch64] Fix register constraints for lane intrinsics.

2013-09-06 Thread James Greenhalgh


Hi,

Most of the vector-by-element instructions in AArch64 have the restriction
that, if the vector they are taking an element from has type "h"
then it must be in a register from the lower half of the vector register
set (i.e. v0-v15). While we have imposed that restriction in places, we
have not been consistent.

Fix that.

Tested with aarch64.exp with no regressions.

OK for trunk?

Thanks,
James

---
gcc/

2013-09-06  James Greenhalgh  

* config/aarch64/aarch64-simd.md
(aarch64_sqdmll_n_internal): Use
 iterator to ensure correct register choice.
(aarch64_sqdmll2_n_internal): Likewise.
(aarch64_sqdmull_n): Likewise.
(aarch64_sqdmull2_n_internal): Likewise.
* config/aarch64/arm_neon.h
(vml_lane_16): Use 'x' constraint for element vector.
(vml_n_16): Likewise.
(vmll_high_lane_16): Likewise.
(vmll_high_n_16): Likewise.
(vmll_lane_16): Likewise.
(vmll_n_16): Likewise.
(vmul_lane_16): Likewise.
(vmul_n_16): Likewise.
(vmull_lane_16): Likewise.
(vmull_n_16): Likewise.
(vmull_high_lane_16): Likewise.
(vmull_high_n_16): Likewise.
(vqrdmulh_n_s16): Likewise.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c085fb9c49958c5f402a28c0b39fe45ec1aadbc7..5161e48dfcea91910b9dc2ee68219c3ede55f4aa 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2797,7 +2797,7 @@ (define_insn "aarch64_sqdml
 		  (match_operand:VD_HSI 2 "register_operand" "w"))
 		(sign_extend:
 		  (vec_duplicate:VD_HSI
-		(match_operand: 3 "register_operand" "w"
+		(match_operand: 3 "register_operand" ""
 	  (const_int 1]
   "TARGET_SIMD"
   "sqdmll\\t%0, %2, %3.[0]"
@@ -2955,7 +2955,7 @@ (define_insn "aarch64_sqdml
   (match_operand:VQ_HSI 4 "vect_par_cnst_hi_half" "")))
 	  (sign_extend:
 (vec_duplicate:
-		  (match_operand: 3 "register_operand" "w"
+		  (match_operand: 3 "register_operand" ""
 	(const_int 1]
   "TARGET_SIMD"
   "sqdmll2\\t%0, %2, %3.[0]"
@@ -3083,7 +3083,7 @@ (define_insn "aarch64_sqdmull_n"
 		 (match_operand:VD_HSI 1 "register_operand" "w"))
 	   (sign_extend:
  (vec_duplicate:VD_HSI
-   (match_operand: 2 "register_operand" "w")))
+   (match_operand: 2 "register_operand" "")))
 	   )
 	 (const_int 1)))]
   "TARGET_SIMD"
@@ -3193,7 +3193,7 @@ (define_insn "aarch64_sqdmull2_n_i
(match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" "")))
 	   (sign_extend:
  (vec_duplicate:
-   (match_operand: 2 "register_operand" "w")))
+   (match_operand: 2 "register_operand" "")))
 	   )
 	 (const_int 1)))]
   "TARGET_SIMD"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 29d1378..e20d34e 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -7146,7 +7146,7 @@ vld1q_dup_u64 (const uint64_t * a)
int16x4_t result;\
__asm__ ("mla %0.4h, %2.4h, %3.h[%4]"\
 : "=w"(result)  \
-: "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+: "0"(a_), "w"(b_), "x"(c_), "i"(d) \
 : /* No clobbers */);   \
result;  \
  })
@@ -7174,7 +7174,7 @@ vld1q_dup_u64 (const uint64_t * a)
uint16x4_t result;   \
__asm__ ("mla %0.4h, %2.4h, %3.h[%4]"\
 : "=w"(result)  \
-: "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+: "0"(a_), "w"(b_), "x"(c_), "i"(d) \
 : /* No clobbers */);   \
result;  \
  })
@@ -7202,7 +7202,7 @@ vld1q_dup_u64 (const uint64_t * a)
int16x4_t result;\
__asm__ ("mla %0.4h, %2.4h, %3.h[%4]"\
 : "=w"(result)  \
-: "0"(a_), "w"(b_), "w"(c_), "i"(d) \
+: "0"(a_), "w"(b_), "x"(c_), "i"(d) \
 : /* No clobbers */);   \
result;  \
  })
@@ -7230,7 +7230,7 @@ vld1q_dup_u64 (const uint64_t * a)
uint16x4_t result;   \
__asm__ ("mla %0.4h, %2

Re: RFC - Next refactoring steps

2013-09-06 Thread Richard Biener

On Fri, Sep 6, 2013 at 10:14 AM, Mike Stump  wrote:
> On Sep 5, 2013, at 11:54 PM, Richard Biener  
> wrote:
>> Most of the GCC headerfiles do not include all their required headers but
>> rely on .c files doing that (in the appropriate order).  I somehow like
>> that though I cannot explain why ;)
>
> Very old school.  I can explain why I don't like it, but I'll resist.  The 
> universal standard is for each header to include all it needs and for the 
> ordering of includes not to matter any.  Deviations from this are the 
> exception and should virtually never should be the case.
>
>> Also grepping for #includes I see that this doesn't seem to be true anymore.
>
> Yeah, everyone else hates the old school style with a passion, it was never a 
> good idea.  :-)  When they come across it, people have a tendency to fix the 
> headers as broken and over time, all the dependencies eventually get added.

Well, I still think you cannot randomly change include file order in
GCC.  At least some

#ifdef ...

hackery in some headers will suddenly break (that is, change outcome)
if you include for example
tm.h before or after it.

But yes, making all dependencies explicit puts #includes where they
belong and shows where
header refactoring would make sense.  It also removes weird includes
from .c files that are
only necessary to make included required headers not break.

Richard.

[i386, AVX-512F, pr58269] Partial fix for PR58269: properly initialize last EXT REX SSE register.

2013-09-06 Thread Kirill Yukhin

Hello,
Here is a patch to fix pr58269.
Actually this is not a full fix, but an obvious part.

ChangeLog entry:
2013-09-06  Kirill Yukhin  

PR target/58269
* gcc/config/i386/i386.c (ix86_conditional_register_usage):
Proper initialize extended SSE registers.

Bootstrap pass.

Ok for trunk?

--
Thanks, K

---
 gcc/config/i386/i386.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a8d70bc..d6a40a8 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4218,7 +4218,7 @@ ix86_conditional_register_usage (void)
 
   /* If AVX512F is disabled, squash the registers.  */
   if (! TARGET_AVX512F)
-for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
+for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++)
   fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
 }
 
-- 
1.7.11.7

[patch] Fix another fallout of partial inlining change

2013-09-06 Thread Eric Botcazou

Hi,

see http://gcc.gnu.org/ml/gcc/2013-09/msg00028.html for the context.
The patch sets DECL_NO_INLINE_WARNING_P on the non-inlinable part after 
splitting (an alternative would be to clear DECL_DECLARED_INLINE_P).

Tested on x86_64-suse-linux, OK for the mainline?


2013-09-06  Eric Botcazou  

* ipa-split.c (split_function): Set DECL_NO_INLINE_WARNING_P on the
non-inlinable part.



2013-09-06  Eric Botcazou  

* gnat.dg/warn10.ad[sb]: New test.
* gnat.dg/warn10_pkg.ads: New helper.


-- 
Eric BotcazouIndex: ipa-split.c
===
--- ipa-split.c	(revision 202287)
+++ ipa-split.c	(working copy)
@@ -1222,6 +1222,9 @@ split_function (struct split_point *spli
   DECL_BUILT_IN_CLASS (node->symbol.decl) = NOT_BUILT_IN;
   DECL_FUNCTION_CODE (node->symbol.decl) = (enum built_in_function) 0;
 }
+  /* If the original function is declared inline, there is no point in issuing
+ a warning for the non-inlinable part.  */
+  DECL_NO_INLINE_WARNING_P (node->symbol.decl) = 1;
   cgraph_node_remove_callees (cur_node);
   ipa_remove_all_references (&cur_node->symbol.ref_list);
   if (!split_part_return_p)-- { dg-do compile }
-- { dg-options "-O3 -gnatn -Winline" }

package body Warn10 is

   procedure Do_Something(Driver : My_Driver) is
  X : Float;
   begin
   X := Get_Input_Value( Driver, 1, 1);
   end;

end Warn10;with Warn10_Pkg; use Warn10_Pkg;

package Warn10 is

   type My_Driver is new Root with record
  Extra : Natural;
   end record;

   procedure Do_Something(Driver : My_Driver);

end Warn10;package Warn10_Pkg is

   Size : constant Natural := 100;
   type My_Array is array(1..Size, 1..Size) of Float;

   type Root is tagged record
  Input_Values : My_Array;
   end record;

   function Get_Input_Value( Driver : Root; I, J : Natural) return Float;

end Warn10_Pkg;

Re: Ping: RFA: testsuite patches (1/6): keeps_null_pointer_checks effect on pta/alias dump files

2013-09-06 Thread Joern Rennecke


Quoting Richard Biener :


I'd say rather than just disabling the scan for
keeps_null_pointer_checks you should add
appropriate scanning for those targets as well (I expect you just
don't see the 'NULL's in the points-to sets).


Well, for gcc.dg/ipa/ipa-pta-14.c, which doesn't have NONLOCAL otherwise,
you instead see NONLOCAL.

Please find attached the amended patch.

I have verified that this gets the new expected PASSes for atmega128-sim.
2013-09-06  Joern Rennecke  

* gcc.dg/ipa/ipa-pta-14.c (scan-ipa-dump) [keeps_null_pointer_checks]:
Don't expect NULL in foo.result set.
* gcc.dg/tree-ssa/pta-escape-1.c (scan-tree-dump): Don't expect NULL
in ESCAPED set.
* gcc.dg/tree-ssa/pta-escape-2.c: Likewise.
* gcc.dg/tree-ssa/pta-escape-3.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c 
b/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c
index e8abc32..ed59cbf 100644
--- a/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c
+++ b/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c
@@ -22,7 +22,8 @@ int main()
   a.p = (void *)&c;
   p = foo(&a, &a);
   /* { dg-final { scan-ipa-dump "foo.result = { NULL a\[^ \]* c\[^ \]* }" 
"pta" { xfail *-*-* } } } */
-  /* { dg-final { scan-ipa-dump "foo.result = { NULL a\[^ \]* a\[^ \]* c\[^ 
\]* }" "pta" } } */
+  /* { dg-final { scan-ipa-dump "foo.result = { NULL a\[^ \]* a\[^ \]* c\[^ 
\]* }" "pta" { target { ! keeps_null_pointer_checks } } } } */
+  /* { dg-final { scan-ipa-dump "foo.result = { NONLOCAL a\[^ \]* a\[^ \]* 
c\[^ \]* }" "pta" { target { keeps_null_pointer_checks } } } } */
   ((struct X *)p)->p = (void *)0;
   if (a.p != (void *)0)
 abort ();
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c
index a424417..dcfae5d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c
@@ -33,5 +33,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" "alias" 
} } */
+/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" "alias" 
{ target { ! keeps_null_pointer_checks } } } } */
+/* { dg-final { scan-tree-dump "ESCAPED = { ESCAPED NONLOCAL x }" "alias" { 
target { keeps_null_pointer_checks } } } } */
 /* { dg-final { cleanup-tree-dump "alias" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c
index 8580382..e613959 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c
@@ -34,5 +34,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" "alias" 
} } */
+/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" "alias" 
{ target { ! keeps_null_pointer_checks } } } } */
+/* { dg-final { scan-tree-dump "ESCAPED = { ESCAPED NONLOCAL x }" "alias" { 
target { keeps_null_pointer_checks } } } } */
 /* { dg-final { cleanup-tree-dump "alias" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c
index 5a121a0..870dcf6 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c
@@ -38,5 +38,6 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" "alias" 
} } */
+/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" "alias" 
{ target { ! keeps_null_pointer_checks } } } } */
+/* { dg-final { scan-tree-dump "ESCAPED = { ESCAPED NONLOCAL x }" "alias" { 
target { keeps_null_pointer_checks } } } } */
 /* { dg-final { cleanup-tree-dump "alias" } } */

Re: [PATCH, PR 57748] Check for out of bounds access

2013-09-06 Thread Richard Biener

On Fri, Sep 6, 2013 at 10:35 AM, Bernd Edlinger
 wrote:
> Richard,
>
>> But the movmisalign path skips all this code and with the
>> current code thinks the actual access is in the mode of the
>> whole structure. (and also misses the address adjustment
>> as shown in the other testcase for the bug)
>>
>> The movmisalign handling in this path is just broken. And
>> it's _not_ just for optimization! If you have
>>
>> struct __attribute__((packed)) {
>> double x;
>> v2df y;
>> } *p;
>>
>> and do
>>
>> p = malloc (48); // gets you 8-byte aligned memory
>> p->y = (v2df) { 1., 2. };
>>
>> then you cannot skip the movmisaling handling because
>> we'd generate an aligned move (based on the mode) then.
>>
>
> Ok, test examples are really helpful here.
>
> This time the structure is BLKmode, unaligned,
> movmisalign = false anyway.
>
> I tried to make a test case out of your example,
> and as I expected, the code is also correct:
>
> foo:
> .cfi_startproc
> movdqa  .LC1(%rip), %xmm0
> movq$-1, (%rdi)
> movl$0x4048f5c3, 8(%rdi)
> movdqu  %xmm0, 12(%rdi)
> ret
>
> movdqu.
>
> The test executes without trap.
> And I did everything to make the object unaligned.

Yeah, well - for the expand path with BLKmode it's working fine.
We're doing all
the dances because of correctness for non-BLKmode expand paths.

> I am sure we could completely remove the
> movmisalign path, and nothing would happen.
> probably. except maybe for a performance regression.

In this place probably yes.  But we can also fix it to use the proper mode and
properly do the offset adjustment.  But just adding a bounds check looks
broken to me.

Note that there may have been a correctness reason for this code in the
light of the IPA-SRA code.  Maybe Martin remembers.

If removing the movmisalign handling inside the handled-component
case in expand_assignment works out (make sure to also test a
strict-alignment target) then that is probably fine.

Richard.

>
> Bernd.

Re: Ping: RFA: testsuite patches (1/6): keeps_null_pointer_checks effect on pta/alias dump files

2013-09-06 Thread Richard Biener

On Fri, Sep 6, 2013 at 11:13 AM, Joern Rennecke
 wrote:
> Quoting Richard Biener :
>
>> I'd say rather than just disabling the scan for
>> keeps_null_pointer_checks you should add
>> appropriate scanning for those targets as well (I expect you just
>> don't see the 'NULL's in the points-to sets).
>
>
> Well, for gcc.dg/ipa/ipa-pta-14.c, which doesn't have NONLOCAL otherwise,
> you instead see NONLOCAL.

Yes.  On those targets NULL == NONLOCAL (0 is just another global address).

> Please find attached the amended patch.
>
> I have verified that this gets the new expected PASSes for atmega128-sim.

Ok.

Thanks,
Richard.

> 2013-09-06  Joern Rennecke  
>
> * gcc.dg/ipa/ipa-pta-14.c (scan-ipa-dump)
> [keeps_null_pointer_checks]:
> Don't expect NULL in foo.result set.
> * gcc.dg/tree-ssa/pta-escape-1.c (scan-tree-dump): Don't expect NULL
> in ESCAPED set.
> * gcc.dg/tree-ssa/pta-escape-2.c: Likewise.
> * gcc.dg/tree-ssa/pta-escape-3.c: Likewise.
>
> diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c
> b/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c
> index e8abc32..ed59cbf 100644
> --- a/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c
> +++ b/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c
> @@ -22,7 +22,8 @@ int main()
>a.p = (void *)&c;
>p = foo(&a, &a);
>/* { dg-final { scan-ipa-dump "foo.result = { NULL a\[^ \]* c\[^ \]* }"
> "pta" { xfail *-*-* } } } */
> -  /* { dg-final { scan-ipa-dump "foo.result = { NULL a\[^ \]* a\[^ \]* c\[^
> \]* }" "pta" } } */
> +  /* { dg-final { scan-ipa-dump "foo.result = { NULL a\[^ \]* a\[^ \]* c\[^
> \]* }" "pta" { target { ! keeps_null_pointer_checks } } } } */
> +  /* { dg-final { scan-ipa-dump "foo.result = { NONLOCAL a\[^ \]* a\[^ \]*
> c\[^ \]* }" "pta" { target { keeps_null_pointer_checks } } } } */
>((struct X *)p)->p = (void *)0;
>if (a.p != (void *)0)
>  abort ();
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c
> b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c
> index a424417..dcfae5d 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c
> @@ -33,5 +33,6 @@ int main()
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }"
> "alias" } } */
> +/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }"
> "alias" { target { ! keeps_null_pointer_checks } } } } */
> +/* { dg-final { scan-tree-dump "ESCAPED = { ESCAPED NONLOCAL x }" "alias" {
> target { keeps_null_pointer_checks } } } } */
>  /* { dg-final { cleanup-tree-dump "alias" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c
> b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c
> index 8580382..e613959 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c
> @@ -34,5 +34,6 @@ int main()
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }"
> "alias" } } */
> +/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }"
> "alias" { target { ! keeps_null_pointer_checks } } } } */
> +/* { dg-final { scan-tree-dump "ESCAPED = { ESCAPED NONLOCAL x }" "alias" {
> target { keeps_null_pointer_checks } } } } */
>  /* { dg-final { cleanup-tree-dump "alias" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c
> b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c
> index 5a121a0..870dcf6 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c
> @@ -38,5 +38,6 @@ int main()
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }"
> "alias" } } */
> +/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }"
> "alias" { target { ! keeps_null_pointer_checks } } } } */
> +/* { dg-final { scan-tree-dump "ESCAPED = { ESCAPED NONLOCAL x }" "alias" {
> target { keeps_null_pointer_checks } } } } */
>  /* { dg-final { cleanup-tree-dump "alias" } } */
>

Re: [patch] Fix another fallout of partial inlining change

2013-09-06 Thread Jan Hubicka

> Hi,
> 
> see http://gcc.gnu.org/ml/gcc/2013-09/msg00028.html for the context.
> The patch sets DECL_NO_INLINE_WARNING_P on the non-inlinable part after 
> splitting (an alternative would be to clear DECL_DECLARED_INLINE_P).

Sorry, I missed your mail and it seems that my original mail did not
hit the mailing list.  I am attaching what I wrote back then for a record.
The patch fixes situation where function is externaly visible and called
once.  In this case it makes sense to partially inline it into the
one caller.  Previous heuristic was wrong assuming that the function is
static and thus preventing splitting because it makes more sense to inline
it always.
> 
> Tested on x86_64-suse-linux, OK for the mainline?
> 
> 
> 2013-09-06  Eric Botcazou  
> 
>   * ipa-split.c (split_function): Set DECL_NO_INLINE_WARNING_P on the
>   non-inlinable part.

Yes, this is OK.
Thinking about it, we ought to prvent splitting always_inline functions: those
may contain something that relies on inlining.  Either you can include in your
change or I will fix it as a followup.

Thank you,
Honza


Hi,
while analyzing profile issues with Martin I noticed that the testcases bellow
demonstrate three different bugs in profling and splitting.

This patch fixes first of it - the function t() should be split and partially
inlinined, while it is not.

Bootstrapped/regtested ppc64-linux, will commit it shortly.
Honza
Index: ipa-split.c
===
--- ipa-split.c (revision 202153)
+++ ipa-split.c (working copy)
@@ -1537,7 +1538,9 @@ execute_split_functions (void)
  Note that we are not completely conservative about disqualifying functions
  called once.  It is possible that the caller is called more then once and
  then inlining would still benefit.  */
-  if ((!node->callers || !node->callers->next_caller)
+  if ((!node->callers
+   /* Local functions called once will be completely inlined most of time. 
 */
+   || (!node->callers->next_caller && node->local.local))
   && !node->symbol.address_taken
   && (!flag_lto || !node->symbol.externally_visible))
 {
Index: testsuite/gcc.dg/tree-ssa/fnsplit-1.c
===
--- testsuite/gcc.dg/tree-ssa/fnsplit-1.c   (revision 0)
+++ testsuite/gcc.dg/tree-ssa/fnsplit-1.c   (working copy)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fnsplit" } */
+#include 
+int a[1000];
+
+void
+t(int a)
+{
+  if (a)
+printf ("I Am Completely Operational,"),
+printf ("And All My Circuits Are Functioning Perfectly\n");
+}
+int
+main(void)
+{
+  int i;
+  for (i = 0; i < 1000; i++)
+t(a[i]);
+  return 0;
+}
+/* { dg-final { scan-tree-dump-times "Splitting function at:" 1 "fnsplit"} } */
+
+/* { dg-final { cleanup-tree-dump "fnsplit" } } */

Re: [PATCH, PR 57748] Check for out of bounds access

2013-09-06 Thread Eric Botcazou

> this patch fixes the ICE and wrong code issues from PR57748. It contains two
> test cases. One for the ICE and one for the wrong code. Both test cases
> fail for all gcc versions, and are fixed with this patch.
> 
> Bootstrapped and tested on x86_64-linux without any problems.
> 
> OK for trunk and the 4.8 / 4.7 branch?

Please avoid touching the 4.7 branch at this point, this is a sensitive area 
and the testcases look quite artificial to me.

-- 
Eric Botcazou

Re: [i386, AVX-512F, pr58269] Partial fix for PR58269: properly initialize last EXT REX SSE register.

2013-09-06 Thread Uros Bizjak

On Fri, Sep 6, 2013 at 10:34 AM, Kirill Yukhin  wrote:
> Hello,
> Here is a patch to fix pr58269.
> Actually this is not a full fix, but an obvious part.
>
> ChangeLog entry:
> 2013-09-06  Kirill Yukhin  
>
> PR target/58269
> * gcc/config/i386/i386.c (ix86_conditional_register_usage):
> Proper initialize extended SSE registers.

This is OK.

Thanks,
Uros.

RE: [PATCH, PR 57748] Check for out of bounds access

2013-09-06 Thread Bernd Edlinger

Just for completeness, these were the test examples on
this private communication:

On Fri, 6 Sep 2013 11:19:18, Richard Biener wrote:
> On Fri, Sep 6, 2013 at 10:35 AM, Bernd Edlinger
>  wrote:
>> Richard,
>>
>>> But the movmisalign path skips all this code and with the
>>> current code thinks the actual access is in the mode of the
>>> whole structure. (and also misses the address adjustment
>>> as shown in the other testcase for the bug)
>>>
>>> The movmisalign handling in this path is just broken. And
>>> it's _not_ just for optimization! If you have
>>>
>>> struct __attribute__((packed)) {
>>> double x;
>>> v2df y;
>>> } *p;
>>>
>>> and do
>>>
>>> p = malloc (48); // gets you 8-byte aligned memory
>>> p->y = (v2df) { 1., 2. };
>>>
>>> then you cannot skip the movmisaling handling because
>>> we'd generate an aligned move (based on the mode) then.
>>>
>>
>> Ok, test examples are really helpful here.
>>
>> This time the structure is BLKmode, unaligned,
>> movmisalign = false anyway.
>>
>> I tried to make a test case out of your example,
>> and as I expected, the code is also correct:
>>
>> foo:
>> .cfi_startproc
>> movdqa .LC1(%rip), %xmm0
>> movq $-1, (%rdi)
>> movl $0x4048f5c3, 8(%rdi)
>> movdqu %xmm0, 12(%rdi)
>> ret
>>
>> movdqu.
>>
>> The test executes without trap.
>> And I did everything to make the object unaligned.
>
> Yeah, well - for the expand path with BLKmode it's working fine.
> We're doing all
> the dances because of correctness for non-BLKmode expand paths.
>
>> I am sure we could completely remove the
>> movmisalign path, and nothing would happen.
>> probably. except maybe for a performance regression.
>
> In this place probably yes. But we can also fix it to use the proper mode and
> properly do the offset adjustment. But just adding a bounds check looks
> broken to me.
>
> Note that there may have been a correctness reason for this code in the
> light of the IPA-SRA code. Maybe Martin remembers.
>
> If removing the movmisalign handling inside the handled-component
> case in expand_assignment works out (make sure to also test a
> strict-alignment target) then that is probably fine.
>
> Richard.
>
>>
>> Bernd. #include 
#include 

typedef long long V
  __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));
union x
{
   long long a;
   float b;
} __attribute__((aligned(1))) ;

struct s
{
  union x xx[0];
  V x;
  
} __attribute__((packed));

void __attribute__((noinline, noclone))
foo(struct s * x)
{
  x->xx[0].a = -1;
  x->xx[0].b = 3.14;
  x->x[1] = 0x123456789ABCDEF;
}

int
main()
{
  struct s ss;
  memset(&ss, 0, sizeof(ss));
  foo (&ss);
  printf("%f %llX\n", ss.xx[0].b, ss.xx[0].a);
  printf("%llX %llX\n", ss.x[0], ss.x[1]);
}
#include 
#include 

typedef long long V
  __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));

struct __attribute__((packed)) s
{
   long long a;
   float b;
   V x;
};

void __attribute__((noinline, noclone))
foo(struct s * x)
{
  V a = { 1, 2 };
  x->a = -1;
  x->b = 3.14;
  x->x = a;
}

int
main()
{
  long long buf[100];
  struct s* ss = (struct s*) ((char*)buf+1);
  memset(ss, 0, sizeof(*ss));
  foo (ss);
  printf("%f %llX\n", ss->b, ss->a);
  printf("%llX %llX\n", ss->x[0], ss->x[1]);
}

Re: [i386, AVX-512F, pr58269] Partial fix for PR58269: properly initialize last EXT REX SSE register.

2013-09-06 Thread Jakub Jelinek

On Fri, Sep 06, 2013 at 11:28:53AM +0200, Uros Bizjak wrote:
> On Fri, Sep 6, 2013 at 10:34 AM, Kirill Yukhin  
> wrote:
> > Hello,
> > Here is a patch to fix pr58269.
> > Actually this is not a full fix, but an obvious part.
> >
> > ChangeLog entry:
> > 2013-09-06  Kirill Yukhin  
> >
> > PR target/58269
> > * gcc/config/i386/i386.c (ix86_conditional_register_usage):
> > Proper initialize extended SSE registers.
> 
> This is OK.

But please leave out gcc/ prefix from the ChangeLog entry.

Jakub

Re: [PATCH, PR 57748] Check for out of bounds access

2013-09-06 Thread Jakub Jelinek

On Fri, Sep 06, 2013 at 11:28:44AM +0200, Eric Botcazou wrote:
> > this patch fixes the ICE and wrong code issues from PR57748. It contains two
> > test cases. One for the ICE and one for the wrong code. Both test cases
> > fail for all gcc versions, and are fixed with this patch.
> > 
> > Bootstrapped and tested on x86_64-linux without any problems.
> > 
> > OK for trunk and the 4.8 / 4.7 branch?
> 
> Please avoid touching the 4.7 branch at this point, this is a sensitive area 
> and the testcases look quite artificial to me.

Even for 4.8 it needs at least several weeks on the trunk before considering
backports of such code.

Jakub

Re: [patch] Fix another fallout of partial inlining change

2013-09-06 Thread Eric Botcazou

> Sorry, I missed your mail and it seems that my original mail did not
> hit the mailing list.  I am attaching what I wrote back then for a record.
> The patch fixes situation where function is externaly visible and called
> once.  In this case it makes sense to partially inline it into the
> one caller.  Previous heuristic was wrong assuming that the function is
> static and thus preventing splitting because it makes more sense to inline
> it always.

OK, thanks for the explanation.

> Thinking about it, we ought to prvent splitting always_inline functions:
> those may contain something that relies on inlining.  Either you can
> include in your change or I will fix it as a followup.

I'll let you make the change.

-- 
Eric Botcazou

Re: [PATCH, C++, PR58282] Handle noexcept on transactions with -fno-exceptions

2013-09-06 Thread Tom de Vries

On 04/09/13 19:21, Jason Merrill wrote:
> On 09/03/2013 06:03 AM, Tom de Vries wrote:
>>  * semantics.c (finish_transaction_stmt, build_transaction_expr): Handle
>>  flag_exceptions.
> 
> I'd rather handle this at a lower level, by making 
> build_must_not_throw_expr return its argument if -fno-exceptions.
> 

Jason,

This patch implements that way of fixing. In addition, it adds a test-case.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom


2013-09-02  Tom de Vries  

PR c++/58282
* except.c (build_must_not_throw_expr): Handle
flag_exceptions.

* g++.dg/tm/noexcept-6.C: New test.

diff --git a/gcc/cp/except.c b/gcc/cp/except.c
index fbebcba..c76d944 100644
--- a/gcc/cp/except.c
+++ b/gcc/cp/except.c
@@ -374,6 +374,9 @@ build_must_not_throw_expr (tree body, tree cond)
 {
   tree type = body ? TREE_TYPE (body) : void_type_node;
 
+  if (!flag_exceptions)
+return body;
+
   if (cond && !value_dependent_expression_p (cond))
 {
   cond = cxx_constant_value (cond);
diff --git a/gcc/testsuite/g++.dg/tm/noexcept-6.C b/gcc/testsuite/g++.dg/tm/noexcept-6.C
new file mode 100644
index 000..4391159
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tm/noexcept-6.C
@@ -0,0 +1,23 @@
+// { dg-do compile }
+// { dg-options "-fno-exceptions -fgnu-tm -O -std=c++0x -fdump-tree-tmlower" }
+
+struct TrueFalse
+{
+  static constexpr bool v() { return true; }
+};
+
+int global;
+
+template int foo()
+{
+  return __transaction_atomic noexcept(T::v()) (global + 1);
+}
+
+int f1()
+{
+  return foo();
+}
+
+/* { dg-final { scan-tree-dump-times "eh_must_not_throw" 0 "tmlower" } } */
+/* { dg-final { scan-tree-dump-times "__transaction_atomic" 1 "tmlower" } } */
+/* { dg-final { cleanup-tree-dump "tmlower" } } */

Re: [i386, AVX-512F, pr58269] Partial fix for PR58269: properly initialize last EXT REX SSE register.

2013-09-06 Thread Iain Sandoe

Hi Kirill,

Thanks for Ilya's input on the PR thread.

We've done some testing/checking across the Darwin versions last night and I've 
bootstrapped all langs including Ada, and tested the patch below (together with 
the fragment you posted earlier) on Darwin12.

>> On Fri, Sep 6, 2013 at 10:34 AM, Kirill Yukhin  
>> wrote:

>>> Here is a patch to fix pr58269.
>>> Actually this is not a full fix, but an obvious part.

Here's what I propose for the remainder of the fix (FAOD, I cannot approve the 
Darwin changes).

Darwin is supposed to follow the System V psABI for x86_64 and, AFAICT for 
Darwin9, 10, 11 and 12 it is complying for function calls involving xmm0-7 
(additional float args are, indeed, placed on the stack).  Ergo, saving xmm8-15 
in __builtin_apply_args() only consumes time and stack space - the content is 
not part of the call.

As a fall-back; if I've missed a subtlety and, for some reason, we need to 
adjust the xmm set saved for compatibility with an older (gcc-4.2 based) system 
compiler, then we can override SSE_REGPARM_MAX in i386/darwin*.h for the 
relevant system version.

Losing TARGET_MACHO conditional code is always nice :)

Mike: Actually, since this seems to have uncovered a pre-existing wrong code 
bug, perhaps (this part of the) fix should also be applied to open branches?

gcc:

PR target/58269
config/i386/i386.c (ix86_function_arg_regno_p): Make Darwin use the
xmm register set described in the psABI.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a8d70bc..e68b3f9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5706,17 +5706,9 @@ ix86_function_arg_regno_p (int regno)
&& (regno < FIRST_SSE_REG + SSE_REGPARM_MAX)));
 }

-  if (TARGET_MACHO)
-{
-  if (SSE_REGNO_P (regno) && TARGET_SSE)
-return true;
-}
-  else
-{
-  if (TARGET_SSE && SSE_REGNO_P (regno)
-  && (regno < FIRST_SSE_REG + SSE_REGPARM_MAX))
-return true;
-}
+  if (TARGET_SSE && SSE_REGNO_P (regno)
+  && (regno < FIRST_SSE_REG + SSE_REGPARM_MAX))
+return true;

   /* TODO: The function should depend on current function ABI but
  builtins.c would need updating then. Therefore we use the

Fix ipa-devirt-11.C on AIX part 1

2013-09-06 Thread Jan Hubicka

Hi,
ipa-devirt-11.C (invented by me) checks for series of events that lead
to an ICE.  With local alias changes these events are not happening for multiple
reasons.   This patch solves first problem: we now redirect call to a function
to call to its alias and that breaks detection of self recursion on several 
places.
Introducewd by adding recursive_call_p predicate and also symtab logic to try to
prove if two symbols must point to semantically equivalent piece of code (or 
vtable)

Bootstrapped/regtested x86_64-linux, will commit it shortly.

PR middle-end/58094
* cgraph.h (symtab_semantically_equivalent_p): Declare.
* tree-tailcall.c: Include ipa-utils.h.
(find_tail_calls): Use it.
* ipa-pure-const.c (check_call): Likewise.
* ipa-utils.c (recursive_call_p): New function.
* ipa-utils.h (recursive_call_p): Dclare.
* symtab.c (symtab_nonoverwritable_alias): Fix formatting.
(symtab_semantically_equivalent_p): New function.
* Makefile.in (tree-tailcall.o): Update dependencies.
Index: cgraph.h
===
--- cgraph.h(revision 202315)
+++ cgraph.h(working copy)
@@ -627,6 +627,7 @@ bool symtab_for_node_and_aliases (symtab
  bool);
 symtab_node symtab_nonoverwritable_alias (symtab_node);
 enum availability symtab_node_availability (symtab_node);
+bool symtab_semantically_equivalent_p (symtab_node, symtab_node);
 
 /* In cgraph.c  */
 void dump_cgraph (FILE *);
Index: tree-tailcall.c
===
--- tree-tailcall.c (revision 202315)
+++ tree-tailcall.c (working copy)
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.
 #include "target.h"
 #include "cfgloop.h"
 #include "common/common-target.h"
+#include "ipa-utils.h"
 
 /* The file implements the tail recursion elimination.  It is also used to
analyze the tail calls in general, passing the results to the rtl level
@@ -445,7 +446,7 @@ find_tail_calls (basic_block bb, struct
   /* We found the call, check whether it is suitable.  */
   tail_recursion = false;
   func = gimple_call_fndecl (call);
-  if (func == current_function_decl)
+  if (func && recursive_call_p (current_function_decl, func))
 {
   tree arg;
 
Index: ipa-pure-const.c
===
--- ipa-pure-const.c(revision 202315)
+++ ipa-pure-const.c(working copy)
@@ -541,7 +541,8 @@ check_call (funct_state local, gimple ca
 }
 
   /* When not in IPA mode, we can still handle self recursion.  */
-  if (!ipa && callee_t == current_function_decl)
+  if (!ipa && callee_t
+  && recursive_call_p (current_function_decl, callee_t))
 {
   if (dump_file)
 fprintf (dump_file, "Recursive call can loop.\n");
@@ -1079,8 +1080,9 @@ ignore_edge (struct cgraph_edge *e)
 }
 
 /* Return true if NODE is self recursive function.
-   ??? self recursive and indirectly recursive funcions should
-   be the same, so this function seems unnecessary.  */
+   Indirectly recursive functions appears as non-trivial strongly
+   connected components, so we need to care about self recursion
+   only.  */
 
 static bool
 self_recursive_p (struct cgraph_node *node)
Index: ipa-utils.c
===
--- ipa-utils.c (revision 202315)
+++ ipa-utils.c (working copy)
@@ -791,3 +791,14 @@ ipa_merge_profiles (struct cgraph_node *
   src->symbol.decl = oldsrcdecl;
 }
 
+/* Return true if call to DEST is known to be self-recusive call withing FUNC. 
  */
+
+bool
+recursive_call_p (tree func, tree dest)
+{
+  struct cgraph_node *dest_node = cgraph_get_create_node (dest);
+  struct cgraph_node *cnode = cgraph_get_create_node (func);
+
+  return symtab_semantically_equivalent_p ((symtab_node)dest_node,
+  (symtab_node)cnode);
+}
Index: ipa-utils.h
===
--- ipa-utils.h (revision 202315)
+++ ipa-utils.h (working copy)
@@ -46,6 +46,7 @@ int ipa_reverse_postorder (struct cgraph
 tree get_base_var (tree);
 void ipa_merge_profiles (struct cgraph_node *dst,
 struct cgraph_node *src);
+bool recursive_call_p (tree, tree);
 
 /* In ipa-profile.c  */
 bool ipa_propagate_frequency (struct cgraph_node *node);
Index: symtab.c
===
--- symtab.c(revision 202315)
+++ symtab.c(working copy)
@@ -1106,11 +1106,48 @@ symtab_nonoverwritable_alias (symtab_nod
 {
   DECL_STATIC_CONSTRUCTOR (new_decl) = 0;
   DECL_STATIC_DESTRUCTOR (new_decl) = 0;
-  new_node = (symtab_node) cgraph_create_function_alias (new_decl, 
node->symbol.decl);
+  new_node = (symtab_node) cgraph_create_function_alias
+(new_decl, node->symbol.decl);
 }

RFA: Fix mark_target_live_regs to take COND_EXEC into account

2013-09-06 Thread Joern Rennecke


We found that

std::basic_string, std::allocator
::copy(char*, unsigned long, unsigned long) const

got miscompiled for ARC because reorg thought that all call-clobbered
registers were dead after a conditional call.
I can't reproduce the test case with current trunk + ARC port, but I reckon
this is just due to the fickleness of the register allocator.  Chances are,
every other version, some other obscure function is miscompiled without
this patch.

bootstrapped on i686-pc-linux-gnu.

OK to apply?
2013-04-29 Claudiu Zissulescu 

* resource.c (mark_target_live_regs): Compute resources taking
into account if a call is predicated or not.

diff --git a/gcc/resource.c b/gcc/resource.c
Index: resource.c
===
--- resource.c  (revision 202295)
+++ resource.c  (working copy)
@@ -994,11 +994,18 @@ mark_target_live_regs (rtx insns, rtx ta
 
  if (CALL_P (real_insn))
{
- /* CALL clobbers all call-used regs that aren't fixed except
-sp, ap, and fp.  Do this before setting the result of the
-call live.  */
- AND_COMPL_HARD_REG_SET (current_live_regs,
- regs_invalidated_by_call);
+ /* Values in call-clobbered registers survive a COND_EXEC CALL
+if that is not executed; this matters for resoure use because
+they may be used by a complementarily (or more strictly)
+predicated instruction, or if the CALL is NORETURN.  */
+ if (GET_CODE (PATTERN (real_insn)) != COND_EXEC)
+   {
+ /* CALL clobbers all call-used regs that aren't fixed except
+sp, ap, and fp.  Do this before setting the result of the
+call live.  */
+ AND_COMPL_HARD_REG_SET (current_live_regs,
+ regs_invalidated_by_call);
+   }
 
  /* A CALL_INSN sets any global register live, since it may
 have been modified by the call.  */

Fix ipa-devirt-11.C on AIX part 2

2013-09-06 Thread Jan Hubicka

Hi,
this patch makes tree-sra to do its job in the case where function has an alias.
There were two problems; first recursion is not detected correctly and second
we did not see the callers and thus skipped the function.  Rest of tree-sra 
seems
to work as expected.

Bootstrapped/regtsted x86_64-linux, comitted.

Honza

* Makefile.in (tree-sra.o): Update dependencies.
* tree-sra.c: Include ipa-utils.h
(scan_function): Use recursive_call_p.
(has_caller_p): New function.
(cgraph_for_node_and_aliases): Count also callers of aliases.
Index: Makefile.in
===
--- Makefile.in (revision 202316)
+++ Makefile.in (working copy)
@@ -3104,7 +3104,7 @@ tree-sra.o : tree-sra.c $(CONFIG_H) $(SY
$(HASH_TABLE_H) $(TM_H) $(TREE_H) $(GIMPLE_H) $(CGRAPH_H) $(TREE_FLOW_H) \
$(IPA_PROP_H) $(DIAGNOSTIC_H) statistics.h \
$(PARAMS_H) $(TARGET_H) $(FLAGS_H) \
-   $(DBGCNT_H) $(TREE_INLINE_H) $(GIMPLE_PRETTY_PRINT_H)
+   $(DBGCNT_H) $(TREE_INLINE_H) $(GIMPLE_PRETTY_PRINT_H) ipa-utils.h
 tree-switch-conversion.o : tree-switch-conversion.c $(CONFIG_H) $(SYSTEM_H) \
 $(TREE_H) $(TM_P_H) $(TREE_FLOW_H) $(DIAGNOSTIC_H) $(TREE_INLINE_H) \
 $(TM_H) coretypes.h $(GIMPLE_H) $(CFGLOOP_H) \
Index: tree-sra.c
===
--- tree-sra.c  (revision 202315)
+++ tree-sra.c  (working copy)
@@ -91,6 +91,7 @@ along with GCC; see the file COPYING3.
 #include "tree-inline.h"
 #include "gimple-pretty-print.h"
 #include "ipa-inline.h"
+#include "ipa-utils.h"
 
 /* Enumeration of all aggregate reductions we can do.  */
 enum sra_mode { SRA_MODE_EARLY_IPA,   /* early call regularization */
@@ -1256,8 +1257,7 @@ scan_function (void)
  if (DECL_BUILT_IN_CLASS (dest) == BUILT_IN_NORMAL
  && DECL_FUNCTION_CODE (dest) == BUILT_IN_APPLY_ARGS)
encountered_apply_args = true;
- if (cgraph_get_node (dest)
- == cgraph_get_node (current_function_decl))
+ if (recursive_call_p (current_function_decl, dest))
{
  encountered_recursive_call = true;
  if (!callsite_has_enough_arguments_p (stmt))
@@ -4906,6 +4906,16 @@ modify_function (struct cgraph_node *nod
   return cfg_changed;
 }
 
+/* If NODE has a caller, return true.  */
+
+static bool
+has_caller_p (struct cgraph_node *node, void *data ATTRIBUTE_UNUSED)
+{
+  if (node->callers)
+return true;
+  return false;
+}
+
 /* Return false the function is apparently unsuitable for IPA-SRA based on it's
attributes, return true otherwise.  NODE is the cgraph node of the current
function.  */
@@ -4949,7 +4959,7 @@ ipa_sra_preliminary_function_checks (str
   return false;
 }
 
-  if (!node->callers)
+  if (!cgraph_for_node_and_aliases (node, has_caller_p, NULL, true))
 {
   if (dump_file)
fprintf (dump_file,

RFA: Fix mark_referenced_resources to handle COND_EXEC

2013-09-06 Thread Joern Rennecke


We found that gethostbyname_r from uClibc got miscompiled for ARC because
reorg thought a COND_EXEC unconditionally kills all the conditionally set
registers.  Making it think that the registers are not killed is no good
either, because then we would miss anti-dependencies.  So, short of adding
some complex predicate-tracing across all paths, it is best to consider
the result registers to be used before set.

bootstrapped / regtested on i686-pc-linux-gnu.

Well, that's not a particular relevant port since it doesn't combine
delay_slots with COND_EXEC, but then, only ARC does, and it's still
waiting for review.  And an arc-linux-uclibc system can't boot properly
without this patch being applied to the compiler, and you have to boot
the system before doing a compiler bootstrap... and the later is also a
bit uncommon for embedded targets.

This is also the reason I placed the test into gcc.target; in principle
the test could also live in gcc.dg, but it would only be relevant to ARC.

OK to apply?
gcc:

2013-04-30  Joern Rennecke  

* resource.c (mark_referenced_resources): Handle COND_EXEC.

Index: resource.c
===
--- resource.c  (revision 202295)
+++ resource.c  (working copy)
@@ -374,6 +374,16 @@ mark_referenced_resources (rtx x, struct
 case INSN:
 case JUMP_INSN:
 
+  if (GET_CODE (PATTERN (x)) == COND_EXEC)
+  /* In addition to the usual references, also consider all outputs
+as referenced, to compensate for mark_set_resources treating
+them as killed.  This is similar to ZERO_EXTRACT / STRICT_LOW_PART
+handling, execpt that we got a partial incidence instead of a partial
+width.  */
+  mark_set_resources (x, res, 0,
+ include_delayed_effects
+ ? MARK_SRC_DEST_CALL : MARK_SRC_DEST);
+
 #ifdef INSN_REFERENCES_ARE_DELAYED
   if (! include_delayed_effects
  && INSN_REFERENCES_ARE_DELAYED (x))
gcc/testsuite:

2013-08-31  Joern Rennecke  

* gcc.target/arc/cond-set-use.c: New test.

diff --git a/gcc/testsuite/gcc.target/arc/cond-set-use.c 
b/gcc/testsuite/gcc.target/arc/cond-set-use.c
new file mode 100644
index 000..aee2725
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arc/cond-set-use.c
@@ -0,0 +1,128 @@
+/* { dg-do run } */
+/* { dg-options "-Os" } */
+
+/* Based on gethostbyname_r,
+ * Copyright (C) 2000-2006 Erik Andersen 
+ *
+ * Licensed under the LGPL v2.1, see the file COPYING.LIB
+ *
+ * Extraction / wrapping as test by
+ * Joern Rennecke  
+ * Copyright (C) 2013 Free Software Foundation, Inc.
+ */
+
+typedef unsigned size_t;
+typedef int ssize_t;
+typedef unsigned uint32_t;
+struct resolv_answer {
+ char *dotted;
+ int atype;
+ int aclass;
+ int ttl;
+ int rdlength;
+ const unsigned char *rdata;
+ int rdoffset;
+ char* buf;
+ size_t buflen;
+ size_t add_count;
+};
+struct hostent
+{
+  char *h_name;
+  char **h_aliases;
+  int h_addrtype;
+  int h_length;
+  char **h_addr_list;
+};
+
+int *__attribute__  ((noinline,weak)) nop (void * p) { return p; };
+void __attribute__  ((noinline,weak)) seta (struct resolv_answer * p)
+{ p->atype = 1;}
+
+int ghostbyname_r(
+  struct hostent *result_buf,
+  char *buf,
+  size_t buflen,
+  struct hostent **result,
+  int *h_errnop)
+{
+ char **addr_list;
+ char **alias;
+ char *alias0;
+ int i0;
+ struct resolv_answer a;
+ int i;
+
+ *result = ((void *)0);
+
+ *h_errnop = -1;
+
+ if ((ssize_t)buflen <= 5)
+  return 34;
+
+ alias = (char **)buf;
+ addr_list = (char **)buf;
+
+  /* This got turned into branch with conditional move in delay slot.  */
+ if ((ssize_t)buflen < 256)
+  return 34;
+
+
+ {
+  if (!nop(&i0)) {
+   result_buf->h_aliases = alias;
+   result_buf->h_addrtype = 2;
+   result_buf->h_length = 4;
+   result_buf->h_addr_list = addr_list;
+   *result = result_buf;
+   *h_errnop = 0;
+   return 0;
+  }
+ }
+
+
+ seta (&a);
+
+ if (a.atype == 1) {
+
+  int need_bytes = sizeof(addr_list[0]) * (a.add_count + 1 + 1);
+
+  int ips_len = a.add_count * a.rdlength;
+
+  buflen -= (need_bytes + ips_len);
+  if ((ssize_t)buflen < 0) {
+   i = 34;
+   goto free_and_ret;
+  }
+
+  result_buf->h_addrtype = 2;
+  *result = result_buf;
+  *h_errnop = 0;
+  i = 0;
+  goto free_and_ret;
+ }
+
+ /* For cse, the 1 was is loaded into a call-saved register;
+the load was hoisted into a delay slot before the conditional load,
+clobbering result_buf, which (conditionally) lived in the same
+call-saved register, because mark_referenced_resources considered the
+destination of the COND_EXEC only clobbered, but not used.  */
+ *h_errnop = 1;
+ *nop(&i0) = 1;
+ i = 2;
+
+ free_and_ret:
+ nop (&i0);
+ return i;
+}
+
+int
+main ()
+{
+  struct hostent buf, *res;
+  int i;
+  char c;
+  ghostbyname_r (&buf,  &c, 1024, &res, &i);
+  ghostbyname_r (&buf,  0, 1024, &res, &i);
+  return 0;
+}

RE: [PATCH, PR 57748] Check for out of bounds access

2013-09-06 Thread Bernd Edlinger

On Fri, 6 Sep 2013 11:30:53, Jakub Jelinek wrote:
> On Fri, Sep 06, 2013 at 11:28:44AM +0200, Eric Botcazou wrote:
>>> this patch fixes the ICE and wrong code issues from PR57748. It contains two
>>> test cases. One for the ICE and one for the wrong code. Both test cases
>>> fail for all gcc versions, and are fixed with this patch.
>>>
>>> Bootstrapped and tested on x86_64-linux without any problems.
>>>
>>> OK for trunk and the 4.8 / 4.7 branch?
>>
>> Please avoid touching the 4.7 branch at this point, this is a sensitive area
>> and the testcases look quite artificial to me.
>
> Even for 4.8 it needs at least several weeks on the trunk before considering
> backports of such code.
>
> Jakub

That's true, the examples are quite artificial, because it is difficult
and quite target-dependent, if there will be a movmisalign optab at all.

And the original test case will no longer reproduce on trunk because of this
change:

2013-08-08  Bernd Edlinger  

PR target/58065
* config/arm/arm.h (MALLOC_ABI_ALIGNMENT): Define.

maybe we should first backport this one.

Yesterday someone did that on the linaro branch,
but maybe /branches/gcc-4_8-branch should be updated as well?

Bernd.

Fix ICE with -O0 -fdevirtualize

2013-09-06 Thread Jan Hubicka

Hi,
this patch fixes ICE with -O0 -fdevirtualize where we try to access type 
inheritance
graph that does not exist.

Bootstrapped/regtested x86_64-linux, comitted.

2013-09-06  Jan Hubicka  

PR tree-optimization/58311
* ipa-devirt.c (gate_ipa_devirt): Only execute when optimizing.

Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 202315)
+++ ipa-devirt.c(working copy)
@@ -1114,9 +1114,7 @@ ipa_devirt (void)
 static bool
 gate_ipa_devirt (void)
 {
-  /* FIXME: We should remove the optimize check after we ensure we never run
- IPA passes when not optimizing.  */
-  return flag_devirtualize && !in_lto_p;
+  return flag_devirtualize && !in_lto_p && optimize;
 }
 
 namespace {

[PATCH] Adjust control dependence delete

2013-09-06 Thread Richard Biener


To be slightly more obvious now that it is a vec ...

Committed.

Richard.

2013-09-06  Richard Biener 

* cfganal.c (control_dependences::~control_dependences):
Properly free all of the vector.

Index: gcc/cfganal.c
===
--- gcc/cfganal.c   (revision 202316)
+++ gcc/cfganal.c   (working copy)
@@ -431,7 +431,7 @@ control_dependences::control_dependences
 
 control_dependences::~control_dependences ()
 {
-  for (int i = 0; i < last_basic_block; ++i)
+  for (unsigned i = 0; i < control_dependence_map.length (); ++i)
 BITMAP_FREE (control_dependence_map[i]);
   control_dependence_map.release ();
   free_edge_list (el);

Re: [i386, AVX-512F, pr58269] Partial fix for PR58269: properly initialize last EXT REX SSE register.

2013-09-06 Thread Kirill Yukhin

Hello,
On 06 Sep 11:29, Jakub Jelinek wrote:
> On Fri, Sep 06, 2013 at 11:28:53AM +0200, Uros Bizjak wrote:
> > This is OK.
> 
> But please leave out gcc/ prefix from the ChangeLog entry.

Thanks, checked into main trunk: 
http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00181.html
with fixed ChangeLog.

--
Thanks, K

[AArch64] Fix types of second parameter to qtbl/qtbx intrinsics

2013-09-06 Thread James Greenhalgh


Hi,

The signed variants of the qtbl and qtbx intrinsics currently
take an int8x<8,16> for their control vector parameter.
This should be a uint8x<8,16> parameter.

Fixed as attached and checked against aarch64.exp on aarch64-none-elf
with no regressions.

Is this OK to commit?

I have some similair patches kicking around in my tree, these feel
obvious, but I'd like to check that others' share that perspective
before I go committing anything!

Thanks,
James

---
gcc/

2013-09-06  James Greenhalgh  

* config/aarch64/arm_neon.h
(vqtbl<1,2,3,4>_s8): Fix control vector parameter type.
(vqtbx<1,2,3,4>_s8): Likewise.

gcc/testsuite/

2013-09-06  James Greenhalgh  

* gcc.target/aarch64/table-intrinsics.c
(qtbl_tests8_< ,2,3,4>): Fix control vector parameter type.
(qtb_tests8_< ,2,3,4>): Likewise.
(qtblq_tests8_< ,2,3,4>): Likewise.
(qtbxq_tests8_< ,2,3,4>): Likewise.
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index e20d34e..5864f2c 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -15973,7 +15973,7 @@ vqtbl1_p8 (poly8x16_t a, uint8x8_t b)
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vqtbl1_s8 (int8x16_t a, int8x8_t b)
+vqtbl1_s8 (int8x16_t a, uint8x8_t b)
 {
   int8x8_t result;
   __asm__ ("tbl %0.8b, {%1.16b}, %2.8b"
@@ -16006,7 +16006,7 @@ vqtbl1q_p8 (poly8x16_t a, uint8x16_t b)
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vqtbl1q_s8 (int8x16_t a, int8x16_t b)
+vqtbl1q_s8 (int8x16_t a, uint8x16_t b)
 {
   int8x16_t result;
   __asm__ ("tbl %0.16b, {%1.16b}, %2.16b"
@@ -16028,7 +16028,7 @@ vqtbl1q_u8 (uint8x16_t a, uint8x16_t b)
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vqtbl2_s8 (int8x16x2_t tab, int8x8_t idx)
+vqtbl2_s8 (int8x16x2_t tab, uint8x8_t idx)
 {
   int8x8_t result;
   __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
@@ -16064,7 +16064,7 @@ vqtbl2_p8 (poly8x16x2_t tab, uint8x8_t idx)
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vqtbl2q_s8 (int8x16x2_t tab, int8x16_t idx)
+vqtbl2q_s8 (int8x16x2_t tab, uint8x16_t idx)
 {
   int8x16_t result;
   __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
@@ -16100,7 +16100,7 @@ vqtbl2q_p8 (poly8x16x2_t tab, uint8x16_t idx)
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vqtbl3_s8 (int8x16x3_t tab, int8x8_t idx)
+vqtbl3_s8 (int8x16x3_t tab, uint8x8_t idx)
 {
   int8x8_t result;
   __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t"
@@ -16136,7 +16136,7 @@ vqtbl3_p8 (poly8x16x3_t tab, uint8x8_t idx)
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vqtbl3q_s8 (int8x16x3_t tab, int8x16_t idx)
+vqtbl3q_s8 (int8x16x3_t tab, uint8x16_t idx)
 {
   int8x16_t result;
   __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t"
@@ -16172,7 +16172,7 @@ vqtbl3q_p8 (poly8x16x3_t tab, uint8x16_t idx)
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vqtbl4_s8 (int8x16x4_t tab, int8x8_t idx)
+vqtbl4_s8 (int8x16x4_t tab, uint8x8_t idx)
 {
   int8x8_t result;
   __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t"
@@ -16209,7 +16209,7 @@ vqtbl4_p8 (poly8x16x4_t tab, uint8x8_t idx)
 
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vqtbl4q_s8 (int8x16x4_t tab, int8x16_t idx)
+vqtbl4q_s8 (int8x16x4_t tab, uint8x16_t idx)
 {
   int8x16_t result;
   __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t"
@@ -16246,7 +16246,7 @@ vqtbl4q_p8 (poly8x16x4_t tab, uint8x16_t idx)
 
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vqtbx1_s8 (int8x8_t r, int8x16_t tab, int8x8_t idx)
+vqtbx1_s8 (int8x8_t r, int8x16_t tab, uint8x8_t idx)
 {
   int8x8_t result = r;
   __asm__ ("tbx %0.8b,{%1.16b},%2.8b"
@@ -16279,7 +16279,7 @@ vqtbx1_p8 (poly8x8_t r, poly8x16_t tab, uint8x8_t idx)
 }
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vqtbx1q_s8 (int8x16_t r, int8x16_t tab, int8x16_t idx)
+vqtbx1q_s8 (int8x16_t r, int8x16_t tab, uint8x16_t idx)
 {
   int8x16_t result = r;
   __asm__ ("tbx %0.16b,{%1.16b},%2.16b"
@@ -16312,7 +16312,7 @@ vqtbx1q_p8 (poly8x16_t r, poly8x16_t tab, uint8x16_t idx)
 }
 
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
-vqtbx2_s8 (int8x8_t r, int8x16x2_t tab, int8x8_t idx)
+vqtbx2_s8 (int8x8_t r, int8x16x2_t tab, uint8x8_t idx)
 {
   int8x8_t result = r;
   __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
@@ -16349,7 +16349,7 @@ vqtbx2_p8 (poly8x8_t r, poly8x16x2_t tab, uint8x8_t idx)
 
 
 __extension__ static __inline int8x16_t __attribute__ ((__always_inline__))
-vqtbx2q_s8 (int8x16_t r, int8x16x2_t tab, int8x16_t idx)
+vqtbx2q_s8 (int8x16_t r, int8x16x2_t tab, uint8x16_t idx)
 {
   int8x16_t result = r;
   __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t"
@@ -16386,7 +16386,7 @@ vqtbx2q_p8 (poly8x16_t r, poly8x16x2_t tab, uint8x16_t idx)
 
 
 __extensio

Re: [Patch AArch64] Fix register constraints for lane intrinsics.

2013-09-06 Thread Marcus Shawcroft

On 6 September 2013 09:18, James Greenhalgh  wrote:
>
> Hi,
>
> Most of the vector-by-element instructions in AArch64 have the restriction
> that, if the vector they are taking an element from has type "h"
> then it must be in a register from the lower half of the vector register
> set (i.e. v0-v15). While we have imposed that restriction in places, we
> have not been consistent.
>
> Fix that.
>
> Tested with aarch64.exp with no regressions.
>
> OK for trunk?

OK
/Marcus

Re: [Patch, AArch64] Fix vdup_lane_* intrinsics' lane parameter.

2013-09-06 Thread Marcus Shawcroft

On 5 September 2013 17:21, Tejas Belagod  wrote:
>
> Hi,
>
> This patch fixes vdup_lane_* intrinsics in arm_neon.h to have the
> correct lane parameter as opposed to the present '0'.
>
> Tested on aarch64-none-elf. OK for trunk?
>
> Thanks,
> Tejas Belagod
> ARM.
>
> Changelog:
>
> 2013-09-05  Tejas Belagod  

OK
/Marcus

Re: [AArch64] Fix types of second parameter to qtbl/qtbx intrinsics

2013-09-06 Thread Marcus Shawcroft

On 6 September 2013 11:45, James Greenhalgh  wrote:

> The signed variants of the qtbl and qtbx intrinsics currently
> take an int8x<8,16> for their control vector parameter.
> This should be a uint8x<8,16> parameter.
>
> Fixed as attached and checked against aarch64.exp on aarch64-none-elf
> with no regressions.
>
> Is this OK to commit?


This is OK.

/Marcus

Re: [Patch ARM] Add "type" attribute to Everything!

2013-09-06 Thread Richard Earnshaw

On 05/09/13 17:44, James Greenhalgh wrote:
> 
> This patch changes the default type to be 'untyped' and then adds
> type information to everything in the ARM backend.
> 
> To do this, we introduce three new types:
> 
>   * "multiple" which is used where an insn will be split, or
> where it will generate a sequence of more than one instruction.
>   * "no_insn" which is used where an insn will not generate an
> instruction in the final output.
>   * "untyped" which is used where no type has been given, and
> is the default type.
> 
> Then we add these types everywhere appropriate in the arm backend.
> 
> Checked and bootstrapped on arm-none-eabi and arm-none-linux-gnueabihf
> with no issues.
> 
> Thanks,
> James
> 
> ---
> gcc/
> 
> 2013-09-05  James Greenhalgh  
> 
>   * config/arm/types.md: Add "no_insn", "multiple" and "untyped"
>   types.
>   * config/arm/arm-fixed.md: Add type attribute to all insn
>   patterns.
>   (add3): Add type attribute.
>   (add3): Likewise.
>   (usadd3): Likewise.
>   (ssadd3): Likewise.
>   (sub3): Likewise.
>   (sub3): Likewise.
>   (ussub3): Likewise.
>   (sssub3): Likewise.
>   (ssmulsa3): Likewise.
>   (usmulusa3): Likewise.
>   (arm_usatsihi): Likewise.
>   * config/arm/vfp.md
>   (*movdi_vfp): Add types for all instructions.
>   (*movdi_vfp_cortexa8): Likewise.
>   (*movhf_vfp_neon): Likewise.
>   (*movhf_vfp): Likewise.
>   (*movdf_vfp): Likewise.
>   (*thumb2_movdf_vfp): Likewise.
>   (*thumb2_movdfcc_vfp): Likewise.
>   * config/arm/arm.md: Add type attribute to all insn patterns.
>   (*thumb1_adddi3): Add type attribute.
>   (*arm_adddi3): Likewise.
>   (*adddi_sesidi_di): Likewise.
>   (*adddi_zesidi_di): Likewise.
>   (*thumb1_addsi3): Likewise.
>   (addsi3_compare0): Likewise.
>   (*addsi3_compare0_scratch): Likewise.
>   (*compare_negsi_si): Likewise.
>   (cmpsi2_addneg): Likewise.
>   (*addsi3_carryin_): Likewise.
>   (*addsi3_carryin_alt2_): Likewise.
>   (*addsi3_carryin_clobercc_): Likewise.
>   (*subsi3_carryin): Likewise.
>   (*subsi3_carryin_const): Likewise.
>   (*subsi3_carryin_compare): Likewise.
>   (*subsi3_carryin_compare_const): Likewise.
>   (*arm_subdi3): Likewise.
>   (*thumb_subdi3): Likewise.
>   (*subdi_di_zesidi): Likewise.
>   (*subdi_di_sesidi): Likewise.
>   (*subdi_zesidi_di): Likewise.
>   (*subdi_sesidi_di): Likewise.
>   (*subdi_zesidi_ze): Likewise.
>   (thumb1_subsi3_insn): Likewise.
>   (*arm_subsi3_insn): Likewise.
>   (*anddi3_insn): Likewise.
>   (*anddi_zesidi_di): Likewise.
>   (*anddi_sesdi_di): Likewise.
>   (*ne_zeroextracts): Likewise.
>   (*ne_zeroextracts): Likewise.
>   (*ite_ne_zeroextr): Likewise.
>   (*ite_ne_zeroextr): Likewise.
>   (*anddi_notdi_di): Likewise.
>   (*anddi_notzesidi): Likewise.
>   (*anddi_notsesidi): Likewise.
>   (andsi_notsi_si): Likewise.
>   (thumb1_bicsi3): Likewise.
>   (*iordi3_insn): Likewise.
>   (*iordi_zesidi_di): Likewise.
>   (*iordi_sesidi_di): Likewise.
>   (*thumb1_iorsi3_insn): Likewise.
>   (*xordi3_insn): Likewise.
>   (*xordi_zesidi_di): Likewise.
>   (*xordi_sesidi_di): Likewise.
>   (*arm_xorsi3): Likewise.
>   (*andsi_iorsi3_no): Likewise.
>   (*smax_0): Likewise.
>   (*smax_m1): Likewise.
>   (*arm_smax_insn): Likewise.
>   (*smin_0): Likewise.
>   (*arm_smin_insn): Likewise.
>   (*arm_umaxsi3): Likewise.
>   (*arm_uminsi3): Likewise.
>   (*minmax_arithsi): Likewise.
>   (*minmax_arithsi_): Likewise.
>   (*satsi_): Likewise.
>   (arm_ashldi3_1bit): Likewise.
>   (arm_ashrdi3_1bit): Likewise.
>   (arm_lshrdi3_1bit): Likewise.
>   (*arm_negdi2): Likewise.
>   (*thumb1_negdi2): Likewise.
>   (*arm_negsi2): Likewise.
>   (*thumb1_negsi2): Likewise.
>   (*negdi_extendsid): Likewise.
>   (*negdi_zero_extend): Likewise.
>   (*arm_abssi2): Likewise.
>   (*thumb1_abssi2): Likewise.
>   (*arm_neg_abssi2): Likewise.
>   (*thumb1_neg_abss): Likewise.
>   (one_cmpldi2): Likewise.
>   (extenddi2): Likewise.
>   (*compareqi_eq0): Likewise.
>   (*arm_extendhisi2addsi): Likewise.
>   (*arm_movdi): Likewise.
>   (*thumb1_movdi_insn): Likewise.
>   (*arm_movt): Likewise.
>   (*thumb1_movsi_insn): Likewise.
>   (pic_add_dot_plus_four): Likewise.
>   (pic_add_dot_plus_eight): Likewise.
>   (tls_load_dot_plus_eight): Likewise.
>   (*thumb1_movhi_insn): Likewise.
>   (*thumb1_movsf_insn): Likewise.
>   (*movdf_soft_insn): Likewise.
>   (*thumb_movdf_insn): Likewise.
>   (cbranchsi4_insn): Likewise.
>   (cbranchsi4_scratch): Likewise.
>   (*negated_cbranchsi4): Likewise.
>   (*tbit_cbranch): Likewise.
>   (*tlobits_cbranch): Likewise.
>   (*

Re: RFC - Next refactoring steps

2013-09-06 Thread Michael Matz

Hi,

On Thu, 5 Sep 2013, Andrew MacLeod wrote:

> 1 -  I think we ought to split out the data structures from gimple.h into
> gimple-core.h, like we did with tree.h

Why?

> gimple.h.  That won't  really affect my work.  I think it probably ought to be
> done for clarity eventually.gimple.h would then simply become a collector
> of "gimple-blah.h" files which are required for a complete gimple system.

Doesn't make sense to me.  If you split it into multiple files, just to 
then create a wrapper called gimple.h including all of them again for the 
consumers you can just as well have all of it in one file.

> 2 - all the uses of TREE_NULL to refer to an empty/nonexistent object...  it
> is no longer in tree-core.h, which I think is correct. what should we start
> using instead in converted files?  options are:
>   a)  0
>   b) NULL
>   c) something we define as 0, like GIMPLE_NULL

I think I'd prefer 0, but it creates a problem with varargs functions, so 
you'd always need "(gimple)0" for those, which you'd probably hide behind 
a macro, at which point you'd have two forms for the nil gimple ("0" and 
that macro), for consistency you'd want to use the macro in place of "0" 
also in the non-varargs places, et voila, you're back to "NULL" or 
"NULL_GIMPLE" :-/

> 3) remove tree.h from other .h files
>   Now that tree.h is split, there are a few .h files which directly include
> tree.h themselves. It would be nice if we could remove the implementation
> requirement of tree.h to ease moving to gimple.h. The 4 files are :
>  ipa-utils.h   lto-streamer.h  streamer-hooks.h  tree-streamer.h
> It should be possible to not directly include tree.h itself in these files.
> Worst case, the header file is included  after tree.h from the .C files.. that
> seems to be the way most of the other include files avoid including tree.h
> directly.

That can be done right now I think.

For the rest of the topic, tree vs gimple: I don't see much value in that.  
Why create a gimple_expr type that basically is just tree?  You can as 
well use tree then.


Ciao,
Michael.

[ARM,AARCH64] Insn type reclassification. Split f_cvt type.

2013-09-06 Thread James Greenhalgh


This patch splits the f_cvt attribute to:

 * f_cvt conversions between float representations.
 * f_cvti2f conversions from int to float.
 * f_cvtf2i conversions from float to int.

Then we update the pipeline descriptions to refelct this change.

Regression tested for aarch64-none-elf and arm-none-eabi and sanity
checked. Bootstrapped in series with other type splitting patches.

OK?

Thanks,
James

---
2013-09-06  James Greenhalgh  

* config/arm/types.md
(type): Split f_cvt as f_cvt, f_cvtf2i, f_cvti2f.
* config/aarch64/aarch64.md
(l2): Update with
new attributes.
(fix_trunc2): Likewise.
(fixuns_trunc2): Likewise.
(float2): Likewise.
* config/arm/vfp.md
(*truncsisf2_vfp): Update with new attributes.
(*truncsidf2_vfp): Likewise.
(fixuns_truncsfsi2): Likewise.
(fixuns_truncdfsi2): Likewise.
(*floatsisf2_vfp): Likewise.
(*floatsidf2_vfp): Likewise.
(floatunssisf2): Likewise.
(floatunssidf2): Likewise.
(*combine_vcvt_f32_): Likewise.
(*combine_vcvt_f64_): Likewise.
* config/arm/arm1020e.md: Update with new attributes.
* config/arm/cortex-a15-neon.md: Update with new attributes.
* config/arm/cortex-a5.md: Update with new attributes.
* config/arm/cortex-a53.md: Update with new attributes.
* config/arm/cortex-a7.md: Update with new attributes.
* config/arm/cortex-a8-neon.md: Update with new attributes.
* config/arm/cortex-a9.md: Update with new attributes.
* config/arm/cortex-m4-fpu.md: Update with new attributes.
* config/arm/cortex-r4f.md: Update with new attributes.
* config/arm/marvell-pj4.md: Update with new attributes.
* config/arm/vfp11.md: Update with new attributes.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4dfd2ab83d00601dc8192ad47fec2c1e404d1264..6a4a975bb89c48311659db0091c76266d29cdba2 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3685,7 +3685,7 @@ (define_insn "l<
   "TARGET_FLOAT"
   "fcvt\\t%0, %1"
   [(set_attr "v8type" "fcvtf2i")
-   (set_attr "type" "f_cvt")
+   (set_attr "type" "f_cvtf2i")
(set_attr "mode" "")
(set_attr "mode2" "")]
 )
@@ -3785,7 +3785,7 @@ (define_insn "fix_trunc0, %1"
   [(set_attr "v8type" "fcvtf2i")
-   (set_attr "type" "f_cvt")
+   (set_attr "type" "f_cvtf2i")
(set_attr "mode" "")
(set_attr "mode2" "")]
 )
@@ -3796,7 +3796,7 @@ (define_insn "fixuns_trunc0, %1"
   [(set_attr "v8type" "fcvtf2i")
-   (set_attr "type" "f_cvt")
+   (set_attr "type" "f_cvtf2i")
(set_attr "mode" "")
(set_attr "mode2" "")]
 )
@@ -3807,7 +3807,7 @@ (define_insn "float2
   "TARGET_FLOAT"
   "scvtf\\t%0, %1"
   [(set_attr "v8type" "fcvti2f")
-   (set_attr "type" "f_cvt")
+   (set_attr "type" "f_cvti2f")
(set_attr "mode" "")
(set_attr "mode2" "")]
 )
diff --git a/gcc/config/arm/arm1020e.md b/gcc/config/arm/arm1020e.md
index 615c6a5b16de647cbd8c0fa947f8b763a1353ee3..e16e862c1f49b36f75ba1faf20c2095fb9aeacdf 100644
--- a/gcc/config/arm/arm1020e.md
+++ b/gcc/config/arm/arm1020e.md
@@ -289,7 +289,7 @@ (define_insn_reservation "v10_farith" 5
 
 (define_insn_reservation "v10_cvt" 5
  (and (eq_attr "vfp10" "yes")
-  (eq_attr "type" "f_cvt"))
+  (eq_attr "type" "f_cvt,f_cvti2f,f_cvtf2i"))
  "1020a_e+v10_fmac")
 
 (define_insn_reservation "v10_fmul" 6
diff --git a/gcc/config/arm/cortex-a15-neon.md b/gcc/config/arm/cortex-a15-neon.md
index f1cac9e1af88bd5e3f0d87ff50c44376ad82d441..b5d14e7f7f9c3965e02e0d6e0edf0044df341812 100644
--- a/gcc/config/arm/cortex-a15-neon.md
+++ b/gcc/config/arm/cortex-a15-neon.md
@@ -471,7 +471,7 @@ (define_insn_reservation "cortex_a15_vfp
 
 (define_insn_reservation "cortex_a15_vfp_cvt" 6
   (and (eq_attr "tune" "cortexa15")
-   (eq_attr "type" "f_cvt"))
+   (eq_attr "type" "f_cvt,f_cvtf2i,f_cvti2f"))
   "ca15_issue1,ca15_cx_vfp")
 
 (define_insn_reservation "cortex_a15_vfp_cmpd" 8
diff --git a/gcc/config/arm/cortex-a5.md b/gcc/config/arm/cortex-a5.md
index 8930baf8daff5be2d2872324cd41fd5a1cd03778..54c8c420324a155523bc961917c475c5aeb86a96 100644
--- a/gcc/config/arm/cortex-a5.md
+++ b/gcc/config/arm/cortex-a5.md
@@ -168,7 +168,8 @@ (define_insn_reservation "cortex_a5_bran
 
 (define_insn_reservation "cortex_a5_fpalu" 4
   (and (eq_attr "tune" "cortexa5")
-   (eq_attr "type" "ffariths, fadds, ffarithd, faddd, fcpys, fmuls, f_cvt,\
+   (eq_attr "type" "ffariths, fadds, ffarithd, faddd, fcpys, fmuls,\
+f_cvt,f_cvtf2i,f_cvti2f,\
 			fcmps, fcmpd"))
   "cortex_a5_ex1+cortex_a5_fpadd_pipe")
 
diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
index 66d4cb436f5fb43b545f94ac57a8c5e909360353..e84b9ea1a71ef1df2476d3b25900522469074914 100644
--- a/gcc/config/arm/cortex-a53.md
+++ b/gcc/config/arm/cortex-a53.md
@@ -209,7 +209,8 @@ (define_insn_reservation "cortex_a53_bra
 
 (define_

[Patch ARM AARCH64] Split "type" attributes: fdiv

2013-09-06 Thread James Greenhalgh


Hi,

The type attributes "fdivs,fdivd" should be split as:

fdivs -> fsqrts, fdivs
fdivd -> fsqrtd, fdivd

Do this and update pipelines as needed.

Regression tested on aarch64-none-elf and arm-none-eabi and
bootstrapped in series with other type splitting patches.

OK?

Thanks,
James

---
2013-09-06  James Greenhalgh  

* config/arm/types.md: Split fdiv as fsqrt, fdiv.
* config/arm/arm.md (core_cycles): Remove fdiv.
* config/arm/vfp.md:
(*sqrtsf2_vfp): Update for attribute changes.
(*sqrtdf2_vfp): Likewise.
* config/aarch64/aarch64.md:
(sqrt2): Update for attribute changes.
* config/arm/arm1020e.md: Update with new attributes.
* config/arm/cortex-a15-neon.md: Update with new attributes.
* config/arm/cortex-a5.md: Update with new attributes.
* config/arm/cortex-a53.md: Update with new attributes.
* config/arm/cortex-a7.md: Update with new attributes.
* config/arm/cortex-a8-neon.md: Update with new attributes.
* config/arm/cortex-a9.md: Update with new attributes.
* config/arm/cortex-m4-fpu.md: Update with new attributes.
* config/arm/cortex-r4f.md: Update with new attributes.
* config/arm/marvell-pj4.md: Update with new attributes.
* config/arm/vfp11.md: Update with new attributes.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6a4a975bb89c48311659db0091c76266d29cdba2..ded37efb4c86130af8dd82db66d50cc227bfeff0 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3903,7 +3903,7 @@ (define_insn "sqrt2"
   "TARGET_FLOAT"
   "fsqrt\\t%0, %1"
   [(set_attr "v8type" "fsqrt")
-   (set_attr "type" "fdiv")
+   (set_attr "type" "fsqrt")
(set_attr "mode" "")]
 )
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 5ed8ee7dc6293bf93869545bef4cd3f60966908b..6c0fbf44288c9f6e077fe2d9836cd5c1e2042a0a 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -335,7 +335,6 @@ (define_attr "core_cycles" "single,multi
 alus_shift_imm, alus_shift_reg, bfm, csel, rev, logic_imm, logic_reg,\
 logic_shift_imm, logic_shift_reg, logics_imm, logics_reg,\
 logics_shift_imm, logics_shift_reg, extend, shift_imm, float, fcsel,\
-fdivd, fdivs,\
 wmmx_wor, wmmx_wxor, wmmx_wand, wmmx_wandn, wmmx_wmov, wmmx_tmcrr,\
 wmmx_tmrrc, wmmx_wldr, wmmx_wstr, wmmx_tmcr, wmmx_tmrc, wmmx_wadd,\
 wmmx_wsub, wmmx_wmul, wmmx_wmac, wmmx_wavg2, wmmx_tinsr, wmmx_textrm,\
diff --git a/gcc/config/arm/arm1020e.md b/gcc/config/arm/arm1020e.md
index e16e862c1f49b36f75ba1faf20c2095fb9aeacdf..8cf0890d9300527962b14f60e08c190155616425 100644
--- a/gcc/config/arm/arm1020e.md
+++ b/gcc/config/arm/arm1020e.md
@@ -299,12 +299,12 @@ (define_insn_reservation "v10_fmul" 6
 
 (define_insn_reservation "v10_fdivs" 18
  (and (eq_attr "vfp10" "yes")
-  (eq_attr "type" "fdivs"))
+  (eq_attr "type" "fdivs, fsqrts"))
  "1020a_e+v10_ds*14")
 
 (define_insn_reservation "v10_fdivd" 32
  (and (eq_attr "vfp10" "yes")
-  (eq_attr "type" "fdivd"))
+  (eq_attr "type" "fdivd, fsqrtd"))
  "1020a_e+v10_fmac+v10_ds*28")
 
 (define_insn_reservation "v10_floads" 4
diff --git a/gcc/config/arm/cortex-a15-neon.md b/gcc/config/arm/cortex-a15-neon.md
index b5d14e7f7f9c3965e02e0d6e0edf0044df341812..057507a762ab546e37b4a32a9771b4098a693d55 100644
--- a/gcc/config/arm/cortex-a15-neon.md
+++ b/gcc/config/arm/cortex-a15-neon.md
@@ -501,12 +501,12 @@ (define_insn_reservation "cortex_a15_vfp
 
 (define_insn_reservation "cortex_a15_vfp_divs" 10
   (and (eq_attr "tune" "cortexa15")
-   (eq_attr "type" "fdivs"))
+   (eq_attr "type" "fdivs, fsqrts"))
   "ca15_issue1,ca15_cx_ik")
 
 (define_insn_reservation "cortex_a15_vfp_divd" 18
   (and (eq_attr "tune" "cortexa15")
-   (eq_attr "type" "fdivd"))
+   (eq_attr "type" "fdivd, fsqrtd"))
   "ca15_issue1,ca15_cx_ik")
 
 ;; Define bypasses.
diff --git a/gcc/config/arm/cortex-a5.md b/gcc/config/arm/cortex-a5.md
index 54c8c420324a155523bc961917c475c5aeb86a96..03d3cc99106f4e8875b649b06dbd1341f18a5f55 100644
--- a/gcc/config/arm/cortex-a5.md
+++ b/gcc/config/arm/cortex-a5.md
@@ -233,14 +233,14 @@ (define_insn_reservation "cortex_a5_fpma
 
 (define_insn_reservation "cortex_a5_fdivs" 14
   (and (eq_attr "tune" "cortexa5")
-   (eq_attr "type" "fdivs"))
+   (eq_attr "type" "fdivs, fsqrts"))
   "cortex_a5_ex1, cortex_a5_fp_div_sqrt * 13")
 
 ;; ??? Similarly for fdivd.
 
 (define_insn_reservation "cortex_a5_fdivd" 29
   (and (eq_attr "tune" "cortexa5")
-   (eq_attr "type" "fdivd"))
+   (eq_attr "type" "fdivd, fsqrtd"))
   "cortex_a5_ex1, cortex_a5_fp_div_sqrt * 28")
 
 
diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
index e84b9ea1a71ef1df2476d3b25900522469074914..e52f4377c62f217172c2f0b88af724f59ed7cef0 100644
--- a/gcc/config/arm/cortex-a53.md
+++ b/gcc/config/arm/cortex-a53

[Patch AArch64] Fix types for some multiply instructions.

2013-09-06 Thread James Greenhalgh


Hi,

We don't really need to split the types on these
instructions. The ARM backend already has suitable descriptions
of things like mla and smlal. Use them.

Regression tested on aarch64-none-elf with no regressions.

OK?

Thanks,
James
---
2013-09-06  James Greenhalgh  

* config/aarch64/aarch64.md
(*madd): Fix type attribute.
(*maddsi_uxtw): Likewise.
(*msub): Likewise.
(*msubsi_uxtw): Likewise.
(maddsidi4): Likewise.
(msubsidi4): Likewise.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index ded37efb4c86130af8dd82db66d50cc227bfeff0..e28764da5dd608259098d3150783e6eacd09be27 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2281,7 +2281,7 @@ (define_insn "*madd"
   ""
   "madd\\t%0, %1, %2, %3"
   [(set_attr "v8type" "madd")
-   (set_attr "type" "mul")
+   (set_attr "type" "mla")
(set_attr "mode" "")]
 )
 
@@ -2295,7 +2295,7 @@ (define_insn "*maddsi_uxtw"
   ""
   "madd\\t%w0, %w1, %w2, %w3"
   [(set_attr "v8type" "madd")
-   (set_attr "type" "mul")
+   (set_attr "type" "mla")
(set_attr "mode" "SI")]
 )
 
@@ -2308,7 +2308,7 @@ (define_insn "*msub"
   ""
   "msub\\t%0, %1, %2, %3"
   [(set_attr "v8type" "madd")
-   (set_attr "type" "mul")
+   (set_attr "type" "mla")
(set_attr "mode" "")]
 )
 
@@ -2323,7 +2323,7 @@ (define_insn "*msubsi_uxtw"
   ""
   "msub\\t%w0, %w1, %w2, %w3"
   [(set_attr "v8type" "madd")
-   (set_attr "type" "mul")
+   (set_attr "type" "mla")
(set_attr "mode" "SI")]
 )
 
@@ -2373,7 +2373,7 @@ (define_insn "maddsidi4"
   ""
   "maddl\\t%0, %w1, %w2, %3"
   [(set_attr "v8type" "maddl")
-   (set_attr "type" "mul")
+   (set_attr "type" "mlal")
(set_attr "mode" "DI")]
 )
 
@@ -2387,7 +2387,7 @@ (define_insn "msubsidi4"
   ""
   "msubl\\t%0, %w1, %w2, %3"
   [(set_attr "v8type" "maddl")
-   (set_attr "type" "mul")
+   (set_attr "type" "mlal")
(set_attr "mode" "DI")]
 )

Re: [ARM,AARCH64] Insn type reclassification. Split f_cvt type.

2013-09-06 Thread Richard Earnshaw

On 06/09/13 14:24, James Greenhalgh wrote:
> 
> This patch splits the f_cvt attribute to:
> 
>  * f_cvt conversions between float representations.
>  * f_cvti2f conversions from int to float.
>  * f_cvtf2i conversions from float to int.
> 
> Then we update the pipeline descriptions to refelct this change.
> 
> Regression tested for aarch64-none-elf and arm-none-eabi and sanity
> checked. Bootstrapped in series with other type splitting patches.
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> 2013-09-06  James Greenhalgh  
> 
>   * config/arm/types.md
>   (type): Split f_cvt as f_cvt, f_cvtf2i, f_cvti2f.
>   * config/aarch64/aarch64.md
>   (l2): Update with
>   new attributes.
>   (fix_trunc2): Likewise.
>   (fixuns_trunc2): Likewise.
>   (float2): Likewise.
>   * config/arm/vfp.md
>   (*truncsisf2_vfp): Update with new attributes.
>   (*truncsidf2_vfp): Likewise.
>   (fixuns_truncsfsi2): Likewise.
>   (fixuns_truncdfsi2): Likewise.
>   (*floatsisf2_vfp): Likewise.
>   (*floatsidf2_vfp): Likewise.
>   (floatunssisf2): Likewise.
>   (floatunssidf2): Likewise.
>   (*combine_vcvt_f32_): Likewise.
>   (*combine_vcvt_f64_): Likewise.
>   * config/arm/arm1020e.md: Update with new attributes.
>   * config/arm/cortex-a15-neon.md: Update with new attributes.
>   * config/arm/cortex-a5.md: Update with new attributes.
>   * config/arm/cortex-a53.md: Update with new attributes.
>   * config/arm/cortex-a7.md: Update with new attributes.
>   * config/arm/cortex-a8-neon.md: Update with new attributes.
>   * config/arm/cortex-a9.md: Update with new attributes.
>   * config/arm/cortex-m4-fpu.md: Update with new attributes.
>   * config/arm/cortex-r4f.md: Update with new attributes.
>   * config/arm/marvell-pj4.md: Update with new attributes.
>   * config/arm/vfp11.md: Update with new attributes.
> 
> 

OK.

R.

Re: [Patch ARM AARCH64] Split "type" attributes: fdiv

2013-09-06 Thread Richard Earnshaw

On 06/09/13 14:27, James Greenhalgh wrote:
> 
> Hi,
> 
> The type attributes "fdivs,fdivd" should be split as:
> 
> fdivs -> fsqrts, fdivs
> fdivd -> fsqrtd, fdivd
> 
> Do this and update pipelines as needed.
> 
> Regression tested on aarch64-none-elf and arm-none-eabi and
> bootstrapped in series with other type splitting patches.
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> 2013-09-06  James Greenhalgh  
> 
>   * config/arm/types.md: Split fdiv as fsqrt, fdiv.
>   * config/arm/arm.md (core_cycles): Remove fdiv.
>   * config/arm/vfp.md:
>   (*sqrtsf2_vfp): Update for attribute changes.
>   (*sqrtdf2_vfp): Likewise.
>   * config/aarch64/aarch64.md:
>   (sqrt2): Update for attribute changes.
>   * config/arm/arm1020e.md: Update with new attributes.
>   * config/arm/cortex-a15-neon.md: Update with new attributes.
>   * config/arm/cortex-a5.md: Update with new attributes.
>   * config/arm/cortex-a53.md: Update with new attributes.
>   * config/arm/cortex-a7.md: Update with new attributes.
>   * config/arm/cortex-a8-neon.md: Update with new attributes.
>   * config/arm/cortex-a9.md: Update with new attributes.
>   * config/arm/cortex-m4-fpu.md: Update with new attributes.
>   * config/arm/cortex-r4f.md: Update with new attributes.
>   * config/arm/marvell-pj4.md: Update with new attributes.
>   * config/arm/vfp11.md: Update with new attributes.
> 
> 

OK.

R.

Re: [PATCH, libvtv] Fix most of the testsuite.

2013-09-06 Thread Diego Novillo

On Thu, Sep 5, 2013 at 4:52 PM, Caroline Tice  wrote:
> Ping?  Could somebody please review this for me?

Mike already approved this upthread.

Re: [Patch AArch64] Fix types for some multiply instructions.

2013-09-06 Thread Richard Earnshaw

On 06/09/13 14:29, James Greenhalgh wrote:
> 
> Hi,
> 
> We don't really need to split the types on these
> instructions. The ARM backend already has suitable descriptions
> of things like mla and smlal. Use them.
> 
> Regression tested on aarch64-none-elf with no regressions.
> 
> OK?
> 
> Thanks,
> James
> ---
> 2013-09-06  James Greenhalgh  
> 
>   * config/aarch64/aarch64.md
>   (*madd): Fix type attribute.
>   (*maddsi_uxtw): Likewise.
>   (*msub): Likewise.
>   (*msubsi_uxtw): Likewise.
>   (maddsidi4): Likewise.
>   (msubsidi4): Likewise.
> 

OK.

R.

Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended.

2013-09-06 Thread Kirill Yukhin

Hello,

PING.

--
Thanks, K

[AArch64, ARM] Rename the FCPYS type to FMOV

2013-09-06 Thread James Greenhalgh


Hi,

This patch updates the AArch64 backend such that floating point
moves are correctly categorized with type "FMOV".

Then in the ARM backend we rename "FCPYS" to "FMOV" everywhere
where it is appropriate to do so.

Regression tested on aarch64-none-elf and arm-none-eabi with no
regressions.

OK?

Thanks,
James
---
gcc/

2013-09-06  James Greenhalgh  

* config/arm/types.md (type): Rename fcpys to fmov.
* config/arm/vfp.md
(*arm_movsi_vfp): Rename type fcpys as fmov.
(*thumb2_movsi_vfp): Likewise
(*movhf_vfp_neon): Likewise
(*movhf_vfp): Likewise
(*movsf_vfp): Likewise
(*thumb2_movsf_vfp): Likewise
(*movsfcc_vfp): Likewise
(*thumb2_movsfcc_vfp): Likewise
* config/aarch64/aarch64-simd.md
(move_lo_quad_): Replace type mov_reg with fmovs.
* config/aarch64/aarch64.md
(*movsi_aarch64): Replace type mov_reg with fmovs.
(*movdi_aarch64): Likewise
(*movsf_aarch64): Likewise
(*movdf_aarch64): Likewise
* config/arm/arm.c
(cortexa7_older_only): Rename TYPE_FCPYS to TYPE_FMOV.
* config/arm/iwmmxt.md
(*iwmmxt_movsi_insn): Rename type fcpys as fmov.
* config/arm/arm1020e.md: Update with new attributes.
* config/arm/cortex-a15-neon.md: Update with new attributes.
* config/arm/cortex-a5.md: Update with new attributes.
* config/arm/cortex-a53.md: Update with new attributes.
* config/arm/cortex-a7.md: Update with new attributes.
* config/arm/cortex-a8-neon.md: Update with new attributes.
* config/arm/cortex-a9.md: Update with new attributes.
* config/arm/cortex-m4-fpu.md: Update with new attributes.
* config/arm/cortex-r4f.md: Update with new attributes.
* config/arm/marvell-pj4.md: Update with new attributes.
* config/arm/vfp11.md: Update with new attributes.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c085fb9c49958c5f402a28c0b39fe45ec1aadbc7..882fe4a19368d6ef6f9ef862dffce6d6307c5ac3 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1052,7 +1052,7 @@ (define_insn "move_lo_quad_"
fmov\\t%d0, %1
dup\\t%d0, %1"
   [(set_attr "v8type" "*,fmov,*")
-   (set_attr "type" "*,mov_reg,*")
+   (set_attr "type" "*,fmov,*")
(set_attr "simd_type" "simd_dup,*,simd_dup")
(set_attr "simd_mode" "")
(set_attr "simd" "yes,*,yes")
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index e28764da5dd608259098d3150783e6eacd09be27..db6aa1d3fa15e17095ba26a64e020d098e9fa6c0 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -831,7 +831,7 @@ (define_insn "*movsi_aarch64"
fmov\\t%s0, %s1"
   [(set_attr "v8type" "move,move,move,alu,load1,load1,store1,store1,adr,adr,fmov,fmov,fmov")
(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,load1,load1,store1,store1,\
- adr,adr,mov_reg,mov_reg,mov_reg")
+ adr,adr,fmov,fmov,fmov")
(set_attr "mode" "SI")
(set_attr "fp" "*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes")]
 )
@@ -858,7 +858,7 @@ (define_insn "*movdi_aarch64"
movi\\t%d0, %1"
   [(set_attr "v8type" "move,move,move,alu,load1,load1,store1,store1,adr,adr,fmov,fmov,fmov,fmov")
(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,load1,load1,store1,store1,\
- adr,adr,mov_reg,mov_reg,mov_reg,mov_reg")
+ adr,adr,fmov,fmov,fmov,fmov")
(set_attr "mode" "DI")
(set_attr "fp" "*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
(set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,yes")]
@@ -961,8 +961,8 @@ (define_insn "*movsf_aarch64"
   [(set_attr "v8type" "fmovi2f,fmovf2i,\
 		   fmov,fconst,fpsimd_load,\
 		   fpsimd_store,fpsimd_load,fpsimd_store,fmov")
-   (set_attr "type" "f_mcr,f_mrc,mov_reg,fconsts,\
- f_loads,f_stores,f_loads,f_stores,mov_reg")
+   (set_attr "type" "f_mcr,f_mrc,fmov,fconsts,\
+ f_loads,f_stores,f_loads,f_stores,fmov")
(set_attr "mode" "SF")]
 )
 
@@ -984,7 +984,7 @@ (define_insn "*movdf_aarch64"
   [(set_attr "v8type" "fmovi2f,fmovf2i,\
 		   fmov,fconst,fpsimd_load,\
 		   fpsimd_store,fpsimd_load,fpsimd_store,move")
-   (set_attr "type" "f_mcr,f_mrc,mov_reg,fconstd,\
+   (set_attr "type" "f_mcr,f_mrc,fmov,fconstd,\
  f_loadd,f_stored,f_loadd,f_stored,mov_reg")
(set_attr "mode" "DF")]
 )
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cac98cc..f9027dd 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8977,7 +8977,7 @@ cortexa7_older_only (rtx insn)
 case TYPE_FADDS:
 case TYPE_FFARITHD:
 case TYPE_FADDD:
-case TYPE_FCPYS:
+case TYPE_FMOV:
 case TYPE_F_CVT:
 case TYPE_FCMPS:
 case TYPE_FCMPD:
diff --git a/gcc/config/arm/arm1020e.md b/gcc/config/arm/arm1020e.md
index 8cf0890d9300527962b14f60e08c1901556

[AArch64] Use "multiple" for type, where more than one instruction is used for a move

2013-09-06 Thread James Greenhalgh


Hi,

We could introduce a whole new type for insns which generate two moves,
but we have already introduced a "multiple" classification for
types in the ARM backend, so use that in place of "mov_reg" where
appropriate.

Regression tested on aarch64-none-elf and arm-none-eabi with no
regressions.

OK?

Thanks,
James

---
gcc/

2013-09-06  James Greenhalgh  

* config/aarch64/aarch64.md
(*movti_aarch64): Use "multiple" for type where v8type is "move2".
(*movtf_aarch64): Likewise.
* config/arm/arm.md
(thumb1_movdi_insn): Use "multiple" for type where more than one
instruction is used for a move.
(*arm32_movhf): Likewise.
(*thumb_movdf_insn): Likewise.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index db6aa1d3fa15e17095ba26a64e020d098e9fa6c0..96705862de6b53322828fd60df15207af4b2ed61 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -906,7 +906,7 @@ (define_insn "*movti_aarch64"
str\\t%q1, %0"
   [(set_attr "v8type" "move2,fmovi2f,fmovf2i,*, \
 		   load2,store2,store2,fpsimd_load,fpsimd_store")
-   (set_attr "type" "mov_reg,f_mcr,f_mrc,*, \
+   (set_attr "type" "multiple,f_mcr,f_mrc,*, \
 		 load2,store2,store2,f_loadd,f_stored")
(set_attr "simd_type" "*,*,*,simd_move,*,*,*,*,*")
(set_attr "mode" "DI,DI,DI,TI,DI,DI,DI,TI,TI")
@@ -1024,7 +1024,7 @@ (define_insn "*movtf_aarch64"
ldp\\t%0, %H0, %1
stp\\t%1, %H1, %0"
   [(set_attr "v8type" "logic,move2,fmovi2f,fmovf2i,fconst,fconst,fpsimd_load,fpsimd_store,fpsimd_load2,fpsimd_store2")
-   (set_attr "type" "logic_reg,mov_reg,f_mcr,f_mrc,fconstd,fconstd,\
+   (set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,fconstd,fconstd,\
  f_loadd,f_stored,f_loadd,f_stored")
(set_attr "mode" "DF,DF,DF,DF,DF,DF,TF,TF,DF,DF")
(set_attr "length" "4,8,8,8,4,4,4,4,4,4")
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 6c0fbf44288c9f6e077fe2d9836cd5c1e2042a0a..fd0b1cbdccd23ad4d18be50417c7532b29840b91 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6141,7 +6141,7 @@ (define_insn "*thumb1_movdi_insn"
 }
   }"
   [(set_attr "length" "4,4,6,2,2,6,4,4")
-   (set_attr "type" "multiple,mov_reg,multiple,load2,store2,load2,store2,mov_reg")
+   (set_attr "type" "multiple,multiple,multiple,load2,store2,load2,store2,multiple")
(set_attr "pool_range" "*,*,*,*,*,1018,*,*")]
 )
 
@@ -7221,7 +7221,7 @@ (define_insn "*arm32_movhf"
 }
   "
   [(set_attr "conds" "unconditional")
-   (set_attr "type" "load1,store1,mov_reg,mov_reg")
+   (set_attr "type" "load1,store1,mov_reg,multiple")
(set_attr "length" "4,4,4,8")
(set_attr "predicable" "yes")]
 )
@@ -7466,7 +7466,7 @@ (define_insn "*thumb_movdf_insn"
 }
   "
   [(set_attr "length" "4,2,2,6,4,4")
-   (set_attr "type" "multiple,load2,store2,load2,store2,mov_reg")
+   (set_attr "type" "multiple,load2,store2,load2,store2,multiple")
(set_attr "pool_range" "*,*,*,1018,*,*")]
 )

Re: [PATCH, vtv update] Add documentation for --enable-vtable-verify configure option...

2013-09-06 Thread Diego Novillo


On 2013-08-28 17:15 , Caroline Tice wrote:


 # Least ordering for dependencies mean linking w/o libstdc++ for as
 # long as the development of libvtv does not absolutely require it.
Index: gcc/doc/install.texi
===
--- gcc/doc/install.texi(revision 202060)
+++ gcc/doc/install.texi(working copy)
@@ -1032,6 +1032,16 @@ and for cross builds configured with @op
 More documentation about multiarch can be found at
 @uref{http://wiki.debian.org/Multiarch}.

+@item --enable-vtable-verify
+Specify whether to enable or disable the vtable verification feature.
+Enabling this feature causes libstdc++ to be built with its virtual calls
+in verifiable mode.  This means that, when linked with libvtv, every
+virtual call in libstdc++ will verify the vtable pointer through 
which the
+call will be made before actually making the call.  If not linked 
with libvtv,
+the verifier will call stub functions (in libstdc++ itself) and do 
nothing.

+If vtable verification is disabled, then libstdc++ is not built with its
+virutal calls in verifiable mode at all.


s/virutal/virtual/

Could you clarify in the docs whether --enable-vtable-verify is the 
default behaviour or not?


Why would I need --disable-vtv, if I can use --disable-vtable-verify?


OK with those changes.


Diego.

[AArch64, ARM] Introduce "mrs" type attribute.

2013-09-06 Thread James Greenhalgh


Hi,

This patch adds an "mrs" type to be used to categorize instructions
which read or write from a special/system/co-processor register.

Then we add this type to all the pipeline descriptions. This probably
ends up as a miscategorization in most cases as we put "mrs" in the
same category as "multiple" in the pipelines. This will give the most
consistant behaviour with what came before.

Regression tested on aarch64-none-elf and arm-none-eabi with no
regressions.

OK?

Thanks,
James

---
gcc/

2013-09-06  James Greenhalgh  

* config/arm/types.md (type): Add "mrs" type.
* config/aarch64/aarch64.md
(aarch64_load_tp_hard): Make type "mrs".
* config/arm/arm.md
(load_tp_hard): Make type "mrs".
* config/arm/cortex-a15.md: Update with new attributes.
* config/arm/cortex-a5.md: Update with new attributes.
* config/arm/cortex-a53.md: Update with new attributes.
* config/arm/cortex-a7.md: Update with new attributes.
* config/arm/cortex-a8.md: Update with new attributes.
* config/arm/cortex-a9.md: Update with new attributes.
* config/arm/cortex-m4.md: Update with new attributes.
* config/arm/cortex-r4.md: Update with new attributes.
* config/arm/fa526.md: Update with new attributes.
* config/arm/fa606te.md: Update with new attributes.
* config/arm/fa626te.md: Update with new attributes.
* config/arm/fa726te.md: Update with new attributes.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 96705862de6b53322828fd60df15207af4b2ed61..5aa127bcb47912f1986007d4491b86e92c23 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4134,7 +4134,7 @@ (define_insn "aarch64_load_tp_hard"
   ""
   "mrs\\t%0, tpidr_el0"
   [(set_attr "v8type" "mrs")
-   (set_attr "type" "mov_reg")
+   (set_attr "type" "mrs")
(set_attr "mode" "DI")]
 )
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index fd0b1cbdccd23ad4d18be50417c7532b29840b91..8a482b570ec039aad888d7d7d902b48f7e453abc 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -12453,7 +12453,7 @@ (define_insn "load_tp_hard"
   "TARGET_HARD_TP"
   "mrc%?\\tp15, 0, %0, c13, c0, 3\\t@ load_tp_hard"
   [(set_attr "predicable" "yes")
-   (set_attr "type" "mov_reg")]
+   (set_attr "type" "mrs")]
 )
 
 ;; Doesn't clobber R1-R3.  Must use r0 for the first operand.
diff --git a/gcc/config/arm/cortex-a15.md b/gcc/config/arm/cortex-a15.md
index 6b1559260246a11e6d74f7f467dbeae761d934ea..ccad62076089b5e095f472fdbf298ba7226ae4ec 100644
--- a/gcc/config/arm/cortex-a15.md
+++ b/gcc/config/arm/cortex-a15.md
@@ -68,7 +68,7 @@ (define_insn_reservation "cortex_a15_alu
 shift_imm,shift_reg,\
 mov_imm,mov_reg,\
 mvn_imm,mvn_reg,\
-multiple,no_insn"))
+mrs,multiple,no_insn"))
   "ca15_issue1,(ca15_sx1,ca15_sx1_alu)|(ca15_sx2,ca15_sx2_alu)")
 
 ;; ALU ops with immediate shift
diff --git a/gcc/config/arm/cortex-a5.md b/gcc/config/arm/cortex-a5.md
index fa3e9d59c91028214ca7aa1be2c6668b4af5e6d3..22e0a08f38e7620cef745d28c5373e2daf957f7d 100644
--- a/gcc/config/arm/cortex-a5.md
+++ b/gcc/config/arm/cortex-a5.md
@@ -64,7 +64,7 @@ (define_insn_reservation "cortex_a5_alu"
 adr,bfm,rev,\
 shift_imm,shift_reg,\
 mov_imm,mov_reg,mvn_imm,mvn_reg,\
-multiple,no_insn"))
+mrs,multiple,no_insn"))
   "cortex_a5_ex1")
 
 (define_insn_reservation "cortex_a5_alu_shift" 2
diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
index 33b5ca30150fc57e7a3c4886c01b9e8092fc3ffa..48d0d03853f147d2d7cc15c1208304617b9c1ec4 100644
--- a/gcc/config/arm/cortex-a53.md
+++ b/gcc/config/arm/cortex-a53.md
@@ -73,7 +73,7 @@ (define_insn_reservation "cortex_a53_alu
 adr,bfm,csel,rev,\
 shift_imm,shift_reg,\
 mov_imm,mov_reg,mvn_imm,mvn_reg,\
-multiple,no_insn"))
+mrs,multiple,no_insn"))
   "cortex_a53_slot_any")
 
 (define_insn_reservation "cortex_a53_alu_shift" 2
diff --git a/gcc/config/arm/cortex-a7.md b/gcc/config/arm/cortex-a7.md
index ba9da8046ebd4a886c425238ab28df5ab9d85a8a..a72a88d90af1c5491115ee84af47ec6d4f593535 100644
--- a/gcc/config/arm/cortex-a7.md
+++ b/gcc/config/arm/cortex-a7.md
@@ -110,7 +110,7 @@ (define_insn_reservation "cortex_a7_alu_
 logic_shift_reg,logics_shift_reg,\
 mov_shift,mov_shift_reg,\
 mvn_shift,mvn_shift_reg,\
-multiple,no_insn"))
+mrs,multiple,no_insn"))
   "cortex_a7_ex1")
 
 ;; Forwarding path for unshifted operands.
diff --git a/gcc/config/arm/cortex-a8.md b/gcc/config/arm/cortex-a8.md
index ed0b3513

Re: [AArch64, ARM] Rename the FCPYS type to FMOV

2013-09-06 Thread Richard Earnshaw

On 06/09/13 14:44, James Greenhalgh wrote:
> 
> Hi,
> 
> This patch updates the AArch64 backend such that floating point
> moves are correctly categorized with type "FMOV".
> 
> Then in the ARM backend we rename "FCPYS" to "FMOV" everywhere
> where it is appropriate to do so.
> 
> Regression tested on aarch64-none-elf and arm-none-eabi with no
> regressions.
> 
> OK?
> 
> Thanks,
> James
> ---
> gcc/
> 
> 2013-09-06  James Greenhalgh  
> 
>   * config/arm/types.md (type): Rename fcpys to fmov.
>   * config/arm/vfp.md
>   (*arm_movsi_vfp): Rename type fcpys as fmov.
>   (*thumb2_movsi_vfp): Likewise
>   (*movhf_vfp_neon): Likewise
>   (*movhf_vfp): Likewise
>   (*movsf_vfp): Likewise
>   (*thumb2_movsf_vfp): Likewise
>   (*movsfcc_vfp): Likewise
>   (*thumb2_movsfcc_vfp): Likewise
>   * config/aarch64/aarch64-simd.md
>   (move_lo_quad_): Replace type mov_reg with fmovs.
>   * config/aarch64/aarch64.md
>   (*movsi_aarch64): Replace type mov_reg with fmovs.
>   (*movdi_aarch64): Likewise
>   (*movsf_aarch64): Likewise
>   (*movdf_aarch64): Likewise
>   * config/arm/arm.c
>   (cortexa7_older_only): Rename TYPE_FCPYS to TYPE_FMOV.
>   * config/arm/iwmmxt.md
>   (*iwmmxt_movsi_insn): Rename type fcpys as fmov.
>   * config/arm/arm1020e.md: Update with new attributes.
>   * config/arm/cortex-a15-neon.md: Update with new attributes.
>   * config/arm/cortex-a5.md: Update with new attributes.
>   * config/arm/cortex-a53.md: Update with new attributes.
>   * config/arm/cortex-a7.md: Update with new attributes.
>   * config/arm/cortex-a8-neon.md: Update with new attributes.
>   * config/arm/cortex-a9.md: Update with new attributes.
>   * config/arm/cortex-m4-fpu.md: Update with new attributes.
>   * config/arm/cortex-r4f.md: Update with new attributes.
>   * config/arm/marvell-pj4.md: Update with new attributes.
>   * config/arm/vfp11.md: Update with new attributes.
> 
> 

OK.

R.

Re: [PATCH i386 2/8] [AVX512] Add mask registers.

2013-09-06 Thread Kirill Yukhin

Hello,
On 29 Aug 15:59, Kirill Yukhin wrote:
>  /* Define parameter passing and return registers.  */
> @@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void)
>  
>/* If AVX512F is disabled, squash the registers.  */
>if (! TARGET_AVX512F)
> -for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
> -  fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
> +{
> +  for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
> + fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
> +
> +  for (i = FIRST_MASK_REG; i < LAST_MASK_REG; i++)
> + fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
> +}
>  }
This place should be updated as here: 
http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00181.html
I am not reposting as the change is obvious.

--
Thanks, K

Re: [AArch64] Use "multiple" for type, where more than one instruction is used for a move

2013-09-06 Thread Richard Earnshaw

On 06/09/13 14:46, James Greenhalgh wrote:
> 
> Hi,
> 
> We could introduce a whole new type for insns which generate two moves,
> but we have already introduced a "multiple" classification for
> types in the ARM backend, so use that in place of "mov_reg" where
> appropriate.
> 
> Regression tested on aarch64-none-elf and arm-none-eabi with no
> regressions.
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> gcc/
> 
> 2013-09-06  James Greenhalgh  
> 
>   * config/aarch64/aarch64.md
>   (*movti_aarch64): Use "multiple" for type where v8type is "move2".
>   (*movtf_aarch64): Likewise.
>   * config/arm/arm.md
>   (thumb1_movdi_insn): Use "multiple" for type where more than one
>   instruction is used for a move.
>   (*arm32_movhf): Likewise.
>   (*thumb_movdf_insn): Likewise.
> 
> 

OK.

R.

[AArch64] Use neon__2 where appropriate as "type".

2013-09-06 Thread James Greenhalgh


Hi,

The final (!!!) patch in the series making types equivalent between
AArch64 and ARM backends deals with insns in the AArch64 backend
which generate ldp and stp. We could invent a new type for these and
add that type to all the pipeline descriptions, but I think the types
neon_ldm_2 and neon_stm_2 describe them adequately.

Tested on aarch64-none-elf with no regressions.

OK?

Thanks,
James

---
gcc/

2013-09-06  James Greenhalgh  

* config/aarch64/aarch64.md
(*movtf_aarch64): Use neon_dm_2 as type where v8type
is fpsimd_2.
(load_pair): Likewise.
(store_pair): Likewise.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5aa127bcb47912f1986007d4491b86e92c23..f37f98f9994bb773785d8573a7efd1e625b5e23a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1025,7 +1025,7 @@ (define_insn "*movtf_aarch64"
stp\\t%1, %H1, %0"
   [(set_attr "v8type" "logic,move2,fmovi2f,fmovf2i,fconst,fconst,fpsimd_load,fpsimd_store,fpsimd_load2,fpsimd_store2")
(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,fconstd,fconstd,\
- f_loadd,f_stored,f_loadd,f_stored")
+ f_loadd,f_stored,neon_ldm_2,neon_stm_2")
(set_attr "mode" "DF,DF,DF,DF,DF,DF,TF,TF,DF,DF")
(set_attr "length" "4,8,8,8,4,4,4,4,4,4")
(set_attr "fp" "*,*,yes,yes,*,yes,yes,yes,*,*")
@@ -1090,7 +1090,7 @@ (define_insn "load_pair"
 			   GET_MODE_SIZE (mode)))"
   "ldp\\t%0, %2, %1"
   [(set_attr "v8type" "fpsimd_load2")
-   (set_attr "type" "f_load")
+   (set_attr "type" "neon_ldm_2")
(set_attr "mode" "")]
 )
 
@@ -1106,8 +1106,8 @@ (define_insn "store_pair"
 			   XEXP (operands[0], 0),
 			   GET_MODE_SIZE (mode)))"
   "stp\\t%1, %3, %0"
-  [(set_attr "v8type" "fpsimd_load2")
-   (set_attr "type" "f_load")
+  [(set_attr "v8type" "fpsimd_store2")
+   (set_attr "type" "neon_stm_2")
(set_attr "mode" "")]
 )

Re: [AArch64, ARM] Introduce "mrs" type attribute.

2013-09-06 Thread Richard Earnshaw

On 06/09/13 14:48, James Greenhalgh wrote:
> 
> Hi,
> 
> This patch adds an "mrs" type to be used to categorize instructions
> which read or write from a special/system/co-processor register.
> 
> Then we add this type to all the pipeline descriptions. This probably
> ends up as a miscategorization in most cases as we put "mrs" in the
> same category as "multiple" in the pipelines. This will give the most
> consistant behaviour with what came before.
> 
> Regression tested on aarch64-none-elf and arm-none-eabi with no
> regressions.
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> gcc/
> 
> 2013-09-06  James Greenhalgh  
> 
>   * config/arm/types.md (type): Add "mrs" type.
>   * config/aarch64/aarch64.md
>   (aarch64_load_tp_hard): Make type "mrs".
>   * config/arm/arm.md
>   (load_tp_hard): Make type "mrs".
>   * config/arm/cortex-a15.md: Update with new attributes.
>   * config/arm/cortex-a5.md: Update with new attributes.
>   * config/arm/cortex-a53.md: Update with new attributes.
>   * config/arm/cortex-a7.md: Update with new attributes.
>   * config/arm/cortex-a8.md: Update with new attributes.
>   * config/arm/cortex-a9.md: Update with new attributes.
>   * config/arm/cortex-m4.md: Update with new attributes.
>   * config/arm/cortex-r4.md: Update with new attributes.
>   * config/arm/fa526.md: Update with new attributes.
>   * config/arm/fa606te.md: Update with new attributes.
>   * config/arm/fa626te.md: Update with new attributes.
>   * config/arm/fa726te.md: Update with new attributes.
> 


OK.

R.

Re: [AArch64] Use neon__2 where appropriate as "type".

2013-09-06 Thread Richard Earnshaw

On 06/09/13 14:50, James Greenhalgh wrote:
> 
> Hi,
> 
> The final (!!!) patch in the series making types equivalent between
> AArch64 and ARM backends deals with insns in the AArch64 backend
> which generate ldp and stp. We could invent a new type for these and
> add that type to all the pipeline descriptions, but I think the types
> neon_ldm_2 and neon_stm_2 describe them adequately.
> 

It's close enough.  At least for now.

> Tested on aarch64-none-elf with no regressions.
> 
> OK?

OK.

> 
> Thanks,
> James
> 
> ---
> gcc/
> 
> 2013-09-06  James Greenhalgh  
> 
>   * config/aarch64/aarch64.md
>   (*movtf_aarch64): Use neon_dm_2 as type where v8type
>   is fpsimd_2.
>   (load_pair): Likewise.
>   (store_pair): Likewise.
> 
> 

R.

Re: RFC - Next refactoring steps

2013-09-06 Thread Andrew MacLeod


On 09/06/2013 09:09 AM, Michael Matz wrote:

Hi,

On Thu, 5 Sep 2013, Andrew MacLeod wrote:


1 -  I think we ought to split out the data structures from gimple.h into
gimple-core.h, like we did with tree.h

Why?
For the seam reason we split tree.h.  The new gimple mechanism needs to 
coexist with trees until the conversion is complete.  There will be a 
period of time when some files have been converted over to the new 
gimple wrappers, and they will export functions which unconverted files 
still need to call.  The functional prototypes will be using things like 
GimpleType, so those files will need to see the basic class description 
to compile and call the routine. (just like how the gimple function need 
to see the baic layout of a tree to interact with trees)  They don't 
need to see all the implementation for the methods in gimple-type.h, but 
they need to see the basic class description which also contains the few 
routines required to cast back and forth to tree.  This allows both to 
coexist while we transition.


Doing it now before we branch into stage 2 makes life much easier in the 
branch I will be working on during stage2, as well as test the 
functionality of it with the rest of the refactoring we are doing before 
the branch.




gimple.h.  That won't  really affect my work.  I think it probably ought to be
done for clarity eventually.gimple.h would then simply become a collector
of "gimple-blah.h" files which are required for a complete gimple system.

Doesn't make sense to me.  If you split it into multiple files, just to
then create a wrapper called gimple.h including all of them again for the
consumers you can just as well have all of it in one file.
Well, gimple.h right now is 5400 lines, and that just implements gimple 
statements.  when we add the methods for types and decls and expression 
etc etc etc, its going to be a very very very large file.   The split is 
to make it managable.  Im fine with leaving gimple.h to be the statement 
implementation, and then the top of gimple.h can just include all the 
other ones... it just means gimple-stmt is gimple...   Its just a 
consistency naming thing, and i don't feel strongly about it.

2 - all the uses of TREE_NULL to refer to an empty/nonexistent object...  it
is no longer in tree-core.h, which I think is correct. what should we start
using instead in converted files?  options are:
   a)  0
   b) NULL
   c) something we define as 0, like GIMPLE_NULL

I think I'd prefer 0, but it creates a problem with varargs functions, so
you'd always need "(gimple)0" for those, which you'd probably hide behind
a macro, at which point you'd have two forms for the nil gimple ("0" and
that macro), for consistency you'd want to use the macro in place of "0"
also in the non-varargs places, et voila, you're back to "NULL" or
"NULL_GIMPLE" :-/
I prefer 0 too I think, but I had forgotten about vararg issues... so 
are we thinking maybe just plain old NULL?  I guess thats pretty standard..



3) remove tree.h from other .h files
   Now that tree.h is split, there are a few .h files which directly include
tree.h themselves. It would be nice if we could remove the implementation
requirement of tree.h to ease moving to gimple.h. The 4 files are :
  ipa-utils.h   lto-streamer.h  streamer-hooks.h  tree-streamer.h
It should be possible to not directly include tree.h itself in these files.
Worst case, the header file is included  after tree.h from the .C files.. that
seems to be the way most of the other include files avoid including tree.h
directly.

That can be done right now I think.

For the rest of the topic, tree vs gimple: I don't see much value in that.
Why create a gimple_expr type that basically is just tree?  You can as
well use tree then.

Because once its all done... we can do many other things because the 
uses are detached from the implementation.  In a gimple expression we 
can see concisely what elements of tree are actually used, and how. We 
can then remove tree from the implementation of gimple_expr if it makes 
sense. We can change the underlying object from a tree to something 
which more concisely represents what we need in the middle/back end.  We 
can easily change the implementation of gimpe-expr to use an index into 
a table of expressions which makes streaming much simpler.. I am also 
doing this for types and decls and such. We get typesafe checking at 
compile time, And the ultimate goal is to remove dependencies from the 
front end and the gimple... gimple will be detachable from trees.  All 
the fun stuff I talked about in the original document motivating this.

Re: [PATCH, vtv update] Add documentation for --enable-vtable-verify configure option...

2013-09-06 Thread Caroline Tice

On Fri, Sep 6, 2013 at 6:47 AM, Diego Novillo  wrote:
> On 2013-08-28 17:15 , Caroline Tice wrote:
>
>>  # Least ordering for dependencies mean linking w/o libstdc++ for as
>>  # long as the development of libvtv does not absolutely require it.
>> Index: gcc/doc/install.texi
>> ===
>> --- gcc/doc/install.texi(revision 202060)
>> +++ gcc/doc/install.texi(working copy)
>> @@ -1032,6 +1032,16 @@ and for cross builds configured with @op
>>  More documentation about multiarch can be found at
>>  @uref{http://wiki.debian.org/Multiarch}.
>>
>> +@item --enable-vtable-verify
>> +Specify whether to enable or disable the vtable verification feature.
>> +Enabling this feature causes libstdc++ to be built with its virtual calls
>> +in verifiable mode.  This means that, when linked with libvtv, every
>> +virtual call in libstdc++ will verify the vtable pointer through which
>> the
>> +call will be made before actually making the call.  If not linked with
>> libvtv,
>> +the verifier will call stub functions (in libstdc++ itself) and do
>> nothing.
>> +If vtable verification is disabled, then libstdc++ is not built with its
>> +virutal calls in verifiable mode at all.
>
>
> s/virutal/virtual/
>
> Could you clarify in the docs whether --enable-vtable-verify is the default
> behaviour or not?
>

It will not be the default, but yes I will clarify.

> Why would I need --disable-vtv, if I can use --disable-vtable-verify?
>

--disable-vtable-verify prevents  libstdc++ from being build with
verification turned on, but the libvtv library will still be built.
--disable-libvtv prevents the libvtv library from being built.
Obviously I need to add a comment like this to the documentation, so I
will.

>
> OK with those changes.
>
>
> Diego.

Fix glitch in -fstack-usage output

2013-09-06 Thread Eric Botcazou

The output of -fstack-usage doesn't use fully qualified names of functions 
(because they can be quite long in Ada) so the code attempts to strip the 
scope prefix.  But this isn't robust enough in presence of suffixes created by 
the compiler, for example by the cloning machinery.

Tested on x86_64-suse-linux, applied on the mainline as obvious.


2013-09-06  Eric Botcazou  

* toplev.c (output_stack_usage): Be prepared for suffixes created by
the compiler in the function names.


2013-09-06  Eric Botcazou  

* gnat.dg/stack_usage2.adb: New test.


-- 
Eric BotcazouIndex: toplev.c
===
--- toplev.c	(revision 202160)
+++ toplev.c	(working copy)
@@ -1017,22 +1017,35 @@ output_stack_usage (void)
 {
   expanded_location loc
 	= expand_location (DECL_SOURCE_LOCATION (current_function_decl));
-  const char *raw_id, *id;
-
-  /* Strip the scope prefix if any.  */
-  raw_id = lang_hooks.decl_printable_name (current_function_decl, 2);
-  id = strrchr (raw_id, '.');
-  if (id)
-	id++;
+  /* We don't want to print the full qualified name because it can be long,
+	 so we strip the scope prefix, but we may need to deal with the suffix
+	 created by the compiler.  */
+  const char *suffix
+	= strchr (IDENTIFIER_POINTER (DECL_NAME (current_function_decl)), '.');
+  const char *name
+	= lang_hooks.decl_printable_name (current_function_decl, 2);
+  if (suffix)
+	{
+	  const char *dot = strchr (name, '.');
+	  while (dot && strcasecmp (dot, suffix) != 0)
+	{
+	  name = dot + 1;
+	  dot = strchr (name, '.');
+	}
+	}
   else
-	id = raw_id;
+	{
+	  const char *dot = strrchr (name, '.');
+	  if (dot)
+	name = dot + 1;
+	}
 
   fprintf (stack_usage_file,
 	   "%s:%d:%d:%s\t"HOST_WIDE_INT_PRINT_DEC"\t%s\n",
 	   lbasename (loc.file),
 	   loc.line,
 	   loc.column,
-	   id,
+	   name,
 	   stack_usage,
 	   stack_usage_kind_str[stack_usage_kind]);
 }-- { dg-do compile }
-- { dg-options "-O2 -fstack-usage" }

with System;

procedure Stack_Usage2 is

   Sink : System.Address;
   pragma Import (Ada, Sink);

   procedure Transmit_Data (Branch : Integer) is
  pragma No_Inline (Transmit_Data);
  X : Integer;
   begin
  case Branch is
 when 1 => Sink := X'Address;
 when others => null;
  end case;
   end;

begin
   Transmit_Data (Branch => 1);
end;

-- { dg-final { scan-stack-usage-not ":Constprop" } }
-- { dg-final { cleanup-stack-usage } }

Re: RFC - Next refactoring steps

2013-09-06 Thread Andrew MacLeod


On 09/06/2013 04:19 AM, Richard Biener wrote:

On Fri, Sep 6, 2013 at 10:14 AM, Mike Stump  wrote:

On Sep 5, 2013, at 11:54 PM, Richard Biener  wrote:

Most of the GCC headerfiles do not include all their required headers but
rely on .c files doing that (in the appropriate order).  I somehow like
that though I cannot explain why ;)

Very old school.  I can explain why I don't like it, but I'll resist.  The 
universal standard is for each header to include all it needs and for the 
ordering of includes not to matter any.  Deviations from this are the exception 
and should virtually never should be the case.


Also grepping for #includes I see that this doesn't seem to be true anymore.

Yeah, everyone else hates the old school style with a passion, it was never a 
good idea.  :-)  When they come across it, people have a tendency to fix the 
headers as broken and over time, all the dependencies eventually get added.

Well, I still think you cannot randomly change include file order in
GCC.  At least some

#ifdef ...

hackery in some headers will suddenly break (that is, change outcome)
if you include for example
tm.h before or after it.

these would be really good to identify and fix, if possible. (surely 
they can be fixed.. :-)if they cant be fixed for whatever reason, we 
ought to protect them with some mechanism.. ie assert that tm.h either 
has or has not been included before hand... whichever way is required.   
At least then we get an error if it changes.


Do you know of any of the top of your head?

Andrew

Re: RFC - Next refactoring steps

2013-09-06 Thread Andrew MacLeod


On 09/05/2013 08:26 PM, Joseph S. Myers wrote:

On Thu, 5 Sep 2013, Andrew MacLeod wrote:


Or are you suggesting that coretypes.h is a file we can assume is available?

Every .c file should start with includes of config.h, system.h and
coretypes.h, in that order, so all other headers can assume they have been
included.



OK, I can work with those assumptions :-).

Andrew

[AArch64] Fix parameters to vcvtx_high

2013-09-06 Thread James Greenhalgh


Hi,

vcvtx_high_f32_f64 should have two parameters, a float32x2 which
provides the lower half of the target vector, and a float64x2
which will be converted to the higher half of the target vector.

Fix thusly.

Tested with aarch64.exp on aarch64-none-elf.

OK?

Thanks,
James

---
gcc/

2013-09-06  James Greenhalgh  

* config/aarch64/arm_neon.h
(vcvtx_high_f32_f64): Fix parameters.
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 5864f2c..47b45f4 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -5756,12 +5756,12 @@ vcvtx_f32_f64 (float64x2_t a)
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
-vcvtx_high_f32_f64 (float64x2_t a)
+vcvtx_high_f32_f64 (float32x2_t a, float64x2_t b)
 {
   float32x4_t result;
   __asm__ ("fcvtxn2 %0.4s,%1.2d"
: "=w"(result)
-   : "w"(a)
+   : "w" (b), "0"(a)
: /* No clobbers */);
   return result;
 }

Fix ipa-devirt-11.C on AIX part 3

2013-09-06 Thread Jan Hubicka

Hi,
this patch makes inlining of functions called once to work even if they are 
called
by alias.

Bootstrapped/regtested x86_64-linux, comitted.

PR middle-end/58094
* ipa-inline.c (has_caller_p): New function.
(want_inline_function_to_all_callers_p): Use it.
(sum_callers, inline_to_all_callers): Break out from ...
(ipa_inline): ... here.
Index: ipa-inline.c
===
--- ipa-inline.c(revision 202334)
+++ ipa-inline.c(working copy)
@@ -750,6 +750,15 @@ check_caller_edge (struct cgraph_node *n
   && node->callers != edge);
 }
 
+/* If NODE has a caller, return true.  */
+
+static bool
+has_caller_p (struct cgraph_node *node, void *data ATTRIBUTE_UNUSED)
+{
+  if (node->callers)
+return true;
+  return false;
+}
 
 /* Decide if inlining NODE would reduce unit size by eliminating
the offline copy of function.  
@@ -763,7 +772,7 @@ want_inline_function_to_all_callers_p (s
bool has_hot_call = false;
 
/* Does it have callers?  */
-   if (!node->callers)
+   if (!cgraph_for_node_and_aliases (node, has_caller_p, NULL, true))
  return false;
/* Already inlined?  */
if (function->global.inlined_to)
@@ -1892,6 +1901,60 @@ flatten_function (struct cgraph_node *no
 inline_update_overall_summary (node);
 }
 
+/* Count number of callers of NODE and store it into DATA (that
+   points to int.  Worker for cgraph_for_node_and_aliases.  */
+
+static bool
+sum_callers (struct cgraph_node *node, void *data)
+{
+  struct cgraph_edge *e;
+  int *num_calls = (int *)data;
+
+  for (e = node->callers; e; e = e->next_caller)
+(*num_calls)++;
+  return false;
+}
+
+/* Inline NODE to all callers.  Worker for cgraph_for_node_and_aliases.
+   DATA points to number of calls originally found so we avoid infinite
+   recursion.  */
+
+static bool
+inline_to_all_callers (struct cgraph_node *node, void *data)
+{
+  int *num_calls = (int *)data;
+  while (node->callers && !node->global.inlined_to)
+{
+  struct cgraph_node *caller = node->callers->caller;
+
+  if (dump_file)
+   {
+ fprintf (dump_file,
+  "\nInlining %s size %i.\n",
+  cgraph_node_name (node),
+  inline_summary (node)->size);
+ fprintf (dump_file,
+  " Called once from %s %i insns.\n",
+  cgraph_node_name (node->callers->caller),
+  inline_summary (node->callers->caller)->size);
+   }
+
+  inline_call (node->callers, true, NULL, NULL, true);
+  if (dump_file)
+   fprintf (dump_file,
+" Inlined into %s which now has %i size\n",
+cgraph_node_name (caller),
+inline_summary (caller)->size);
+  if (!(*num_calls)--)
+   {
+ if (dump_file)
+   fprintf (dump_file, "New calls found; giving up.\n");
+ break;
+   }
+}
+  return false;
+}
+
 /* Decide on the inlining.  We do so in the topological order to avoid
expenses on updating data structures.  */
 
@@ -2003,39 +2066,11 @@ ipa_inline (void)
  && want_inline_function_to_all_callers_p (node, cold))
{
  int num_calls = 0;
- struct cgraph_edge *e;
- for (e = node->callers; e; e = e->next_caller)
-   num_calls++;
- while (node->callers && !node->global.inlined_to)
-   {
- struct cgraph_node *caller = node->callers->caller;
-
- if (dump_file)
-   {
- fprintf (dump_file,
-  "\nInlining %s size %i.\n",
-  cgraph_node_name (node),
-  inline_summary (node)->size);
- fprintf (dump_file,
-  " Called once from %s %i insns.\n",
-  cgraph_node_name (node->callers->caller),
-  inline_summary (node->callers->caller)->size);
-   }
-
- inline_call (node->callers, true, NULL, NULL, true);
- remove_functions = true;
- if (dump_file)
-   fprintf (dump_file,
-" Inlined into %s which now has %i size\n",
-cgraph_node_name (caller),
-inline_summary (caller)->size);
- if (!num_calls--)
-   {
- if (dump_file)
-   fprintf (dump_file, "New calls found; giving up.\n");
- break;
-   }
-   }
+ cgraph_for_node_and_aliases (node, sum_callers,
+  &num_calls, true);
+ cgraph_for_node_and_aliases (node, inline_to_all_callers,
+  &num_calls, true);
+

Re: [c++-concepts] Class template constraints

2013-09-06 Thread Andrew Sutton

Updated as per comments.

I moved the resolve_template_scope function out to
finish_template_type. I couldn't figure out how to get the parsed
template parameter from the looked-up template in
lookup_class_template. That information may not be available outside
the parse state.

Andrew
Andrew Sutton

On Wed, Sep 4, 2013 at 3:49 PM, Jason Merrill  wrote:
> On 09/04/2013 01:33 PM, Andrew Sutton wrote:
>>
>> Ah. The goal is to check after we've deduced/coerced template
>> arguments into a valid substitution. With functions, that's in
>> fn_type_unification (hopefully called from instantiate_template)
>
>
> Actually fn_type_unification calls instantiate_template, but yep, we're on
> the same page.
>
> Jason
>

templates.patch
Description: Binary data

[gomp4] Further accel fixes

2013-09-06 Thread Jakub Jelinek

Hi!

This fixes mainly VLA handling in target{, data, update} constructs,
but also deals with field alignments in the structure and field order.
Committed to gomp-4_0-branch.

2013-09-06  Jakub Jelinek  

* omp-low.c (scan_sharing_clauses): Handle VLAs in
OMP_CLAUSE_{MAP,TO,FROM}.  Set DECL_ALIGN (field) before
calling insert_field_into_struct.
(scan_omp_target): Reverse TYPE_FIELDS, verify that
all field alignments are the same.
(lower_omp_target): Use maybe_lookup_field instead of
lookup_sfield to check if field is present.  Handle VLAs.
* tree-pretty-print.c (dump_omp_clause): Only check
OMP_CLAUSE_MAP_KIND on OMP_CLAUSE_MAP clauses.
* gimplify.c (enum gimplify_omp_var_data): Add GOVD_MAP_TO_ONLY.
(omp_firstprivatize_variable, omp_add_variable,
gimplify_adjust_omp_clauses_1, gimplify_adjust_omp_clauses): Handle
VLAs in OMP_CLAUSE_{MAP,TO,FROM}.
libgomp/
* testsuite/libgomp.c/target-2.c: New test.
* testsuite/libgomp.c++/target-3.C: New test.

--- gcc/omp-low.c.jj2013-09-05 17:11:14.0 +0200
+++ gcc/omp-low.c   2013-09-06 16:15:16.367638718 +0200
@@ -1574,10 +1574,24 @@ scan_sharing_clauses (tree clauses, omp_
}
  if (DECL_P (decl))
{
- install_var_field (decl, true, 3, ctx);
- if (gimple_omp_target_kind (ctx->stmt)
- == GF_OMP_TARGET_KIND_REGION)
-   install_var_local (decl, ctx);
+ if (DECL_SIZE (decl)
+ && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
+   {
+ tree decl2 = DECL_VALUE_EXPR (decl);
+ gcc_assert (TREE_CODE (decl2) == INDIRECT_REF);
+ decl2 = TREE_OPERAND (decl2, 0);
+ gcc_assert (DECL_P (decl2));
+ install_var_field (decl2, true, 3, ctx);
+ install_var_local (decl2, ctx);
+ install_var_local (decl, ctx);
+   }
+ else
+   {
+ install_var_field (decl, true, 3, ctx);
+ if (gimple_omp_target_kind (ctx->stmt)
+ == GF_OMP_TARGET_KIND_REGION)
+   install_var_local (decl, ctx);
+   }
}
  else
{
@@ -1600,6 +1614,7 @@ scan_sharing_clauses (tree clauses, omp_
  tree field
= build_decl (OMP_CLAUSE_LOCATION (c),
  FIELD_DECL, NULL_TREE, ptr_type_node);
+ DECL_ALIGN (field) = TYPE_ALIGN (ptr_type_node);
  insert_field_into_struct (ctx->record_type, field);
  splay_tree_insert (ctx->field_map, (splay_tree_key) decl,
 (splay_tree_value) field);
@@ -1684,6 +1699,16 @@ scan_sharing_clauses (tree clauses, omp_
  TREE_TYPE (new_decl)
= remap_type (TREE_TYPE (decl), &ctx->cb);
}
+ else if (DECL_SIZE (decl)
+  && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
+   {
+ tree decl2 = DECL_VALUE_EXPR (decl);
+ gcc_assert (TREE_CODE (decl2) == INDIRECT_REF);
+ decl2 = TREE_OPERAND (decl2, 0);
+ gcc_assert (DECL_P (decl2));
+ fixup_remapped_decl (decl2, ctx, false);
+ fixup_remapped_decl (decl, ctx, true);
+   }
  else
fixup_remapped_decl (decl, ctx, false);
}
@@ -2126,6 +2151,16 @@ scan_omp_target (gimple stmt, omp_contex
 ctx->record_type = ctx->receiver_decl = NULL;
   else
 {
+  TYPE_FIELDS (ctx->record_type)
+   = nreverse (TYPE_FIELDS (ctx->record_type));
+#ifdef ENABLE_CHECKING
+  tree field;
+  unsigned int align = DECL_ALIGN (TYPE_FIELDS (ctx->record_type));
+  for (field = TYPE_FIELDS (ctx->record_type);
+  field;
+  field = DECL_CHAIN (field))
+   gcc_assert (DECL_ALIGN (field) == align);
+#endif
   layout_type (ctx->record_type);
   if (kind == GF_OMP_TARGET_KIND_REGION)
fixup_child_record_type (ctx);
@@ -9201,7 +9236,18 @@ lower_omp_target (gimple_stmt_iterator *
  map_cnt++;
continue;
  }
-   if (!lookup_sfield (var, ctx))
+
+   if (DECL_SIZE (var)
+   && TREE_CODE (DECL_SIZE (var)) != INTEGER_CST)
+ {
+   tree var2 = DECL_VALUE_EXPR (var);
+   gcc_assert (TREE_CODE (var2) == INDIRECT_REF);
+   var2 = TREE_OPERAND (var2, 0);
+   gcc_assert (DECL_P (var2));
+   var = var2;
+ }
+
+   if (!maybe_lookup_field (var, ctx))
  continue;
 
if (kind == GF_OMP_TARGET_KIND_REGION)
@@ -9293,8 +9339,20 @@ lower_omp_target (gimple_stmt_iterator *
nc = NULL_TREE;
  }
  }
-   else if (!

Re: [PATCH, x86] Use vector moves in memmove expanding

2013-09-06 Thread H.J. Lu

On Tue, Sep 3, 2013 at 12:01 PM, Eric Botcazou  wrote:
>> Changes were checked into trunk:
>> http://gcc.gnu.org/ml/gcc-cvs/2013-07/msg00179.html
>
> The patch miscompiles the MPFR library on x86 Pentium Pro.  Reduced testcase
> attached, compile for x86 with -mtune=pentiumpro.
>

The change in question has

@@ -23169,7 +23150,7 @@ ix86_expand_movmem (rtx dst, rtx src, rtx
count_exp, rtx align_exp,

   if (count_exp != const0_rtx && epilogue_size_needed > 1)
 expand_movmem_epilogue (dst, src, destreg, srcreg, count_exp,
-   epilogue_size_needed);
+   size_needed);
   if (jump_around_label)
 emit_label (jump_around_label);
   return true;

Michael, why did you change epilogue_size_needed to size_needed
here?  It looks wrong to me.

-- 
H.J.

Re: [RFC] Fix for PR58201

2013-09-06 Thread Paolo Carlini


Hi Honza,

On 09/06/2013 01:05 AM, Jan Hubicka wrote:

Hi,
this is the patch I commited after testing on x86_64-linux.

Honza

Index: ChangeLog
===
*** ChangeLog   (revision 202271)
--- ChangeLog   (working copy)
***
*** 1,3 
--- 1,9 
+ 2013-09-04  Jan Hubicka  
+
+   PR middle-end/58201
+   * cgraphunit.c (analyze_functions): Clear AUX fields
+   after processing; initialize assembler name has.
+

I checked and double checked and with this commit a C++ test regressed:

FAIL: g++.dg/template/cond2.C -std=gnu++98 (test for warnings, line 9)
FAIL: g++.dg/template/cond2.C -std=gnu++11 (test for warnings, line 9)

In practice we emit a message off by one line, line 10 instead of the 
correct line 9. Is it possible that you are clearing too much, so to speak?


After the recent breakages, up to date testresults are arriving only now 
on gcc-testresults confirming my finding, for example Andreas Schwab 
already posted a couple.


Thanks!
Paolo.

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-06 Thread Wei Mi

SCHED_GROUP works after I add chain_to_prev_insn after
add_branch_dependences, in order to chain control dependences to prev
insn for sched group. Here is the new patch. Testing is going on.

Thanks,
Wei Mi.

2013-09-06  Wei Mi  

* config/i386/i386.c (ix86_macro_fusion_p): New function.
(ix86_macro_fusion_pair_p): Ditto.
* config/i386/x86-tune.def (DEF_TUNE): Add m_COREI7 for
X86_TUNE_FUSE_CMP_AND_BRANCH.
* sched-deps.c (group_insns_for_macro_fusion): New function.
(sched_analyze_insn): Call group_insns_for_macro_fusion.
(chain_to_prev_insn): Change it from static to extern.
(chain_to_prev_insn_p): Ditto.
* doc/tm.texi: Generated.
* doc/tm.texi.in: Ditto.
* sched-int.h: New declarations.
* sched-rgn.c (add_branch_dependences): Chain control
dependences to prev insn for sched group.
* target.def: Add macro_fusion_p and macro_fusion_pair_p.

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 201963)
+++ config/i386/i386.c  (working copy)
@@ -24850,6 +24850,99 @@ ia32_multipass_dfa_lookahead (void)
 }
 }

+/* Return true if target platform supports macro-fusion.  */
+
+static bool
+ix86_macro_fusion_p ()
+{
+  if (TARGET_FUSE_CMP_AND_BRANCH)
+return true;
+  else
+return false;
+}
+
+/* Check whether current microarchitecture support macro fusion
+   for insn pair "CONDGEN + CONDJMP". Refer to
+   "Intel Architectures Optimization Reference Manual". */
+
+static bool
+ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp)
+{
+  rtx src;
+  if (!strcmp (ix86_tune_string, "corei7"))
+{
+  /* For Nehalem.  */
+  rtx single_set = single_set (condgen);
+  /* Nehalem doesn't support macro-fusion for add/sub+jmp.  */
+  if (single_set == NULL_RTX)
+return false;
+
+  src = SET_SRC (single_set);
+  if (GET_CODE (src) != COMPARE)
+   return false;
+
+  /* Nehalem doesn't support macro-fusion for cmp/test MEM-IMM
+insn pattern.  */
+  if ((MEM_P (XEXP (src, 0))
+  && CONST_INT_P (XEXP (src, 1)))
+ || (MEM_P (XEXP (src, 1))
+ && CONST_INT_P (XEXP (src, 0
+   return false;
+
+  /* Nehalem doesn't support macro-fusion for add/sub/dec/inc + jmp.  */
+  if (get_attr_type (condgen) != TYPE_TEST
+ && get_attr_type (condgen) != TYPE_ICMP)
+   return false;
+  return true;
+}
+  else if (!strcmp (ix86_tune_string, "corei7-avx"))
+{
+  /* For Sandybridge.  */
+  enum rtx_code ccode;
+  rtx compare_set = NULL_RTX, test_if, cond;
+  rtx single_set = single_set (condgen);
+  if (single_set != NULL_RTX)
+compare_set = single_set;
+  else
+   {
+ int i;
+ rtx pat = PATTERN (condgen);
+ for (i = 0; i < XVECLEN (pat, 0); i++)
+   if (GET_CODE (XVECEXP (pat, 0, i)) == SET
+   && GET_CODE (SET_SRC (XVECEXP (pat, 0, i))) == COMPARE)
+ compare_set = XVECEXP (pat, 0, i);
+   }
+
+  if (compare_set == NULL_RTX)
+   return false;
+  src = SET_SRC (compare_set);
+  if (GET_CODE (src) != COMPARE)
+   return false;
+
+  /* Sandybridge doesn't support macro-fusion for cmp/test MEM-IMM
+insn pattern.  */
+  if ((MEM_P (XEXP (src, 0))
+   && CONST_INT_P (XEXP (src, 1)))
+  || (MEM_P (XEXP (src, 1))
+  && CONST_INT_P (XEXP (src, 0
+return false;
+
+  /* Sandybridge doesn't support macro-fusion for inc/dec +
+unsigned comparison jmp.  */
+  test_if = SET_SRC (pc_set (condjmp));
+  cond = XEXP (test_if, 0);
+  ccode = GET_CODE (cond);
+  if (get_attr_type (condgen) == TYPE_INCDEC
+ && (ccode == GEU
+ || ccode == GTU
+ || ccode == LEU
+ || ccode == LTU))
+   return false;
+  return true;
+}
+  return false;
+}
+
 /* Try to reorder ready list to take advantage of Atom pipelined IMUL
execution. It is applied if
(1) IMUL instruction is on the top of list;
@@ -42982,6 +43075,10 @@ ix86_memmodel_check (unsigned HOST_WIDE_
 #undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD
 #define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD \
   ia32_multipass_dfa_lookahead
+#undef TARGET_SCHED_MACRO_FUSION_P
+#define TARGET_SCHED_MACRO_FUSION_P ix86_macro_fusion_p
+#undef TARGET_SCHED_MACRO_FUSION_PAIR_P
+#define TARGET_SCHED_MACRO_FUSION_PAIR_P ix86_macro_fusion_pair_p

 #undef TARGET_FUNCTION_OK_FOR_SIBCALL
 #define TARGET_FUNCTION_OK_FOR_SIBCALL ix86_function_ok_for_sibcall
Index: config/i386/x86-tune.def
===
--- config/i386/x86-tune.def(revision 201963)
+++ config/i386/x86-tune.def(working copy)
@@ -196,7 +196,8 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS,
 /* X86_TUNE_FUSE_CMP_AND_BRANCH: Fuse a compare or test i

Re: RFA: Fix mark_target_live_regs to take COND_EXEC into account

2013-09-06 Thread Steven Bosscher

On Fri, Sep 6, 2013 at 12:20 PM, Joern Rennecke wrote:
> We found that
>
> std::basic_string, std::allocator
> ::copy(char*, unsigned long, unsigned long) const
>
> got miscompiled for ARC because reorg thought that all call-clobbered
> registers were dead after a conditional call.

Hmm, interesting to have a target with both predication and delay slots...

> I can't reproduce the test case with current trunk + ARC port, but I reckon
> this is just due to the fickleness of the register allocator.  Chances are,
> every other version, some other obscure function is miscompiled without
> this patch.
>
> bootstrapped on i686-pc-linux-gnu.

That's not a very good test, i686 doesn't use this code ;-)

> OK to apply?

Yes, the patch is OK.

However, I wouldn't be surprised if using COND_EXECs together with
delay slots has more hidden problems. Browsing through resource.c, the
"if (jump_insn)" block at lines 1095:1119 jumps out as another place
you should probably look into.

Ciao!
Steven

Re: [RS6000] powerpc64 -mcmodel=medium large symbol offsets

2013-09-06 Thread David Edelsohn

On Fri, Sep 6, 2013 at 3:13 AM, Alan Modra  wrote:
> The following testcase taken from the linux kernel is miscompiled on
> powerpc64-linux.
>
> /* -m64 -mcmodel=medium -O -S -fno-section-anchors */
> static int x;
>
> unsigned long
> foo (void)
> {
>   return ((unsigned long) &x) - 0xc000;
> }
>
> generates
> addis 3,2,x+4611686018427387904@toc@ha
> addi 3,3,x+4611686018427387904@toc@l
> blr
>
> losing the top 32 bits of the offset.  Sadly, the assembler and linker
> do not complain, which is a hole in the ABI.  (@ha and _HA relocs as
> per the ABI won't complain about overflow since they might be used in
> a @highesta, @highera sequence loading a 64-bit value.)
>
> This patch stops combine merging large offsets into a symbol addend
> by copying code from reg_or_add_cint_operand to a new predicate,
> add_cint_operand, and using that to restrict the range of offsets.
> Bootstrapped and regression tested powerpc64-linux.  OK to apply?
>
> * config/rs6000/predicates.md (add_cint_operand): New.
> * config/rs6000/rs6000.md (largetoc_high_plus): Restrict offset
> using add_cint_operand.
> (largetoc_high_plus_aix): Likewise.

This patch should include a testcase.

But what user feedback are you expecting if the offset is too large,
such as your example? In my test with the patch, it produces an
unrecognizable insn error, which seems less than friendly.

Thanks, David

Re: [i386, AVX-512F, pr58269] Partial fix for PR58269: properly initialize last EXT REX SSE register.

2013-09-06 Thread Mike Stump

On Sep 6, 2013, at 3:04 AM, Iain Sandoe  wrote:
> Thanks for Ilya's input on the PR thread.

> Here's what I propose for the remainder of the fix (FAOD, I cannot approve 
> the Darwin changes).

Ok.

> Mike: Actually, since this seems to have uncovered a pre-existing wrong code 
> bug, perhaps (this part of the) fix should also be applied to open branches?

Bad wrong code gen I think is suitable for the release branches.  I double 
checked llvm:

  llvm/lib/Target/X86/X86CallingConv.td

and it seems to match.

Re: RFC - Next refactoring steps

2013-09-06 Thread Joseph S. Myers

On Fri, 6 Sep 2013, Richard Biener wrote:

> But yes, making all dependencies explicit puts #includes where they 
> belong and shows where header refactoring would make sense.  It also 
> removes weird includes from .c files that are only necessary to make 
> included required headers not break.

One difficulty is that e.g. flags.h uses SWITCHABLE_TARGET (tm.h) but is 
included in lots of front-end files that don't actually care about the 
SWITCHABLE_TARGET use in flags.h and have no other reason to include tm.h.  
Since we generally want to eliminate tm.h uses in front-end files 
(converting target macros used there into hooks, etc.), making the 
"dependency" explicit in the obvious way isn't such a good idea.  
Instead, probably the bits that don't depend on SWITCHABLE_TARGET should 
move out of flags.h to other headers, the residual bit that does depend on 
tm.h should become e.g. switchable-target-flags.h (and include tm.h), and 
files that currently include flags.h should change to including the 
generated options.h (and any other headers needed for things formerly got 
from flags.h).

(If other headers do start including tm.h, of course it becomes more 
complicated to identify which front-end files are still including tm.h, as 
they may include it indirectly.)

-- 
Joseph S. Myers
jos...@codesourcery.com

[C++ PATCH] Fix PR58300, issue with -fvtable-verify=preinit

2013-09-06 Thread Paolo Carlini


Hi,

On 09/06/2013 12:35 AM, Caroline Tice wrote:

This fixes a bug where using -fvtable-verify=preinit sometimes causes
an ICE.  In particular, when the preinit flag was used, the vtable
verification constructor initialization function was being written to
the assembly file before it was being checked with
cgraph_process_new_functions.  This sometimes caused an assertion
failure in decide_is_symbol_needed.  This patch fixes the problem by
reordering events so that the function is written to the assembly file
after the call to cgraph_process_new_functions.  I have verified that
this fixes the test case attached to the bug, and it has passed all my
regular vtable verification tests.   Is this patch ok to commit?

-- Caroline Tice
cmt...@google.com


2013-09-04  Caroline Tice  

 PR c++/58300
 * vtable-class-hierarchy.c (vtv_generate_init_routine): In
 preinit case, move call to assemble_vtv_preinit_initializer to
after call to cgraph_process_new_functions.
Thanks Caroline. Let's add Jason in CC, I almost missed the patch myself 
(and I was actively paying attention to the issue)


Paolo.

Fix missed propagation opportunity in DOM

2013-09-06 Thread Jeff Law



I recently noticed that we were failing to propagate edge equivalences 
into PHI arguments in non-dominated successors.


The case loos like this:

;;   basic block 11, loop depth 0, count 0, freq 160, maybe hot
;;prev block 10, next block 12, flags: (NEW, REACHABLE)
;;pred:   10 [50.0%]  (FALSE_VALUE,EXECUTABLE)
  _257 = di_13(D)->comps;
  _258 = (long unsigned int) _255;
  _259 = _258 * 24;
  p_260 = _257 + _259;
  _261 = _255 + 1;
  di_13(D)->next_comp = _261;
  if (p_260 != 0B)
goto ;
  else
goto ;
;;succ:   12 [100.0%]  (TRUE_VALUE,EXECUTABLE)
;;13 (FALSE_VALUE,EXECUTABLE)

;;   basic block 12, loop depth 0, count 0, freq 272, maybe hot
;;   Invalid sum of incoming frequencies 160, should be 272
;;prev block 11, next block 13, flags: (NEW, REACHABLE)
;;pred:   11 [100.0%]  (TRUE_VALUE,EXECUTABLE)
  p_260->type = 37;
  p_260->u.s_builtin.type = _139;
;;succ:   13 [100.0%]  (FALLTHRU,EXECUTABLE)

;;   basic block 13, loop depth 0, count 0, freq 319, maybe hot
;;   Invalid sum of incoming frequencies 432, should be 319
;;prev block 12, next block 14, flags: (NEW, REACHABLE)
;;pred:   110 [100.0%]  (FALLTHRU)
;;12 [100.0%]  (FALLTHRU,EXECUTABLE)
;;11 (FALSE_VALUE,EXECUTABLE)
  # _478 = PHI <0B(110), p_260(12), p_260(11)>
  ret = _478;
  _142 = di_13(D)->expansion;
  _143 = _478->u.s_builtin.type;

In particular note block 11 does *not* dominate block 13.  However, we 
know that when we traverse the edge 11->13 that p_260 will have the 
value zero, which should be propagated into the PHI node.


After fixing the propagation with the attached patch we have
_478 = PHI <0B(110), p_260(12), 0B(11)>

I have other code which then discovers the unconditional NULL pointer 
dereferences when we traverse 110->13 or 11->13 and isolates those paths.


That in turn allows blocks 12 and 13 to be combined, which in turn 
allows discovery of additional CSE opportunities.


Bootstrapped and regression tested on x86_64-unknown-linux-gnu.  Applied 
to the trunk.


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8493ee1..1f1f939 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2013-09-06  Jeff Law  
+
+   * tree-ssa-dom.c (cprop_into_successor_phis): Also propagate
+   edge implied equivalences into successor phis.
+
 2013-09-06 Claudiu Zissulescu 
 
* resource.c (mark_target_live_regs): Compute resources taking
diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 691e6f9..f999a64 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -1642,6 +1642,28 @@ cprop_into_successor_phis (basic_block bb)
   if (gsi_end_p (gsi))
continue;
 
+  /* We may have an equivalence associated with this edge.  While
+we can not propagate it into non-dominated blocks, we can
+propagate them into PHIs in non-dominated blocks.  */
+
+  /* Push the unwind marker so we can reset the const and copies
+table back to its original state after processing this edge.  */
+  const_and_copies_stack.safe_push (NULL_TREE);
+
+  /* Extract and record any simple NAME = VALUE equivalences. 
+
+Don't bother with [01] = COND equivalences, they're not useful
+here.  */
+  struct edge_info *edge_info = (struct edge_info *) e->aux;
+  if (edge_info)
+   {
+ tree lhs = edge_info->lhs;
+ tree rhs = edge_info->rhs;
+
+ if (lhs && TREE_CODE (lhs) == SSA_NAME)
+   record_const_or_copy (lhs, rhs);
+   }
+
   indx = e->dest_idx;
   for ( ; !gsi_end_p (gsi); gsi_next (&gsi))
{
@@ -1667,6 +1689,8 @@ cprop_into_successor_phis (basic_block bb)
  && may_propagate_copy (orig_val, new_val))
propagate_value (orig_p, new_val);
}
+
+  restore_vars_to_original_value ();
 }
 }

[patch i386]: Turn on -fms-extensions option for i386 ms-abi targets

2013-09-06 Thread Kai Tietz

Hello,

This patch changes for i386 targets using by default ms-abi the
default-value of option -fms-extensions.  For those targets the option
will be turned on.

I will apply this patch next week, if there are no objections.

Regards,
Kai

ChangeLog

2013-09-06  Kai Tietz  

* doc/invoke.texi (fms-extensions): Document changed
behavior for ms-abi targets.
* config/i386/i386.c (ix86_option_override_internal):
Set default value of option -fms-extension for ms-abi targets.

Index: doc/invoke.texi
===
--- doc/invoke.texi (Revision 202346)
+++ doc/invoke.texi (Arbeitskopie)
@@ -1856,6 +1856,8 @@ Some cases of unnamed fields in structures and uni
 accepted with this option.  @xref{Unnamed Fields,,Unnamed struct/union
 fields within structs/unions}, for details.

+Note that this option is off for all targets but i?86 and x86_64
+targets using ms-abi.
 @item -fplan9-extensions
 Accept some non-standard constructs used in Plan 9 code.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (Revision 202134)
+++ config/i386/i386.c  (Arbeitskopie)
@@ -3357,6 +3357,12 @@ ix86_option_override_internal (bool main_args_p)
   if (!global_options_set.x_ix86_abi)
 ix86_abi = DEFAULT_ABI;

+  /* For targets using ms ABI enable ms-extensions, if not
+ explicit turned off.  For non-ms ABI we turn off this
+ option.  */
+  if (!global_options_set.x_flag_ms_extensions)
+flag_ms_extensions = (MS_ABI == DEFAULT_ABI);
+
   if (global_options_set.x_ix86_cmodel)
 {
   switch (ix86_cmodel)

Re: RFC - Next refactoring steps

2013-09-06 Thread Steven Bosscher

On Fri, Sep 6, 2013 at 5:21 PM, Andrew MacLeod wrote:

>> hackery in some headers will suddenly break (that is, change outcome)
>> if you include for example
>> tm.h before or after it.
>>
> these would be really good to identify and fix, if possible. (surely they
> can be fixed.. :-)if they cant be fixed for whatever reason, we ought to
> protect them with some mechanism.. ie assert that tm.h either has or has not
> been included before hand... whichever way is required.   At least then we
> get an error if it changes.
>
> Do you know of any of the top of your head?

sched-int.h comes to mind as an example I fixed recently (it needs
INSN_SCHEDULING from insn-attr.h.

But these cases are hard to discover. Usually you simply run into them
while trying to re-arrange the #includes of a .c file.

Ciao!
Steven

Re: RFC - Next refactoring steps

2013-09-06 Thread Diego Novillo

On Fri, Sep 6, 2013 at 4:14 AM, Mike Stump  wrote:
> On Sep 5, 2013, at 11:54 PM, Richard Biener  
> wrote:
>> Most of the GCC headerfiles do not include all their required headers but
>> rely on .c files doing that (in the appropriate order).  I somehow like
>> that though I cannot explain why ;)
>
> Very old school.  I can explain why I don't like it, but I'll resist.  The 
> universal standard is for each header to include all it needs and for the 
> ordering of includes not to matter any.  Deviations from this are the 
> exception and should virtually never should be the case.

Agreed. Header files should be self-contained and include everything
they need. This will be increasingly more important in the presence of
C++ modules.


Diego.

Re: Fix PR middle-end/57370

2013-09-06 Thread Easwaran Raman

On Fri, Sep 6, 2013 at 12:05 AM, Richard Biener
 wrote:
> On Thu, Sep 5, 2013 at 9:23 PM, Easwaran Raman  wrote:
>> On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener
>>  wrote:
>>> On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman  wrote:
 I have a new patch that supersedes this. The new patch also fixes PR
 tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and
 no test regression on x86_64/linux. Ok for trunk?

 2013-07-31  Easwaran Raman  

 PR middle-end/57370
 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset
 of debug statements that cause inconsistent IR.
>>>
>>> Missing ChangeLog entry for the insert_stmt_after hunk which I do not like
>>> at all.  The other hunks are ok, but we need to work harder to preserve
>>> debug stmts - simply removing all is not going to fly.
>>>
>>> Richard.
>>
>> I looked into the problem related to the debug stmts in this failing
>> test case in detail. Initially, the code sequence looks
>>
>>
>> s$n_13 = MEM[(struct S *)&s];
>> # DEBUG s$n => s$n_13
>> _2 = (double) s$n_13;
>> _4 = _2 * a_3(D);
>> _6 = (double) i_5(D);
>> _7 = _4 * _6;
>> _9 = (double) j_8(D);
>> _10 = _7 * _9;
>> bar (_10);
>> # DEBUG D#2 => (double) k_12(D)
>> # DEBUG D#1 => _7 * D#2
>> # DEBUG t$n => (int) D#1
>>
>> After reassociation
>>
>> s$n_13 = MEM[(struct S *)&s];
>> # DEBUG s$n => s$n_13
>> _2 = (double) s$n_13;
>> _6 = (double) i_5(D);
>> # DEBUG D#3 => _4 * _6
>> _9 = (double) j_8(D);
>> _4 = _9 * _2;
>> _7 = _4 * a_3(D);
>> _10 = _7 * _6;
>> bar (_10);
>> # DEBUG D#2 => (double) k_12(D)
>> # DEBUG D#1 => D#3 * D#2
>> # DEBUG t$n => (int) D#1
>>
>> In the above, # DEBUG D#3 => _4 * _6  appears above the statement that
>> defines _4. But even if I move the def of D#3 below the statement
>> defining _4, it is not sufficient. Before reassociation, t$n refers to
>> the expression (s$n_13 * a_3(D) * i_5(D) * k_12(D)), but after
>> reassociation it would refer to ( j_8(D) * s$n_13 *  i_5(D) *
>> k_12(D)). It seems the correct fix is to discard the debug temps whose
>> RHS contains a variable that is involved in reassociation and then
>> reconstruct it. Is that the expected fix here?
>
> The value of the DEBUG expression changes because the value
> that _4 computes changes - that is a no-no that may not happen.
> We cannot re-use _4 (without previously releasing it and allocating
> it newly) with a new value.  releasing _4 should have fixed up the
> debug stmts.
>
> So - can you verify whether we are indeed just replacing the
> RHS of _4 = _2 * a_3(D) to _9 * _2 without changing the SSA name of
> that expression?

I have confirmed that the SSA name of the LHS remains the same. The
reassociation code simply calls gimple_assign_set_rhs2 to modify RHS2
and then calls update_stmt.

- Easwaran

> Another reason why re-using the same LHS for another value is
> wrong is that annotated information like points-to sets, alignment
> and soon value-range information will be wrong.
>
> Thanks,
> Richard.
>
>> - Easwaran
>>

 testsuite/ChangeLog:
 2013-07-31  Easwaran Raman  

 PR middle-end/57370
 PR tree-optimization/57393
 PR tree-optimization/58011
 * gfortran.dg/reassoc_12.f90: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.
 * gcc.dg/tree-ssa/reassoc-31.c: New testcase.


 On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman  wrote:
> Ping.
>
> On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman  
> wrote:
>> A newly generated statement in build_and_add_sum  function of
>> tree-ssa-reassoc.c has to be assigned the UID of its adjacent
>> statement. In one instance, it was assigned the wrong uid (of an
>> earlier phi statement) which messed up the IR and caused the test
>> program to hang. Bootstraps and no test regressions on x86_64/linux.
>> Ok for trunk?
>>
>> Thanks,
>> Easwaran
>>
>>
>> 2013-06-27  Easwaran Raman  
>>
>> PR middle-end/57370
>> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of 
>> a phi
>> node for a non-phi gimple statement.
>>
>> testsuite/ChangeLog:
>> 2013-06-27  Easwaran Raman  
>>
>> PR middle-end/57370
>> * gfortran.dg/reassoc_12.f90: New testcase.
>>
>>
>> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90
>> ===
>> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0)
>> @@ -0,0 +1,74 @@
>> +! { dg-do compile }
>> +! { dg-options "-O2 -ffast-math" }
>> +! PR middle-end/57370
>> +
>> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, &
>> +   grad_deriv,npoints, sx)
>> +IMPLICIT REAL*8 (t)
>> +INTEGER, PARAMETER :: dp=8
>

Re: Simplify jump threading slightly

2013-09-06 Thread Jeff Law


On 09/06/2013 02:10 PM, Gerald Pfeifer wrote:

Hi Jeff,

On Thu, 5 Sep 2013, Jeff Law wrote:

Bootstrapped and regression tested on x86_64-unknown-linux-gnu.
Installed onto the trunk.


is it possible this

   2013-09-05  Jeff Law  

* tree-ssa-threadedge.c (thread_around_empty_blocks): Renamed
from thread_around_empty_block.  Record threading path into PATH.
Recurse if threading through the initial block is successful.
(thread_across_edge): Corresponding changes to slightly simplify.

is causing http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58340 , or
least unearthing it?

That's amd64-unknown-freebsd8.0, pretty similar to your test platform,
so this would be a bit surprising, but...
I'll take a look.  I'm not a big believer in coincidences -- with that 
code recently changing and you seeing a bootstrap failure, odds are, 
it's something in that change.


The day is winding down, so it'll probably be monday before I get too 
far on it.


jeff

Re: Simplify jump threading slightly

2013-09-06 Thread Gerald Pfeifer

Hi Jeff,

On Thu, 5 Sep 2013, Jeff Law wrote:
> Bootstrapped and regression tested on x86_64-unknown-linux-gnu. 
> Installed onto the trunk.

is it possible this

  2013-09-05  Jeff Law  

   * tree-ssa-threadedge.c (thread_around_empty_blocks): Renamed
   from thread_around_empty_block.  Record threading path into PATH.
   Recurse if threading through the initial block is successful.
   (thread_across_edge): Corresponding changes to slightly simplify.

is causing http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58340 , or
least unearthing it?

That's amd64-unknown-freebsd8.0, pretty similar to your test platform,
so this would be a bit surprising, but...

Gerald

Re: [PATCH, x86] Use vector moves in memmove expanding

2013-09-06 Thread Michael Zolotukhin

> Michael, why did you change epilogue_size_needed to size_needed
> here?  It looks wrong to me.
This function was changed in several places and meaning of
'size_needed' and 'epilogue_size_needed' could've been changed. It
needs more careful examination and I'll do it shortly.
Briefly, I significantly changed prologue and epilogue generation
routines and that obviously caused that regression. I guess that in
some of these function I began to look at size_needed instead of
desired_align (usually, size_needed >= desired_align, but for pentium
pro that's not true). I hope to dig into it on the weekend.

Thanks for reporting and investigation.

Michael
> H.J.

Factor gimple structures out of gimple.h

2013-09-06 Thread Diego Novillo

This patch introduces gimple-core.h, which contains just the data
structures needed to define gimple. I left everything else in
gimple.h

The addition of alias.h to tree-ssa-alias.h is so that we can
include tree-ssa-alias.h in isolation. It now includes everything
it needs.

More discussion on rationale at the thread:
http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00300.html

Tested on x86_64.


2013-09-06  Diego Novillo  

* Makefile.in (GIMPLE_CORE_H): New.
(GIMPLE_H): Depend on GIMPLE_CORE_H.
(TREE_SSA_ALIAS_H): New. Replace references to tree-ssa-alias.h with
TREE_SSA_ALIAS_H.
* gimple-core.h: New. Factor all gimple data structures out of ...
* gimple.h: ... here.
* tree-ssa-alias.h: Include alias.h.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a72b753..2be8846 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -882,15 +882,19 @@ TREE_H = tree.h $(TREE_CORE_H)  tree-check.h
 REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h
 BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) \
cfg-flags.def cfghooks.h
-GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \
+GIMPLE_CORE_H = gimple-core.h $(CONFIG_H) $(SYSTEM_H) $(CORETYPES_H) \
+   $(INPUT_H) gimple.def gsstruct.def $(TREE_SSA_ALIAS_H) \
+   $(INTERNAL_FN_H) $(TREE_CORE_H)
+GIMPLE_H = gimple.h $(GIMPLE_CORE_H) pointer-set.h $(VEC_H) \
$(GGC_H) $(BASIC_BLOCK_H) $(TREE_H) tree-ssa-operands.h \
-   tree-ssa-alias.h $(INTERNAL_FN_H) $(HASH_TABLE_H)
+   $(HASH_TABLE_H)
 TRANS_MEM_H = trans-mem.h
 GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h
 COVERAGE_H = coverage.h $(GCOV_IO_H)
 DEMANGLE_H = $(srcdir)/../include/demangle.h
 RECOG_H = recog.h
 ALIAS_H = alias.h
+TREE_SSA_ALIAS_H = tree-ssa-alias.h $(ALIAS_H)
 EMIT_RTL_H = emit-rtl.h
 FLAGS_H = flags.h flag-types.h $(OPTIONS_H)
 OPTIONS_H = options.h flag-types.h $(OPTIONS_H_EXTRA)
@@ -946,7 +950,7 @@ TREE_PASS_H = tree-pass.h $(TIMEVAR_H) $(DUMPFILE_H)
 TREE_FLOW_H = tree-flow.h tree-flow-inline.h tree-ssa-operands.h \
$(BITMAP_H) sbitmap.h $(BASIC_BLOCK_H) $(GIMPLE_H) \
$(HASHTAB_H) $(CGRAPH_H) $(IPA_REFERENCE_H) \
-   tree-ssa-alias.h
+   $(TREE_SSA_ALIAS_H)
 TREE_HASHER_H = tree-hasher.h $(HASH_TABLE_H) $(TREE_FLOW_H)
 TREE_SSA_LIVE_H = tree-ssa-live.h $(PARTITION_H)
 SSAEXPAND_H = ssaexpand.h $(TREE_SSA_LIVE_H)
@@ -2234,7 +2238,7 @@ test-dump.o : test-dump.c $(CONFIG_H) $(SYSTEM_H) 
$(CORETYPES_H) \
$(BITMAP_H) sbitmap.h sreal.h $(TREE_H) $(CXX_PARSER_H) $(DWARF2OUT_H) \
$(GIMPLE_PRETTY_PRINT_H) $(BASIC_BLOCK_H) insn-config.h $(LRA_INT.H) \
$(SEL_SCHED_DUMP_H) $(IRA_INT_H) $(TREE_DATA_REF_H) $(TREE_FLOW_H) \
-   $(TREE_SSA_LIVE_H) tree-ssa-alias.h $(OMEGA_H) $(RTL_H)
+   $(TREE_SSA_LIVE_H) $(TREE_SSA_ALIAS_H) $(OMEGA_H) $(RTL_H)
 tree.o: tree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \
all-tree.def $(FLAGS_H) $(FUNCTION_H) $(PARAMS_H) \
toplev.h $(DIAGNOSTIC_CORE_H) $(GGC_H) $(HASHTAB_H) $(TARGET_H) output.h 
$(TM_P_H) \
@@ -2722,7 +2726,7 @@ targhooks.o : targhooks.c $(CONFIG_H) $(SYSTEM_H) 
coretypes.h $(TREE_H) \
$(EXPR_H) $(TM_H) $(RTL_H) $(TM_P_H) $(FUNCTION_H) output.h 
$(DIAGNOSTIC_CORE_H) \
$(MACHMODE_H) $(TARGET_DEF_H) $(TARGET_H) $(GGC_H) gt-targhooks.h \
$(OPTABS_H) $(RECOG_H) $(REGS_H) reload.h hard-reg-set.h intl.h $(OPTS_H) \
-   tree-ssa-alias.h $(TREE_FLOW_H)
+   $(TREE_SSA_ALIAS_H) $(TREE_FLOW_H)
 common/common-targhooks.o : common/common-targhooks.c $(CONFIG_H) $(SYSTEM_H) \
coretypes.h $(INPUT_H) $(TM_H) $(COMMON_TARGET_H) common/common-targhooks.h
 
@@ -2746,7 +2750,7 @@ toplev.o : toplev.c $(CONFIG_H) $(SYSTEM_H) coretypes.h 
$(TM_H) $(TREE_H) \
langhooks.h insn-flags.h $(CFGLOOP_H) hosthooks.h \
$(CGRAPH_H) $(COVERAGE_H) alloc-pool.h $(GGC_H) \
$(OPTS_H) params.def tree-mudflap.h $(TREE_PASS_H) $(GIMPLE_H) \
-   tree-ssa-alias.h $(PLUGIN_H) realmpfr.h tree-diagnostic.h \
+   $(TREE_SSA_ALIAS_H) $(PLUGIN_H) realmpfr.h tree-diagnostic.h \
$(TREE_PRETTY_PRINT_H) opts-diagnostic.h $(COMMON_TARGET_H) \
tsan.h diagnostic-color.h $(CONTEXT_H) $(PASS_MANAGER_H)
 
@@ -3303,7 +3307,7 @@ alias.o : alias.c $(CONFIG_H) $(SYSTEM_H) coretypes.h 
$(DUMPFILE_H) \
$(ALIAS_H) $(EMIT_RTL_H) $(GGC_H) $(FUNCTION_H) cselib.h $(TREE_H) 
$(TM_P_H) \
langhooks.h $(TARGET_H) gt-alias.h $(TIMEVAR_H) $(CGRAPH_H) \
$(SPLAY_TREE_H) $(DF_H) \
-   tree-ssa-alias.h pointer-set.h $(TREE_FLOW_H)
+   $(TREE_SSA_ALIAS_H) pointer-set.h $(TREE_FLOW_H)
 stack-ptr-mod.o : stack-ptr-mod.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
$(TM_H) $(TREE_H) $(RTL_H) $(REGS_H) $(EXPR_H) $(TREE_PASS_H) \
$(BASIC_BLOCK_H) $(FLAGS_H) output.h $(DF_H)
@@ -3819,7 +3823,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/reg-stack.c $(srcdir)/cfgrtl.c \
   $(srcdir)/sdbout.c $(srcdir)/stor-layout.c \
   $(srcdir)/stringpool.c $(s

Re: [PATCH] Fixing improper conversion from sin() to sinf() in optimization mode.

2013-09-06 Thread Cong Hou

First, thank you for your detailed comments again! Then I deeply
apologize for not explaining my patch properly and responding to your
previous comment. I didn't understand thoroughly the problem before
submitting the patch.

Previously I only considered the following three conversions for sqrt():


1: (float) sqrt ((double) float_val)  ->  sqrtf (float_val)
2: (float) sqrtl ((long double) float_val)  ->  sqrtf (float_val)
3: (double) sqrtl ((long double) double_val)  ->  sqrt (double_val)


We have four types here:

TYPE is the type to which the result of the function call is converted.
ITYPE is the type of the math call expression.
TREE_TYPE(arg0) is the type of the function argument (before type conversion).
NEWTYPE is chosen from TYPE and TREE_TYPE(arg0) with higher precision.
It will be the type of the new math call expression after conversion.

For all three cases above, TYPE is always the same as NEWTYPE. That is
why I only considered TYPE during the precision comparison. ITYPE can
only be double_type_node or long_double_type_node depending on the
type of the math function. That is why I explicitly used those two
types instead of ITYPE (no correctness issue). But you are right,
ITYPE is more elegant and better here.

After further analysis, I found I missed two more cases. Note that we
have the following conditions according to the code in convert.c:

TYPE_PRECISION(NEWTYPE) >= TYPE_PRECISION(TYPE)
TYPE_PRECISION(NEWTYPE) >= TYPE_PRECISION(TREE_TYPE(arg0))
TYPE_PRECISION (NEWTYPE) < TYPE_PRECISION (ITYPE)

the last condition comes from the fact that we only consider
converting a math function call into another one with narrower type.
Therefore we have

TYPE_PRECISION(TYPE) < TYPE_PRECISION (ITYPE)
TYPE_PRECISION(TREE_TYPE(arg0)) < TYPE_PRECISION (ITYPE)

So for sqrt(), TYPE and TREE_TYPE(arg0) can only be float, and for
sqrtl(), TYPE and TREE_TYPE(arg0) can be either float or double with
four possible combinations. Therefore we have two more conversions to
consider besides the three ones I mentioned above:


4: (float) sqrtl ((long double) double_val)  ->  (float) sqrt (double_val)
5: (double) sqrtl ((long double) float_val)  ->  sqrt ((double) float_val)


For the first conversion here, TYPE (float) is different from NEWTYPE
(double), and my previous patch doesn't handle this case.The correct
way is to compare precisions of ITYPE and NEWTYPE now.

To sum up, we are converting the expression

(TYPE) sqrtITYPE ((ITYPE) expr)

to

(TYPE) sqrtNEWTYPE ((NEWTYPE) expr)

and we require

PRECISION (ITYPE) >= PRECISION (NEWTYPE) * 2 + 2

to make it a safe conversion.


The new patch is pasted below.

I appreciate your detailed comments and analysis, and next time when I
submit a patch I will be more carefully about the reviewer's comment.


Thank you!

Cong



Index: gcc/convert.c
===
--- gcc/convert.c (revision 201891)
+++ gcc/convert.c (working copy)
@@ -135,16 +135,19 @@ convert_to_real (tree type, tree expr)
   CASE_MATHFN (COS)
   CASE_MATHFN (ERF)
   CASE_MATHFN (ERFC)
-  CASE_MATHFN (FABS)
   CASE_MATHFN (LOG)
   CASE_MATHFN (LOG10)
   CASE_MATHFN (LOG2)
   CASE_MATHFN (LOG1P)
-  CASE_MATHFN (LOGB)
   CASE_MATHFN (SIN)
-  CASE_MATHFN (SQRT)
   CASE_MATHFN (TAN)
   CASE_MATHFN (TANH)
+/* The above functions are not safe to do this conversion. */
+if (!flag_unsafe_math_optimizations)
+  break;
+  CASE_MATHFN (SQRT)
+  CASE_MATHFN (FABS)
+  CASE_MATHFN (LOGB)
 #undef CASE_MATHFN
 {
   tree arg0 = strip_float_extensions (CALL_EXPR_ARG (expr, 0));
@@ -155,6 +158,27 @@ convert_to_real (tree type, tree expr)
   if (TYPE_PRECISION (TREE_TYPE (arg0)) > TYPE_PRECISION (type))
  newtype = TREE_TYPE (arg0);

+  /* We consider to convert
+
+ (T1) sqrtT2 ((T2) exprT3)
+ to
+ (T1) sqrtT4 ((T4) exprT3)
+
+  , where T1 is TYPE, T2 is ITYPE, T3 is TREE_TYPE (ARG0),
+ and T4 is NEWTYPE. All those types are of floating point types.
+ T4 (NEWTYPE) should be narrower than T2 (ITYPE). This conversion
+ is safe only if P1 >= P2*2+2, where P1 and P2 are precisions of
+ T2 and T4. See the following URL for a reference:
+ 
http://stackoverflow.com/questions/9235456/determining-floating-point-square-root
+ */
+  if (fcode == BUILT_IN_SQRT || fcode == BUILT_IN_SQRTL)
+ {
+  int p1 = REAL_MODE_FORMAT (TYPE_MODE (itype))->p;
+  int p2 = REAL_MODE_FORMAT (TYPE_MODE (newtype))->p;
+  if (p1 < p2 * 2 + 2 && !flag_unsafe_math_optimizations)
+break;
+ }
+
   /* Be careful about integer to fp conversions.
  These may overflow still.  */
   if (FLOAT_TYPE_P (TREE_TYPE (arg0))
Index: gcc/testsuite/gcc.c-torture/execute/20030125-1.c
===
--- gcc/testsuite/gcc.c-torture/execute/20030125-1.c (revision 201891)
+++ gcc/testsuite/gcc.c-torture/execute/20030125-1.c (working copy)
@@ -44,11 +44,11 @@ __attribute__ ((noinline))
 double
 sin(double a)
 {
- abort ();
+ retu

Re: RFC - Next refactoring steps

2013-09-06 Thread Mike Stump

On Sep 6, 2013, at 8:21 AM, Andrew MacLeod  wrote:
> these would be really good to identify and fix, if possible.

> Do you know of any of the top of your head?

Sure, they are easy to find.  See below.  I'll note one more use of fixing 
these.  clang is adding modules to get compilation speed.  Modules require that 
the inclusion of a header be the same in every instance that it is used.  The 
usages below would prevent these headers from participating in being a module 
(and transitive closure of all they include them), thus dooming the software to 
slow compiles.  C++ features exponential compile time, where/as C features 
logarithmic compile time.  modules return things back to being linear and are 
worthwhile to control compilation speed issues.  To be competitive, I think gcc 
will have to add modules eventually as well, and as the gcc source base grows, 
making use of modules to keep compilation time down is also worthwhile. 

dfp.h:
#ifdef TREE_CODE
bool decimal_real_arithmetic (REAL_VALUE_TYPE *, enum tree_code, const 
REAL_VALUE_TYPE *,
  const REAL_VALUE_TYPE *);
#endif

double-int.h:
#ifndef GENERATOR_FILE
/* Conversion to and from GMP integer representations.  */

void mpz_set_double_int (mpz_t, double_int, bool);
double_int mpz_get_double_int (const_tree, mpz_t, bool);
#endif

expr.h:
#ifdef TREE_CODE
extern rtx expand_variable_shift (enum tree_code, enum machine_mode,
  rtx, tree, rtx, int);
extern rtx expand_shift (enum tree_code, enum machine_mode, rtx, int, rtx,
 int);
extern rtx expand_divmod (int, enum tree_code, enum machine_mode, rtx, rtx,
  rtx, int);
#endif

function.h:
#ifdef RTX_CODE
extern void diddle_return_value (void (*)(rtx, void*), void*);
extern void clobber_return_register (void);
#endif

target.h:
#ifdef GCC_TM_H

#ifndef CUMULATIVE_ARGS_MAGIC
#define CUMULATIVE_ARGS_MAGIC ((void *) &targetm.calls)
#endif

static inline CUMULATIVE_ARGS *
get_cumulative_args (cumulative_args_t arg)
{
#ifdef ENABLE_CHECKING
  gcc_assert (arg.magic == CUMULATIVE_ARGS_MAGIC);
#endif /* ENABLE_CHECKING */
  return (CUMULATIVE_ARGS *) arg.p;
}

Re: RFA: Fix mark_referenced_resources to handle COND_EXEC

2013-09-06 Thread Steven Bosscher

On Fri, Sep 6, 2013 at 12:22 PM, Joern Rennecke wrote:
> 2013-04-30  Joern Rennecke  <...>
>
> * resource.c (mark_referenced_resources): Handle COND_EXEC.

This is OK.

Ciao!
Steven

Re: [RS6000] powerpc64 -mcmodel=medium large symbol offsets

2013-09-06 Thread Alan Modra

On Fri, Sep 06, 2013 at 02:18:49PM -0400, David Edelsohn wrote:
> On Fri, Sep 6, 2013 at 3:13 AM, Alan Modra  wrote:
> > The following testcase taken from the linux kernel is miscompiled on
> > powerpc64-linux.
> >
> > /* -m64 -mcmodel=medium -O -S -fno-section-anchors */
> > static int x;
> >
> > unsigned long
> > foo (void)
> > {
> >   return ((unsigned long) &x) - 0xc000;
> > }
> >
> > generates
> > addis 3,2,x+4611686018427387904@toc@ha
> > addi 3,3,x+4611686018427387904@toc@l
> > blr
> >
> > losing the top 32 bits of the offset.  Sadly, the assembler and linker
> > do not complain, which is a hole in the ABI.  (@ha and _HA relocs as
> > per the ABI won't complain about overflow since they might be used in
> > a @highesta, @highera sequence loading a 64-bit value.)
> >
> > This patch stops combine merging large offsets into a symbol addend
> > by copying code from reg_or_add_cint_operand to a new predicate,
> > add_cint_operand, and using that to restrict the range of offsets.
> > Bootstrapped and regression tested powerpc64-linux.  OK to apply?
> >
> > * config/rs6000/predicates.md (add_cint_operand): New.
> > * config/rs6000/rs6000.md (largetoc_high_plus): Restrict offset
> > using add_cint_operand.
> > (largetoc_high_plus_aix): Likewise.
> 
> This patch should include a testcase.
> 
> But what user feedback are you expecting if the offset is too large,
> such as your example? In my test with the patch, it produces an
> unrecognizable insn error, which seems less than friendly.

The testcase gives me

.L.foo:
lis 9,0x4000
sldi 9,9,32
addis 3,2,x@toc@ha
addi 3,3,x@toc@l
add 3,3,9
blr

How did you manage to get an unrecognizable insn?  I can't see how we
generate the pattern except in combine.

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH] C++-ify and move control dependence code

2013-09-06 Thread Steven Bosscher

On Fri, Sep 6, 2013 at 8:41 AM, Richard Biener wrote:
>> I'd recommend re-implementing the control dependence code, then. The
>> current implementation is basically taken from old RTL-SSA dce.c and
>> uses a now old-fashioned view of the CFG, e.g. using edge lists.
>> You're probably better off starting from the dominance frontiers code
>> (control dependence is the same as dominance frontiers on the reverse
>> CFG).
...
> Well, I'm not in a position to do the re-implementation at the moment.

You don't have to. You just copy the dominance frontier code and make
it work on the reverse CFG. That shouldn't take more than half an hour
or so, and the result is much easier to interpret/validate than the
edge list stuff so you'll probably get that half our back easily when
working on control dependence consumers.

In fact, I already made those changes some time ago, see the cfganal.c
parts of http://gcc.gnu.org/ml/gcc-patches/2013-01/txt00046.txt

> Would a re-implementation give me control-dependence in a more useful
> way for PHI nodes?  That is, what block is an incoming edge (to compute
> the PHI arg) control dependent on?

I suppose so, yes. The inverse dominance-frontiers output is a
relation between basic blocks, i.e. not edge based. So you'd get
B=e->src and take the control dependencies of B.

> The current DCE query looks for
> control dependece of the incoming edges predecessor, but for
>
>if
> |\
> | 
> |  /
>PHI <1, 2>
>
> this gets us the right answer for the edge from the  block
> but a wrong one for the one that uses the if block (as DCE has to
> consider all incoming edges that probably doesn't matter as it
> just gets one step ahead here instead of waiting for the control
> dependence of the if block being marked).

I'm not sure I'm following what you're trying to say here, but if I
understand correctly, then there should be a check in DCE that the PHI
block does not post-dominate its predecessor.

In any case, I recommend you don't start from the DCE control
dependence code. It is relatively slow (Harvey's algorithm is optimal,
Morgan's is quadratic in the number of edges), and it doesn't compute
control dependence for abnormal edges (I hacked that out, it made
control dependence calculation too slow). You'll want "proper",
correct control dependence, and the code in tree-ssa-dce.c is tweaked
for the purpose of DCE.

Ciao!
Steven

Re: C++ demangler fix

2013-09-06 Thread Jason Merrill


OK, thanks.

Jason

Re: Ping: RFA: Consider int and same-size short as equivalent vector components

2013-09-06 Thread Jason Merrill


On 09/05/2013 10:50 AM, Joern Rennecke wrote:

(vector_types_compatible_elements_p): New function.


Why do we need this as well as vector_types_convertible_p?  For that 
matter, why do we need both vector_types_convertible_p and 
vector_targets_convertible_p?


Jason

Re: Ping: RFA: Consider int and same-size short as equivalent vector components

2013-09-06 Thread Joern Rennecke


Quoting Jason Merrill :


On 09/05/2013 10:50 AM, Joern Rennecke wrote:

(vector_types_compatible_elements_p): New function.


Why do we need this as well as vector_types_convertible_p?  For that
matter, why do we need both vector_types_convertible_p and
vector_targets_convertible_p?


vector_targets_convertible_p is used for pointer types.  The callers
do a hop, skip and dance to check that the qualifiers are satisfactory,
while OTOH vector_targets_convertible_p ignores the number of elements
in the vectors.  That's fine with vectors as we can consider, say,
a vector of 8 elements as two consecutive vectors of 4 elements, and
that works fine with pointers.

vector_types_convertible is used for pointer value types.
It could in principle call vector_targets_convertible_p as a subroutine,
but then the check for vector type would be duplicated with its callers,
and also the purpose of vector_targets_convertible_p would become
muddled.
Where vector_types_convertible returns true, a conversion might still be
needed to make the types match.

vector_types_compatible_elements_p was supposed to be a minimal change
from same_scalar_type_ignoring_signedness to fix the treatment of
opaque vectors in binary operators.  Where it returns true, and
the other checks of the caller succeed (being vector types in he first
place, and matching number of elements), we can just treat the types
as essentially the same.

I suppose we could replace same_scalar_type_ignoring_signedness /  
vector_types_compatible_elements_p by vector_compatible_p, but that  
would

change the behaviour in a number of ways.  First, we would no longer allow
sign changes by default, this would be only available with
-flax-vector-conversions, and to the extend that the types_compatible_p
lang hook says so; OTOH, whatever else it says is compatible, would then
also be (as long as -flax-vector-conversions is in effect).
Is that change wanted?
Also, we would presumably need to insert conversions at times.
And should we allow different-sized vectors to match?  If so, what would
be the result type?

Re: [RS6000] powerpc64 -mcmodel=medium large symbol offsets

2013-09-06 Thread Alan Modra

On Sat, Sep 07, 2013 at 09:06:08AM +0930, Alan Modra wrote:
> The testcase gives me
> 
> .L.foo:
>   lis 9,0x4000
>   sldi 9,9,32
>   addis 3,2,x@toc@ha
>   addi 3,3,x@toc@l
>   add 3,3,9
>   blr
> 
> How did you manage to get an unrecognizable insn?  I can't see how we
> generate the pattern except in combine.

Never mind.  I updated and rebuilt from a clean tree and now see the
failure too.  "tocrefdi" is where combine is still munging together
the large offset.

-- 
Alan Modra
Australia Development Lab, IBM

89 matches

Mail list logo