[PATCH, PR 57748] Check for out of bounds access
Hi, this patch fixes the ICE and wrong code issues from PR57748. It contains two test cases. One for the ICE and one for the wrong code. Both test cases fail for all gcc versions, and are fixed with this patch. Bootstrapped and tested on x86_64-linux without any problems. OK for trunk and the 4.8 / 4.7 branch? Regards Bernd Edlinger2013-09-06 Bernd Edlinger PR middle-end/57748 * expr.c (expand_assignment): Check for out of bounds access to structures. testsuite: * gcc.dg/torture/pr57748-1.c: New test. * gcc.dg/torture/pr57748-2.c: New test. patch-pr57748.diff Description: Binary data
Re: Fix PR middle-end/57370
On Thu, Sep 5, 2013 at 9:23 PM, Easwaran Raman wrote: > On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener > wrote: >> On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman wrote: >>> I have a new patch that supersedes this. The new patch also fixes PR >>> tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and >>> no test regression on x86_64/linux. Ok for trunk? >>> >>> 2013-07-31 Easwaran Raman >>> >>> PR middle-end/57370 >>> * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset >>> of debug statements that cause inconsistent IR. >> >> Missing ChangeLog entry for the insert_stmt_after hunk which I do not like >> at all. The other hunks are ok, but we need to work harder to preserve >> debug stmts - simply removing all is not going to fly. >> >> Richard. > > I looked into the problem related to the debug stmts in this failing > test case in detail. Initially, the code sequence looks > > > s$n_13 = MEM[(struct S *)&s]; > # DEBUG s$n => s$n_13 > _2 = (double) s$n_13; > _4 = _2 * a_3(D); > _6 = (double) i_5(D); > _7 = _4 * _6; > _9 = (double) j_8(D); > _10 = _7 * _9; > bar (_10); > # DEBUG D#2 => (double) k_12(D) > # DEBUG D#1 => _7 * D#2 > # DEBUG t$n => (int) D#1 > > After reassociation > > s$n_13 = MEM[(struct S *)&s]; > # DEBUG s$n => s$n_13 > _2 = (double) s$n_13; > _6 = (double) i_5(D); > # DEBUG D#3 => _4 * _6 > _9 = (double) j_8(D); > _4 = _9 * _2; > _7 = _4 * a_3(D); > _10 = _7 * _6; > bar (_10); > # DEBUG D#2 => (double) k_12(D) > # DEBUG D#1 => D#3 * D#2 > # DEBUG t$n => (int) D#1 > > In the above, # DEBUG D#3 => _4 * _6 appears above the statement that > defines _4. But even if I move the def of D#3 below the statement > defining _4, it is not sufficient. Before reassociation, t$n refers to > the expression (s$n_13 * a_3(D) * i_5(D) * k_12(D)), but after > reassociation it would refer to ( j_8(D) * s$n_13 * i_5(D) * > k_12(D)). It seems the correct fix is to discard the debug temps whose > RHS contains a variable that is involved in reassociation and then > reconstruct it. Is that the expected fix here? The value of the DEBUG expression changes because the value that _4 computes changes - that is a no-no that may not happen. We cannot re-use _4 (without previously releasing it and allocating it newly) with a new value. releasing _4 should have fixed up the debug stmts. So - can you verify whether we are indeed just replacing the RHS of _4 = _2 * a_3(D) to _9 * _2 without changing the SSA name of that expression? Another reason why re-using the same LHS for another value is wrong is that annotated information like points-to sets, alignment and soon value-range information will be wrong. Thanks, Richard. > - Easwaran > >>> >>> testsuite/ChangeLog: >>> 2013-07-31 Easwaran Raman >>> >>> PR middle-end/57370 >>> PR tree-optimization/57393 >>> PR tree-optimization/58011 >>> * gfortran.dg/reassoc_12.f90: New testcase. >>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase. >>> * gcc.dg/tree-ssa/reassoc-31.c: New testcase. >>> >>> >>> On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman wrote: Ping. On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman wrote: > A newly generated statement in build_and_add_sum function of > tree-ssa-reassoc.c has to be assigned the UID of its adjacent > statement. In one instance, it was assigned the wrong uid (of an > earlier phi statement) which messed up the IR and caused the test > program to hang. Bootstraps and no test regressions on x86_64/linux. > Ok for trunk? > > Thanks, > Easwaran > > > 2013-06-27 Easwaran Raman > > PR middle-end/57370 > * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of a > phi > node for a non-phi gimple statement. > > testsuite/ChangeLog: > 2013-06-27 Easwaran Raman > > PR middle-end/57370 > * gfortran.dg/reassoc_12.f90: New testcase. > > > Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 > === > --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) > +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) > @@ -0,0 +1,74 @@ > +! { dg-do compile } > +! { dg-options "-O2 -ffast-math" } > +! PR middle-end/57370 > + > + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, & > + grad_deriv,npoints, sx) > +IMPLICIT REAL*8 (t) > +INTEGER, PARAMETER :: dp=8 > +REAL(kind=dp), DIMENSION(1:npoints) :: e_ndrho_ndrho_ndrho, & > + e_ndrho_ndrho_rho > + DO ii=1,npoints > + IF( grad_deriv >= 2 .OR. grad_deriv == -2 ) THEN > +t1425 = t233 * t557 > +t1429 = beta * t225 > +t1622 = t327 * t1621 > +t1626 = t327 * t1625 >
[RS6000] powerpc64 -mcmodel=medium large symbol offsets
The following testcase taken from the linux kernel is miscompiled on powerpc64-linux. /* -m64 -mcmodel=medium -O -S -fno-section-anchors */ static int x; unsigned long foo (void) { return ((unsigned long) &x) - 0xc000; } generates addis 3,2,x+4611686018427387904@toc@ha addi 3,3,x+4611686018427387904@toc@l blr losing the top 32 bits of the offset. Sadly, the assembler and linker do not complain, which is a hole in the ABI. (@ha and _HA relocs as per the ABI won't complain about overflow since they might be used in a @highesta, @highera sequence loading a 64-bit value.) This patch stops combine merging large offsets into a symbol addend by copying code from reg_or_add_cint_operand to a new predicate, add_cint_operand, and using that to restrict the range of offsets. Bootstrapped and regression tested powerpc64-linux. OK to apply? * config/rs6000/predicates.md (add_cint_operand): New. * config/rs6000/rs6000.md (largetoc_high_plus): Restrict offset using add_cint_operand. (largetoc_high_plus_aix): Likewise. Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 202264) +++ gcc/config/rs6000/predicates.md (working copy) @@ -376,6 +376,12 @@ (ior (match_code "const_int") (match_operand 0 "gpc_reg_operand"))) +;; Return 1 if op is a constant integer valid for addition with addis, addi. +(define_predicate "add_cint_operand" + (and (match_code "const_int") + (match_test "(unsigned HOST_WIDE_INT) (INTVAL (op) + 0x80008000) + < (unsigned HOST_WIDE_INT) 0x1ll"))) + ;; Return 1 if op is a constant integer valid for addition ;; or non-special register. (define_predicate "reg_or_add_cint_operand" Index: gcc/config/rs6000/rs6000.md === --- gcc/config/rs6000/rs6000.md (revision 202264) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -12207,7 +12209,7 @@ (unspec [(match_operand:DI 1 "" "") (match_operand:DI 2 "gpc_reg_operand" "b")] UNSPEC_TOCREL) - (match_operand 3 "const_int_operand" "n"] + (match_operand 3 "add_cint_operand" "n"] "TARGET_ELF && TARGET_CMODEL != CMODEL_SMALL" "addis %0,%2,%1+%3@toc@ha") @@ -12218,7 +12220,7 @@ (unspec [(match_operand:P 1 "" "") (match_operand:P 2 "gpc_reg_operand" "b")] UNSPEC_TOCREL) - (match_operand 3 "const_int_operand" "n"] + (match_operand 3 "add_cint_operand" "n"] "TARGET_XCOFF && TARGET_CMODEL != CMODEL_SMALL" "addis %0,%1+%3@u(%2)") -- Alan Modra Australia Development Lab, IBM
Re: [PATCH, PR 57748] Check for out of bounds access
On Fri, Sep 6, 2013 at 9:03 AM, Bernd Edlinger wrote: > Hi, > > this patch fixes the ICE and wrong code issues from PR57748. It contains two > test cases. > One for the ICE and one for the wrong code. Both test cases fail for all gcc > versions, > and are fixed with this patch. > > Bootstrapped and tested on x86_64-linux without any problems. > > OK for trunk and the 4.8 / 4.7 branch? This doesn't look like the correct test. The movmisalign optab query uses the wrong mode. Richard. > Regards > Bernd Edlinger
Re: Silence gcc.dg/autopar/pr49960.c
> > +2013-09-05 Jan Hubicka > > + > > + * gcc.dg/autopar/pr49960.c: Disable partial inlining > > 2013-09-05 Richard Biener > > Please add a blank line between entries? The actual commmit has the extra line. Not sure why the patch doesn't. Sorry for that. Honza
Re: RFC - Next refactoring steps
On Sep 5, 2013, at 11:54 PM, Richard Biener wrote: > Most of the GCC headerfiles do not include all their required headers but > rely on .c files doing that (in the appropriate order). I somehow like > that though I cannot explain why ;) Very old school. I can explain why I don't like it, but I'll resist. The universal standard is for each header to include all it needs and for the ordering of includes not to matter any. Deviations from this are the exception and should virtually never should be the case. > Also grepping for #includes I see that this doesn't seem to be true anymore. Yeah, everyone else hates the old school style with a passion, it was never a good idea. :-) When they come across it, people have a tendency to fix the headers as broken and over time, all the dependencies eventually get added.
[Patch AArch64] Fix register constraints for lane intrinsics.
Hi, Most of the vector-by-element instructions in AArch64 have the restriction that, if the vector they are taking an element from has type "h" then it must be in a register from the lower half of the vector register set (i.e. v0-v15). While we have imposed that restriction in places, we have not been consistent. Fix that. Tested with aarch64.exp with no regressions. OK for trunk? Thanks, James --- gcc/ 2013-09-06 James Greenhalgh * config/aarch64/aarch64-simd.md (aarch64_sqdmll_n_internal): Use iterator to ensure correct register choice. (aarch64_sqdmll2_n_internal): Likewise. (aarch64_sqdmull_n): Likewise. (aarch64_sqdmull2_n_internal): Likewise. * config/aarch64/arm_neon.h (vml_lane_16): Use 'x' constraint for element vector. (vml_n_16): Likewise. (vmll_high_lane_16): Likewise. (vmll_high_n_16): Likewise. (vmll_lane_16): Likewise. (vmll_n_16): Likewise. (vmul_lane_16): Likewise. (vmul_n_16): Likewise. (vmull_lane_16): Likewise. (vmull_n_16): Likewise. (vmull_high_lane_16): Likewise. (vmull_high_n_16): Likewise. (vqrdmulh_n_s16): Likewise. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index c085fb9c49958c5f402a28c0b39fe45ec1aadbc7..5161e48dfcea91910b9dc2ee68219c3ede55f4aa 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2797,7 +2797,7 @@ (define_insn "aarch64_sqdml (match_operand:VD_HSI 2 "register_operand" "w")) (sign_extend: (vec_duplicate:VD_HSI - (match_operand: 3 "register_operand" "w" + (match_operand: 3 "register_operand" "" (const_int 1] "TARGET_SIMD" "sqdmll\\t%0, %2, %3.[0]" @@ -2955,7 +2955,7 @@ (define_insn "aarch64_sqdml (match_operand:VQ_HSI 4 "vect_par_cnst_hi_half" ""))) (sign_extend: (vec_duplicate: - (match_operand: 3 "register_operand" "w" + (match_operand: 3 "register_operand" "" (const_int 1] "TARGET_SIMD" "sqdmll2\\t%0, %2, %3.[0]" @@ -3083,7 +3083,7 @@ (define_insn "aarch64_sqdmull_n" (match_operand:VD_HSI 1 "register_operand" "w")) (sign_extend: (vec_duplicate:VD_HSI - (match_operand: 2 "register_operand" "w"))) + (match_operand: 2 "register_operand" ""))) ) (const_int 1)))] "TARGET_SIMD" @@ -3193,7 +3193,7 @@ (define_insn "aarch64_sqdmull2_n_i (match_operand:VQ_HSI 3 "vect_par_cnst_hi_half" ""))) (sign_extend: (vec_duplicate: - (match_operand: 2 "register_operand" "w"))) + (match_operand: 2 "register_operand" ""))) ) (const_int 1)))] "TARGET_SIMD" diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 29d1378..e20d34e 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -7146,7 +7146,7 @@ vld1q_dup_u64 (const uint64_t * a) int16x4_t result;\ __asm__ ("mla %0.4h, %2.4h, %3.h[%4]"\ : "=w"(result) \ -: "0"(a_), "w"(b_), "w"(c_), "i"(d) \ +: "0"(a_), "w"(b_), "x"(c_), "i"(d) \ : /* No clobbers */); \ result; \ }) @@ -7174,7 +7174,7 @@ vld1q_dup_u64 (const uint64_t * a) uint16x4_t result; \ __asm__ ("mla %0.4h, %2.4h, %3.h[%4]"\ : "=w"(result) \ -: "0"(a_), "w"(b_), "w"(c_), "i"(d) \ +: "0"(a_), "w"(b_), "x"(c_), "i"(d) \ : /* No clobbers */); \ result; \ }) @@ -7202,7 +7202,7 @@ vld1q_dup_u64 (const uint64_t * a) int16x4_t result;\ __asm__ ("mla %0.4h, %2.4h, %3.h[%4]"\ : "=w"(result) \ -: "0"(a_), "w"(b_), "w"(c_), "i"(d) \ +: "0"(a_), "w"(b_), "x"(c_), "i"(d) \ : /* No clobbers */); \ result; \ }) @@ -7230,7 +7230,7 @@ vld1q_dup_u64 (const uint64_t * a) uint16x4_t result; \ __asm__ ("mla %0.4h, %2
Re: RFC - Next refactoring steps
On Fri, Sep 6, 2013 at 10:14 AM, Mike Stump wrote: > On Sep 5, 2013, at 11:54 PM, Richard Biener > wrote: >> Most of the GCC headerfiles do not include all their required headers but >> rely on .c files doing that (in the appropriate order). I somehow like >> that though I cannot explain why ;) > > Very old school. I can explain why I don't like it, but I'll resist. The > universal standard is for each header to include all it needs and for the > ordering of includes not to matter any. Deviations from this are the > exception and should virtually never should be the case. > >> Also grepping for #includes I see that this doesn't seem to be true anymore. > > Yeah, everyone else hates the old school style with a passion, it was never a > good idea. :-) When they come across it, people have a tendency to fix the > headers as broken and over time, all the dependencies eventually get added. Well, I still think you cannot randomly change include file order in GCC. At least some #ifdef ... hackery in some headers will suddenly break (that is, change outcome) if you include for example tm.h before or after it. But yes, making all dependencies explicit puts #includes where they belong and shows where header refactoring would make sense. It also removes weird includes from .c files that are only necessary to make included required headers not break. Richard.
[i386, AVX-512F, pr58269] Partial fix for PR58269: properly initialize last EXT REX SSE register.
Hello, Here is a patch to fix pr58269. Actually this is not a full fix, but an obvious part. ChangeLog entry: 2013-09-06 Kirill Yukhin PR target/58269 * gcc/config/i386/i386.c (ix86_conditional_register_usage): Proper initialize extended SSE registers. Bootstrap pass. Ok for trunk? -- Thanks, K --- gcc/config/i386/i386.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a8d70bc..d6a40a8 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4218,7 +4218,7 @@ ix86_conditional_register_usage (void) /* If AVX512F is disabled, squash the registers. */ if (! TARGET_AVX512F) -for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++) +for (i = FIRST_EXT_REX_SSE_REG; i <= LAST_EXT_REX_SSE_REG; i++) fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; } -- 1.7.11.7
[patch] Fix another fallout of partial inlining change
Hi, see http://gcc.gnu.org/ml/gcc/2013-09/msg00028.html for the context. The patch sets DECL_NO_INLINE_WARNING_P on the non-inlinable part after splitting (an alternative would be to clear DECL_DECLARED_INLINE_P). Tested on x86_64-suse-linux, OK for the mainline? 2013-09-06 Eric Botcazou * ipa-split.c (split_function): Set DECL_NO_INLINE_WARNING_P on the non-inlinable part. 2013-09-06 Eric Botcazou * gnat.dg/warn10.ad[sb]: New test. * gnat.dg/warn10_pkg.ads: New helper. -- Eric BotcazouIndex: ipa-split.c === --- ipa-split.c (revision 202287) +++ ipa-split.c (working copy) @@ -1222,6 +1222,9 @@ split_function (struct split_point *spli DECL_BUILT_IN_CLASS (node->symbol.decl) = NOT_BUILT_IN; DECL_FUNCTION_CODE (node->symbol.decl) = (enum built_in_function) 0; } + /* If the original function is declared inline, there is no point in issuing + a warning for the non-inlinable part. */ + DECL_NO_INLINE_WARNING_P (node->symbol.decl) = 1; cgraph_node_remove_callees (cur_node); ipa_remove_all_references (&cur_node->symbol.ref_list); if (!split_part_return_p)-- { dg-do compile } -- { dg-options "-O3 -gnatn -Winline" } package body Warn10 is procedure Do_Something(Driver : My_Driver) is X : Float; begin X := Get_Input_Value( Driver, 1, 1); end; end Warn10;with Warn10_Pkg; use Warn10_Pkg; package Warn10 is type My_Driver is new Root with record Extra : Natural; end record; procedure Do_Something(Driver : My_Driver); end Warn10;package Warn10_Pkg is Size : constant Natural := 100; type My_Array is array(1..Size, 1..Size) of Float; type Root is tagged record Input_Values : My_Array; end record; function Get_Input_Value( Driver : Root; I, J : Natural) return Float; end Warn10_Pkg;
Re: Ping: RFA: testsuite patches (1/6): keeps_null_pointer_checks effect on pta/alias dump files
Quoting Richard Biener : I'd say rather than just disabling the scan for keeps_null_pointer_checks you should add appropriate scanning for those targets as well (I expect you just don't see the 'NULL's in the points-to sets). Well, for gcc.dg/ipa/ipa-pta-14.c, which doesn't have NONLOCAL otherwise, you instead see NONLOCAL. Please find attached the amended patch. I have verified that this gets the new expected PASSes for atmega128-sim. 2013-09-06 Joern Rennecke * gcc.dg/ipa/ipa-pta-14.c (scan-ipa-dump) [keeps_null_pointer_checks]: Don't expect NULL in foo.result set. * gcc.dg/tree-ssa/pta-escape-1.c (scan-tree-dump): Don't expect NULL in ESCAPED set. * gcc.dg/tree-ssa/pta-escape-2.c: Likewise. * gcc.dg/tree-ssa/pta-escape-3.c: Likewise. diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c b/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c index e8abc32..ed59cbf 100644 --- a/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c +++ b/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c @@ -22,7 +22,8 @@ int main() a.p = (void *)&c; p = foo(&a, &a); /* { dg-final { scan-ipa-dump "foo.result = { NULL a\[^ \]* c\[^ \]* }" "pta" { xfail *-*-* } } } */ - /* { dg-final { scan-ipa-dump "foo.result = { NULL a\[^ \]* a\[^ \]* c\[^ \]* }" "pta" } } */ + /* { dg-final { scan-ipa-dump "foo.result = { NULL a\[^ \]* a\[^ \]* c\[^ \]* }" "pta" { target { ! keeps_null_pointer_checks } } } } */ + /* { dg-final { scan-ipa-dump "foo.result = { NONLOCAL a\[^ \]* a\[^ \]* c\[^ \]* }" "pta" { target { keeps_null_pointer_checks } } } } */ ((struct X *)p)->p = (void *)0; if (a.p != (void *)0) abort (); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c index a424417..dcfae5d 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c @@ -33,5 +33,6 @@ int main() return 0; } -/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" "alias" } } */ +/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" "alias" { target { ! keeps_null_pointer_checks } } } } */ +/* { dg-final { scan-tree-dump "ESCAPED = { ESCAPED NONLOCAL x }" "alias" { target { keeps_null_pointer_checks } } } } */ /* { dg-final { cleanup-tree-dump "alias" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c index 8580382..e613959 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c @@ -34,5 +34,6 @@ int main() return 0; } -/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" "alias" } } */ +/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" "alias" { target { ! keeps_null_pointer_checks } } } } */ +/* { dg-final { scan-tree-dump "ESCAPED = { ESCAPED NONLOCAL x }" "alias" { target { keeps_null_pointer_checks } } } } */ /* { dg-final { cleanup-tree-dump "alias" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c index 5a121a0..870dcf6 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c @@ -38,5 +38,6 @@ int main() return 0; } -/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" "alias" } } */ +/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" "alias" { target { ! keeps_null_pointer_checks } } } } */ +/* { dg-final { scan-tree-dump "ESCAPED = { ESCAPED NONLOCAL x }" "alias" { target { keeps_null_pointer_checks } } } } */ /* { dg-final { cleanup-tree-dump "alias" } } */
Re: [PATCH, PR 57748] Check for out of bounds access
On Fri, Sep 6, 2013 at 10:35 AM, Bernd Edlinger wrote: > Richard, > >> But the movmisalign path skips all this code and with the >> current code thinks the actual access is in the mode of the >> whole structure. (and also misses the address adjustment >> as shown in the other testcase for the bug) >> >> The movmisalign handling in this path is just broken. And >> it's _not_ just for optimization! If you have >> >> struct __attribute__((packed)) { >> double x; >> v2df y; >> } *p; >> >> and do >> >> p = malloc (48); // gets you 8-byte aligned memory >> p->y = (v2df) { 1., 2. }; >> >> then you cannot skip the movmisaling handling because >> we'd generate an aligned move (based on the mode) then. >> > > Ok, test examples are really helpful here. > > This time the structure is BLKmode, unaligned, > movmisalign = false anyway. > > I tried to make a test case out of your example, > and as I expected, the code is also correct: > > foo: > .cfi_startproc > movdqa .LC1(%rip), %xmm0 > movq$-1, (%rdi) > movl$0x4048f5c3, 8(%rdi) > movdqu %xmm0, 12(%rdi) > ret > > movdqu. > > The test executes without trap. > And I did everything to make the object unaligned. Yeah, well - for the expand path with BLKmode it's working fine. We're doing all the dances because of correctness for non-BLKmode expand paths. > I am sure we could completely remove the > movmisalign path, and nothing would happen. > probably. except maybe for a performance regression. In this place probably yes. But we can also fix it to use the proper mode and properly do the offset adjustment. But just adding a bounds check looks broken to me. Note that there may have been a correctness reason for this code in the light of the IPA-SRA code. Maybe Martin remembers. If removing the movmisalign handling inside the handled-component case in expand_assignment works out (make sure to also test a strict-alignment target) then that is probably fine. Richard. > > Bernd.
Re: Ping: RFA: testsuite patches (1/6): keeps_null_pointer_checks effect on pta/alias dump files
On Fri, Sep 6, 2013 at 11:13 AM, Joern Rennecke wrote: > Quoting Richard Biener : > >> I'd say rather than just disabling the scan for >> keeps_null_pointer_checks you should add >> appropriate scanning for those targets as well (I expect you just >> don't see the 'NULL's in the points-to sets). > > > Well, for gcc.dg/ipa/ipa-pta-14.c, which doesn't have NONLOCAL otherwise, > you instead see NONLOCAL. Yes. On those targets NULL == NONLOCAL (0 is just another global address). > Please find attached the amended patch. > > I have verified that this gets the new expected PASSes for atmega128-sim. Ok. Thanks, Richard. > 2013-09-06 Joern Rennecke > > * gcc.dg/ipa/ipa-pta-14.c (scan-ipa-dump) > [keeps_null_pointer_checks]: > Don't expect NULL in foo.result set. > * gcc.dg/tree-ssa/pta-escape-1.c (scan-tree-dump): Don't expect NULL > in ESCAPED set. > * gcc.dg/tree-ssa/pta-escape-2.c: Likewise. > * gcc.dg/tree-ssa/pta-escape-3.c: Likewise. > > diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c > b/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c > index e8abc32..ed59cbf 100644 > --- a/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c > +++ b/gcc/testsuite/gcc.dg/ipa/ipa-pta-14.c > @@ -22,7 +22,8 @@ int main() >a.p = (void *)&c; >p = foo(&a, &a); >/* { dg-final { scan-ipa-dump "foo.result = { NULL a\[^ \]* c\[^ \]* }" > "pta" { xfail *-*-* } } } */ > - /* { dg-final { scan-ipa-dump "foo.result = { NULL a\[^ \]* a\[^ \]* c\[^ > \]* }" "pta" } } */ > + /* { dg-final { scan-ipa-dump "foo.result = { NULL a\[^ \]* a\[^ \]* c\[^ > \]* }" "pta" { target { ! keeps_null_pointer_checks } } } } */ > + /* { dg-final { scan-ipa-dump "foo.result = { NONLOCAL a\[^ \]* a\[^ \]* > c\[^ \]* }" "pta" { target { keeps_null_pointer_checks } } } } */ >((struct X *)p)->p = (void *)0; >if (a.p != (void *)0) > abort (); > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c > b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c > index a424417..dcfae5d 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-1.c > @@ -33,5 +33,6 @@ int main() >return 0; > } > > -/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" > "alias" } } */ > +/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" > "alias" { target { ! keeps_null_pointer_checks } } } } */ > +/* { dg-final { scan-tree-dump "ESCAPED = { ESCAPED NONLOCAL x }" "alias" { > target { keeps_null_pointer_checks } } } } */ > /* { dg-final { cleanup-tree-dump "alias" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c > b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c > index 8580382..e613959 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-2.c > @@ -34,5 +34,6 @@ int main() >return 0; > } > > -/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" > "alias" } } */ > +/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" > "alias" { target { ! keeps_null_pointer_checks } } } } */ > +/* { dg-final { scan-tree-dump "ESCAPED = { ESCAPED NONLOCAL x }" "alias" { > target { keeps_null_pointer_checks } } } } */ > /* { dg-final { cleanup-tree-dump "alias" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c > b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c > index 5a121a0..870dcf6 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pta-escape-3.c > @@ -38,5 +38,6 @@ int main() >return 0; > } > > -/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" > "alias" } } */ > +/* { dg-final { scan-tree-dump "ESCAPED = { NULL ESCAPED NONLOCAL x }" > "alias" { target { ! keeps_null_pointer_checks } } } } */ > +/* { dg-final { scan-tree-dump "ESCAPED = { ESCAPED NONLOCAL x }" "alias" { > target { keeps_null_pointer_checks } } } } */ > /* { dg-final { cleanup-tree-dump "alias" } } */ >
Re: [patch] Fix another fallout of partial inlining change
> Hi, > > see http://gcc.gnu.org/ml/gcc/2013-09/msg00028.html for the context. > The patch sets DECL_NO_INLINE_WARNING_P on the non-inlinable part after > splitting (an alternative would be to clear DECL_DECLARED_INLINE_P). Sorry, I missed your mail and it seems that my original mail did not hit the mailing list. I am attaching what I wrote back then for a record. The patch fixes situation where function is externaly visible and called once. In this case it makes sense to partially inline it into the one caller. Previous heuristic was wrong assuming that the function is static and thus preventing splitting because it makes more sense to inline it always. > > Tested on x86_64-suse-linux, OK for the mainline? > > > 2013-09-06 Eric Botcazou > > * ipa-split.c (split_function): Set DECL_NO_INLINE_WARNING_P on the > non-inlinable part. Yes, this is OK. Thinking about it, we ought to prvent splitting always_inline functions: those may contain something that relies on inlining. Either you can include in your change or I will fix it as a followup. Thank you, Honza Hi, while analyzing profile issues with Martin I noticed that the testcases bellow demonstrate three different bugs in profling and splitting. This patch fixes first of it - the function t() should be split and partially inlinined, while it is not. Bootstrapped/regtested ppc64-linux, will commit it shortly. Honza Index: ipa-split.c === --- ipa-split.c (revision 202153) +++ ipa-split.c (working copy) @@ -1537,7 +1538,9 @@ execute_split_functions (void) Note that we are not completely conservative about disqualifying functions called once. It is possible that the caller is called more then once and then inlining would still benefit. */ - if ((!node->callers || !node->callers->next_caller) + if ((!node->callers + /* Local functions called once will be completely inlined most of time. */ + || (!node->callers->next_caller && node->local.local)) && !node->symbol.address_taken && (!flag_lto || !node->symbol.externally_visible)) { Index: testsuite/gcc.dg/tree-ssa/fnsplit-1.c === --- testsuite/gcc.dg/tree-ssa/fnsplit-1.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/fnsplit-1.c (working copy) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-fnsplit" } */ +#include +int a[1000]; + +void +t(int a) +{ + if (a) +printf ("I Am Completely Operational,"), +printf ("And All My Circuits Are Functioning Perfectly\n"); +} +int +main(void) +{ + int i; + for (i = 0; i < 1000; i++) +t(a[i]); + return 0; +} +/* { dg-final { scan-tree-dump-times "Splitting function at:" 1 "fnsplit"} } */ + +/* { dg-final { cleanup-tree-dump "fnsplit" } } */
Re: [PATCH, PR 57748] Check for out of bounds access
> this patch fixes the ICE and wrong code issues from PR57748. It contains two > test cases. One for the ICE and one for the wrong code. Both test cases > fail for all gcc versions, and are fixed with this patch. > > Bootstrapped and tested on x86_64-linux without any problems. > > OK for trunk and the 4.8 / 4.7 branch? Please avoid touching the 4.7 branch at this point, this is a sensitive area and the testcases look quite artificial to me. -- Eric Botcazou
Re: [i386, AVX-512F, pr58269] Partial fix for PR58269: properly initialize last EXT REX SSE register.
On Fri, Sep 6, 2013 at 10:34 AM, Kirill Yukhin wrote: > Hello, > Here is a patch to fix pr58269. > Actually this is not a full fix, but an obvious part. > > ChangeLog entry: > 2013-09-06 Kirill Yukhin > > PR target/58269 > * gcc/config/i386/i386.c (ix86_conditional_register_usage): > Proper initialize extended SSE registers. This is OK. Thanks, Uros.
RE: [PATCH, PR 57748] Check for out of bounds access
Just for completeness, these were the test examples on this private communication: On Fri, 6 Sep 2013 11:19:18, Richard Biener wrote: > On Fri, Sep 6, 2013 at 10:35 AM, Bernd Edlinger > wrote: >> Richard, >> >>> But the movmisalign path skips all this code and with the >>> current code thinks the actual access is in the mode of the >>> whole structure. (and also misses the address adjustment >>> as shown in the other testcase for the bug) >>> >>> The movmisalign handling in this path is just broken. And >>> it's _not_ just for optimization! If you have >>> >>> struct __attribute__((packed)) { >>> double x; >>> v2df y; >>> } *p; >>> >>> and do >>> >>> p = malloc (48); // gets you 8-byte aligned memory >>> p->y = (v2df) { 1., 2. }; >>> >>> then you cannot skip the movmisaling handling because >>> we'd generate an aligned move (based on the mode) then. >>> >> >> Ok, test examples are really helpful here. >> >> This time the structure is BLKmode, unaligned, >> movmisalign = false anyway. >> >> I tried to make a test case out of your example, >> and as I expected, the code is also correct: >> >> foo: >> .cfi_startproc >> movdqa .LC1(%rip), %xmm0 >> movq $-1, (%rdi) >> movl $0x4048f5c3, 8(%rdi) >> movdqu %xmm0, 12(%rdi) >> ret >> >> movdqu. >> >> The test executes without trap. >> And I did everything to make the object unaligned. > > Yeah, well - for the expand path with BLKmode it's working fine. > We're doing all > the dances because of correctness for non-BLKmode expand paths. > >> I am sure we could completely remove the >> movmisalign path, and nothing would happen. >> probably. except maybe for a performance regression. > > In this place probably yes. But we can also fix it to use the proper mode and > properly do the offset adjustment. But just adding a bounds check looks > broken to me. > > Note that there may have been a correctness reason for this code in the > light of the IPA-SRA code. Maybe Martin remembers. > > If removing the movmisalign handling inside the handled-component > case in expand_assignment works out (make sure to also test a > strict-alignment target) then that is probably fine. > > Richard. > >> >> Bernd. #include #include typedef long long V __attribute__ ((vector_size (2 * sizeof (long long)), may_alias)); union x { long long a; float b; } __attribute__((aligned(1))) ; struct s { union x xx[0]; V x; } __attribute__((packed)); void __attribute__((noinline, noclone)) foo(struct s * x) { x->xx[0].a = -1; x->xx[0].b = 3.14; x->x[1] = 0x123456789ABCDEF; } int main() { struct s ss; memset(&ss, 0, sizeof(ss)); foo (&ss); printf("%f %llX\n", ss.xx[0].b, ss.xx[0].a); printf("%llX %llX\n", ss.x[0], ss.x[1]); } #include #include typedef long long V __attribute__ ((vector_size (2 * sizeof (long long)), may_alias)); struct __attribute__((packed)) s { long long a; float b; V x; }; void __attribute__((noinline, noclone)) foo(struct s * x) { V a = { 1, 2 }; x->a = -1; x->b = 3.14; x->x = a; } int main() { long long buf[100]; struct s* ss = (struct s*) ((char*)buf+1); memset(ss, 0, sizeof(*ss)); foo (ss); printf("%f %llX\n", ss->b, ss->a); printf("%llX %llX\n", ss->x[0], ss->x[1]); }
Re: [i386, AVX-512F, pr58269] Partial fix for PR58269: properly initialize last EXT REX SSE register.
On Fri, Sep 06, 2013 at 11:28:53AM +0200, Uros Bizjak wrote: > On Fri, Sep 6, 2013 at 10:34 AM, Kirill Yukhin > wrote: > > Hello, > > Here is a patch to fix pr58269. > > Actually this is not a full fix, but an obvious part. > > > > ChangeLog entry: > > 2013-09-06 Kirill Yukhin > > > > PR target/58269 > > * gcc/config/i386/i386.c (ix86_conditional_register_usage): > > Proper initialize extended SSE registers. > > This is OK. But please leave out gcc/ prefix from the ChangeLog entry. Jakub
Re: [PATCH, PR 57748] Check for out of bounds access
On Fri, Sep 06, 2013 at 11:28:44AM +0200, Eric Botcazou wrote: > > this patch fixes the ICE and wrong code issues from PR57748. It contains two > > test cases. One for the ICE and one for the wrong code. Both test cases > > fail for all gcc versions, and are fixed with this patch. > > > > Bootstrapped and tested on x86_64-linux without any problems. > > > > OK for trunk and the 4.8 / 4.7 branch? > > Please avoid touching the 4.7 branch at this point, this is a sensitive area > and the testcases look quite artificial to me. Even for 4.8 it needs at least several weeks on the trunk before considering backports of such code. Jakub
Re: [patch] Fix another fallout of partial inlining change
> Sorry, I missed your mail and it seems that my original mail did not > hit the mailing list. I am attaching what I wrote back then for a record. > The patch fixes situation where function is externaly visible and called > once. In this case it makes sense to partially inline it into the > one caller. Previous heuristic was wrong assuming that the function is > static and thus preventing splitting because it makes more sense to inline > it always. OK, thanks for the explanation. > Thinking about it, we ought to prvent splitting always_inline functions: > those may contain something that relies on inlining. Either you can > include in your change or I will fix it as a followup. I'll let you make the change. -- Eric Botcazou
Re: [PATCH, C++, PR58282] Handle noexcept on transactions with -fno-exceptions
On 04/09/13 19:21, Jason Merrill wrote: > On 09/03/2013 06:03 AM, Tom de Vries wrote: >> * semantics.c (finish_transaction_stmt, build_transaction_expr): Handle >> flag_exceptions. > > I'd rather handle this at a lower level, by making > build_must_not_throw_expr return its argument if -fno-exceptions. > Jason, This patch implements that way of fixing. In addition, it adds a test-case. Bootstrapped and reg-tested on x86_64. OK for trunk? Thanks, - Tom 2013-09-02 Tom de Vries PR c++/58282 * except.c (build_must_not_throw_expr): Handle flag_exceptions. * g++.dg/tm/noexcept-6.C: New test. diff --git a/gcc/cp/except.c b/gcc/cp/except.c index fbebcba..c76d944 100644 --- a/gcc/cp/except.c +++ b/gcc/cp/except.c @@ -374,6 +374,9 @@ build_must_not_throw_expr (tree body, tree cond) { tree type = body ? TREE_TYPE (body) : void_type_node; + if (!flag_exceptions) +return body; + if (cond && !value_dependent_expression_p (cond)) { cond = cxx_constant_value (cond); diff --git a/gcc/testsuite/g++.dg/tm/noexcept-6.C b/gcc/testsuite/g++.dg/tm/noexcept-6.C new file mode 100644 index 000..4391159 --- /dev/null +++ b/gcc/testsuite/g++.dg/tm/noexcept-6.C @@ -0,0 +1,23 @@ +// { dg-do compile } +// { dg-options "-fno-exceptions -fgnu-tm -O -std=c++0x -fdump-tree-tmlower" } + +struct TrueFalse +{ + static constexpr bool v() { return true; } +}; + +int global; + +template int foo() +{ + return __transaction_atomic noexcept(T::v()) (global + 1); +} + +int f1() +{ + return foo(); +} + +/* { dg-final { scan-tree-dump-times "eh_must_not_throw" 0 "tmlower" } } */ +/* { dg-final { scan-tree-dump-times "__transaction_atomic" 1 "tmlower" } } */ +/* { dg-final { cleanup-tree-dump "tmlower" } } */
Re: [i386, AVX-512F, pr58269] Partial fix for PR58269: properly initialize last EXT REX SSE register.
Hi Kirill, Thanks for Ilya's input on the PR thread. We've done some testing/checking across the Darwin versions last night and I've bootstrapped all langs including Ada, and tested the patch below (together with the fragment you posted earlier) on Darwin12. >> On Fri, Sep 6, 2013 at 10:34 AM, Kirill Yukhin >> wrote: >>> Here is a patch to fix pr58269. >>> Actually this is not a full fix, but an obvious part. Here's what I propose for the remainder of the fix (FAOD, I cannot approve the Darwin changes). Darwin is supposed to follow the System V psABI for x86_64 and, AFAICT for Darwin9, 10, 11 and 12 it is complying for function calls involving xmm0-7 (additional float args are, indeed, placed on the stack). Ergo, saving xmm8-15 in __builtin_apply_args() only consumes time and stack space - the content is not part of the call. As a fall-back; if I've missed a subtlety and, for some reason, we need to adjust the xmm set saved for compatibility with an older (gcc-4.2 based) system compiler, then we can override SSE_REGPARM_MAX in i386/darwin*.h for the relevant system version. Losing TARGET_MACHO conditional code is always nice :) Mike: Actually, since this seems to have uncovered a pre-existing wrong code bug, perhaps (this part of the) fix should also be applied to open branches? gcc: PR target/58269 config/i386/i386.c (ix86_function_arg_regno_p): Make Darwin use the xmm register set described in the psABI. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index a8d70bc..e68b3f9 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5706,17 +5706,9 @@ ix86_function_arg_regno_p (int regno) && (regno < FIRST_SSE_REG + SSE_REGPARM_MAX))); } - if (TARGET_MACHO) -{ - if (SSE_REGNO_P (regno) && TARGET_SSE) -return true; -} - else -{ - if (TARGET_SSE && SSE_REGNO_P (regno) - && (regno < FIRST_SSE_REG + SSE_REGPARM_MAX)) -return true; -} + if (TARGET_SSE && SSE_REGNO_P (regno) + && (regno < FIRST_SSE_REG + SSE_REGPARM_MAX)) +return true; /* TODO: The function should depend on current function ABI but builtins.c would need updating then. Therefore we use the
Fix ipa-devirt-11.C on AIX part 1
Hi, ipa-devirt-11.C (invented by me) checks for series of events that lead to an ICE. With local alias changes these events are not happening for multiple reasons. This patch solves first problem: we now redirect call to a function to call to its alias and that breaks detection of self recursion on several places. Introducewd by adding recursive_call_p predicate and also symtab logic to try to prove if two symbols must point to semantically equivalent piece of code (or vtable) Bootstrapped/regtested x86_64-linux, will commit it shortly. PR middle-end/58094 * cgraph.h (symtab_semantically_equivalent_p): Declare. * tree-tailcall.c: Include ipa-utils.h. (find_tail_calls): Use it. * ipa-pure-const.c (check_call): Likewise. * ipa-utils.c (recursive_call_p): New function. * ipa-utils.h (recursive_call_p): Dclare. * symtab.c (symtab_nonoverwritable_alias): Fix formatting. (symtab_semantically_equivalent_p): New function. * Makefile.in (tree-tailcall.o): Update dependencies. Index: cgraph.h === --- cgraph.h(revision 202315) +++ cgraph.h(working copy) @@ -627,6 +627,7 @@ bool symtab_for_node_and_aliases (symtab bool); symtab_node symtab_nonoverwritable_alias (symtab_node); enum availability symtab_node_availability (symtab_node); +bool symtab_semantically_equivalent_p (symtab_node, symtab_node); /* In cgraph.c */ void dump_cgraph (FILE *); Index: tree-tailcall.c === --- tree-tailcall.c (revision 202315) +++ tree-tailcall.c (working copy) @@ -35,6 +35,7 @@ along with GCC; see the file COPYING3. #include "target.h" #include "cfgloop.h" #include "common/common-target.h" +#include "ipa-utils.h" /* The file implements the tail recursion elimination. It is also used to analyze the tail calls in general, passing the results to the rtl level @@ -445,7 +446,7 @@ find_tail_calls (basic_block bb, struct /* We found the call, check whether it is suitable. */ tail_recursion = false; func = gimple_call_fndecl (call); - if (func == current_function_decl) + if (func && recursive_call_p (current_function_decl, func)) { tree arg; Index: ipa-pure-const.c === --- ipa-pure-const.c(revision 202315) +++ ipa-pure-const.c(working copy) @@ -541,7 +541,8 @@ check_call (funct_state local, gimple ca } /* When not in IPA mode, we can still handle self recursion. */ - if (!ipa && callee_t == current_function_decl) + if (!ipa && callee_t + && recursive_call_p (current_function_decl, callee_t)) { if (dump_file) fprintf (dump_file, "Recursive call can loop.\n"); @@ -1079,8 +1080,9 @@ ignore_edge (struct cgraph_edge *e) } /* Return true if NODE is self recursive function. - ??? self recursive and indirectly recursive funcions should - be the same, so this function seems unnecessary. */ + Indirectly recursive functions appears as non-trivial strongly + connected components, so we need to care about self recursion + only. */ static bool self_recursive_p (struct cgraph_node *node) Index: ipa-utils.c === --- ipa-utils.c (revision 202315) +++ ipa-utils.c (working copy) @@ -791,3 +791,14 @@ ipa_merge_profiles (struct cgraph_node * src->symbol.decl = oldsrcdecl; } +/* Return true if call to DEST is known to be self-recusive call withing FUNC. */ + +bool +recursive_call_p (tree func, tree dest) +{ + struct cgraph_node *dest_node = cgraph_get_create_node (dest); + struct cgraph_node *cnode = cgraph_get_create_node (func); + + return symtab_semantically_equivalent_p ((symtab_node)dest_node, + (symtab_node)cnode); +} Index: ipa-utils.h === --- ipa-utils.h (revision 202315) +++ ipa-utils.h (working copy) @@ -46,6 +46,7 @@ int ipa_reverse_postorder (struct cgraph tree get_base_var (tree); void ipa_merge_profiles (struct cgraph_node *dst, struct cgraph_node *src); +bool recursive_call_p (tree, tree); /* In ipa-profile.c */ bool ipa_propagate_frequency (struct cgraph_node *node); Index: symtab.c === --- symtab.c(revision 202315) +++ symtab.c(working copy) @@ -1106,11 +1106,48 @@ symtab_nonoverwritable_alias (symtab_nod { DECL_STATIC_CONSTRUCTOR (new_decl) = 0; DECL_STATIC_DESTRUCTOR (new_decl) = 0; - new_node = (symtab_node) cgraph_create_function_alias (new_decl, node->symbol.decl); + new_node = (symtab_node) cgraph_create_function_alias +(new_decl, node->symbol.decl); }
RFA: Fix mark_target_live_regs to take COND_EXEC into account
We found that std::basic_string, std::allocator ::copy(char*, unsigned long, unsigned long) const got miscompiled for ARC because reorg thought that all call-clobbered registers were dead after a conditional call. I can't reproduce the test case with current trunk + ARC port, but I reckon this is just due to the fickleness of the register allocator. Chances are, every other version, some other obscure function is miscompiled without this patch. bootstrapped on i686-pc-linux-gnu. OK to apply? 2013-04-29 Claudiu Zissulescu * resource.c (mark_target_live_regs): Compute resources taking into account if a call is predicated or not. diff --git a/gcc/resource.c b/gcc/resource.c Index: resource.c === --- resource.c (revision 202295) +++ resource.c (working copy) @@ -994,11 +994,18 @@ mark_target_live_regs (rtx insns, rtx ta if (CALL_P (real_insn)) { - /* CALL clobbers all call-used regs that aren't fixed except -sp, ap, and fp. Do this before setting the result of the -call live. */ - AND_COMPL_HARD_REG_SET (current_live_regs, - regs_invalidated_by_call); + /* Values in call-clobbered registers survive a COND_EXEC CALL +if that is not executed; this matters for resoure use because +they may be used by a complementarily (or more strictly) +predicated instruction, or if the CALL is NORETURN. */ + if (GET_CODE (PATTERN (real_insn)) != COND_EXEC) + { + /* CALL clobbers all call-used regs that aren't fixed except +sp, ap, and fp. Do this before setting the result of the +call live. */ + AND_COMPL_HARD_REG_SET (current_live_regs, + regs_invalidated_by_call); + } /* A CALL_INSN sets any global register live, since it may have been modified by the call. */
Fix ipa-devirt-11.C on AIX part 2
Hi, this patch makes tree-sra to do its job in the case where function has an alias. There were two problems; first recursion is not detected correctly and second we did not see the callers and thus skipped the function. Rest of tree-sra seems to work as expected. Bootstrapped/regtsted x86_64-linux, comitted. Honza * Makefile.in (tree-sra.o): Update dependencies. * tree-sra.c: Include ipa-utils.h (scan_function): Use recursive_call_p. (has_caller_p): New function. (cgraph_for_node_and_aliases): Count also callers of aliases. Index: Makefile.in === --- Makefile.in (revision 202316) +++ Makefile.in (working copy) @@ -3104,7 +3104,7 @@ tree-sra.o : tree-sra.c $(CONFIG_H) $(SY $(HASH_TABLE_H) $(TM_H) $(TREE_H) $(GIMPLE_H) $(CGRAPH_H) $(TREE_FLOW_H) \ $(IPA_PROP_H) $(DIAGNOSTIC_H) statistics.h \ $(PARAMS_H) $(TARGET_H) $(FLAGS_H) \ - $(DBGCNT_H) $(TREE_INLINE_H) $(GIMPLE_PRETTY_PRINT_H) + $(DBGCNT_H) $(TREE_INLINE_H) $(GIMPLE_PRETTY_PRINT_H) ipa-utils.h tree-switch-conversion.o : tree-switch-conversion.c $(CONFIG_H) $(SYSTEM_H) \ $(TREE_H) $(TM_P_H) $(TREE_FLOW_H) $(DIAGNOSTIC_H) $(TREE_INLINE_H) \ $(TM_H) coretypes.h $(GIMPLE_H) $(CFGLOOP_H) \ Index: tree-sra.c === --- tree-sra.c (revision 202315) +++ tree-sra.c (working copy) @@ -91,6 +91,7 @@ along with GCC; see the file COPYING3. #include "tree-inline.h" #include "gimple-pretty-print.h" #include "ipa-inline.h" +#include "ipa-utils.h" /* Enumeration of all aggregate reductions we can do. */ enum sra_mode { SRA_MODE_EARLY_IPA, /* early call regularization */ @@ -1256,8 +1257,7 @@ scan_function (void) if (DECL_BUILT_IN_CLASS (dest) == BUILT_IN_NORMAL && DECL_FUNCTION_CODE (dest) == BUILT_IN_APPLY_ARGS) encountered_apply_args = true; - if (cgraph_get_node (dest) - == cgraph_get_node (current_function_decl)) + if (recursive_call_p (current_function_decl, dest)) { encountered_recursive_call = true; if (!callsite_has_enough_arguments_p (stmt)) @@ -4906,6 +4906,16 @@ modify_function (struct cgraph_node *nod return cfg_changed; } +/* If NODE has a caller, return true. */ + +static bool +has_caller_p (struct cgraph_node *node, void *data ATTRIBUTE_UNUSED) +{ + if (node->callers) +return true; + return false; +} + /* Return false the function is apparently unsuitable for IPA-SRA based on it's attributes, return true otherwise. NODE is the cgraph node of the current function. */ @@ -4949,7 +4959,7 @@ ipa_sra_preliminary_function_checks (str return false; } - if (!node->callers) + if (!cgraph_for_node_and_aliases (node, has_caller_p, NULL, true)) { if (dump_file) fprintf (dump_file,
RFA: Fix mark_referenced_resources to handle COND_EXEC
We found that gethostbyname_r from uClibc got miscompiled for ARC because reorg thought a COND_EXEC unconditionally kills all the conditionally set registers. Making it think that the registers are not killed is no good either, because then we would miss anti-dependencies. So, short of adding some complex predicate-tracing across all paths, it is best to consider the result registers to be used before set. bootstrapped / regtested on i686-pc-linux-gnu. Well, that's not a particular relevant port since it doesn't combine delay_slots with COND_EXEC, but then, only ARC does, and it's still waiting for review. And an arc-linux-uclibc system can't boot properly without this patch being applied to the compiler, and you have to boot the system before doing a compiler bootstrap... and the later is also a bit uncommon for embedded targets. This is also the reason I placed the test into gcc.target; in principle the test could also live in gcc.dg, but it would only be relevant to ARC. OK to apply? gcc: 2013-04-30 Joern Rennecke * resource.c (mark_referenced_resources): Handle COND_EXEC. Index: resource.c === --- resource.c (revision 202295) +++ resource.c (working copy) @@ -374,6 +374,16 @@ mark_referenced_resources (rtx x, struct case INSN: case JUMP_INSN: + if (GET_CODE (PATTERN (x)) == COND_EXEC) + /* In addition to the usual references, also consider all outputs +as referenced, to compensate for mark_set_resources treating +them as killed. This is similar to ZERO_EXTRACT / STRICT_LOW_PART +handling, execpt that we got a partial incidence instead of a partial +width. */ + mark_set_resources (x, res, 0, + include_delayed_effects + ? MARK_SRC_DEST_CALL : MARK_SRC_DEST); + #ifdef INSN_REFERENCES_ARE_DELAYED if (! include_delayed_effects && INSN_REFERENCES_ARE_DELAYED (x)) gcc/testsuite: 2013-08-31 Joern Rennecke * gcc.target/arc/cond-set-use.c: New test. diff --git a/gcc/testsuite/gcc.target/arc/cond-set-use.c b/gcc/testsuite/gcc.target/arc/cond-set-use.c new file mode 100644 index 000..aee2725 --- /dev/null +++ b/gcc/testsuite/gcc.target/arc/cond-set-use.c @@ -0,0 +1,128 @@ +/* { dg-do run } */ +/* { dg-options "-Os" } */ + +/* Based on gethostbyname_r, + * Copyright (C) 2000-2006 Erik Andersen + * + * Licensed under the LGPL v2.1, see the file COPYING.LIB + * + * Extraction / wrapping as test by + * Joern Rennecke + * Copyright (C) 2013 Free Software Foundation, Inc. + */ + +typedef unsigned size_t; +typedef int ssize_t; +typedef unsigned uint32_t; +struct resolv_answer { + char *dotted; + int atype; + int aclass; + int ttl; + int rdlength; + const unsigned char *rdata; + int rdoffset; + char* buf; + size_t buflen; + size_t add_count; +}; +struct hostent +{ + char *h_name; + char **h_aliases; + int h_addrtype; + int h_length; + char **h_addr_list; +}; + +int *__attribute__ ((noinline,weak)) nop (void * p) { return p; }; +void __attribute__ ((noinline,weak)) seta (struct resolv_answer * p) +{ p->atype = 1;} + +int ghostbyname_r( + struct hostent *result_buf, + char *buf, + size_t buflen, + struct hostent **result, + int *h_errnop) +{ + char **addr_list; + char **alias; + char *alias0; + int i0; + struct resolv_answer a; + int i; + + *result = ((void *)0); + + *h_errnop = -1; + + if ((ssize_t)buflen <= 5) + return 34; + + alias = (char **)buf; + addr_list = (char **)buf; + + /* This got turned into branch with conditional move in delay slot. */ + if ((ssize_t)buflen < 256) + return 34; + + + { + if (!nop(&i0)) { + result_buf->h_aliases = alias; + result_buf->h_addrtype = 2; + result_buf->h_length = 4; + result_buf->h_addr_list = addr_list; + *result = result_buf; + *h_errnop = 0; + return 0; + } + } + + + seta (&a); + + if (a.atype == 1) { + + int need_bytes = sizeof(addr_list[0]) * (a.add_count + 1 + 1); + + int ips_len = a.add_count * a.rdlength; + + buflen -= (need_bytes + ips_len); + if ((ssize_t)buflen < 0) { + i = 34; + goto free_and_ret; + } + + result_buf->h_addrtype = 2; + *result = result_buf; + *h_errnop = 0; + i = 0; + goto free_and_ret; + } + + /* For cse, the 1 was is loaded into a call-saved register; +the load was hoisted into a delay slot before the conditional load, +clobbering result_buf, which (conditionally) lived in the same +call-saved register, because mark_referenced_resources considered the +destination of the COND_EXEC only clobbered, but not used. */ + *h_errnop = 1; + *nop(&i0) = 1; + i = 2; + + free_and_ret: + nop (&i0); + return i; +} + +int +main () +{ + struct hostent buf, *res; + int i; + char c; + ghostbyname_r (&buf, &c, 1024, &res, &i); + ghostbyname_r (&buf, 0, 1024, &res, &i); + return 0; +}
RE: [PATCH, PR 57748] Check for out of bounds access
On Fri, 6 Sep 2013 11:30:53, Jakub Jelinek wrote: > On Fri, Sep 06, 2013 at 11:28:44AM +0200, Eric Botcazou wrote: >>> this patch fixes the ICE and wrong code issues from PR57748. It contains two >>> test cases. One for the ICE and one for the wrong code. Both test cases >>> fail for all gcc versions, and are fixed with this patch. >>> >>> Bootstrapped and tested on x86_64-linux without any problems. >>> >>> OK for trunk and the 4.8 / 4.7 branch? >> >> Please avoid touching the 4.7 branch at this point, this is a sensitive area >> and the testcases look quite artificial to me. > > Even for 4.8 it needs at least several weeks on the trunk before considering > backports of such code. > > Jakub That's true, the examples are quite artificial, because it is difficult and quite target-dependent, if there will be a movmisalign optab at all. And the original test case will no longer reproduce on trunk because of this change: 2013-08-08 Bernd Edlinger PR target/58065 * config/arm/arm.h (MALLOC_ABI_ALIGNMENT): Define. maybe we should first backport this one. Yesterday someone did that on the linaro branch, but maybe /branches/gcc-4_8-branch should be updated as well? Bernd.
Fix ICE with -O0 -fdevirtualize
Hi, this patch fixes ICE with -O0 -fdevirtualize where we try to access type inheritance graph that does not exist. Bootstrapped/regtested x86_64-linux, comitted. 2013-09-06 Jan Hubicka PR tree-optimization/58311 * ipa-devirt.c (gate_ipa_devirt): Only execute when optimizing. Index: ipa-devirt.c === --- ipa-devirt.c(revision 202315) +++ ipa-devirt.c(working copy) @@ -1114,9 +1114,7 @@ ipa_devirt (void) static bool gate_ipa_devirt (void) { - /* FIXME: We should remove the optimize check after we ensure we never run - IPA passes when not optimizing. */ - return flag_devirtualize && !in_lto_p; + return flag_devirtualize && !in_lto_p && optimize; } namespace {
[PATCH] Adjust control dependence delete
To be slightly more obvious now that it is a vec ... Committed. Richard. 2013-09-06 Richard Biener * cfganal.c (control_dependences::~control_dependences): Properly free all of the vector. Index: gcc/cfganal.c === --- gcc/cfganal.c (revision 202316) +++ gcc/cfganal.c (working copy) @@ -431,7 +431,7 @@ control_dependences::control_dependences control_dependences::~control_dependences () { - for (int i = 0; i < last_basic_block; ++i) + for (unsigned i = 0; i < control_dependence_map.length (); ++i) BITMAP_FREE (control_dependence_map[i]); control_dependence_map.release (); free_edge_list (el);
Re: [i386, AVX-512F, pr58269] Partial fix for PR58269: properly initialize last EXT REX SSE register.
Hello, On 06 Sep 11:29, Jakub Jelinek wrote: > On Fri, Sep 06, 2013 at 11:28:53AM +0200, Uros Bizjak wrote: > > This is OK. > > But please leave out gcc/ prefix from the ChangeLog entry. Thanks, checked into main trunk: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00181.html with fixed ChangeLog. -- Thanks, K
[AArch64] Fix types of second parameter to qtbl/qtbx intrinsics
Hi, The signed variants of the qtbl and qtbx intrinsics currently take an int8x<8,16> for their control vector parameter. This should be a uint8x<8,16> parameter. Fixed as attached and checked against aarch64.exp on aarch64-none-elf with no regressions. Is this OK to commit? I have some similair patches kicking around in my tree, these feel obvious, but I'd like to check that others' share that perspective before I go committing anything! Thanks, James --- gcc/ 2013-09-06 James Greenhalgh * config/aarch64/arm_neon.h (vqtbl<1,2,3,4>_s8): Fix control vector parameter type. (vqtbx<1,2,3,4>_s8): Likewise. gcc/testsuite/ 2013-09-06 James Greenhalgh * gcc.target/aarch64/table-intrinsics.c (qtbl_tests8_< ,2,3,4>): Fix control vector parameter type. (qtb_tests8_< ,2,3,4>): Likewise. (qtblq_tests8_< ,2,3,4>): Likewise. (qtbxq_tests8_< ,2,3,4>): Likewise. diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index e20d34e..5864f2c 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -15973,7 +15973,7 @@ vqtbl1_p8 (poly8x16_t a, uint8x8_t b) } __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) -vqtbl1_s8 (int8x16_t a, int8x8_t b) +vqtbl1_s8 (int8x16_t a, uint8x8_t b) { int8x8_t result; __asm__ ("tbl %0.8b, {%1.16b}, %2.8b" @@ -16006,7 +16006,7 @@ vqtbl1q_p8 (poly8x16_t a, uint8x16_t b) } __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) -vqtbl1q_s8 (int8x16_t a, int8x16_t b) +vqtbl1q_s8 (int8x16_t a, uint8x16_t b) { int8x16_t result; __asm__ ("tbl %0.16b, {%1.16b}, %2.16b" @@ -16028,7 +16028,7 @@ vqtbl1q_u8 (uint8x16_t a, uint8x16_t b) } __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) -vqtbl2_s8 (int8x16x2_t tab, int8x8_t idx) +vqtbl2_s8 (int8x16x2_t tab, uint8x8_t idx) { int8x8_t result; __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t" @@ -16064,7 +16064,7 @@ vqtbl2_p8 (poly8x16x2_t tab, uint8x8_t idx) } __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) -vqtbl2q_s8 (int8x16x2_t tab, int8x16_t idx) +vqtbl2q_s8 (int8x16x2_t tab, uint8x16_t idx) { int8x16_t result; __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t" @@ -16100,7 +16100,7 @@ vqtbl2q_p8 (poly8x16x2_t tab, uint8x16_t idx) } __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) -vqtbl3_s8 (int8x16x3_t tab, int8x8_t idx) +vqtbl3_s8 (int8x16x3_t tab, uint8x8_t idx) { int8x8_t result; __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t" @@ -16136,7 +16136,7 @@ vqtbl3_p8 (poly8x16x3_t tab, uint8x8_t idx) } __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) -vqtbl3q_s8 (int8x16x3_t tab, int8x16_t idx) +vqtbl3q_s8 (int8x16x3_t tab, uint8x16_t idx) { int8x16_t result; __asm__ ("ld1 {v16.16b - v18.16b}, %1\n\t" @@ -16172,7 +16172,7 @@ vqtbl3q_p8 (poly8x16x3_t tab, uint8x16_t idx) } __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) -vqtbl4_s8 (int8x16x4_t tab, int8x8_t idx) +vqtbl4_s8 (int8x16x4_t tab, uint8x8_t idx) { int8x8_t result; __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t" @@ -16209,7 +16209,7 @@ vqtbl4_p8 (poly8x16x4_t tab, uint8x8_t idx) __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) -vqtbl4q_s8 (int8x16x4_t tab, int8x16_t idx) +vqtbl4q_s8 (int8x16x4_t tab, uint8x16_t idx) { int8x16_t result; __asm__ ("ld1 {v16.16b - v19.16b}, %1\n\t" @@ -16246,7 +16246,7 @@ vqtbl4q_p8 (poly8x16x4_t tab, uint8x16_t idx) __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) -vqtbx1_s8 (int8x8_t r, int8x16_t tab, int8x8_t idx) +vqtbx1_s8 (int8x8_t r, int8x16_t tab, uint8x8_t idx) { int8x8_t result = r; __asm__ ("tbx %0.8b,{%1.16b},%2.8b" @@ -16279,7 +16279,7 @@ vqtbx1_p8 (poly8x8_t r, poly8x16_t tab, uint8x8_t idx) } __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) -vqtbx1q_s8 (int8x16_t r, int8x16_t tab, int8x16_t idx) +vqtbx1q_s8 (int8x16_t r, int8x16_t tab, uint8x16_t idx) { int8x16_t result = r; __asm__ ("tbx %0.16b,{%1.16b},%2.16b" @@ -16312,7 +16312,7 @@ vqtbx1q_p8 (poly8x16_t r, poly8x16_t tab, uint8x16_t idx) } __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) -vqtbx2_s8 (int8x8_t r, int8x16x2_t tab, int8x8_t idx) +vqtbx2_s8 (int8x8_t r, int8x16x2_t tab, uint8x8_t idx) { int8x8_t result = r; __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t" @@ -16349,7 +16349,7 @@ vqtbx2_p8 (poly8x8_t r, poly8x16x2_t tab, uint8x8_t idx) __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) -vqtbx2q_s8 (int8x16_t r, int8x16x2_t tab, int8x16_t idx) +vqtbx2q_s8 (int8x16_t r, int8x16x2_t tab, uint8x16_t idx) { int8x16_t result = r; __asm__ ("ld1 {v16.16b, v17.16b}, %1\n\t" @@ -16386,7 +16386,7 @@ vqtbx2q_p8 (poly8x16_t r, poly8x16x2_t tab, uint8x16_t idx) __extensio
Re: [Patch AArch64] Fix register constraints for lane intrinsics.
On 6 September 2013 09:18, James Greenhalgh wrote: > > Hi, > > Most of the vector-by-element instructions in AArch64 have the restriction > that, if the vector they are taking an element from has type "h" > then it must be in a register from the lower half of the vector register > set (i.e. v0-v15). While we have imposed that restriction in places, we > have not been consistent. > > Fix that. > > Tested with aarch64.exp with no regressions. > > OK for trunk? OK /Marcus
Re: [Patch, AArch64] Fix vdup_lane_* intrinsics' lane parameter.
On 5 September 2013 17:21, Tejas Belagod wrote: > > Hi, > > This patch fixes vdup_lane_* intrinsics in arm_neon.h to have the > correct lane parameter as opposed to the present '0'. > > Tested on aarch64-none-elf. OK for trunk? > > Thanks, > Tejas Belagod > ARM. > > Changelog: > > 2013-09-05 Tejas Belagod OK /Marcus
Re: [AArch64] Fix types of second parameter to qtbl/qtbx intrinsics
On 6 September 2013 11:45, James Greenhalgh wrote: > The signed variants of the qtbl and qtbx intrinsics currently > take an int8x<8,16> for their control vector parameter. > This should be a uint8x<8,16> parameter. > > Fixed as attached and checked against aarch64.exp on aarch64-none-elf > with no regressions. > > Is this OK to commit? This is OK. /Marcus
Re: [Patch ARM] Add "type" attribute to Everything!
On 05/09/13 17:44, James Greenhalgh wrote: > > This patch changes the default type to be 'untyped' and then adds > type information to everything in the ARM backend. > > To do this, we introduce three new types: > > * "multiple" which is used where an insn will be split, or > where it will generate a sequence of more than one instruction. > * "no_insn" which is used where an insn will not generate an > instruction in the final output. > * "untyped" which is used where no type has been given, and > is the default type. > > Then we add these types everywhere appropriate in the arm backend. > > Checked and bootstrapped on arm-none-eabi and arm-none-linux-gnueabihf > with no issues. > > Thanks, > James > > --- > gcc/ > > 2013-09-05 James Greenhalgh > > * config/arm/types.md: Add "no_insn", "multiple" and "untyped" > types. > * config/arm/arm-fixed.md: Add type attribute to all insn > patterns. > (add3): Add type attribute. > (add3): Likewise. > (usadd3): Likewise. > (ssadd3): Likewise. > (sub3): Likewise. > (sub3): Likewise. > (ussub3): Likewise. > (sssub3): Likewise. > (ssmulsa3): Likewise. > (usmulusa3): Likewise. > (arm_usatsihi): Likewise. > * config/arm/vfp.md > (*movdi_vfp): Add types for all instructions. > (*movdi_vfp_cortexa8): Likewise. > (*movhf_vfp_neon): Likewise. > (*movhf_vfp): Likewise. > (*movdf_vfp): Likewise. > (*thumb2_movdf_vfp): Likewise. > (*thumb2_movdfcc_vfp): Likewise. > * config/arm/arm.md: Add type attribute to all insn patterns. > (*thumb1_adddi3): Add type attribute. > (*arm_adddi3): Likewise. > (*adddi_sesidi_di): Likewise. > (*adddi_zesidi_di): Likewise. > (*thumb1_addsi3): Likewise. > (addsi3_compare0): Likewise. > (*addsi3_compare0_scratch): Likewise. > (*compare_negsi_si): Likewise. > (cmpsi2_addneg): Likewise. > (*addsi3_carryin_): Likewise. > (*addsi3_carryin_alt2_): Likewise. > (*addsi3_carryin_clobercc_): Likewise. > (*subsi3_carryin): Likewise. > (*subsi3_carryin_const): Likewise. > (*subsi3_carryin_compare): Likewise. > (*subsi3_carryin_compare_const): Likewise. > (*arm_subdi3): Likewise. > (*thumb_subdi3): Likewise. > (*subdi_di_zesidi): Likewise. > (*subdi_di_sesidi): Likewise. > (*subdi_zesidi_di): Likewise. > (*subdi_sesidi_di): Likewise. > (*subdi_zesidi_ze): Likewise. > (thumb1_subsi3_insn): Likewise. > (*arm_subsi3_insn): Likewise. > (*anddi3_insn): Likewise. > (*anddi_zesidi_di): Likewise. > (*anddi_sesdi_di): Likewise. > (*ne_zeroextracts): Likewise. > (*ne_zeroextracts): Likewise. > (*ite_ne_zeroextr): Likewise. > (*ite_ne_zeroextr): Likewise. > (*anddi_notdi_di): Likewise. > (*anddi_notzesidi): Likewise. > (*anddi_notsesidi): Likewise. > (andsi_notsi_si): Likewise. > (thumb1_bicsi3): Likewise. > (*iordi3_insn): Likewise. > (*iordi_zesidi_di): Likewise. > (*iordi_sesidi_di): Likewise. > (*thumb1_iorsi3_insn): Likewise. > (*xordi3_insn): Likewise. > (*xordi_zesidi_di): Likewise. > (*xordi_sesidi_di): Likewise. > (*arm_xorsi3): Likewise. > (*andsi_iorsi3_no): Likewise. > (*smax_0): Likewise. > (*smax_m1): Likewise. > (*arm_smax_insn): Likewise. > (*smin_0): Likewise. > (*arm_smin_insn): Likewise. > (*arm_umaxsi3): Likewise. > (*arm_uminsi3): Likewise. > (*minmax_arithsi): Likewise. > (*minmax_arithsi_): Likewise. > (*satsi_): Likewise. > (arm_ashldi3_1bit): Likewise. > (arm_ashrdi3_1bit): Likewise. > (arm_lshrdi3_1bit): Likewise. > (*arm_negdi2): Likewise. > (*thumb1_negdi2): Likewise. > (*arm_negsi2): Likewise. > (*thumb1_negsi2): Likewise. > (*negdi_extendsid): Likewise. > (*negdi_zero_extend): Likewise. > (*arm_abssi2): Likewise. > (*thumb1_abssi2): Likewise. > (*arm_neg_abssi2): Likewise. > (*thumb1_neg_abss): Likewise. > (one_cmpldi2): Likewise. > (extenddi2): Likewise. > (*compareqi_eq0): Likewise. > (*arm_extendhisi2addsi): Likewise. > (*arm_movdi): Likewise. > (*thumb1_movdi_insn): Likewise. > (*arm_movt): Likewise. > (*thumb1_movsi_insn): Likewise. > (pic_add_dot_plus_four): Likewise. > (pic_add_dot_plus_eight): Likewise. > (tls_load_dot_plus_eight): Likewise. > (*thumb1_movhi_insn): Likewise. > (*thumb1_movsf_insn): Likewise. > (*movdf_soft_insn): Likewise. > (*thumb_movdf_insn): Likewise. > (cbranchsi4_insn): Likewise. > (cbranchsi4_scratch): Likewise. > (*negated_cbranchsi4): Likewise. > (*tbit_cbranch): Likewise. > (*tlobits_cbranch): Likewise. > (*
Re: RFC - Next refactoring steps
Hi, On Thu, 5 Sep 2013, Andrew MacLeod wrote: > 1 - I think we ought to split out the data structures from gimple.h into > gimple-core.h, like we did with tree.h Why? > gimple.h. That won't really affect my work. I think it probably ought to be > done for clarity eventually.gimple.h would then simply become a collector > of "gimple-blah.h" files which are required for a complete gimple system. Doesn't make sense to me. If you split it into multiple files, just to then create a wrapper called gimple.h including all of them again for the consumers you can just as well have all of it in one file. > 2 - all the uses of TREE_NULL to refer to an empty/nonexistent object... it > is no longer in tree-core.h, which I think is correct. what should we start > using instead in converted files? options are: > a) 0 > b) NULL > c) something we define as 0, like GIMPLE_NULL I think I'd prefer 0, but it creates a problem with varargs functions, so you'd always need "(gimple)0" for those, which you'd probably hide behind a macro, at which point you'd have two forms for the nil gimple ("0" and that macro), for consistency you'd want to use the macro in place of "0" also in the non-varargs places, et voila, you're back to "NULL" or "NULL_GIMPLE" :-/ > 3) remove tree.h from other .h files > Now that tree.h is split, there are a few .h files which directly include > tree.h themselves. It would be nice if we could remove the implementation > requirement of tree.h to ease moving to gimple.h. The 4 files are : > ipa-utils.h lto-streamer.h streamer-hooks.h tree-streamer.h > It should be possible to not directly include tree.h itself in these files. > Worst case, the header file is included after tree.h from the .C files.. that > seems to be the way most of the other include files avoid including tree.h > directly. That can be done right now I think. For the rest of the topic, tree vs gimple: I don't see much value in that. Why create a gimple_expr type that basically is just tree? You can as well use tree then. Ciao, Michael.
[ARM,AARCH64] Insn type reclassification. Split f_cvt type.
This patch splits the f_cvt attribute to: * f_cvt conversions between float representations. * f_cvti2f conversions from int to float. * f_cvtf2i conversions from float to int. Then we update the pipeline descriptions to refelct this change. Regression tested for aarch64-none-elf and arm-none-eabi and sanity checked. Bootstrapped in series with other type splitting patches. OK? Thanks, James --- 2013-09-06 James Greenhalgh * config/arm/types.md (type): Split f_cvt as f_cvt, f_cvtf2i, f_cvti2f. * config/aarch64/aarch64.md (l2): Update with new attributes. (fix_trunc2): Likewise. (fixuns_trunc2): Likewise. (float2): Likewise. * config/arm/vfp.md (*truncsisf2_vfp): Update with new attributes. (*truncsidf2_vfp): Likewise. (fixuns_truncsfsi2): Likewise. (fixuns_truncdfsi2): Likewise. (*floatsisf2_vfp): Likewise. (*floatsidf2_vfp): Likewise. (floatunssisf2): Likewise. (floatunssidf2): Likewise. (*combine_vcvt_f32_): Likewise. (*combine_vcvt_f64_): Likewise. * config/arm/arm1020e.md: Update with new attributes. * config/arm/cortex-a15-neon.md: Update with new attributes. * config/arm/cortex-a5.md: Update with new attributes. * config/arm/cortex-a53.md: Update with new attributes. * config/arm/cortex-a7.md: Update with new attributes. * config/arm/cortex-a8-neon.md: Update with new attributes. * config/arm/cortex-a9.md: Update with new attributes. * config/arm/cortex-m4-fpu.md: Update with new attributes. * config/arm/cortex-r4f.md: Update with new attributes. * config/arm/marvell-pj4.md: Update with new attributes. * config/arm/vfp11.md: Update with new attributes. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 4dfd2ab83d00601dc8192ad47fec2c1e404d1264..6a4a975bb89c48311659db0091c76266d29cdba2 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3685,7 +3685,7 @@ (define_insn "l< "TARGET_FLOAT" "fcvt\\t%0, %1" [(set_attr "v8type" "fcvtf2i") - (set_attr "type" "f_cvt") + (set_attr "type" "f_cvtf2i") (set_attr "mode" "") (set_attr "mode2" "")] ) @@ -3785,7 +3785,7 @@ (define_insn "fix_trunc0, %1" [(set_attr "v8type" "fcvtf2i") - (set_attr "type" "f_cvt") + (set_attr "type" "f_cvtf2i") (set_attr "mode" "") (set_attr "mode2" "")] ) @@ -3796,7 +3796,7 @@ (define_insn "fixuns_trunc0, %1" [(set_attr "v8type" "fcvtf2i") - (set_attr "type" "f_cvt") + (set_attr "type" "f_cvtf2i") (set_attr "mode" "") (set_attr "mode2" "")] ) @@ -3807,7 +3807,7 @@ (define_insn "float2 "TARGET_FLOAT" "scvtf\\t%0, %1" [(set_attr "v8type" "fcvti2f") - (set_attr "type" "f_cvt") + (set_attr "type" "f_cvti2f") (set_attr "mode" "") (set_attr "mode2" "")] ) diff --git a/gcc/config/arm/arm1020e.md b/gcc/config/arm/arm1020e.md index 615c6a5b16de647cbd8c0fa947f8b763a1353ee3..e16e862c1f49b36f75ba1faf20c2095fb9aeacdf 100644 --- a/gcc/config/arm/arm1020e.md +++ b/gcc/config/arm/arm1020e.md @@ -289,7 +289,7 @@ (define_insn_reservation "v10_farith" 5 (define_insn_reservation "v10_cvt" 5 (and (eq_attr "vfp10" "yes") - (eq_attr "type" "f_cvt")) + (eq_attr "type" "f_cvt,f_cvti2f,f_cvtf2i")) "1020a_e+v10_fmac") (define_insn_reservation "v10_fmul" 6 diff --git a/gcc/config/arm/cortex-a15-neon.md b/gcc/config/arm/cortex-a15-neon.md index f1cac9e1af88bd5e3f0d87ff50c44376ad82d441..b5d14e7f7f9c3965e02e0d6e0edf0044df341812 100644 --- a/gcc/config/arm/cortex-a15-neon.md +++ b/gcc/config/arm/cortex-a15-neon.md @@ -471,7 +471,7 @@ (define_insn_reservation "cortex_a15_vfp (define_insn_reservation "cortex_a15_vfp_cvt" 6 (and (eq_attr "tune" "cortexa15") - (eq_attr "type" "f_cvt")) + (eq_attr "type" "f_cvt,f_cvtf2i,f_cvti2f")) "ca15_issue1,ca15_cx_vfp") (define_insn_reservation "cortex_a15_vfp_cmpd" 8 diff --git a/gcc/config/arm/cortex-a5.md b/gcc/config/arm/cortex-a5.md index 8930baf8daff5be2d2872324cd41fd5a1cd03778..54c8c420324a155523bc961917c475c5aeb86a96 100644 --- a/gcc/config/arm/cortex-a5.md +++ b/gcc/config/arm/cortex-a5.md @@ -168,7 +168,8 @@ (define_insn_reservation "cortex_a5_bran (define_insn_reservation "cortex_a5_fpalu" 4 (and (eq_attr "tune" "cortexa5") - (eq_attr "type" "ffariths, fadds, ffarithd, faddd, fcpys, fmuls, f_cvt,\ + (eq_attr "type" "ffariths, fadds, ffarithd, faddd, fcpys, fmuls,\ +f_cvt,f_cvtf2i,f_cvti2f,\ fcmps, fcmpd")) "cortex_a5_ex1+cortex_a5_fpadd_pipe") diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md index 66d4cb436f5fb43b545f94ac57a8c5e909360353..e84b9ea1a71ef1df2476d3b25900522469074914 100644 --- a/gcc/config/arm/cortex-a53.md +++ b/gcc/config/arm/cortex-a53.md @@ -209,7 +209,8 @@ (define_insn_reservation "cortex_a53_bra (define_
[Patch ARM AARCH64] Split "type" attributes: fdiv
Hi, The type attributes "fdivs,fdivd" should be split as: fdivs -> fsqrts, fdivs fdivd -> fsqrtd, fdivd Do this and update pipelines as needed. Regression tested on aarch64-none-elf and arm-none-eabi and bootstrapped in series with other type splitting patches. OK? Thanks, James --- 2013-09-06 James Greenhalgh * config/arm/types.md: Split fdiv as fsqrt, fdiv. * config/arm/arm.md (core_cycles): Remove fdiv. * config/arm/vfp.md: (*sqrtsf2_vfp): Update for attribute changes. (*sqrtdf2_vfp): Likewise. * config/aarch64/aarch64.md: (sqrt2): Update for attribute changes. * config/arm/arm1020e.md: Update with new attributes. * config/arm/cortex-a15-neon.md: Update with new attributes. * config/arm/cortex-a5.md: Update with new attributes. * config/arm/cortex-a53.md: Update with new attributes. * config/arm/cortex-a7.md: Update with new attributes. * config/arm/cortex-a8-neon.md: Update with new attributes. * config/arm/cortex-a9.md: Update with new attributes. * config/arm/cortex-m4-fpu.md: Update with new attributes. * config/arm/cortex-r4f.md: Update with new attributes. * config/arm/marvell-pj4.md: Update with new attributes. * config/arm/vfp11.md: Update with new attributes. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 6a4a975bb89c48311659db0091c76266d29cdba2..ded37efb4c86130af8dd82db66d50cc227bfeff0 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3903,7 +3903,7 @@ (define_insn "sqrt2" "TARGET_FLOAT" "fsqrt\\t%0, %1" [(set_attr "v8type" "fsqrt") - (set_attr "type" "fdiv") + (set_attr "type" "fsqrt") (set_attr "mode" "")] ) diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 5ed8ee7dc6293bf93869545bef4cd3f60966908b..6c0fbf44288c9f6e077fe2d9836cd5c1e2042a0a 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -335,7 +335,6 @@ (define_attr "core_cycles" "single,multi alus_shift_imm, alus_shift_reg, bfm, csel, rev, logic_imm, logic_reg,\ logic_shift_imm, logic_shift_reg, logics_imm, logics_reg,\ logics_shift_imm, logics_shift_reg, extend, shift_imm, float, fcsel,\ -fdivd, fdivs,\ wmmx_wor, wmmx_wxor, wmmx_wand, wmmx_wandn, wmmx_wmov, wmmx_tmcrr,\ wmmx_tmrrc, wmmx_wldr, wmmx_wstr, wmmx_tmcr, wmmx_tmrc, wmmx_wadd,\ wmmx_wsub, wmmx_wmul, wmmx_wmac, wmmx_wavg2, wmmx_tinsr, wmmx_textrm,\ diff --git a/gcc/config/arm/arm1020e.md b/gcc/config/arm/arm1020e.md index e16e862c1f49b36f75ba1faf20c2095fb9aeacdf..8cf0890d9300527962b14f60e08c190155616425 100644 --- a/gcc/config/arm/arm1020e.md +++ b/gcc/config/arm/arm1020e.md @@ -299,12 +299,12 @@ (define_insn_reservation "v10_fmul" 6 (define_insn_reservation "v10_fdivs" 18 (and (eq_attr "vfp10" "yes") - (eq_attr "type" "fdivs")) + (eq_attr "type" "fdivs, fsqrts")) "1020a_e+v10_ds*14") (define_insn_reservation "v10_fdivd" 32 (and (eq_attr "vfp10" "yes") - (eq_attr "type" "fdivd")) + (eq_attr "type" "fdivd, fsqrtd")) "1020a_e+v10_fmac+v10_ds*28") (define_insn_reservation "v10_floads" 4 diff --git a/gcc/config/arm/cortex-a15-neon.md b/gcc/config/arm/cortex-a15-neon.md index b5d14e7f7f9c3965e02e0d6e0edf0044df341812..057507a762ab546e37b4a32a9771b4098a693d55 100644 --- a/gcc/config/arm/cortex-a15-neon.md +++ b/gcc/config/arm/cortex-a15-neon.md @@ -501,12 +501,12 @@ (define_insn_reservation "cortex_a15_vfp (define_insn_reservation "cortex_a15_vfp_divs" 10 (and (eq_attr "tune" "cortexa15") - (eq_attr "type" "fdivs")) + (eq_attr "type" "fdivs, fsqrts")) "ca15_issue1,ca15_cx_ik") (define_insn_reservation "cortex_a15_vfp_divd" 18 (and (eq_attr "tune" "cortexa15") - (eq_attr "type" "fdivd")) + (eq_attr "type" "fdivd, fsqrtd")) "ca15_issue1,ca15_cx_ik") ;; Define bypasses. diff --git a/gcc/config/arm/cortex-a5.md b/gcc/config/arm/cortex-a5.md index 54c8c420324a155523bc961917c475c5aeb86a96..03d3cc99106f4e8875b649b06dbd1341f18a5f55 100644 --- a/gcc/config/arm/cortex-a5.md +++ b/gcc/config/arm/cortex-a5.md @@ -233,14 +233,14 @@ (define_insn_reservation "cortex_a5_fpma (define_insn_reservation "cortex_a5_fdivs" 14 (and (eq_attr "tune" "cortexa5") - (eq_attr "type" "fdivs")) + (eq_attr "type" "fdivs, fsqrts")) "cortex_a5_ex1, cortex_a5_fp_div_sqrt * 13") ;; ??? Similarly for fdivd. (define_insn_reservation "cortex_a5_fdivd" 29 (and (eq_attr "tune" "cortexa5") - (eq_attr "type" "fdivd")) + (eq_attr "type" "fdivd, fsqrtd")) "cortex_a5_ex1, cortex_a5_fp_div_sqrt * 28") diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md index e84b9ea1a71ef1df2476d3b25900522469074914..e52f4377c62f217172c2f0b88af724f59ed7cef0 100644 --- a/gcc/config/arm/cortex-a53.md +++ b/gcc/config/arm/cortex-a53
[Patch AArch64] Fix types for some multiply instructions.
Hi, We don't really need to split the types on these instructions. The ARM backend already has suitable descriptions of things like mla and smlal. Use them. Regression tested on aarch64-none-elf with no regressions. OK? Thanks, James --- 2013-09-06 James Greenhalgh * config/aarch64/aarch64.md (*madd): Fix type attribute. (*maddsi_uxtw): Likewise. (*msub): Likewise. (*msubsi_uxtw): Likewise. (maddsidi4): Likewise. (msubsidi4): Likewise. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index ded37efb4c86130af8dd82db66d50cc227bfeff0..e28764da5dd608259098d3150783e6eacd09be27 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2281,7 +2281,7 @@ (define_insn "*madd" "" "madd\\t%0, %1, %2, %3" [(set_attr "v8type" "madd") - (set_attr "type" "mul") + (set_attr "type" "mla") (set_attr "mode" "")] ) @@ -2295,7 +2295,7 @@ (define_insn "*maddsi_uxtw" "" "madd\\t%w0, %w1, %w2, %w3" [(set_attr "v8type" "madd") - (set_attr "type" "mul") + (set_attr "type" "mla") (set_attr "mode" "SI")] ) @@ -2308,7 +2308,7 @@ (define_insn "*msub" "" "msub\\t%0, %1, %2, %3" [(set_attr "v8type" "madd") - (set_attr "type" "mul") + (set_attr "type" "mla") (set_attr "mode" "")] ) @@ -2323,7 +2323,7 @@ (define_insn "*msubsi_uxtw" "" "msub\\t%w0, %w1, %w2, %w3" [(set_attr "v8type" "madd") - (set_attr "type" "mul") + (set_attr "type" "mla") (set_attr "mode" "SI")] ) @@ -2373,7 +2373,7 @@ (define_insn "maddsidi4" "" "maddl\\t%0, %w1, %w2, %3" [(set_attr "v8type" "maddl") - (set_attr "type" "mul") + (set_attr "type" "mlal") (set_attr "mode" "DI")] ) @@ -2387,7 +2387,7 @@ (define_insn "msubsidi4" "" "msubl\\t%0, %w1, %w2, %3" [(set_attr "v8type" "maddl") - (set_attr "type" "mul") + (set_attr "type" "mlal") (set_attr "mode" "DI")] )
Re: [ARM,AARCH64] Insn type reclassification. Split f_cvt type.
On 06/09/13 14:24, James Greenhalgh wrote: > > This patch splits the f_cvt attribute to: > > * f_cvt conversions between float representations. > * f_cvti2f conversions from int to float. > * f_cvtf2i conversions from float to int. > > Then we update the pipeline descriptions to refelct this change. > > Regression tested for aarch64-none-elf and arm-none-eabi and sanity > checked. Bootstrapped in series with other type splitting patches. > > OK? > > Thanks, > James > > --- > 2013-09-06 James Greenhalgh > > * config/arm/types.md > (type): Split f_cvt as f_cvt, f_cvtf2i, f_cvti2f. > * config/aarch64/aarch64.md > (l2): Update with > new attributes. > (fix_trunc2): Likewise. > (fixuns_trunc2): Likewise. > (float2): Likewise. > * config/arm/vfp.md > (*truncsisf2_vfp): Update with new attributes. > (*truncsidf2_vfp): Likewise. > (fixuns_truncsfsi2): Likewise. > (fixuns_truncdfsi2): Likewise. > (*floatsisf2_vfp): Likewise. > (*floatsidf2_vfp): Likewise. > (floatunssisf2): Likewise. > (floatunssidf2): Likewise. > (*combine_vcvt_f32_): Likewise. > (*combine_vcvt_f64_): Likewise. > * config/arm/arm1020e.md: Update with new attributes. > * config/arm/cortex-a15-neon.md: Update with new attributes. > * config/arm/cortex-a5.md: Update with new attributes. > * config/arm/cortex-a53.md: Update with new attributes. > * config/arm/cortex-a7.md: Update with new attributes. > * config/arm/cortex-a8-neon.md: Update with new attributes. > * config/arm/cortex-a9.md: Update with new attributes. > * config/arm/cortex-m4-fpu.md: Update with new attributes. > * config/arm/cortex-r4f.md: Update with new attributes. > * config/arm/marvell-pj4.md: Update with new attributes. > * config/arm/vfp11.md: Update with new attributes. > > OK. R.
Re: [Patch ARM AARCH64] Split "type" attributes: fdiv
On 06/09/13 14:27, James Greenhalgh wrote: > > Hi, > > The type attributes "fdivs,fdivd" should be split as: > > fdivs -> fsqrts, fdivs > fdivd -> fsqrtd, fdivd > > Do this and update pipelines as needed. > > Regression tested on aarch64-none-elf and arm-none-eabi and > bootstrapped in series with other type splitting patches. > > OK? > > Thanks, > James > > --- > 2013-09-06 James Greenhalgh > > * config/arm/types.md: Split fdiv as fsqrt, fdiv. > * config/arm/arm.md (core_cycles): Remove fdiv. > * config/arm/vfp.md: > (*sqrtsf2_vfp): Update for attribute changes. > (*sqrtdf2_vfp): Likewise. > * config/aarch64/aarch64.md: > (sqrt2): Update for attribute changes. > * config/arm/arm1020e.md: Update with new attributes. > * config/arm/cortex-a15-neon.md: Update with new attributes. > * config/arm/cortex-a5.md: Update with new attributes. > * config/arm/cortex-a53.md: Update with new attributes. > * config/arm/cortex-a7.md: Update with new attributes. > * config/arm/cortex-a8-neon.md: Update with new attributes. > * config/arm/cortex-a9.md: Update with new attributes. > * config/arm/cortex-m4-fpu.md: Update with new attributes. > * config/arm/cortex-r4f.md: Update with new attributes. > * config/arm/marvell-pj4.md: Update with new attributes. > * config/arm/vfp11.md: Update with new attributes. > > OK. R.
Re: [PATCH, libvtv] Fix most of the testsuite.
On Thu, Sep 5, 2013 at 4:52 PM, Caroline Tice wrote: > Ping? Could somebody please review this for me? Mike already approved this upthread.
Re: [Patch AArch64] Fix types for some multiply instructions.
On 06/09/13 14:29, James Greenhalgh wrote: > > Hi, > > We don't really need to split the types on these > instructions. The ARM backend already has suitable descriptions > of things like mla and smlal. Use them. > > Regression tested on aarch64-none-elf with no regressions. > > OK? > > Thanks, > James > --- > 2013-09-06 James Greenhalgh > > * config/aarch64/aarch64.md > (*madd): Fix type attribute. > (*maddsi_uxtw): Likewise. > (*msub): Likewise. > (*msubsi_uxtw): Likewise. > (maddsidi4): Likewise. > (msubsidi4): Likewise. > OK. R.
Re: [PATCH i386 3/8] [AVX512] [1/n] Add AVX-512 patterns: VF iterator extended.
Hello, PING. -- Thanks, K
[AArch64, ARM] Rename the FCPYS type to FMOV
Hi, This patch updates the AArch64 backend such that floating point moves are correctly categorized with type "FMOV". Then in the ARM backend we rename "FCPYS" to "FMOV" everywhere where it is appropriate to do so. Regression tested on aarch64-none-elf and arm-none-eabi with no regressions. OK? Thanks, James --- gcc/ 2013-09-06 James Greenhalgh * config/arm/types.md (type): Rename fcpys to fmov. * config/arm/vfp.md (*arm_movsi_vfp): Rename type fcpys as fmov. (*thumb2_movsi_vfp): Likewise (*movhf_vfp_neon): Likewise (*movhf_vfp): Likewise (*movsf_vfp): Likewise (*thumb2_movsf_vfp): Likewise (*movsfcc_vfp): Likewise (*thumb2_movsfcc_vfp): Likewise * config/aarch64/aarch64-simd.md (move_lo_quad_): Replace type mov_reg with fmovs. * config/aarch64/aarch64.md (*movsi_aarch64): Replace type mov_reg with fmovs. (*movdi_aarch64): Likewise (*movsf_aarch64): Likewise (*movdf_aarch64): Likewise * config/arm/arm.c (cortexa7_older_only): Rename TYPE_FCPYS to TYPE_FMOV. * config/arm/iwmmxt.md (*iwmmxt_movsi_insn): Rename type fcpys as fmov. * config/arm/arm1020e.md: Update with new attributes. * config/arm/cortex-a15-neon.md: Update with new attributes. * config/arm/cortex-a5.md: Update with new attributes. * config/arm/cortex-a53.md: Update with new attributes. * config/arm/cortex-a7.md: Update with new attributes. * config/arm/cortex-a8-neon.md: Update with new attributes. * config/arm/cortex-a9.md: Update with new attributes. * config/arm/cortex-m4-fpu.md: Update with new attributes. * config/arm/cortex-r4f.md: Update with new attributes. * config/arm/marvell-pj4.md: Update with new attributes. * config/arm/vfp11.md: Update with new attributes. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index c085fb9c49958c5f402a28c0b39fe45ec1aadbc7..882fe4a19368d6ef6f9ef862dffce6d6307c5ac3 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1052,7 +1052,7 @@ (define_insn "move_lo_quad_" fmov\\t%d0, %1 dup\\t%d0, %1" [(set_attr "v8type" "*,fmov,*") - (set_attr "type" "*,mov_reg,*") + (set_attr "type" "*,fmov,*") (set_attr "simd_type" "simd_dup,*,simd_dup") (set_attr "simd_mode" "") (set_attr "simd" "yes,*,yes") diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e28764da5dd608259098d3150783e6eacd09be27..db6aa1d3fa15e17095ba26a64e020d098e9fa6c0 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -831,7 +831,7 @@ (define_insn "*movsi_aarch64" fmov\\t%s0, %s1" [(set_attr "v8type" "move,move,move,alu,load1,load1,store1,store1,adr,adr,fmov,fmov,fmov") (set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,load1,load1,store1,store1,\ - adr,adr,mov_reg,mov_reg,mov_reg") + adr,adr,fmov,fmov,fmov") (set_attr "mode" "SI") (set_attr "fp" "*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes")] ) @@ -858,7 +858,7 @@ (define_insn "*movdi_aarch64" movi\\t%d0, %1" [(set_attr "v8type" "move,move,move,alu,load1,load1,store1,store1,adr,adr,fmov,fmov,fmov,fmov") (set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,load1,load1,store1,store1,\ - adr,adr,mov_reg,mov_reg,mov_reg,mov_reg") + adr,adr,fmov,fmov,fmov,fmov") (set_attr "mode" "DI") (set_attr "fp" "*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*") (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,yes")] @@ -961,8 +961,8 @@ (define_insn "*movsf_aarch64" [(set_attr "v8type" "fmovi2f,fmovf2i,\ fmov,fconst,fpsimd_load,\ fpsimd_store,fpsimd_load,fpsimd_store,fmov") - (set_attr "type" "f_mcr,f_mrc,mov_reg,fconsts,\ - f_loads,f_stores,f_loads,f_stores,mov_reg") + (set_attr "type" "f_mcr,f_mrc,fmov,fconsts,\ + f_loads,f_stores,f_loads,f_stores,fmov") (set_attr "mode" "SF")] ) @@ -984,7 +984,7 @@ (define_insn "*movdf_aarch64" [(set_attr "v8type" "fmovi2f,fmovf2i,\ fmov,fconst,fpsimd_load,\ fpsimd_store,fpsimd_load,fpsimd_store,move") - (set_attr "type" "f_mcr,f_mrc,mov_reg,fconstd,\ + (set_attr "type" "f_mcr,f_mrc,fmov,fconstd,\ f_loadd,f_stored,f_loadd,f_stored,mov_reg") (set_attr "mode" "DF")] ) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index cac98cc..f9027dd 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -8977,7 +8977,7 @@ cortexa7_older_only (rtx insn) case TYPE_FADDS: case TYPE_FFARITHD: case TYPE_FADDD: -case TYPE_FCPYS: +case TYPE_FMOV: case TYPE_F_CVT: case TYPE_FCMPS: case TYPE_FCMPD: diff --git a/gcc/config/arm/arm1020e.md b/gcc/config/arm/arm1020e.md index 8cf0890d9300527962b14f60e08c1901556
[AArch64] Use "multiple" for type, where more than one instruction is used for a move
Hi, We could introduce a whole new type for insns which generate two moves, but we have already introduced a "multiple" classification for types in the ARM backend, so use that in place of "mov_reg" where appropriate. Regression tested on aarch64-none-elf and arm-none-eabi with no regressions. OK? Thanks, James --- gcc/ 2013-09-06 James Greenhalgh * config/aarch64/aarch64.md (*movti_aarch64): Use "multiple" for type where v8type is "move2". (*movtf_aarch64): Likewise. * config/arm/arm.md (thumb1_movdi_insn): Use "multiple" for type where more than one instruction is used for a move. (*arm32_movhf): Likewise. (*thumb_movdf_insn): Likewise. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index db6aa1d3fa15e17095ba26a64e020d098e9fa6c0..96705862de6b53322828fd60df15207af4b2ed61 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -906,7 +906,7 @@ (define_insn "*movti_aarch64" str\\t%q1, %0" [(set_attr "v8type" "move2,fmovi2f,fmovf2i,*, \ load2,store2,store2,fpsimd_load,fpsimd_store") - (set_attr "type" "mov_reg,f_mcr,f_mrc,*, \ + (set_attr "type" "multiple,f_mcr,f_mrc,*, \ load2,store2,store2,f_loadd,f_stored") (set_attr "simd_type" "*,*,*,simd_move,*,*,*,*,*") (set_attr "mode" "DI,DI,DI,TI,DI,DI,DI,TI,TI") @@ -1024,7 +1024,7 @@ (define_insn "*movtf_aarch64" ldp\\t%0, %H0, %1 stp\\t%1, %H1, %0" [(set_attr "v8type" "logic,move2,fmovi2f,fmovf2i,fconst,fconst,fpsimd_load,fpsimd_store,fpsimd_load2,fpsimd_store2") - (set_attr "type" "logic_reg,mov_reg,f_mcr,f_mrc,fconstd,fconstd,\ + (set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,fconstd,fconstd,\ f_loadd,f_stored,f_loadd,f_stored") (set_attr "mode" "DF,DF,DF,DF,DF,DF,TF,TF,DF,DF") (set_attr "length" "4,8,8,8,4,4,4,4,4,4") diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 6c0fbf44288c9f6e077fe2d9836cd5c1e2042a0a..fd0b1cbdccd23ad4d18be50417c7532b29840b91 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -6141,7 +6141,7 @@ (define_insn "*thumb1_movdi_insn" } }" [(set_attr "length" "4,4,6,2,2,6,4,4") - (set_attr "type" "multiple,mov_reg,multiple,load2,store2,load2,store2,mov_reg") + (set_attr "type" "multiple,multiple,multiple,load2,store2,load2,store2,multiple") (set_attr "pool_range" "*,*,*,*,*,1018,*,*")] ) @@ -7221,7 +7221,7 @@ (define_insn "*arm32_movhf" } " [(set_attr "conds" "unconditional") - (set_attr "type" "load1,store1,mov_reg,mov_reg") + (set_attr "type" "load1,store1,mov_reg,multiple") (set_attr "length" "4,4,4,8") (set_attr "predicable" "yes")] ) @@ -7466,7 +7466,7 @@ (define_insn "*thumb_movdf_insn" } " [(set_attr "length" "4,2,2,6,4,4") - (set_attr "type" "multiple,load2,store2,load2,store2,mov_reg") + (set_attr "type" "multiple,load2,store2,load2,store2,multiple") (set_attr "pool_range" "*,*,*,1018,*,*")] )
Re: [PATCH, vtv update] Add documentation for --enable-vtable-verify configure option...
On 2013-08-28 17:15 , Caroline Tice wrote: # Least ordering for dependencies mean linking w/o libstdc++ for as # long as the development of libvtv does not absolutely require it. Index: gcc/doc/install.texi === --- gcc/doc/install.texi(revision 202060) +++ gcc/doc/install.texi(working copy) @@ -1032,6 +1032,16 @@ and for cross builds configured with @op More documentation about multiarch can be found at @uref{http://wiki.debian.org/Multiarch}. +@item --enable-vtable-verify +Specify whether to enable or disable the vtable verification feature. +Enabling this feature causes libstdc++ to be built with its virtual calls +in verifiable mode. This means that, when linked with libvtv, every +virtual call in libstdc++ will verify the vtable pointer through which the +call will be made before actually making the call. If not linked with libvtv, +the verifier will call stub functions (in libstdc++ itself) and do nothing. +If vtable verification is disabled, then libstdc++ is not built with its +virutal calls in verifiable mode at all. s/virutal/virtual/ Could you clarify in the docs whether --enable-vtable-verify is the default behaviour or not? Why would I need --disable-vtv, if I can use --disable-vtable-verify? OK with those changes. Diego.
[AArch64, ARM] Introduce "mrs" type attribute.
Hi, This patch adds an "mrs" type to be used to categorize instructions which read or write from a special/system/co-processor register. Then we add this type to all the pipeline descriptions. This probably ends up as a miscategorization in most cases as we put "mrs" in the same category as "multiple" in the pipelines. This will give the most consistant behaviour with what came before. Regression tested on aarch64-none-elf and arm-none-eabi with no regressions. OK? Thanks, James --- gcc/ 2013-09-06 James Greenhalgh * config/arm/types.md (type): Add "mrs" type. * config/aarch64/aarch64.md (aarch64_load_tp_hard): Make type "mrs". * config/arm/arm.md (load_tp_hard): Make type "mrs". * config/arm/cortex-a15.md: Update with new attributes. * config/arm/cortex-a5.md: Update with new attributes. * config/arm/cortex-a53.md: Update with new attributes. * config/arm/cortex-a7.md: Update with new attributes. * config/arm/cortex-a8.md: Update with new attributes. * config/arm/cortex-a9.md: Update with new attributes. * config/arm/cortex-m4.md: Update with new attributes. * config/arm/cortex-r4.md: Update with new attributes. * config/arm/fa526.md: Update with new attributes. * config/arm/fa606te.md: Update with new attributes. * config/arm/fa626te.md: Update with new attributes. * config/arm/fa726te.md: Update with new attributes. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 96705862de6b53322828fd60df15207af4b2ed61..5aa127bcb47912f1986007d4491b86e92c23 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4134,7 +4134,7 @@ (define_insn "aarch64_load_tp_hard" "" "mrs\\t%0, tpidr_el0" [(set_attr "v8type" "mrs") - (set_attr "type" "mov_reg") + (set_attr "type" "mrs") (set_attr "mode" "DI")] ) diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index fd0b1cbdccd23ad4d18be50417c7532b29840b91..8a482b570ec039aad888d7d7d902b48f7e453abc 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -12453,7 +12453,7 @@ (define_insn "load_tp_hard" "TARGET_HARD_TP" "mrc%?\\tp15, 0, %0, c13, c0, 3\\t@ load_tp_hard" [(set_attr "predicable" "yes") - (set_attr "type" "mov_reg")] + (set_attr "type" "mrs")] ) ;; Doesn't clobber R1-R3. Must use r0 for the first operand. diff --git a/gcc/config/arm/cortex-a15.md b/gcc/config/arm/cortex-a15.md index 6b1559260246a11e6d74f7f467dbeae761d934ea..ccad62076089b5e095f472fdbf298ba7226ae4ec 100644 --- a/gcc/config/arm/cortex-a15.md +++ b/gcc/config/arm/cortex-a15.md @@ -68,7 +68,7 @@ (define_insn_reservation "cortex_a15_alu shift_imm,shift_reg,\ mov_imm,mov_reg,\ mvn_imm,mvn_reg,\ -multiple,no_insn")) +mrs,multiple,no_insn")) "ca15_issue1,(ca15_sx1,ca15_sx1_alu)|(ca15_sx2,ca15_sx2_alu)") ;; ALU ops with immediate shift diff --git a/gcc/config/arm/cortex-a5.md b/gcc/config/arm/cortex-a5.md index fa3e9d59c91028214ca7aa1be2c6668b4af5e6d3..22e0a08f38e7620cef745d28c5373e2daf957f7d 100644 --- a/gcc/config/arm/cortex-a5.md +++ b/gcc/config/arm/cortex-a5.md @@ -64,7 +64,7 @@ (define_insn_reservation "cortex_a5_alu" adr,bfm,rev,\ shift_imm,shift_reg,\ mov_imm,mov_reg,mvn_imm,mvn_reg,\ -multiple,no_insn")) +mrs,multiple,no_insn")) "cortex_a5_ex1") (define_insn_reservation "cortex_a5_alu_shift" 2 diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md index 33b5ca30150fc57e7a3c4886c01b9e8092fc3ffa..48d0d03853f147d2d7cc15c1208304617b9c1ec4 100644 --- a/gcc/config/arm/cortex-a53.md +++ b/gcc/config/arm/cortex-a53.md @@ -73,7 +73,7 @@ (define_insn_reservation "cortex_a53_alu adr,bfm,csel,rev,\ shift_imm,shift_reg,\ mov_imm,mov_reg,mvn_imm,mvn_reg,\ -multiple,no_insn")) +mrs,multiple,no_insn")) "cortex_a53_slot_any") (define_insn_reservation "cortex_a53_alu_shift" 2 diff --git a/gcc/config/arm/cortex-a7.md b/gcc/config/arm/cortex-a7.md index ba9da8046ebd4a886c425238ab28df5ab9d85a8a..a72a88d90af1c5491115ee84af47ec6d4f593535 100644 --- a/gcc/config/arm/cortex-a7.md +++ b/gcc/config/arm/cortex-a7.md @@ -110,7 +110,7 @@ (define_insn_reservation "cortex_a7_alu_ logic_shift_reg,logics_shift_reg,\ mov_shift,mov_shift_reg,\ mvn_shift,mvn_shift_reg,\ -multiple,no_insn")) +mrs,multiple,no_insn")) "cortex_a7_ex1") ;; Forwarding path for unshifted operands. diff --git a/gcc/config/arm/cortex-a8.md b/gcc/config/arm/cortex-a8.md index ed0b3513
Re: [AArch64, ARM] Rename the FCPYS type to FMOV
On 06/09/13 14:44, James Greenhalgh wrote: > > Hi, > > This patch updates the AArch64 backend such that floating point > moves are correctly categorized with type "FMOV". > > Then in the ARM backend we rename "FCPYS" to "FMOV" everywhere > where it is appropriate to do so. > > Regression tested on aarch64-none-elf and arm-none-eabi with no > regressions. > > OK? > > Thanks, > James > --- > gcc/ > > 2013-09-06 James Greenhalgh > > * config/arm/types.md (type): Rename fcpys to fmov. > * config/arm/vfp.md > (*arm_movsi_vfp): Rename type fcpys as fmov. > (*thumb2_movsi_vfp): Likewise > (*movhf_vfp_neon): Likewise > (*movhf_vfp): Likewise > (*movsf_vfp): Likewise > (*thumb2_movsf_vfp): Likewise > (*movsfcc_vfp): Likewise > (*thumb2_movsfcc_vfp): Likewise > * config/aarch64/aarch64-simd.md > (move_lo_quad_): Replace type mov_reg with fmovs. > * config/aarch64/aarch64.md > (*movsi_aarch64): Replace type mov_reg with fmovs. > (*movdi_aarch64): Likewise > (*movsf_aarch64): Likewise > (*movdf_aarch64): Likewise > * config/arm/arm.c > (cortexa7_older_only): Rename TYPE_FCPYS to TYPE_FMOV. > * config/arm/iwmmxt.md > (*iwmmxt_movsi_insn): Rename type fcpys as fmov. > * config/arm/arm1020e.md: Update with new attributes. > * config/arm/cortex-a15-neon.md: Update with new attributes. > * config/arm/cortex-a5.md: Update with new attributes. > * config/arm/cortex-a53.md: Update with new attributes. > * config/arm/cortex-a7.md: Update with new attributes. > * config/arm/cortex-a8-neon.md: Update with new attributes. > * config/arm/cortex-a9.md: Update with new attributes. > * config/arm/cortex-m4-fpu.md: Update with new attributes. > * config/arm/cortex-r4f.md: Update with new attributes. > * config/arm/marvell-pj4.md: Update with new attributes. > * config/arm/vfp11.md: Update with new attributes. > > OK. R.
Re: [PATCH i386 2/8] [AVX512] Add mask registers.
Hello, On 29 Aug 15:59, Kirill Yukhin wrote: > /* Define parameter passing and return registers. */ > @@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void) > >/* If AVX512F is disabled, squash the registers. */ >if (! TARGET_AVX512F) > -for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++) > - fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; > +{ > + for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++) > + fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; > + > + for (i = FIRST_MASK_REG; i < LAST_MASK_REG; i++) > + fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = ""; > +} > } This place should be updated as here: http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00181.html I am not reposting as the change is obvious. -- Thanks, K
Re: [AArch64] Use "multiple" for type, where more than one instruction is used for a move
On 06/09/13 14:46, James Greenhalgh wrote: > > Hi, > > We could introduce a whole new type for insns which generate two moves, > but we have already introduced a "multiple" classification for > types in the ARM backend, so use that in place of "mov_reg" where > appropriate. > > Regression tested on aarch64-none-elf and arm-none-eabi with no > regressions. > > OK? > > Thanks, > James > > --- > gcc/ > > 2013-09-06 James Greenhalgh > > * config/aarch64/aarch64.md > (*movti_aarch64): Use "multiple" for type where v8type is "move2". > (*movtf_aarch64): Likewise. > * config/arm/arm.md > (thumb1_movdi_insn): Use "multiple" for type where more than one > instruction is used for a move. > (*arm32_movhf): Likewise. > (*thumb_movdf_insn): Likewise. > > OK. R.
[AArch64] Use neon__2 where appropriate as "type".
Hi, The final (!!!) patch in the series making types equivalent between AArch64 and ARM backends deals with insns in the AArch64 backend which generate ldp and stp. We could invent a new type for these and add that type to all the pipeline descriptions, but I think the types neon_ldm_2 and neon_stm_2 describe them adequately. Tested on aarch64-none-elf with no regressions. OK? Thanks, James --- gcc/ 2013-09-06 James Greenhalgh * config/aarch64/aarch64.md (*movtf_aarch64): Use neon_dm_2 as type where v8type is fpsimd_2. (load_pair): Likewise. (store_pair): Likewise. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 5aa127bcb47912f1986007d4491b86e92c23..f37f98f9994bb773785d8573a7efd1e625b5e23a 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1025,7 +1025,7 @@ (define_insn "*movtf_aarch64" stp\\t%1, %H1, %0" [(set_attr "v8type" "logic,move2,fmovi2f,fmovf2i,fconst,fconst,fpsimd_load,fpsimd_store,fpsimd_load2,fpsimd_store2") (set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,fconstd,fconstd,\ - f_loadd,f_stored,f_loadd,f_stored") + f_loadd,f_stored,neon_ldm_2,neon_stm_2") (set_attr "mode" "DF,DF,DF,DF,DF,DF,TF,TF,DF,DF") (set_attr "length" "4,8,8,8,4,4,4,4,4,4") (set_attr "fp" "*,*,yes,yes,*,yes,yes,yes,*,*") @@ -1090,7 +1090,7 @@ (define_insn "load_pair" GET_MODE_SIZE (mode)))" "ldp\\t%0, %2, %1" [(set_attr "v8type" "fpsimd_load2") - (set_attr "type" "f_load") + (set_attr "type" "neon_ldm_2") (set_attr "mode" "")] ) @@ -1106,8 +1106,8 @@ (define_insn "store_pair" XEXP (operands[0], 0), GET_MODE_SIZE (mode)))" "stp\\t%1, %3, %0" - [(set_attr "v8type" "fpsimd_load2") - (set_attr "type" "f_load") + [(set_attr "v8type" "fpsimd_store2") + (set_attr "type" "neon_stm_2") (set_attr "mode" "")] )
Re: [AArch64, ARM] Introduce "mrs" type attribute.
On 06/09/13 14:48, James Greenhalgh wrote: > > Hi, > > This patch adds an "mrs" type to be used to categorize instructions > which read or write from a special/system/co-processor register. > > Then we add this type to all the pipeline descriptions. This probably > ends up as a miscategorization in most cases as we put "mrs" in the > same category as "multiple" in the pipelines. This will give the most > consistant behaviour with what came before. > > Regression tested on aarch64-none-elf and arm-none-eabi with no > regressions. > > OK? > > Thanks, > James > > --- > gcc/ > > 2013-09-06 James Greenhalgh > > * config/arm/types.md (type): Add "mrs" type. > * config/aarch64/aarch64.md > (aarch64_load_tp_hard): Make type "mrs". > * config/arm/arm.md > (load_tp_hard): Make type "mrs". > * config/arm/cortex-a15.md: Update with new attributes. > * config/arm/cortex-a5.md: Update with new attributes. > * config/arm/cortex-a53.md: Update with new attributes. > * config/arm/cortex-a7.md: Update with new attributes. > * config/arm/cortex-a8.md: Update with new attributes. > * config/arm/cortex-a9.md: Update with new attributes. > * config/arm/cortex-m4.md: Update with new attributes. > * config/arm/cortex-r4.md: Update with new attributes. > * config/arm/fa526.md: Update with new attributes. > * config/arm/fa606te.md: Update with new attributes. > * config/arm/fa626te.md: Update with new attributes. > * config/arm/fa726te.md: Update with new attributes. > OK. R.
Re: [AArch64] Use neon__2 where appropriate as "type".
On 06/09/13 14:50, James Greenhalgh wrote: > > Hi, > > The final (!!!) patch in the series making types equivalent between > AArch64 and ARM backends deals with insns in the AArch64 backend > which generate ldp and stp. We could invent a new type for these and > add that type to all the pipeline descriptions, but I think the types > neon_ldm_2 and neon_stm_2 describe them adequately. > It's close enough. At least for now. > Tested on aarch64-none-elf with no regressions. > > OK? OK. > > Thanks, > James > > --- > gcc/ > > 2013-09-06 James Greenhalgh > > * config/aarch64/aarch64.md > (*movtf_aarch64): Use neon_dm_2 as type where v8type > is fpsimd_2. > (load_pair): Likewise. > (store_pair): Likewise. > > R.
Re: RFC - Next refactoring steps
On 09/06/2013 09:09 AM, Michael Matz wrote: Hi, On Thu, 5 Sep 2013, Andrew MacLeod wrote: 1 - I think we ought to split out the data structures from gimple.h into gimple-core.h, like we did with tree.h Why? For the seam reason we split tree.h. The new gimple mechanism needs to coexist with trees until the conversion is complete. There will be a period of time when some files have been converted over to the new gimple wrappers, and they will export functions which unconverted files still need to call. The functional prototypes will be using things like GimpleType, so those files will need to see the basic class description to compile and call the routine. (just like how the gimple function need to see the baic layout of a tree to interact with trees) They don't need to see all the implementation for the methods in gimple-type.h, but they need to see the basic class description which also contains the few routines required to cast back and forth to tree. This allows both to coexist while we transition. Doing it now before we branch into stage 2 makes life much easier in the branch I will be working on during stage2, as well as test the functionality of it with the rest of the refactoring we are doing before the branch. gimple.h. That won't really affect my work. I think it probably ought to be done for clarity eventually.gimple.h would then simply become a collector of "gimple-blah.h" files which are required for a complete gimple system. Doesn't make sense to me. If you split it into multiple files, just to then create a wrapper called gimple.h including all of them again for the consumers you can just as well have all of it in one file. Well, gimple.h right now is 5400 lines, and that just implements gimple statements. when we add the methods for types and decls and expression etc etc etc, its going to be a very very very large file. The split is to make it managable. Im fine with leaving gimple.h to be the statement implementation, and then the top of gimple.h can just include all the other ones... it just means gimple-stmt is gimple... Its just a consistency naming thing, and i don't feel strongly about it. 2 - all the uses of TREE_NULL to refer to an empty/nonexistent object... it is no longer in tree-core.h, which I think is correct. what should we start using instead in converted files? options are: a) 0 b) NULL c) something we define as 0, like GIMPLE_NULL I think I'd prefer 0, but it creates a problem with varargs functions, so you'd always need "(gimple)0" for those, which you'd probably hide behind a macro, at which point you'd have two forms for the nil gimple ("0" and that macro), for consistency you'd want to use the macro in place of "0" also in the non-varargs places, et voila, you're back to "NULL" or "NULL_GIMPLE" :-/ I prefer 0 too I think, but I had forgotten about vararg issues... so are we thinking maybe just plain old NULL? I guess thats pretty standard.. 3) remove tree.h from other .h files Now that tree.h is split, there are a few .h files which directly include tree.h themselves. It would be nice if we could remove the implementation requirement of tree.h to ease moving to gimple.h. The 4 files are : ipa-utils.h lto-streamer.h streamer-hooks.h tree-streamer.h It should be possible to not directly include tree.h itself in these files. Worst case, the header file is included after tree.h from the .C files.. that seems to be the way most of the other include files avoid including tree.h directly. That can be done right now I think. For the rest of the topic, tree vs gimple: I don't see much value in that. Why create a gimple_expr type that basically is just tree? You can as well use tree then. Because once its all done... we can do many other things because the uses are detached from the implementation. In a gimple expression we can see concisely what elements of tree are actually used, and how. We can then remove tree from the implementation of gimple_expr if it makes sense. We can change the underlying object from a tree to something which more concisely represents what we need in the middle/back end. We can easily change the implementation of gimpe-expr to use an index into a table of expressions which makes streaming much simpler.. I am also doing this for types and decls and such. We get typesafe checking at compile time, And the ultimate goal is to remove dependencies from the front end and the gimple... gimple will be detachable from trees. All the fun stuff I talked about in the original document motivating this.
Re: [PATCH, vtv update] Add documentation for --enable-vtable-verify configure option...
On Fri, Sep 6, 2013 at 6:47 AM, Diego Novillo wrote: > On 2013-08-28 17:15 , Caroline Tice wrote: > >> # Least ordering for dependencies mean linking w/o libstdc++ for as >> # long as the development of libvtv does not absolutely require it. >> Index: gcc/doc/install.texi >> === >> --- gcc/doc/install.texi(revision 202060) >> +++ gcc/doc/install.texi(working copy) >> @@ -1032,6 +1032,16 @@ and for cross builds configured with @op >> More documentation about multiarch can be found at >> @uref{http://wiki.debian.org/Multiarch}. >> >> +@item --enable-vtable-verify >> +Specify whether to enable or disable the vtable verification feature. >> +Enabling this feature causes libstdc++ to be built with its virtual calls >> +in verifiable mode. This means that, when linked with libvtv, every >> +virtual call in libstdc++ will verify the vtable pointer through which >> the >> +call will be made before actually making the call. If not linked with >> libvtv, >> +the verifier will call stub functions (in libstdc++ itself) and do >> nothing. >> +If vtable verification is disabled, then libstdc++ is not built with its >> +virutal calls in verifiable mode at all. > > > s/virutal/virtual/ > > Could you clarify in the docs whether --enable-vtable-verify is the default > behaviour or not? > It will not be the default, but yes I will clarify. > Why would I need --disable-vtv, if I can use --disable-vtable-verify? > --disable-vtable-verify prevents libstdc++ from being build with verification turned on, but the libvtv library will still be built. --disable-libvtv prevents the libvtv library from being built. Obviously I need to add a comment like this to the documentation, so I will. > > OK with those changes. > > > Diego.
Fix glitch in -fstack-usage output
The output of -fstack-usage doesn't use fully qualified names of functions (because they can be quite long in Ada) so the code attempts to strip the scope prefix. But this isn't robust enough in presence of suffixes created by the compiler, for example by the cloning machinery. Tested on x86_64-suse-linux, applied on the mainline as obvious. 2013-09-06 Eric Botcazou * toplev.c (output_stack_usage): Be prepared for suffixes created by the compiler in the function names. 2013-09-06 Eric Botcazou * gnat.dg/stack_usage2.adb: New test. -- Eric BotcazouIndex: toplev.c === --- toplev.c (revision 202160) +++ toplev.c (working copy) @@ -1017,22 +1017,35 @@ output_stack_usage (void) { expanded_location loc = expand_location (DECL_SOURCE_LOCATION (current_function_decl)); - const char *raw_id, *id; - - /* Strip the scope prefix if any. */ - raw_id = lang_hooks.decl_printable_name (current_function_decl, 2); - id = strrchr (raw_id, '.'); - if (id) - id++; + /* We don't want to print the full qualified name because it can be long, + so we strip the scope prefix, but we may need to deal with the suffix + created by the compiler. */ + const char *suffix + = strchr (IDENTIFIER_POINTER (DECL_NAME (current_function_decl)), '.'); + const char *name + = lang_hooks.decl_printable_name (current_function_decl, 2); + if (suffix) + { + const char *dot = strchr (name, '.'); + while (dot && strcasecmp (dot, suffix) != 0) + { + name = dot + 1; + dot = strchr (name, '.'); + } + } else - id = raw_id; + { + const char *dot = strrchr (name, '.'); + if (dot) + name = dot + 1; + } fprintf (stack_usage_file, "%s:%d:%d:%s\t"HOST_WIDE_INT_PRINT_DEC"\t%s\n", lbasename (loc.file), loc.line, loc.column, - id, + name, stack_usage, stack_usage_kind_str[stack_usage_kind]); }-- { dg-do compile } -- { dg-options "-O2 -fstack-usage" } with System; procedure Stack_Usage2 is Sink : System.Address; pragma Import (Ada, Sink); procedure Transmit_Data (Branch : Integer) is pragma No_Inline (Transmit_Data); X : Integer; begin case Branch is when 1 => Sink := X'Address; when others => null; end case; end; begin Transmit_Data (Branch => 1); end; -- { dg-final { scan-stack-usage-not ":Constprop" } } -- { dg-final { cleanup-stack-usage } }
Re: RFC - Next refactoring steps
On 09/06/2013 04:19 AM, Richard Biener wrote: On Fri, Sep 6, 2013 at 10:14 AM, Mike Stump wrote: On Sep 5, 2013, at 11:54 PM, Richard Biener wrote: Most of the GCC headerfiles do not include all their required headers but rely on .c files doing that (in the appropriate order). I somehow like that though I cannot explain why ;) Very old school. I can explain why I don't like it, but I'll resist. The universal standard is for each header to include all it needs and for the ordering of includes not to matter any. Deviations from this are the exception and should virtually never should be the case. Also grepping for #includes I see that this doesn't seem to be true anymore. Yeah, everyone else hates the old school style with a passion, it was never a good idea. :-) When they come across it, people have a tendency to fix the headers as broken and over time, all the dependencies eventually get added. Well, I still think you cannot randomly change include file order in GCC. At least some #ifdef ... hackery in some headers will suddenly break (that is, change outcome) if you include for example tm.h before or after it. these would be really good to identify and fix, if possible. (surely they can be fixed.. :-)if they cant be fixed for whatever reason, we ought to protect them with some mechanism.. ie assert that tm.h either has or has not been included before hand... whichever way is required. At least then we get an error if it changes. Do you know of any of the top of your head? Andrew
Re: RFC - Next refactoring steps
On 09/05/2013 08:26 PM, Joseph S. Myers wrote: On Thu, 5 Sep 2013, Andrew MacLeod wrote: Or are you suggesting that coretypes.h is a file we can assume is available? Every .c file should start with includes of config.h, system.h and coretypes.h, in that order, so all other headers can assume they have been included. OK, I can work with those assumptions :-). Andrew
[AArch64] Fix parameters to vcvtx_high
Hi, vcvtx_high_f32_f64 should have two parameters, a float32x2 which provides the lower half of the target vector, and a float64x2 which will be converted to the higher half of the target vector. Fix thusly. Tested with aarch64.exp on aarch64-none-elf. OK? Thanks, James --- gcc/ 2013-09-06 James Greenhalgh * config/aarch64/arm_neon.h (vcvtx_high_f32_f64): Fix parameters. diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 5864f2c..47b45f4 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -5756,12 +5756,12 @@ vcvtx_f32_f64 (float64x2_t a) } __extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) -vcvtx_high_f32_f64 (float64x2_t a) +vcvtx_high_f32_f64 (float32x2_t a, float64x2_t b) { float32x4_t result; __asm__ ("fcvtxn2 %0.4s,%1.2d" : "=w"(result) - : "w"(a) + : "w" (b), "0"(a) : /* No clobbers */); return result; }
Fix ipa-devirt-11.C on AIX part 3
Hi, this patch makes inlining of functions called once to work even if they are called by alias. Bootstrapped/regtested x86_64-linux, comitted. PR middle-end/58094 * ipa-inline.c (has_caller_p): New function. (want_inline_function_to_all_callers_p): Use it. (sum_callers, inline_to_all_callers): Break out from ... (ipa_inline): ... here. Index: ipa-inline.c === --- ipa-inline.c(revision 202334) +++ ipa-inline.c(working copy) @@ -750,6 +750,15 @@ check_caller_edge (struct cgraph_node *n && node->callers != edge); } +/* If NODE has a caller, return true. */ + +static bool +has_caller_p (struct cgraph_node *node, void *data ATTRIBUTE_UNUSED) +{ + if (node->callers) +return true; + return false; +} /* Decide if inlining NODE would reduce unit size by eliminating the offline copy of function. @@ -763,7 +772,7 @@ want_inline_function_to_all_callers_p (s bool has_hot_call = false; /* Does it have callers? */ - if (!node->callers) + if (!cgraph_for_node_and_aliases (node, has_caller_p, NULL, true)) return false; /* Already inlined? */ if (function->global.inlined_to) @@ -1892,6 +1901,60 @@ flatten_function (struct cgraph_node *no inline_update_overall_summary (node); } +/* Count number of callers of NODE and store it into DATA (that + points to int. Worker for cgraph_for_node_and_aliases. */ + +static bool +sum_callers (struct cgraph_node *node, void *data) +{ + struct cgraph_edge *e; + int *num_calls = (int *)data; + + for (e = node->callers; e; e = e->next_caller) +(*num_calls)++; + return false; +} + +/* Inline NODE to all callers. Worker for cgraph_for_node_and_aliases. + DATA points to number of calls originally found so we avoid infinite + recursion. */ + +static bool +inline_to_all_callers (struct cgraph_node *node, void *data) +{ + int *num_calls = (int *)data; + while (node->callers && !node->global.inlined_to) +{ + struct cgraph_node *caller = node->callers->caller; + + if (dump_file) + { + fprintf (dump_file, + "\nInlining %s size %i.\n", + cgraph_node_name (node), + inline_summary (node)->size); + fprintf (dump_file, + " Called once from %s %i insns.\n", + cgraph_node_name (node->callers->caller), + inline_summary (node->callers->caller)->size); + } + + inline_call (node->callers, true, NULL, NULL, true); + if (dump_file) + fprintf (dump_file, +" Inlined into %s which now has %i size\n", +cgraph_node_name (caller), +inline_summary (caller)->size); + if (!(*num_calls)--) + { + if (dump_file) + fprintf (dump_file, "New calls found; giving up.\n"); + break; + } +} + return false; +} + /* Decide on the inlining. We do so in the topological order to avoid expenses on updating data structures. */ @@ -2003,39 +2066,11 @@ ipa_inline (void) && want_inline_function_to_all_callers_p (node, cold)) { int num_calls = 0; - struct cgraph_edge *e; - for (e = node->callers; e; e = e->next_caller) - num_calls++; - while (node->callers && !node->global.inlined_to) - { - struct cgraph_node *caller = node->callers->caller; - - if (dump_file) - { - fprintf (dump_file, - "\nInlining %s size %i.\n", - cgraph_node_name (node), - inline_summary (node)->size); - fprintf (dump_file, - " Called once from %s %i insns.\n", - cgraph_node_name (node->callers->caller), - inline_summary (node->callers->caller)->size); - } - - inline_call (node->callers, true, NULL, NULL, true); - remove_functions = true; - if (dump_file) - fprintf (dump_file, -" Inlined into %s which now has %i size\n", -cgraph_node_name (caller), -inline_summary (caller)->size); - if (!num_calls--) - { - if (dump_file) - fprintf (dump_file, "New calls found; giving up.\n"); - break; - } - } + cgraph_for_node_and_aliases (node, sum_callers, + &num_calls, true); + cgraph_for_node_and_aliases (node, inline_to_all_callers, + &num_calls, true); +
Re: [c++-concepts] Class template constraints
Updated as per comments. I moved the resolve_template_scope function out to finish_template_type. I couldn't figure out how to get the parsed template parameter from the looked-up template in lookup_class_template. That information may not be available outside the parse state. Andrew Andrew Sutton On Wed, Sep 4, 2013 at 3:49 PM, Jason Merrill wrote: > On 09/04/2013 01:33 PM, Andrew Sutton wrote: >> >> Ah. The goal is to check after we've deduced/coerced template >> arguments into a valid substitution. With functions, that's in >> fn_type_unification (hopefully called from instantiate_template) > > > Actually fn_type_unification calls instantiate_template, but yep, we're on > the same page. > > Jason > templates.patch Description: Binary data
[gomp4] Further accel fixes
Hi! This fixes mainly VLA handling in target{, data, update} constructs, but also deals with field alignments in the structure and field order. Committed to gomp-4_0-branch. 2013-09-06 Jakub Jelinek * omp-low.c (scan_sharing_clauses): Handle VLAs in OMP_CLAUSE_{MAP,TO,FROM}. Set DECL_ALIGN (field) before calling insert_field_into_struct. (scan_omp_target): Reverse TYPE_FIELDS, verify that all field alignments are the same. (lower_omp_target): Use maybe_lookup_field instead of lookup_sfield to check if field is present. Handle VLAs. * tree-pretty-print.c (dump_omp_clause): Only check OMP_CLAUSE_MAP_KIND on OMP_CLAUSE_MAP clauses. * gimplify.c (enum gimplify_omp_var_data): Add GOVD_MAP_TO_ONLY. (omp_firstprivatize_variable, omp_add_variable, gimplify_adjust_omp_clauses_1, gimplify_adjust_omp_clauses): Handle VLAs in OMP_CLAUSE_{MAP,TO,FROM}. libgomp/ * testsuite/libgomp.c/target-2.c: New test. * testsuite/libgomp.c++/target-3.C: New test. --- gcc/omp-low.c.jj2013-09-05 17:11:14.0 +0200 +++ gcc/omp-low.c 2013-09-06 16:15:16.367638718 +0200 @@ -1574,10 +1574,24 @@ scan_sharing_clauses (tree clauses, omp_ } if (DECL_P (decl)) { - install_var_field (decl, true, 3, ctx); - if (gimple_omp_target_kind (ctx->stmt) - == GF_OMP_TARGET_KIND_REGION) - install_var_local (decl, ctx); + if (DECL_SIZE (decl) + && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST) + { + tree decl2 = DECL_VALUE_EXPR (decl); + gcc_assert (TREE_CODE (decl2) == INDIRECT_REF); + decl2 = TREE_OPERAND (decl2, 0); + gcc_assert (DECL_P (decl2)); + install_var_field (decl2, true, 3, ctx); + install_var_local (decl2, ctx); + install_var_local (decl, ctx); + } + else + { + install_var_field (decl, true, 3, ctx); + if (gimple_omp_target_kind (ctx->stmt) + == GF_OMP_TARGET_KIND_REGION) + install_var_local (decl, ctx); + } } else { @@ -1600,6 +1614,7 @@ scan_sharing_clauses (tree clauses, omp_ tree field = build_decl (OMP_CLAUSE_LOCATION (c), FIELD_DECL, NULL_TREE, ptr_type_node); + DECL_ALIGN (field) = TYPE_ALIGN (ptr_type_node); insert_field_into_struct (ctx->record_type, field); splay_tree_insert (ctx->field_map, (splay_tree_key) decl, (splay_tree_value) field); @@ -1684,6 +1699,16 @@ scan_sharing_clauses (tree clauses, omp_ TREE_TYPE (new_decl) = remap_type (TREE_TYPE (decl), &ctx->cb); } + else if (DECL_SIZE (decl) + && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST) + { + tree decl2 = DECL_VALUE_EXPR (decl); + gcc_assert (TREE_CODE (decl2) == INDIRECT_REF); + decl2 = TREE_OPERAND (decl2, 0); + gcc_assert (DECL_P (decl2)); + fixup_remapped_decl (decl2, ctx, false); + fixup_remapped_decl (decl, ctx, true); + } else fixup_remapped_decl (decl, ctx, false); } @@ -2126,6 +2151,16 @@ scan_omp_target (gimple stmt, omp_contex ctx->record_type = ctx->receiver_decl = NULL; else { + TYPE_FIELDS (ctx->record_type) + = nreverse (TYPE_FIELDS (ctx->record_type)); +#ifdef ENABLE_CHECKING + tree field; + unsigned int align = DECL_ALIGN (TYPE_FIELDS (ctx->record_type)); + for (field = TYPE_FIELDS (ctx->record_type); + field; + field = DECL_CHAIN (field)) + gcc_assert (DECL_ALIGN (field) == align); +#endif layout_type (ctx->record_type); if (kind == GF_OMP_TARGET_KIND_REGION) fixup_child_record_type (ctx); @@ -9201,7 +9236,18 @@ lower_omp_target (gimple_stmt_iterator * map_cnt++; continue; } - if (!lookup_sfield (var, ctx)) + + if (DECL_SIZE (var) + && TREE_CODE (DECL_SIZE (var)) != INTEGER_CST) + { + tree var2 = DECL_VALUE_EXPR (var); + gcc_assert (TREE_CODE (var2) == INDIRECT_REF); + var2 = TREE_OPERAND (var2, 0); + gcc_assert (DECL_P (var2)); + var = var2; + } + + if (!maybe_lookup_field (var, ctx)) continue; if (kind == GF_OMP_TARGET_KIND_REGION) @@ -9293,8 +9339,20 @@ lower_omp_target (gimple_stmt_iterator * nc = NULL_TREE; } } - else if (!
Re: [PATCH, x86] Use vector moves in memmove expanding
On Tue, Sep 3, 2013 at 12:01 PM, Eric Botcazou wrote: >> Changes were checked into trunk: >> http://gcc.gnu.org/ml/gcc-cvs/2013-07/msg00179.html > > The patch miscompiles the MPFR library on x86 Pentium Pro. Reduced testcase > attached, compile for x86 with -mtune=pentiumpro. > The change in question has @@ -23169,7 +23150,7 @@ ix86_expand_movmem (rtx dst, rtx src, rtx count_exp, rtx align_exp, if (count_exp != const0_rtx && epilogue_size_needed > 1) expand_movmem_epilogue (dst, src, destreg, srcreg, count_exp, - epilogue_size_needed); + size_needed); if (jump_around_label) emit_label (jump_around_label); return true; Michael, why did you change epilogue_size_needed to size_needed here? It looks wrong to me. -- H.J.
Re: [RFC] Fix for PR58201
Hi Honza, On 09/06/2013 01:05 AM, Jan Hubicka wrote: Hi, this is the patch I commited after testing on x86_64-linux. Honza Index: ChangeLog === *** ChangeLog (revision 202271) --- ChangeLog (working copy) *** *** 1,3 --- 1,9 + 2013-09-04 Jan Hubicka + + PR middle-end/58201 + * cgraphunit.c (analyze_functions): Clear AUX fields + after processing; initialize assembler name has. + I checked and double checked and with this commit a C++ test regressed: FAIL: g++.dg/template/cond2.C -std=gnu++98 (test for warnings, line 9) FAIL: g++.dg/template/cond2.C -std=gnu++11 (test for warnings, line 9) In practice we emit a message off by one line, line 10 instead of the correct line 9. Is it possible that you are clearing too much, so to speak? After the recent breakages, up to date testresults are arriving only now on gcc-testresults confirming my finding, for example Andreas Schwab already posted a couple. Thanks! Paolo.
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
SCHED_GROUP works after I add chain_to_prev_insn after add_branch_dependences, in order to chain control dependences to prev insn for sched group. Here is the new patch. Testing is going on. Thanks, Wei Mi. 2013-09-06 Wei Mi * config/i386/i386.c (ix86_macro_fusion_p): New function. (ix86_macro_fusion_pair_p): Ditto. * config/i386/x86-tune.def (DEF_TUNE): Add m_COREI7 for X86_TUNE_FUSE_CMP_AND_BRANCH. * sched-deps.c (group_insns_for_macro_fusion): New function. (sched_analyze_insn): Call group_insns_for_macro_fusion. (chain_to_prev_insn): Change it from static to extern. (chain_to_prev_insn_p): Ditto. * doc/tm.texi: Generated. * doc/tm.texi.in: Ditto. * sched-int.h: New declarations. * sched-rgn.c (add_branch_dependences): Chain control dependences to prev insn for sched group. * target.def: Add macro_fusion_p and macro_fusion_pair_p. Index: config/i386/i386.c === --- config/i386/i386.c (revision 201963) +++ config/i386/i386.c (working copy) @@ -24850,6 +24850,99 @@ ia32_multipass_dfa_lookahead (void) } } +/* Return true if target platform supports macro-fusion. */ + +static bool +ix86_macro_fusion_p () +{ + if (TARGET_FUSE_CMP_AND_BRANCH) +return true; + else +return false; +} + +/* Check whether current microarchitecture support macro fusion + for insn pair "CONDGEN + CONDJMP". Refer to + "Intel Architectures Optimization Reference Manual". */ + +static bool +ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp) +{ + rtx src; + if (!strcmp (ix86_tune_string, "corei7")) +{ + /* For Nehalem. */ + rtx single_set = single_set (condgen); + /* Nehalem doesn't support macro-fusion for add/sub+jmp. */ + if (single_set == NULL_RTX) +return false; + + src = SET_SRC (single_set); + if (GET_CODE (src) != COMPARE) + return false; + + /* Nehalem doesn't support macro-fusion for cmp/test MEM-IMM +insn pattern. */ + if ((MEM_P (XEXP (src, 0)) + && CONST_INT_P (XEXP (src, 1))) + || (MEM_P (XEXP (src, 1)) + && CONST_INT_P (XEXP (src, 0 + return false; + + /* Nehalem doesn't support macro-fusion for add/sub/dec/inc + jmp. */ + if (get_attr_type (condgen) != TYPE_TEST + && get_attr_type (condgen) != TYPE_ICMP) + return false; + return true; +} + else if (!strcmp (ix86_tune_string, "corei7-avx")) +{ + /* For Sandybridge. */ + enum rtx_code ccode; + rtx compare_set = NULL_RTX, test_if, cond; + rtx single_set = single_set (condgen); + if (single_set != NULL_RTX) +compare_set = single_set; + else + { + int i; + rtx pat = PATTERN (condgen); + for (i = 0; i < XVECLEN (pat, 0); i++) + if (GET_CODE (XVECEXP (pat, 0, i)) == SET + && GET_CODE (SET_SRC (XVECEXP (pat, 0, i))) == COMPARE) + compare_set = XVECEXP (pat, 0, i); + } + + if (compare_set == NULL_RTX) + return false; + src = SET_SRC (compare_set); + if (GET_CODE (src) != COMPARE) + return false; + + /* Sandybridge doesn't support macro-fusion for cmp/test MEM-IMM +insn pattern. */ + if ((MEM_P (XEXP (src, 0)) + && CONST_INT_P (XEXP (src, 1))) + || (MEM_P (XEXP (src, 1)) + && CONST_INT_P (XEXP (src, 0 +return false; + + /* Sandybridge doesn't support macro-fusion for inc/dec + +unsigned comparison jmp. */ + test_if = SET_SRC (pc_set (condjmp)); + cond = XEXP (test_if, 0); + ccode = GET_CODE (cond); + if (get_attr_type (condgen) == TYPE_INCDEC + && (ccode == GEU + || ccode == GTU + || ccode == LEU + || ccode == LTU)) + return false; + return true; +} + return false; +} + /* Try to reorder ready list to take advantage of Atom pipelined IMUL execution. It is applied if (1) IMUL instruction is on the top of list; @@ -42982,6 +43075,10 @@ ix86_memmodel_check (unsigned HOST_WIDE_ #undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD #define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD \ ia32_multipass_dfa_lookahead +#undef TARGET_SCHED_MACRO_FUSION_P +#define TARGET_SCHED_MACRO_FUSION_P ix86_macro_fusion_p +#undef TARGET_SCHED_MACRO_FUSION_PAIR_P +#define TARGET_SCHED_MACRO_FUSION_PAIR_P ix86_macro_fusion_pair_p #undef TARGET_FUNCTION_OK_FOR_SIBCALL #define TARGET_FUNCTION_OK_FOR_SIBCALL ix86_function_ok_for_sibcall Index: config/i386/x86-tune.def === --- config/i386/x86-tune.def(revision 201963) +++ config/i386/x86-tune.def(working copy) @@ -196,7 +196,8 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, /* X86_TUNE_FUSE_CMP_AND_BRANCH: Fuse a compare or test i
Re: RFA: Fix mark_target_live_regs to take COND_EXEC into account
On Fri, Sep 6, 2013 at 12:20 PM, Joern Rennecke wrote: > We found that > > std::basic_string, std::allocator > ::copy(char*, unsigned long, unsigned long) const > > got miscompiled for ARC because reorg thought that all call-clobbered > registers were dead after a conditional call. Hmm, interesting to have a target with both predication and delay slots... > I can't reproduce the test case with current trunk + ARC port, but I reckon > this is just due to the fickleness of the register allocator. Chances are, > every other version, some other obscure function is miscompiled without > this patch. > > bootstrapped on i686-pc-linux-gnu. That's not a very good test, i686 doesn't use this code ;-) > OK to apply? Yes, the patch is OK. However, I wouldn't be surprised if using COND_EXECs together with delay slots has more hidden problems. Browsing through resource.c, the "if (jump_insn)" block at lines 1095:1119 jumps out as another place you should probably look into. Ciao! Steven
Re: [RS6000] powerpc64 -mcmodel=medium large symbol offsets
On Fri, Sep 6, 2013 at 3:13 AM, Alan Modra wrote: > The following testcase taken from the linux kernel is miscompiled on > powerpc64-linux. > > /* -m64 -mcmodel=medium -O -S -fno-section-anchors */ > static int x; > > unsigned long > foo (void) > { > return ((unsigned long) &x) - 0xc000; > } > > generates > addis 3,2,x+4611686018427387904@toc@ha > addi 3,3,x+4611686018427387904@toc@l > blr > > losing the top 32 bits of the offset. Sadly, the assembler and linker > do not complain, which is a hole in the ABI. (@ha and _HA relocs as > per the ABI won't complain about overflow since they might be used in > a @highesta, @highera sequence loading a 64-bit value.) > > This patch stops combine merging large offsets into a symbol addend > by copying code from reg_or_add_cint_operand to a new predicate, > add_cint_operand, and using that to restrict the range of offsets. > Bootstrapped and regression tested powerpc64-linux. OK to apply? > > * config/rs6000/predicates.md (add_cint_operand): New. > * config/rs6000/rs6000.md (largetoc_high_plus): Restrict offset > using add_cint_operand. > (largetoc_high_plus_aix): Likewise. This patch should include a testcase. But what user feedback are you expecting if the offset is too large, such as your example? In my test with the patch, it produces an unrecognizable insn error, which seems less than friendly. Thanks, David
Re: [i386, AVX-512F, pr58269] Partial fix for PR58269: properly initialize last EXT REX SSE register.
On Sep 6, 2013, at 3:04 AM, Iain Sandoe wrote: > Thanks for Ilya's input on the PR thread. > Here's what I propose for the remainder of the fix (FAOD, I cannot approve > the Darwin changes). Ok. > Mike: Actually, since this seems to have uncovered a pre-existing wrong code > bug, perhaps (this part of the) fix should also be applied to open branches? Bad wrong code gen I think is suitable for the release branches. I double checked llvm: llvm/lib/Target/X86/X86CallingConv.td and it seems to match.
Re: RFC - Next refactoring steps
On Fri, 6 Sep 2013, Richard Biener wrote: > But yes, making all dependencies explicit puts #includes where they > belong and shows where header refactoring would make sense. It also > removes weird includes from .c files that are only necessary to make > included required headers not break. One difficulty is that e.g. flags.h uses SWITCHABLE_TARGET (tm.h) but is included in lots of front-end files that don't actually care about the SWITCHABLE_TARGET use in flags.h and have no other reason to include tm.h. Since we generally want to eliminate tm.h uses in front-end files (converting target macros used there into hooks, etc.), making the "dependency" explicit in the obvious way isn't such a good idea. Instead, probably the bits that don't depend on SWITCHABLE_TARGET should move out of flags.h to other headers, the residual bit that does depend on tm.h should become e.g. switchable-target-flags.h (and include tm.h), and files that currently include flags.h should change to including the generated options.h (and any other headers needed for things formerly got from flags.h). (If other headers do start including tm.h, of course it becomes more complicated to identify which front-end files are still including tm.h, as they may include it indirectly.) -- Joseph S. Myers jos...@codesourcery.com
[C++ PATCH] Fix PR58300, issue with -fvtable-verify=preinit
Hi, On 09/06/2013 12:35 AM, Caroline Tice wrote: This fixes a bug where using -fvtable-verify=preinit sometimes causes an ICE. In particular, when the preinit flag was used, the vtable verification constructor initialization function was being written to the assembly file before it was being checked with cgraph_process_new_functions. This sometimes caused an assertion failure in decide_is_symbol_needed. This patch fixes the problem by reordering events so that the function is written to the assembly file after the call to cgraph_process_new_functions. I have verified that this fixes the test case attached to the bug, and it has passed all my regular vtable verification tests. Is this patch ok to commit? -- Caroline Tice cmt...@google.com 2013-09-04 Caroline Tice PR c++/58300 * vtable-class-hierarchy.c (vtv_generate_init_routine): In preinit case, move call to assemble_vtv_preinit_initializer to after call to cgraph_process_new_functions. Thanks Caroline. Let's add Jason in CC, I almost missed the patch myself (and I was actively paying attention to the issue) Paolo.
Fix missed propagation opportunity in DOM
I recently noticed that we were failing to propagate edge equivalences into PHI arguments in non-dominated successors. The case loos like this: ;; basic block 11, loop depth 0, count 0, freq 160, maybe hot ;;prev block 10, next block 12, flags: (NEW, REACHABLE) ;;pred: 10 [50.0%] (FALSE_VALUE,EXECUTABLE) _257 = di_13(D)->comps; _258 = (long unsigned int) _255; _259 = _258 * 24; p_260 = _257 + _259; _261 = _255 + 1; di_13(D)->next_comp = _261; if (p_260 != 0B) goto ; else goto ; ;;succ: 12 [100.0%] (TRUE_VALUE,EXECUTABLE) ;;13 (FALSE_VALUE,EXECUTABLE) ;; basic block 12, loop depth 0, count 0, freq 272, maybe hot ;; Invalid sum of incoming frequencies 160, should be 272 ;;prev block 11, next block 13, flags: (NEW, REACHABLE) ;;pred: 11 [100.0%] (TRUE_VALUE,EXECUTABLE) p_260->type = 37; p_260->u.s_builtin.type = _139; ;;succ: 13 [100.0%] (FALLTHRU,EXECUTABLE) ;; basic block 13, loop depth 0, count 0, freq 319, maybe hot ;; Invalid sum of incoming frequencies 432, should be 319 ;;prev block 12, next block 14, flags: (NEW, REACHABLE) ;;pred: 110 [100.0%] (FALLTHRU) ;;12 [100.0%] (FALLTHRU,EXECUTABLE) ;;11 (FALSE_VALUE,EXECUTABLE) # _478 = PHI <0B(110), p_260(12), p_260(11)> ret = _478; _142 = di_13(D)->expansion; _143 = _478->u.s_builtin.type; In particular note block 11 does *not* dominate block 13. However, we know that when we traverse the edge 11->13 that p_260 will have the value zero, which should be propagated into the PHI node. After fixing the propagation with the attached patch we have _478 = PHI <0B(110), p_260(12), 0B(11)> I have other code which then discovers the unconditional NULL pointer dereferences when we traverse 110->13 or 11->13 and isolates those paths. That in turn allows blocks 12 and 13 to be combined, which in turn allows discovery of additional CSE opportunities. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Applied to the trunk. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8493ee1..1f1f939 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2013-09-06 Jeff Law + + * tree-ssa-dom.c (cprop_into_successor_phis): Also propagate + edge implied equivalences into successor phis. + 2013-09-06 Claudiu Zissulescu * resource.c (mark_target_live_regs): Compute resources taking diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c index 691e6f9..f999a64 100644 --- a/gcc/tree-ssa-dom.c +++ b/gcc/tree-ssa-dom.c @@ -1642,6 +1642,28 @@ cprop_into_successor_phis (basic_block bb) if (gsi_end_p (gsi)) continue; + /* We may have an equivalence associated with this edge. While +we can not propagate it into non-dominated blocks, we can +propagate them into PHIs in non-dominated blocks. */ + + /* Push the unwind marker so we can reset the const and copies +table back to its original state after processing this edge. */ + const_and_copies_stack.safe_push (NULL_TREE); + + /* Extract and record any simple NAME = VALUE equivalences. + +Don't bother with [01] = COND equivalences, they're not useful +here. */ + struct edge_info *edge_info = (struct edge_info *) e->aux; + if (edge_info) + { + tree lhs = edge_info->lhs; + tree rhs = edge_info->rhs; + + if (lhs && TREE_CODE (lhs) == SSA_NAME) + record_const_or_copy (lhs, rhs); + } + indx = e->dest_idx; for ( ; !gsi_end_p (gsi); gsi_next (&gsi)) { @@ -1667,6 +1689,8 @@ cprop_into_successor_phis (basic_block bb) && may_propagate_copy (orig_val, new_val)) propagate_value (orig_p, new_val); } + + restore_vars_to_original_value (); } }
[patch i386]: Turn on -fms-extensions option for i386 ms-abi targets
Hello, This patch changes for i386 targets using by default ms-abi the default-value of option -fms-extensions. For those targets the option will be turned on. I will apply this patch next week, if there are no objections. Regards, Kai ChangeLog 2013-09-06 Kai Tietz * doc/invoke.texi (fms-extensions): Document changed behavior for ms-abi targets. * config/i386/i386.c (ix86_option_override_internal): Set default value of option -fms-extension for ms-abi targets. Index: doc/invoke.texi === --- doc/invoke.texi (Revision 202346) +++ doc/invoke.texi (Arbeitskopie) @@ -1856,6 +1856,8 @@ Some cases of unnamed fields in structures and uni accepted with this option. @xref{Unnamed Fields,,Unnamed struct/union fields within structs/unions}, for details. +Note that this option is off for all targets but i?86 and x86_64 +targets using ms-abi. @item -fplan9-extensions Accept some non-standard constructs used in Plan 9 code. Index: config/i386/i386.c === --- config/i386/i386.c (Revision 202134) +++ config/i386/i386.c (Arbeitskopie) @@ -3357,6 +3357,12 @@ ix86_option_override_internal (bool main_args_p) if (!global_options_set.x_ix86_abi) ix86_abi = DEFAULT_ABI; + /* For targets using ms ABI enable ms-extensions, if not + explicit turned off. For non-ms ABI we turn off this + option. */ + if (!global_options_set.x_flag_ms_extensions) +flag_ms_extensions = (MS_ABI == DEFAULT_ABI); + if (global_options_set.x_ix86_cmodel) { switch (ix86_cmodel)
Re: RFC - Next refactoring steps
On Fri, Sep 6, 2013 at 5:21 PM, Andrew MacLeod wrote: >> hackery in some headers will suddenly break (that is, change outcome) >> if you include for example >> tm.h before or after it. >> > these would be really good to identify and fix, if possible. (surely they > can be fixed.. :-)if they cant be fixed for whatever reason, we ought to > protect them with some mechanism.. ie assert that tm.h either has or has not > been included before hand... whichever way is required. At least then we > get an error if it changes. > > Do you know of any of the top of your head? sched-int.h comes to mind as an example I fixed recently (it needs INSN_SCHEDULING from insn-attr.h. But these cases are hard to discover. Usually you simply run into them while trying to re-arrange the #includes of a .c file. Ciao! Steven
Re: RFC - Next refactoring steps
On Fri, Sep 6, 2013 at 4:14 AM, Mike Stump wrote: > On Sep 5, 2013, at 11:54 PM, Richard Biener > wrote: >> Most of the GCC headerfiles do not include all their required headers but >> rely on .c files doing that (in the appropriate order). I somehow like >> that though I cannot explain why ;) > > Very old school. I can explain why I don't like it, but I'll resist. The > universal standard is for each header to include all it needs and for the > ordering of includes not to matter any. Deviations from this are the > exception and should virtually never should be the case. Agreed. Header files should be self-contained and include everything they need. This will be increasingly more important in the presence of C++ modules. Diego.
Re: Fix PR middle-end/57370
On Fri, Sep 6, 2013 at 12:05 AM, Richard Biener wrote: > On Thu, Sep 5, 2013 at 9:23 PM, Easwaran Raman wrote: >> On Tue, Aug 27, 2013 at 3:35 AM, Richard Biener >> wrote: >>> On Thu, Aug 1, 2013 at 1:34 AM, Easwaran Raman wrote: I have a new patch that supersedes this. The new patch also fixes PR tree-optimization/57393 and PR tree-optimization/58011. Bootstraps and no test regression on x86_64/linux. Ok for trunk? 2013-07-31 Easwaran Raman PR middle-end/57370 * tree-ssa-reassoc.c (build_and_add_sum): Fix UID assignment and reset of debug statements that cause inconsistent IR. >>> >>> Missing ChangeLog entry for the insert_stmt_after hunk which I do not like >>> at all. The other hunks are ok, but we need to work harder to preserve >>> debug stmts - simply removing all is not going to fly. >>> >>> Richard. >> >> I looked into the problem related to the debug stmts in this failing >> test case in detail. Initially, the code sequence looks >> >> >> s$n_13 = MEM[(struct S *)&s]; >> # DEBUG s$n => s$n_13 >> _2 = (double) s$n_13; >> _4 = _2 * a_3(D); >> _6 = (double) i_5(D); >> _7 = _4 * _6; >> _9 = (double) j_8(D); >> _10 = _7 * _9; >> bar (_10); >> # DEBUG D#2 => (double) k_12(D) >> # DEBUG D#1 => _7 * D#2 >> # DEBUG t$n => (int) D#1 >> >> After reassociation >> >> s$n_13 = MEM[(struct S *)&s]; >> # DEBUG s$n => s$n_13 >> _2 = (double) s$n_13; >> _6 = (double) i_5(D); >> # DEBUG D#3 => _4 * _6 >> _9 = (double) j_8(D); >> _4 = _9 * _2; >> _7 = _4 * a_3(D); >> _10 = _7 * _6; >> bar (_10); >> # DEBUG D#2 => (double) k_12(D) >> # DEBUG D#1 => D#3 * D#2 >> # DEBUG t$n => (int) D#1 >> >> In the above, # DEBUG D#3 => _4 * _6 appears above the statement that >> defines _4. But even if I move the def of D#3 below the statement >> defining _4, it is not sufficient. Before reassociation, t$n refers to >> the expression (s$n_13 * a_3(D) * i_5(D) * k_12(D)), but after >> reassociation it would refer to ( j_8(D) * s$n_13 * i_5(D) * >> k_12(D)). It seems the correct fix is to discard the debug temps whose >> RHS contains a variable that is involved in reassociation and then >> reconstruct it. Is that the expected fix here? > > The value of the DEBUG expression changes because the value > that _4 computes changes - that is a no-no that may not happen. > We cannot re-use _4 (without previously releasing it and allocating > it newly) with a new value. releasing _4 should have fixed up the > debug stmts. > > So - can you verify whether we are indeed just replacing the > RHS of _4 = _2 * a_3(D) to _9 * _2 without changing the SSA name of > that expression? I have confirmed that the SSA name of the LHS remains the same. The reassociation code simply calls gimple_assign_set_rhs2 to modify RHS2 and then calls update_stmt. - Easwaran > Another reason why re-using the same LHS for another value is > wrong is that annotated information like points-to sets, alignment > and soon value-range information will be wrong. > > Thanks, > Richard. > >> - Easwaran >> testsuite/ChangeLog: 2013-07-31 Easwaran Raman PR middle-end/57370 PR tree-optimization/57393 PR tree-optimization/58011 * gfortran.dg/reassoc_12.f90: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. * gcc.dg/tree-ssa/reassoc-31.c: New testcase. On Fri, Jul 12, 2013 at 7:57 AM, Easwaran Raman wrote: > Ping. > > On Thu, Jun 27, 2013 at 10:15 AM, Easwaran Raman > wrote: >> A newly generated statement in build_and_add_sum function of >> tree-ssa-reassoc.c has to be assigned the UID of its adjacent >> statement. In one instance, it was assigned the wrong uid (of an >> earlier phi statement) which messed up the IR and caused the test >> program to hang. Bootstraps and no test regressions on x86_64/linux. >> Ok for trunk? >> >> Thanks, >> Easwaran >> >> >> 2013-06-27 Easwaran Raman >> >> PR middle-end/57370 >> * tree-ssa-reassoc.c (build_and_add_sum): Do not use the UID of >> a phi >> node for a non-phi gimple statement. >> >> testsuite/ChangeLog: >> 2013-06-27 Easwaran Raman >> >> PR middle-end/57370 >> * gfortran.dg/reassoc_12.f90: New testcase. >> >> >> Index: gcc/testsuite/gfortran.dg/reassoc_12.f90 >> === >> --- gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) >> +++ gcc/testsuite/gfortran.dg/reassoc_12.f90 (revision 0) >> @@ -0,0 +1,74 @@ >> +! { dg-do compile } >> +! { dg-options "-O2 -ffast-math" } >> +! PR middle-end/57370 >> + >> + SUBROUTINE xb88_lr_adiabatic_lda_calc(e_ndrho_ndrho_ndrho, & >> + grad_deriv,npoints, sx) >> +IMPLICIT REAL*8 (t) >> +INTEGER, PARAMETER :: dp=8 >
Re: Simplify jump threading slightly
On 09/06/2013 02:10 PM, Gerald Pfeifer wrote: Hi Jeff, On Thu, 5 Sep 2013, Jeff Law wrote: Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Installed onto the trunk. is it possible this 2013-09-05 Jeff Law * tree-ssa-threadedge.c (thread_around_empty_blocks): Renamed from thread_around_empty_block. Record threading path into PATH. Recurse if threading through the initial block is successful. (thread_across_edge): Corresponding changes to slightly simplify. is causing http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58340 , or least unearthing it? That's amd64-unknown-freebsd8.0, pretty similar to your test platform, so this would be a bit surprising, but... I'll take a look. I'm not a big believer in coincidences -- with that code recently changing and you seeing a bootstrap failure, odds are, it's something in that change. The day is winding down, so it'll probably be monday before I get too far on it. jeff
Re: Simplify jump threading slightly
Hi Jeff, On Thu, 5 Sep 2013, Jeff Law wrote: > Bootstrapped and regression tested on x86_64-unknown-linux-gnu. > Installed onto the trunk. is it possible this 2013-09-05 Jeff Law * tree-ssa-threadedge.c (thread_around_empty_blocks): Renamed from thread_around_empty_block. Record threading path into PATH. Recurse if threading through the initial block is successful. (thread_across_edge): Corresponding changes to slightly simplify. is causing http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58340 , or least unearthing it? That's amd64-unknown-freebsd8.0, pretty similar to your test platform, so this would be a bit surprising, but... Gerald
Re: [PATCH, x86] Use vector moves in memmove expanding
> Michael, why did you change epilogue_size_needed to size_needed > here? It looks wrong to me. This function was changed in several places and meaning of 'size_needed' and 'epilogue_size_needed' could've been changed. It needs more careful examination and I'll do it shortly. Briefly, I significantly changed prologue and epilogue generation routines and that obviously caused that regression. I guess that in some of these function I began to look at size_needed instead of desired_align (usually, size_needed >= desired_align, but for pentium pro that's not true). I hope to dig into it on the weekend. Thanks for reporting and investigation. Michael > H.J.
Factor gimple structures out of gimple.h
This patch introduces gimple-core.h, which contains just the data structures needed to define gimple. I left everything else in gimple.h The addition of alias.h to tree-ssa-alias.h is so that we can include tree-ssa-alias.h in isolation. It now includes everything it needs. More discussion on rationale at the thread: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00300.html Tested on x86_64. 2013-09-06 Diego Novillo * Makefile.in (GIMPLE_CORE_H): New. (GIMPLE_H): Depend on GIMPLE_CORE_H. (TREE_SSA_ALIAS_H): New. Replace references to tree-ssa-alias.h with TREE_SSA_ALIAS_H. * gimple-core.h: New. Factor all gimple data structures out of ... * gimple.h: ... here. * tree-ssa-alias.h: Include alias.h. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index a72b753..2be8846 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -882,15 +882,19 @@ TREE_H = tree.h $(TREE_CORE_H) tree-check.h REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) \ cfg-flags.def cfghooks.h -GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \ +GIMPLE_CORE_H = gimple-core.h $(CONFIG_H) $(SYSTEM_H) $(CORETYPES_H) \ + $(INPUT_H) gimple.def gsstruct.def $(TREE_SSA_ALIAS_H) \ + $(INTERNAL_FN_H) $(TREE_CORE_H) +GIMPLE_H = gimple.h $(GIMPLE_CORE_H) pointer-set.h $(VEC_H) \ $(GGC_H) $(BASIC_BLOCK_H) $(TREE_H) tree-ssa-operands.h \ - tree-ssa-alias.h $(INTERNAL_FN_H) $(HASH_TABLE_H) + $(HASH_TABLE_H) TRANS_MEM_H = trans-mem.h GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h COVERAGE_H = coverage.h $(GCOV_IO_H) DEMANGLE_H = $(srcdir)/../include/demangle.h RECOG_H = recog.h ALIAS_H = alias.h +TREE_SSA_ALIAS_H = tree-ssa-alias.h $(ALIAS_H) EMIT_RTL_H = emit-rtl.h FLAGS_H = flags.h flag-types.h $(OPTIONS_H) OPTIONS_H = options.h flag-types.h $(OPTIONS_H_EXTRA) @@ -946,7 +950,7 @@ TREE_PASS_H = tree-pass.h $(TIMEVAR_H) $(DUMPFILE_H) TREE_FLOW_H = tree-flow.h tree-flow-inline.h tree-ssa-operands.h \ $(BITMAP_H) sbitmap.h $(BASIC_BLOCK_H) $(GIMPLE_H) \ $(HASHTAB_H) $(CGRAPH_H) $(IPA_REFERENCE_H) \ - tree-ssa-alias.h + $(TREE_SSA_ALIAS_H) TREE_HASHER_H = tree-hasher.h $(HASH_TABLE_H) $(TREE_FLOW_H) TREE_SSA_LIVE_H = tree-ssa-live.h $(PARTITION_H) SSAEXPAND_H = ssaexpand.h $(TREE_SSA_LIVE_H) @@ -2234,7 +2238,7 @@ test-dump.o : test-dump.c $(CONFIG_H) $(SYSTEM_H) $(CORETYPES_H) \ $(BITMAP_H) sbitmap.h sreal.h $(TREE_H) $(CXX_PARSER_H) $(DWARF2OUT_H) \ $(GIMPLE_PRETTY_PRINT_H) $(BASIC_BLOCK_H) insn-config.h $(LRA_INT.H) \ $(SEL_SCHED_DUMP_H) $(IRA_INT_H) $(TREE_DATA_REF_H) $(TREE_FLOW_H) \ - $(TREE_SSA_LIVE_H) tree-ssa-alias.h $(OMEGA_H) $(RTL_H) + $(TREE_SSA_LIVE_H) $(TREE_SSA_ALIAS_H) $(OMEGA_H) $(RTL_H) tree.o: tree.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \ all-tree.def $(FLAGS_H) $(FUNCTION_H) $(PARAMS_H) \ toplev.h $(DIAGNOSTIC_CORE_H) $(GGC_H) $(HASHTAB_H) $(TARGET_H) output.h $(TM_P_H) \ @@ -2722,7 +2726,7 @@ targhooks.o : targhooks.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TREE_H) \ $(EXPR_H) $(TM_H) $(RTL_H) $(TM_P_H) $(FUNCTION_H) output.h $(DIAGNOSTIC_CORE_H) \ $(MACHMODE_H) $(TARGET_DEF_H) $(TARGET_H) $(GGC_H) gt-targhooks.h \ $(OPTABS_H) $(RECOG_H) $(REGS_H) reload.h hard-reg-set.h intl.h $(OPTS_H) \ - tree-ssa-alias.h $(TREE_FLOW_H) + $(TREE_SSA_ALIAS_H) $(TREE_FLOW_H) common/common-targhooks.o : common/common-targhooks.c $(CONFIG_H) $(SYSTEM_H) \ coretypes.h $(INPUT_H) $(TM_H) $(COMMON_TARGET_H) common/common-targhooks.h @@ -2746,7 +2750,7 @@ toplev.o : toplev.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \ langhooks.h insn-flags.h $(CFGLOOP_H) hosthooks.h \ $(CGRAPH_H) $(COVERAGE_H) alloc-pool.h $(GGC_H) \ $(OPTS_H) params.def tree-mudflap.h $(TREE_PASS_H) $(GIMPLE_H) \ - tree-ssa-alias.h $(PLUGIN_H) realmpfr.h tree-diagnostic.h \ + $(TREE_SSA_ALIAS_H) $(PLUGIN_H) realmpfr.h tree-diagnostic.h \ $(TREE_PRETTY_PRINT_H) opts-diagnostic.h $(COMMON_TARGET_H) \ tsan.h diagnostic-color.h $(CONTEXT_H) $(PASS_MANAGER_H) @@ -3303,7 +3307,7 @@ alias.o : alias.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(DUMPFILE_H) \ $(ALIAS_H) $(EMIT_RTL_H) $(GGC_H) $(FUNCTION_H) cselib.h $(TREE_H) $(TM_P_H) \ langhooks.h $(TARGET_H) gt-alias.h $(TIMEVAR_H) $(CGRAPH_H) \ $(SPLAY_TREE_H) $(DF_H) \ - tree-ssa-alias.h pointer-set.h $(TREE_FLOW_H) + $(TREE_SSA_ALIAS_H) pointer-set.h $(TREE_FLOW_H) stack-ptr-mod.o : stack-ptr-mod.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ $(TM_H) $(TREE_H) $(RTL_H) $(REGS_H) $(EXPR_H) $(TREE_PASS_H) \ $(BASIC_BLOCK_H) $(FLAGS_H) output.h $(DF_H) @@ -3819,7 +3823,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \ $(srcdir)/reg-stack.c $(srcdir)/cfgrtl.c \ $(srcdir)/sdbout.c $(srcdir)/stor-layout.c \ $(srcdir)/stringpool.c $(s
Re: [PATCH] Fixing improper conversion from sin() to sinf() in optimization mode.
First, thank you for your detailed comments again! Then I deeply apologize for not explaining my patch properly and responding to your previous comment. I didn't understand thoroughly the problem before submitting the patch. Previously I only considered the following three conversions for sqrt(): 1: (float) sqrt ((double) float_val) -> sqrtf (float_val) 2: (float) sqrtl ((long double) float_val) -> sqrtf (float_val) 3: (double) sqrtl ((long double) double_val) -> sqrt (double_val) We have four types here: TYPE is the type to which the result of the function call is converted. ITYPE is the type of the math call expression. TREE_TYPE(arg0) is the type of the function argument (before type conversion). NEWTYPE is chosen from TYPE and TREE_TYPE(arg0) with higher precision. It will be the type of the new math call expression after conversion. For all three cases above, TYPE is always the same as NEWTYPE. That is why I only considered TYPE during the precision comparison. ITYPE can only be double_type_node or long_double_type_node depending on the type of the math function. That is why I explicitly used those two types instead of ITYPE (no correctness issue). But you are right, ITYPE is more elegant and better here. After further analysis, I found I missed two more cases. Note that we have the following conditions according to the code in convert.c: TYPE_PRECISION(NEWTYPE) >= TYPE_PRECISION(TYPE) TYPE_PRECISION(NEWTYPE) >= TYPE_PRECISION(TREE_TYPE(arg0)) TYPE_PRECISION (NEWTYPE) < TYPE_PRECISION (ITYPE) the last condition comes from the fact that we only consider converting a math function call into another one with narrower type. Therefore we have TYPE_PRECISION(TYPE) < TYPE_PRECISION (ITYPE) TYPE_PRECISION(TREE_TYPE(arg0)) < TYPE_PRECISION (ITYPE) So for sqrt(), TYPE and TREE_TYPE(arg0) can only be float, and for sqrtl(), TYPE and TREE_TYPE(arg0) can be either float or double with four possible combinations. Therefore we have two more conversions to consider besides the three ones I mentioned above: 4: (float) sqrtl ((long double) double_val) -> (float) sqrt (double_val) 5: (double) sqrtl ((long double) float_val) -> sqrt ((double) float_val) For the first conversion here, TYPE (float) is different from NEWTYPE (double), and my previous patch doesn't handle this case.The correct way is to compare precisions of ITYPE and NEWTYPE now. To sum up, we are converting the expression (TYPE) sqrtITYPE ((ITYPE) expr) to (TYPE) sqrtNEWTYPE ((NEWTYPE) expr) and we require PRECISION (ITYPE) >= PRECISION (NEWTYPE) * 2 + 2 to make it a safe conversion. The new patch is pasted below. I appreciate your detailed comments and analysis, and next time when I submit a patch I will be more carefully about the reviewer's comment. Thank you! Cong Index: gcc/convert.c === --- gcc/convert.c (revision 201891) +++ gcc/convert.c (working copy) @@ -135,16 +135,19 @@ convert_to_real (tree type, tree expr) CASE_MATHFN (COS) CASE_MATHFN (ERF) CASE_MATHFN (ERFC) - CASE_MATHFN (FABS) CASE_MATHFN (LOG) CASE_MATHFN (LOG10) CASE_MATHFN (LOG2) CASE_MATHFN (LOG1P) - CASE_MATHFN (LOGB) CASE_MATHFN (SIN) - CASE_MATHFN (SQRT) CASE_MATHFN (TAN) CASE_MATHFN (TANH) +/* The above functions are not safe to do this conversion. */ +if (!flag_unsafe_math_optimizations) + break; + CASE_MATHFN (SQRT) + CASE_MATHFN (FABS) + CASE_MATHFN (LOGB) #undef CASE_MATHFN { tree arg0 = strip_float_extensions (CALL_EXPR_ARG (expr, 0)); @@ -155,6 +158,27 @@ convert_to_real (tree type, tree expr) if (TYPE_PRECISION (TREE_TYPE (arg0)) > TYPE_PRECISION (type)) newtype = TREE_TYPE (arg0); + /* We consider to convert + + (T1) sqrtT2 ((T2) exprT3) + to + (T1) sqrtT4 ((T4) exprT3) + + , where T1 is TYPE, T2 is ITYPE, T3 is TREE_TYPE (ARG0), + and T4 is NEWTYPE. All those types are of floating point types. + T4 (NEWTYPE) should be narrower than T2 (ITYPE). This conversion + is safe only if P1 >= P2*2+2, where P1 and P2 are precisions of + T2 and T4. See the following URL for a reference: + http://stackoverflow.com/questions/9235456/determining-floating-point-square-root + */ + if (fcode == BUILT_IN_SQRT || fcode == BUILT_IN_SQRTL) + { + int p1 = REAL_MODE_FORMAT (TYPE_MODE (itype))->p; + int p2 = REAL_MODE_FORMAT (TYPE_MODE (newtype))->p; + if (p1 < p2 * 2 + 2 && !flag_unsafe_math_optimizations) +break; + } + /* Be careful about integer to fp conversions. These may overflow still. */ if (FLOAT_TYPE_P (TREE_TYPE (arg0)) Index: gcc/testsuite/gcc.c-torture/execute/20030125-1.c === --- gcc/testsuite/gcc.c-torture/execute/20030125-1.c (revision 201891) +++ gcc/testsuite/gcc.c-torture/execute/20030125-1.c (working copy) @@ -44,11 +44,11 @@ __attribute__ ((noinline)) double sin(double a) { - abort (); + retu
Re: RFC - Next refactoring steps
On Sep 6, 2013, at 8:21 AM, Andrew MacLeod wrote: > these would be really good to identify and fix, if possible. > Do you know of any of the top of your head? Sure, they are easy to find. See below. I'll note one more use of fixing these. clang is adding modules to get compilation speed. Modules require that the inclusion of a header be the same in every instance that it is used. The usages below would prevent these headers from participating in being a module (and transitive closure of all they include them), thus dooming the software to slow compiles. C++ features exponential compile time, where/as C features logarithmic compile time. modules return things back to being linear and are worthwhile to control compilation speed issues. To be competitive, I think gcc will have to add modules eventually as well, and as the gcc source base grows, making use of modules to keep compilation time down is also worthwhile. dfp.h: #ifdef TREE_CODE bool decimal_real_arithmetic (REAL_VALUE_TYPE *, enum tree_code, const REAL_VALUE_TYPE *, const REAL_VALUE_TYPE *); #endif double-int.h: #ifndef GENERATOR_FILE /* Conversion to and from GMP integer representations. */ void mpz_set_double_int (mpz_t, double_int, bool); double_int mpz_get_double_int (const_tree, mpz_t, bool); #endif expr.h: #ifdef TREE_CODE extern rtx expand_variable_shift (enum tree_code, enum machine_mode, rtx, tree, rtx, int); extern rtx expand_shift (enum tree_code, enum machine_mode, rtx, int, rtx, int); extern rtx expand_divmod (int, enum tree_code, enum machine_mode, rtx, rtx, rtx, int); #endif function.h: #ifdef RTX_CODE extern void diddle_return_value (void (*)(rtx, void*), void*); extern void clobber_return_register (void); #endif target.h: #ifdef GCC_TM_H #ifndef CUMULATIVE_ARGS_MAGIC #define CUMULATIVE_ARGS_MAGIC ((void *) &targetm.calls) #endif static inline CUMULATIVE_ARGS * get_cumulative_args (cumulative_args_t arg) { #ifdef ENABLE_CHECKING gcc_assert (arg.magic == CUMULATIVE_ARGS_MAGIC); #endif /* ENABLE_CHECKING */ return (CUMULATIVE_ARGS *) arg.p; }
Re: RFA: Fix mark_referenced_resources to handle COND_EXEC
On Fri, Sep 6, 2013 at 12:22 PM, Joern Rennecke wrote: > 2013-04-30 Joern Rennecke <...> > > * resource.c (mark_referenced_resources): Handle COND_EXEC. This is OK. Ciao! Steven
Re: [RS6000] powerpc64 -mcmodel=medium large symbol offsets
On Fri, Sep 06, 2013 at 02:18:49PM -0400, David Edelsohn wrote: > On Fri, Sep 6, 2013 at 3:13 AM, Alan Modra wrote: > > The following testcase taken from the linux kernel is miscompiled on > > powerpc64-linux. > > > > /* -m64 -mcmodel=medium -O -S -fno-section-anchors */ > > static int x; > > > > unsigned long > > foo (void) > > { > > return ((unsigned long) &x) - 0xc000; > > } > > > > generates > > addis 3,2,x+4611686018427387904@toc@ha > > addi 3,3,x+4611686018427387904@toc@l > > blr > > > > losing the top 32 bits of the offset. Sadly, the assembler and linker > > do not complain, which is a hole in the ABI. (@ha and _HA relocs as > > per the ABI won't complain about overflow since they might be used in > > a @highesta, @highera sequence loading a 64-bit value.) > > > > This patch stops combine merging large offsets into a symbol addend > > by copying code from reg_or_add_cint_operand to a new predicate, > > add_cint_operand, and using that to restrict the range of offsets. > > Bootstrapped and regression tested powerpc64-linux. OK to apply? > > > > * config/rs6000/predicates.md (add_cint_operand): New. > > * config/rs6000/rs6000.md (largetoc_high_plus): Restrict offset > > using add_cint_operand. > > (largetoc_high_plus_aix): Likewise. > > This patch should include a testcase. > > But what user feedback are you expecting if the offset is too large, > such as your example? In my test with the patch, it produces an > unrecognizable insn error, which seems less than friendly. The testcase gives me .L.foo: lis 9,0x4000 sldi 9,9,32 addis 3,2,x@toc@ha addi 3,3,x@toc@l add 3,3,9 blr How did you manage to get an unrecognizable insn? I can't see how we generate the pattern except in combine. -- Alan Modra Australia Development Lab, IBM
Re: [PATCH] C++-ify and move control dependence code
On Fri, Sep 6, 2013 at 8:41 AM, Richard Biener wrote: >> I'd recommend re-implementing the control dependence code, then. The >> current implementation is basically taken from old RTL-SSA dce.c and >> uses a now old-fashioned view of the CFG, e.g. using edge lists. >> You're probably better off starting from the dominance frontiers code >> (control dependence is the same as dominance frontiers on the reverse >> CFG). ... > Well, I'm not in a position to do the re-implementation at the moment. You don't have to. You just copy the dominance frontier code and make it work on the reverse CFG. That shouldn't take more than half an hour or so, and the result is much easier to interpret/validate than the edge list stuff so you'll probably get that half our back easily when working on control dependence consumers. In fact, I already made those changes some time ago, see the cfganal.c parts of http://gcc.gnu.org/ml/gcc-patches/2013-01/txt00046.txt > Would a re-implementation give me control-dependence in a more useful > way for PHI nodes? That is, what block is an incoming edge (to compute > the PHI arg) control dependent on? I suppose so, yes. The inverse dominance-frontiers output is a relation between basic blocks, i.e. not edge based. So you'd get B=e->src and take the control dependencies of B. > The current DCE query looks for > control dependece of the incoming edges predecessor, but for > >if > |\ > | > | / >PHI <1, 2> > > this gets us the right answer for the edge from the block > but a wrong one for the one that uses the if block (as DCE has to > consider all incoming edges that probably doesn't matter as it > just gets one step ahead here instead of waiting for the control > dependence of the if block being marked). I'm not sure I'm following what you're trying to say here, but if I understand correctly, then there should be a check in DCE that the PHI block does not post-dominate its predecessor. In any case, I recommend you don't start from the DCE control dependence code. It is relatively slow (Harvey's algorithm is optimal, Morgan's is quadratic in the number of edges), and it doesn't compute control dependence for abnormal edges (I hacked that out, it made control dependence calculation too slow). You'll want "proper", correct control dependence, and the code in tree-ssa-dce.c is tweaked for the purpose of DCE. Ciao! Steven
Re: C++ demangler fix
OK, thanks. Jason
Re: Ping: RFA: Consider int and same-size short as equivalent vector components
On 09/05/2013 10:50 AM, Joern Rennecke wrote: (vector_types_compatible_elements_p): New function. Why do we need this as well as vector_types_convertible_p? For that matter, why do we need both vector_types_convertible_p and vector_targets_convertible_p? Jason
Re: Ping: RFA: Consider int and same-size short as equivalent vector components
Quoting Jason Merrill : On 09/05/2013 10:50 AM, Joern Rennecke wrote: (vector_types_compatible_elements_p): New function. Why do we need this as well as vector_types_convertible_p? For that matter, why do we need both vector_types_convertible_p and vector_targets_convertible_p? vector_targets_convertible_p is used for pointer types. The callers do a hop, skip and dance to check that the qualifiers are satisfactory, while OTOH vector_targets_convertible_p ignores the number of elements in the vectors. That's fine with vectors as we can consider, say, a vector of 8 elements as two consecutive vectors of 4 elements, and that works fine with pointers. vector_types_convertible is used for pointer value types. It could in principle call vector_targets_convertible_p as a subroutine, but then the check for vector type would be duplicated with its callers, and also the purpose of vector_targets_convertible_p would become muddled. Where vector_types_convertible returns true, a conversion might still be needed to make the types match. vector_types_compatible_elements_p was supposed to be a minimal change from same_scalar_type_ignoring_signedness to fix the treatment of opaque vectors in binary operators. Where it returns true, and the other checks of the caller succeed (being vector types in he first place, and matching number of elements), we can just treat the types as essentially the same. I suppose we could replace same_scalar_type_ignoring_signedness / vector_types_compatible_elements_p by vector_compatible_p, but that would change the behaviour in a number of ways. First, we would no longer allow sign changes by default, this would be only available with -flax-vector-conversions, and to the extend that the types_compatible_p lang hook says so; OTOH, whatever else it says is compatible, would then also be (as long as -flax-vector-conversions is in effect). Is that change wanted? Also, we would presumably need to insert conversions at times. And should we allow different-sized vectors to match? If so, what would be the result type?
Re: [RS6000] powerpc64 -mcmodel=medium large symbol offsets
On Sat, Sep 07, 2013 at 09:06:08AM +0930, Alan Modra wrote: > The testcase gives me > > .L.foo: > lis 9,0x4000 > sldi 9,9,32 > addis 3,2,x@toc@ha > addi 3,3,x@toc@l > add 3,3,9 > blr > > How did you manage to get an unrecognizable insn? I can't see how we > generate the pattern except in combine. Never mind. I updated and rebuilt from a clean tree and now see the failure too. "tocrefdi" is where combine is still munging together the large offset. -- Alan Modra Australia Development Lab, IBM